Qingyu Shi (史清宇)
Ph.D. Student
Logo Peking University

I am Qingyu Shi, a third-year PhD candidate at Peking University (PKU) under the supervision of Professor Yunhai Tong. Before that, I received my Bachelor's degree at Beijing Institute of Technology (BIT) in July 2023.

My research interests lie in the multimodal generation, with a particular focus on diffusion models and unified models.


Education
  • Peking University
    Peking University
    School of Intelligence Science and Technology
    Ph.D. Student
    Sep. 2023 - present
  • Beijing Institute of Technology
    Beijing Institute of Technology
    B.S. in School of Mathematics and Statistics
    Sep. 2019 - Jul. 2023
Experience
  • ByteDance
    ByteDance
    Research Intern
    Dec. 2025 - present
  • TeleAI
    TeleAI
    Research Intern
    Mar. 2025 - Dec. 2025
Honors & Awards
  • CETC The 14TH Research Institute Glarun Scholarship
    2024
  • Merit Student
    2024
  • Beijing Excellent Graduates
    2023
  • National Scholarship
    2022
  • National Scholarship
    2021
News
2025
DeT is accepted by ICCV 2025
Jun 26
We release Muddit, the first visual prior unified discrete diffusion model.
May 28
We release DeT, the state-of-the-art motion transfer method based on Video Diffusion Transformers Architecture.
Mar 14
DreamRelation is accepted by CVPR 2025, Congrats!
Feb 28
RMP-SAM is accepted by ICLR 2025, Congrats!
Jan 22
2024
We release DreamRelation, the first method to address the relation-aware customization task.
Nov 20
We release RMP-SAM, the first real-time multi-purpose segment anything model.
Jan 18
2023
Starting my PhD journey at the School of Intelligence Science and Technology, PKU!
Sep 01
Obtained a B.S. degree from the School of Mathematics and Statistics, Beijing Institute of Technology.
Jul 01
Selected Projects and Papers (view all )
Beyond Text-to-Image: Liberating Generation with a Unified Discrete Diffusion Model
Beyond Text-to-Image: Liberating Generation with a Unified Discrete Diffusion Model

Qingyu Shi, Jinbin Bai, Zhuoran Zhao, Wenhao Chai, Kaidong Yu, Jianzong Wu, Shuangyong Song, Yunhai Tong, Xiangtai Li, Xuelong Li, Shuicheng YAN

Under review. 2025

We introduce Muddit, a unified discrete diffusion transformer that enables fast and parallel generation across both text and image modalities. Empirical results show that Muddit achieves competitive or superior performance compared to significantly larger autoregressive models in both quality and efficiency.

Beyond Text-to-Image: Liberating Generation with a Unified Discrete Diffusion Model

Qingyu Shi, Jinbin Bai, Zhuoran Zhao, Wenhao Chai, Kaidong Yu, Jianzong Wu, Shuangyong Song, Yunhai Tong, Xiangtai Li, Xuelong Li, Shuicheng YAN

Under review. 2025

We introduce Muddit, a unified discrete diffusion transformer that enables fast and parallel generation across both text and image modalities. Empirical results show that Muddit achieves competitive or superior performance compared to significantly larger autoregressive models in both quality and efficiency.

Decouple and Track: Benchmarking and Improving Video Diffusion Transformers For Motion Transfer
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers For Motion Transfer

Qingyu Shi, Jianzong Wu, Jinbin Bai, Jiangning Zhang, Lu Qi, Yunhai Tong, Xiangtai Li

ICCV 2025 Poster

In this paper, we propose DeT, a method that adapts DiT models to improve motion transfer ability. Our approach introduces a simple yet effective temporal kernel to smooth DiT features along the temporal dimension, facilitating the decoupling of foreground motion from background appearance.

Decouple and Track: Benchmarking and Improving Video Diffusion Transformers For Motion Transfer

Qingyu Shi, Jianzong Wu, Jinbin Bai, Jiangning Zhang, Lu Qi, Yunhai Tong, Xiangtai Li

ICCV 2025 Poster

In this paper, we propose DeT, a method that adapts DiT models to improve motion transfer ability. Our approach introduces a simple yet effective temporal kernel to smooth DiT features along the temporal dimension, facilitating the decoupling of foreground motion from background appearance.

DreamRelation: Bridging Customization and Relation Generation
DreamRelation: Bridging Customization and Relation Generation

Qingyu Shi, Lu Qi, Jianzong Wu, Jinbin Bai, Jingbo Wang, Yunhai Tong, Xiangtai Li

CVPR 2025 Oral

We introduce DreamRelation, a framework that disentangles identity and relation learning using a carefully curated dataset. Our training data consists of relation-specific images, independent object images containing identity information, and text prompts to guide relation generation. Extensive results on our proposed benchmarks demonstrate the superiority of DreamRelation in generating precise relations while preserving object identities across a diverse set of objects and relationships.

DreamRelation: Bridging Customization and Relation Generation

Qingyu Shi, Lu Qi, Jianzong Wu, Jinbin Bai, Jingbo Wang, Yunhai Tong, Xiangtai Li

CVPR 2025 Oral

We introduce DreamRelation, a framework that disentangles identity and relation learning using a carefully curated dataset. Our training data consists of relation-specific images, independent object images containing identity information, and text prompts to guide relation generation. Extensive results on our proposed benchmarks demonstrate the superiority of DreamRelation in generating precise relations while preserving object identities across a diverse set of objects and relationships.

All publications