2025

Beyond Text-to-Image: Liberating Generation with a Unified Discrete Diffusion Model
Beyond Text-to-Image: Liberating Generation with a Unified Discrete Diffusion Model

Qingyu Shi, Jinbin Bai, Zhuoran Zhao, Wenhao Chai, Kaidong Yu, Jianzong Wu, Shuangyong Song, Yunhai Tong, Xiangtai Li, Xuelong Li, Shuicheng YAN

Under review. 2025

We introduce Muddit, a unified discrete diffusion transformer that enables fast and parallel generation across both text and image modalities. Empirical results show that Muddit achieves competitive or superior performance compared to significantly larger autoregressive models in both quality and efficiency.

Beyond Text-to-Image: Liberating Generation with a Unified Discrete Diffusion Model

Qingyu Shi, Jinbin Bai, Zhuoran Zhao, Wenhao Chai, Kaidong Yu, Jianzong Wu, Shuangyong Song, Yunhai Tong, Xiangtai Li, Xuelong Li, Shuicheng YAN

Under review. 2025

We introduce Muddit, a unified discrete diffusion transformer that enables fast and parallel generation across both text and image modalities. Empirical results show that Muddit achieves competitive or superior performance compared to significantly larger autoregressive models in both quality and efficiency.

Decouple and Track: Benchmarking and Improving Video Diffusion Transformers For Motion Transfer
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers For Motion Transfer

Qingyu Shi, Jianzong Wu, Jinbin Bai, Jiangning Zhang, Lu Qi, Yunhai Tong, Xiangtai Li

ICCV 2025 Poster

In this paper, we propose DeT, a method that adapts DiT models to improve motion transfer ability. Our approach introduces a simple yet effective temporal kernel to smooth DiT features along the temporal dimension, facilitating the decoupling of foreground motion from background appearance.

Decouple and Track: Benchmarking and Improving Video Diffusion Transformers For Motion Transfer

Qingyu Shi, Jianzong Wu, Jinbin Bai, Jiangning Zhang, Lu Qi, Yunhai Tong, Xiangtai Li

ICCV 2025 Poster

In this paper, we propose DeT, a method that adapts DiT models to improve motion transfer ability. Our approach introduces a simple yet effective temporal kernel to smooth DiT features along the temporal dimension, facilitating the decoupling of foreground motion from background appearance.

2024

DreamRelation: Bridging Customization and Relation Generation
DreamRelation: Bridging Customization and Relation Generation

Qingyu Shi, Lu Qi, Jianzong Wu, Jinbin Bai, Jingbo Wang, Yunhai Tong, Xiangtai Li

CVPR 2025 Oral

We introduce DreamRelation, a framework that disentangles identity and relation learning using a carefully curated dataset. Our training data consists of relation-specific images, independent object images containing identity information, and text prompts to guide relation generation. Extensive results on our proposed benchmarks demonstrate the superiority of DreamRelation in generating precise relations while preserving object identities across a diverse set of objects and relationships.

DreamRelation: Bridging Customization and Relation Generation

Qingyu Shi, Lu Qi, Jianzong Wu, Jinbin Bai, Jingbo Wang, Yunhai Tong, Xiangtai Li

CVPR 2025 Oral

We introduce DreamRelation, a framework that disentangles identity and relation learning using a carefully curated dataset. Our training data consists of relation-specific images, independent object images containing identity information, and text prompts to guide relation generation. Extensive results on our proposed benchmarks demonstrate the superiority of DreamRelation in generating precise relations while preserving object identities across a diverse set of objects and relationships.