profile_picture
Zhihui Xie (谢知晖)
Ph.D. Student, The University of Hong Kong
zhxieml@gmail.com

I am a 2nd year Ph.D. student at HKU, advised by Lingpeng Kong and Qi Liu.

I am particularly interested in building scalable methods that enable models to produce useful feedback, learn effectively from that feedback, and improve their reasoning and decision-making over time.

Previously, I obtained my Master’s degree at Shanghai Jiao Tong University, under the supervision of Shuai Li. I received my Bachelor’s degree from IEEE Honor Class, Shanghai Jiao Tong University, where I was fortunate to work with Junchi Yan.

I always enjoy talking with people from different backgrounds. If you are interested in my work or would simply like to connect, feel free to reach out via WeChat.

News

Selected Works

Please find more on my Google Scholar profile
Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models, 2026, arXiv
Siyan Zhao , Zhihui Xie , Mengchen Liu , Jing Huang , Guan Pang , Feiyu Chen , Aditya Grover
Dream-Coder 7B: An Open Diffusion Language Model for Code, 2025, arXiv
Zhihui Xie* , Jiacheng Ye* , Lin Zheng* , Jiahui Gao , Jingwei Dong , Zirui Wu , Xueliang Zhao , Shansan Gong , Xin Jiang , Zhenguo Li , Lingpeng Kong
Dream 7B: Diffusion Large Language Models, 2025, arXiv
Jiacheng Ye* , Zhihui Xie* , Lin Zheng* , Jiahui Gao , Zirui Wu , Xin Jiang , Zhenguo Li , Lingpeng Kong
POLARIS: A POst-training recipe for scaling reinforcement Learning on Advanced ReasonIng modelS, 2025, Blog
Chenxin An , Zhihui Xie , Xiaonan Li , Lei Li , Jun Zhang , Shansan Gong , Ming Zhong , Jingjing Xu , Xipeng Qiu , Mingxuan Wang , Lingpeng Kong
Teaching Language Models to Critique via Reinforcement Learning, 2025, ICML
Zhihui Xie , Jie Chen , Liyu Chen , Weichao Mao , Jingjing Xu , Lingpeng Kong
Learning Versatile Skills with Curriculum Masking, 2024, NeurIPS
Yao Tang* , Zhihui Xie* , Zichuan Lin , Deheng Ye , Shuai Li
VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models, 2024, CVPR
Lei Li* , Yuancheng Wei* , Zhihui Xie* , Xuqing Yang* , Yifan Song , Peiyi Wang , Chenxin An , Tianyu Liu , Sujian Li , Bill Yuchen Lin , Lingpeng Kong , Qi Liu
VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment, 2024, EMNLP
Lei Li* , Zhihui Xie* , Mukai Li , Shunian Chen , Peiyi Wang , Liang Chen , Yazheng Yang , Benyou Wang , Lingpeng Kong
Calibrating Reasoning in Language Models with Internal Consistency, 2024, NeurIPS
Zhihui Xie , Jizhou Guo , Tong Yu , Shuai Li
Future-conditioned Unsupervised Pretraining for Decision Transformer, 2023, ICML
Zhihui Xie , Zichuan Lin , Deheng Ye , Qiang Fu , Wei Yang , Shuai Li
Discovering Low-rank Subspaces for Language-agnostic Multilingual Representations, 2022, EMNLP
Zhihui Xie , Handong Zhao , Tong Yu , Shuai Li
Comparison-based Conversational Recommender System with Relative Bandit Feedback, 2021, SIGIR
Zhihui Xie , Tong Yu , Canzhe Zhao , Shuai Li