profile_picture
Zhihui Xie (谢知晖)
Ph.D. Student, The University of Hong Kong
zhxieml@gmail.com

I am a 2nd year Ph.D. student at HKU, advised by Lingpeng Kong and Qi Liu.

I am particularly interested in building scalable methods that enable models to produce useful feedback, learn effectively from that feedback, and improve their reasoning and decision-making over time.

Previously, I obtained my Master’s degree at Shanghai Jiao Tong University, under the supervision of Shuai Li. I received my Bachelor’s degree from IEEE Honor Class, Shanghai Jiao Tong University, where I was fortunate to work with Junchi Yan.

News

Selected Works

Please find more on my Google Scholar profile
Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models, 2026, arXiv
Siyan Zhao , Zhihui Xie , Mengchen Liu , Jing Huang , Guan Pang , Feiyu Chen , Aditya Grover
Dream-Coder 7B: An Open Diffusion Language Model for Code, 2025, arXiv
Zhihui Xie* , Jiacheng Ye* , Lin Zheng* , Jiahui Gao , Jingwei Dong , Zirui Wu , Xueliang Zhao , Shansan Gong , Xin Jiang , Zhenguo Li , Lingpeng Kong
Dream 7B: Diffusion Large Language Models, 2025, arXiv
Jiacheng Ye* , Zhihui Xie* , Lin Zheng* , Jiahui Gao , Zirui Wu , Xin Jiang , Zhenguo Li , Lingpeng Kong
POLARIS: A POst-training recipe for scaling reinforcement Learning on Advanced ReasonIng modelS, 2025, Blog
Chenxin An , Zhihui Xie , Xiaonan Li , Lei Li , Jun Zhang , Shansan Gong , Ming Zhong , Jingjing Xu , Xipeng Qiu , Mingxuan Wang , Lingpeng Kong
Teaching Language Models to Critique via Reinforcement Learning, 2025, ICML
Zhihui Xie , Jie Chen , Liyu Chen , Weichao Mao , Jingjing Xu , Lingpeng Kong
Learning Versatile Skills with Curriculum Masking, 2024, NeurIPS
Yao Tang* , Zhihui Xie* , Zichuan Lin , Deheng Ye , Shuai Li
VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models, 2024, CVPR
Lei Li* , Yuancheng Wei* , Zhihui Xie* , Xuqing Yang* , Yifan Song , Peiyi Wang , Chenxin An , Tianyu Liu , Sujian Li , Bill Yuchen Lin , Lingpeng Kong , Qi Liu
VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment, 2024, EMNLP
Lei Li* , Zhihui Xie* , Mukai Li , Shunian Chen , Peiyi Wang , Liang Chen , Yazheng Yang , Benyou Wang , Lingpeng Kong
Calibrating Reasoning in Language Models with Internal Consistency, 2024, NeurIPS
Zhihui Xie , Jizhou Guo , Tong Yu , Shuai Li
Future-conditioned Unsupervised Pretraining for Decision Transformer, 2023, ICML
Zhihui Xie , Zichuan Lin , Deheng Ye , Qiang Fu , Wei Yang , Shuai Li
Discovering Low-rank Subspaces for Language-agnostic Multilingual Representations, 2022, EMNLP
Zhihui Xie , Handong Zhao , Tong Yu , Shuai Li
Comparison-based Conversational Recommender System with Relative Bandit Feedback, 2021, SIGIR
Zhihui Xie , Tong Yu , Canzhe Zhao , Shuai Li