Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models,
2026,
arXiv
Siyan Zhao
,
Zhihui Xie
,
Mengchen Liu
,
Jing Huang
,
Guan Pang
,
Feiyu Chen
,
Aditya Grover
Dream-Coder 7B: An Open Diffusion Language Model for Code,
2025,
arXiv
Zhihui Xie*
,
Jiacheng Ye*
,
Lin Zheng*
,
Jiahui Gao
,
Jingwei Dong
,
Zirui Wu
,
Xueliang Zhao
,
Shansan Gong
,
Xin Jiang
,
Zhenguo Li
,
Lingpeng Kong
Dream 7B: Diffusion Large Language Models,
2025,
arXiv
Jiacheng Ye*
,
Zhihui Xie*
,
Lin Zheng*
,
Jiahui Gao
,
Zirui Wu
,
Xin Jiang
,
Zhenguo Li
,
Lingpeng Kong
POLARIS: A POst-training recipe for scaling reinforcement Learning on Advanced ReasonIng modelS,
2025,
Blog
Chenxin An
,
Zhihui Xie
,
Xiaonan Li
,
Lei Li
,
Jun Zhang
,
Shansan Gong
,
Ming Zhong
,
Jingjing Xu
,
Xipeng Qiu
,
Mingxuan Wang
,
Lingpeng Kong
Teaching Language Models to Critique via Reinforcement Learning,
2025,
ICML
Zhihui Xie
,
Jie Chen
,
Liyu Chen
,
Weichao Mao
,
Jingjing Xu
,
Lingpeng Kong
Learning Versatile Skills with Curriculum Masking,
2024,
NeurIPS
Yao Tang*
,
Zhihui Xie*
,
Zichuan Lin
,
Deheng Ye
,
Shuai Li
VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models,
2024,
CVPR
Lei Li*
,
Yuancheng Wei*
,
Zhihui Xie*
,
Xuqing Yang*
,
Yifan Song
,
Peiyi Wang
,
Chenxin An
,
Tianyu Liu
,
Sujian Li
,
Bill Yuchen Lin
,
Lingpeng Kong
,
Qi Liu
VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment,
2024,
EMNLP
Lei Li*
,
Zhihui Xie*
,
Mukai Li
,
Shunian Chen
,
Peiyi Wang
,
Liang Chen
,
Yazheng Yang
,
Benyou Wang
,
Lingpeng Kong
Calibrating Reasoning in Language Models with Internal Consistency,
2024,
NeurIPS
Zhihui Xie
,
Jizhou Guo
,
Tong Yu
,
Shuai Li
Future-conditioned Unsupervised Pretraining for Decision Transformer,
2023,
ICML
Zhihui Xie
,
Zichuan Lin
,
Deheng Ye
,
Qiang Fu
,
Wei Yang
,
Shuai Li
Discovering Low-rank Subspaces for Language-agnostic Multilingual Representations,
2022,
EMNLP
Zhihui Xie
,
Handong Zhao
,
Tong Yu
,
Shuai Li
Comparison-based Conversational Recommender System with Relative Bandit Feedback,
2021,
SIGIR
Zhihui Xie
,
Tong Yu
,
Canzhe Zhao
,
Shuai Li