Teaching Language Models to Critique via Reinforcement Learning,
2025,
Arxiv
Zhihui Xie
,
Jie Chen
,
Liyu Chen
,
Weichao Mao
,
Jingjing Xu
,
Lingpeng Kong
Learning Versatile Skills with Curriculum Masking,
2024,
NeurIPS
Yao Tang*
,
Zhihui Xie*
,
Zichuan Lin
,
Deheng Ye
,
Shuai Li
VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models,
2024,
Arxiv
Lei Li*
,
Yuancheng Wei*
,
Zhihui Xie*
,
Xuqing Yang*
,
Yifan Song
,
Peiyi Wang
,
Chenxin An
,
Tianyu Liu
,
Sujian Li
,
Bill Yuchen Lin
,
Lingpeng Kong
,
Qi Liu
VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment,
2024,
EMNLP
Lei Li*
,
Zhihui Xie*
,
Mukai Li
,
Shunian Chen
,
Peiyi Wang
,
Liang Chen
,
Yazheng Yang
,
Benyou Wang
,
Lingpeng Kong
Calibrating Reasoning in Language Models with Internal Consistency,
2024,
NeurIPS
Zhihui Xie
,
Jizhou Guo
,
Tong Yu
,
Shuai Li
Future-conditioned Unsupervised Pretraining for Decision Transformer,
2023,
ICML
Zhihui Xie
,
Zichuan Lin
,
Deheng Ye
,
Qiang Fu
,
Wei Yang
,
Shuai Li
Discovering Low-rank Subspaces for Language-agnostic Multilingual Representations,
2022,
EMNLP
Zhihui Xie
,
Handong Zhao
,
Tong Yu
,
Shuai Li
Doubly-Adaptive Reinforcement Learning for Cross-Domain Interactive Recommendation,
2022,
SIGIR
Junda Wu*
,
Zhihui Xie*
,
Tong Yu
,
Handong Zhao
,
Ruiyi Zhang
,
Shuai Li
Comparison-based Conversational Recommender System with Relative Bandit Feedback,
2021,
SIGIR
Zhihui Xie
,
Tong Yu
,
Canzhe Zhao
,
Shuai Li