Runzhe Wu

Hi, I'm Runzhe Wu, a final-year Ph.D. student in Computer Science at Cornell University, based at the Cornell Tech campus in New York City. I am advised by Wen Sun. My current research focuses on reinforcement learning, particularly its core theories and its interplay with generative models such as language models and diffusion models.

Before Cornell Tech, I spent the first three years of my Ph.D. at the Ithaca campus of Cornell. Prior to that, I obtained the Bachelor's degree in Computer Science from ACM Honors Class at Shanghai Jiao Tong University, where I conducted research at the APEX Lab under Weinan Zhang and Yong Yu.

I am graduating in 2026 and currently seeking full-time positions (and possibly internships)! Please check out my CV. I can be reached via email at rw646 at cornell dot edu.

CV / Google Scholar / X (Twitter) / LinkedIn

Research

Imbalanced Gradients in RL Post-Training of Multi-Task LLMs
Runzhe Wu, Ankur Samanta, Ayush Jain, Scott Fujimoto, Jeongyeol Kwon, Ben Kretzu, Youliang Yu, Kaveh Hassani, Boris Vidolov, Yonathan Efroni
EACL, 2026 (Findings)

Internalizing Self-Consistency in Language Models: Multi-Agent Consensus Alignment
Ankur Samanta, Akshayaa Magesh, Youliang Yu, Runzhe Wu, Ayush Jain, Daniel Jiang, Boris Vidolov, Paul Sajda, Yonathan Efroni, Kaveh Hassani
Preprint, 2025

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics
Runzhe Wu*, Ayush Sekhari*, Akshay Krishnamurthy, Wen Sun
ICLR, 2025 (Oral — top 1.8%)
[Talk at RL Theory Seminars]

Diffusing States and Matching Scores: A New Framework for Imitation Learning
Runzhe Wu, Yiding Chen, Gokul Swamy, Kianté Brantley, Wen Sun
ICLR, 2025
[Code]

Making RL with Preference-based Feedback Efficient via Randomization
Runzhe Wu, Wen Sun
ICLR, 2024

Contextual Bandits and Imitation Learning via Preference-Based Active Queries
(Alphabetical order) Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu
NeurIPS, 2023 (Also at the ILHF & MFPL Workshops @ ICML, 2023)
[Code]

Selective Sampling and Imitation Learning via Online Regression
(Alphabetical order) Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu
NeurIPS, 2023 (Also at the ILHF Workshop @ ICML, 2023)

The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning
Kaiwen Wang, Kevin Zhou, Runzhe Wu, Nathan Kallus, Wen Sun
NeurIPS, 2023
[Code]

Distributional Offline Policy Evaluation with Predictive Error Guarantees
Runzhe Wu, Masatoshi Uehara, Wen Sun
ICML, 2023
[Code]

MALib: A parallel framework for population-based multi-agent reinforcement learning
Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Weinan Zhang, Jun Wang
JMLR, 2023
[Website] [Code]

Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration
Runzhe Wu, Yufeng Zhang, Zhuoran Yang, Zhaoran Wang
NeurIPS, 2021

Invited Talks

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics
@ ICLR 2025, Oral Presentation (Apr 26, 2025) [Slides] [Recording (1:12:00-1:25:00)]
@ RL Theory Seminars (Nov 19, 2024) [Slides] [Recording]

Awards

I was previously a passionate competitive programmer, during which I achieved the following:

Gold Medal, the 2019 ICPC China Nanchang National Invitational Programming Contest
Gold Medal, the 2018 ICPC Asia Hanoi Regional Programming Contest
Gold Medal, the 2018 ICPC Asia Xuzhou Regional Programming Contest
Gold Medal, the 2018 CCPC Qinhuangdao Regional Programming Contest
Silver Medal, the 34th China National Olympiad in Informatics (NOI)

I was also fortunate to receive the honors below:

Outstanding Graduate of Shanghai
Huawei Scholarship
Rong Chang Scholarship
Zhiyuan Honorary Scholarship

Experience