Runzhe Wu

Hi, I'm Runzhe Wu, a Ph.D. candidate in Computer Science at Cornell University, based at the Cornell Tech campus in New York City. I am advised by Wen Sun. My current research focuses on reinforcement learning, particularly its core methodology and its interplay with language models.

Before Cornell Tech, I spent the first three years of my Ph.D. at the Ithaca campus of Cornell. Prior to that, I obtained the Bachelor's degree in Computer Science from ACM Honors Class at Shanghai Jiao Tong University, where I conducted research at the APEX Lab under Weinan Zhang and Yong Yu.

You can reach me via email at rw646 at cornell dot edu.

CV  /  Google Scholar  /  X (Twitter)

profile photo
Research
Imbalanced Gradients in RL Post-Training of Multi-Task LLMs
Runzhe Wu, Ankur Samanta, Ayush Jain, Scott Fujimoto, Jeongyeol Kwon, Ben Kretzu, Youliang Yu, Kaveh Hassani, Boris Vidolov, Yonathan Efroni
Preprint, 2025
Internalizing Self-Consistency in Language Models: Multi-Agent Consensus Alignment
Ankur Samanta, Akshayaa Magesh, Youliang Yu, Runzhe Wu, Ayush Jain, Daniel Jiang, Boris Vidolov, Paul Sajda, Yonathan Efroni, Kaveh Hassani
Preprint, 2025
Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics
Runzhe Wu*, Ayush Sekhari*, Akshay Krishnamurthy, Wen Sun
ICLR, 2025 (Oral — top 1.8%)
[Talk at RL Theory Seminars]
Diffusing States and Matching Scores: A New Framework for Imitation Learning
Runzhe Wu, Yiding Chen, Gokul Swamy, Kianté Brantley, Wen Sun
ICLR, 2025
[Code]
Making RL with Preference-based Feedback Efficient via Randomization
Runzhe Wu, Wen Sun
ICLR, 2024
Contextual Bandits and Imitation Learning via Preference-Based Active Queries
(Alphabetical order) Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu
NeurIPS, 2023 (Also at the ILHF & MFPL Workshops @ ICML, 2023)
[Code]
Selective Sampling and Imitation Learning via Online Regression
(Alphabetical order) Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu
NeurIPS, 2023 (Also at the ILHF Workshop @ ICML, 2023)
The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning
Kaiwen Wang, Kevin Zhou, Runzhe Wu, Nathan Kallus, Wen Sun
NeurIPS, 2023
[Code]
Distributional Offline Policy Evaluation with Predictive Error Guarantees
Runzhe Wu, Masatoshi Uehara, Wen Sun
ICML, 2023
[Code]
MALib: A parallel framework for population-based multi-agent reinforcement learning
Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Weinan Zhang, Jun Wang
JMLR, 2023
[Website]   [Code]
Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration
Runzhe Wu, Yufeng Zhang, Zhuoran Yang, Zhaoran Wang
NeurIPS, 2021
Invited Talks
Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics
@ ICLR 2025, Oral Presentation (Apr 26, 2025)   [Slides]   [Recording (1:12:00-1:25:00)]
@ RL Theory Seminars (Nov 19, 2024)   [Slides]   [Recording]
Experience
Meta

Research Scientist Intern

Worked on RL post-training of language models, with an emphasis on multi-task learning.

May. 2025 - Aug. 2025

Education
Cornell Tech

Ph.D. in Computer Science

(Transferred from Ithaca campus of Cornell)

May. 2025 - Present

Cornell University

Ph.D. in Computer Science

(M.S. earned en route, then transferred to Cornell Tech campus)

Aug. 2022 - May. 2025

Shanghai Jiao Tong University

B.Eng. in Computer Science

Sep. 2018 - Jun. 2022



Website template from here.