Hi, my name is Zhi Wang, an Associate Professor and PhD Supervisor at Nanjing University. Previously, I received the PhD degree from City University of Hong Kong and the Bachelor degree from Nanjing University. I was a visiting scholar at University of New South Wales, Nanyang Technological University, and Chinese Academy of Sciences.

My research focuses on developing general-purpose decision-making agents across reinforcement learning, post-training and alignment of foundation models, vision-language-action models, and embodied AI. My long-term goal is to bridge foundation models and embodied intelligence, enabling agents to perceive, reason, decide, and act autonomously in complex real-world environments.

Research

Foundation-Model Decision Agents

Grounding language and multimodal foundation models as agents that retrieve evidence, plan hierarchically, and make decisions in interactive environments.

Language AgentsVision-Language-Action ModelsMultimodal Decision-Making
Explore representative work

Post-Training & Alignment of Foundation Models

Using reinforcement learning to improve reasoning, exploration, and output diversity in language and generative models through principled reward and optimization design.

Reinforcement Fine-TuningReasoningGenerative Models
Explore representative work

Self-Improving Agents

Designing agents that autonomously refine their evidence, reasoning, and tool-use workflows through iterative interaction and feedback.

Agentic RAGAgent Harness
Explore representative work

Generalizable & Continual Reinforcement Learning

Developing agents that generalize across tasks and adapt over time through in-context, offline, meta, continual, and lifelong reinforcement learning.

In-Context RLOffline & Meta-RLContinual Learning
Explore representative work

News

  1. ICML 2026

    Two papers on RL for LLM reasoning and agentic RAG were accepted, including one Oral paper.

  2. ICLR 2026

    Three papers on RL for LLM reasoning and in-context RL were accepted.

  3. NeurIPS 2025

    Three papers on in-context RL, language agents, and RL for LLM reasoning were accepted.

  4. ICML 2025

    One paper on hierarchical LLM agents was accepted.

  5. TPAMI

    One paper on interpretable multi-agent reinforcement learning was accepted.

Earlier news
  1. NeurIPS 2024

    One paper on generalist RL agents was accepted.

  2. ICLR 2024

    One paper on efficient multi-agent RL coordination was accepted.

Representative Publications

Foundation-Model Decision Agents

Post-Training & Alignment of Foundation Models

ExGRPO method overview
ICLR 2026

ExGRPO: Learning to Reason from Experience

Runzhe Zhan, Yafu Li, Zhi Wang, Xiaoye Qu, Dongrui Liu, Jing Shao, Derek F. Wong, Yu Cheng

Valuable reasoning experiences are prioritized using rollout correctness and entropy to balance exploration and exploitation.

Self-Improving Agents

Generalizable & Continual Reinforcement Learning

Scalable in-context Q-learning framework
ICLR 2026

Scalable In-Context Q-Learning

Jinmei Liu, Fuhong Liu, Zhenhong Sun, Jianye Hao, Bo Wang, Huaxiong Li, Daoyi Dong, Chunlin Chen, Zhi Wang*

Dynamic programming and world modeling steer in-context RL toward efficient reward maximization and generalization.

Academic Service

Program Chair

IEEE International Conference on Cybernetics 2026

Area Chair

NeurIPS & AAMAS

Guest Editor

IEEE Transactions on Cybernetics

Associate Editor

IEEE SMC & IEEE ICNSC

IEEE SMC 2021โ€“2023, 2026 ยท IEEE ICNSC 2020

Teaching

Course information, announcements, and lecture materials for students at Nanjing University.