Selected Papers
Preprints
Zican Hu, Shilin Zhang, Yafu Li, Jianhao Yan, Xuyang Hu, Leyang Cui, Xiaoye Qu, Chunlin Chen, Yu Cheng, Zhi Wang, "Diversity-Incentivized Exploration for Versatile Reasoning," arXiv preprint 2509.26209, 2025. [pdf] [code]
Runzhe Zhan, Yafu Li, Zhi Wang, Xiaoye Qu, Dongrui Liu, Jing Shao, Derek F. Wong, Yu Cheng, "ExGRPO: Learning to Reason from Experience," arXiv preprint 2510.02245, 2025. [pdf] [code]
Ganqu Cui, Yuchen Zhang, Jiacheng Chen, Lifan Yuan, Zhi Wang, Yuxin Zuo, Haozhan Li, Yuchen Fan, Huayu Chen, Weize Chen, Zhiyuan Liu, Hao Peng, Lei Bai, Wanli Ouyang, Yu Cheng, Bowen Zhou, Ning Ding, "The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models," arXiv preprint 2505.22617, 2025. [pdf] [code]
Jinmei Liu, Fuhong Liu, Jianye Hao, Bo Wang, Huaxiong Li, Chunlin Chen, Zhi Wang*, "Scalable In-Context Q-Learning," arXiv preprint 2506.01299, 2025. [pdf] [code]
Jinmei Liu, Wenbin Li, Xiangyu Yue, Shilin Zhang, Chunlin Chen, and Zhi Wang*, "Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay," arXiv preprint arXiv:2404.10662, 2024. [code]
Conferences
Wenhao Wu, Fuhong Liu, Haoru Li, Zican Hu, Daoyi Dong, Chunlin Chen, Zhi Wang*, "Mixture-of-Experts Meets In-Context Reinforcement Learning," Advances of Neural Information Processing Systems (NeurIPS), 2025. [pdf] [code]
Shilin Zhang, Zican Hu, Wenhao Wu, Xinyi Xie, Jianxiang Tang, Chunlin Chen, Daoyi Dong, Yu Cheng, Zhenhong Sun, Zhi Wang*, "Text-to-Decision Agent: Offline Meta-Reinforcement Learning from Natural Language Supervision," Advances of Neural Information Processing Systems (NeurIPS), 2025. [pdf] [code]
Jianhao Yan, Yafu Li, Zican Hu, Zhi Wang, Ganqu Cui, Xiaoye Qu, Yu Cheng, Yue Zhang, "Learning to Reason under Off-Policy Guidance," Advances of Neural Information Processing Systems (NeurIPS), 2025. [pdf] [code]
Zican Hu, Wei Liu, Xiaoye Qu, Xiangyu Yue, Chunlin Chen, Zhi Wang*, and Yu Cheng, "Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning," in Proceedings of International Conference on Machine Learnig (ICML), 2025. [paper] [code]
Zhi Wang, Li Zhang, Wenhao Wu, Yuanheng Zhu, Dongbin Zhao, and Chunlin Chen, "Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement," in Advances of Neural Information Processing Systems (NeurIPS), 2024. [code] [pdf]
Zican Hu, Zongzhang Zhang, Huaxiong Li, Chunlin Chen, Hongyu Ding, and Zhi Wang*, "Attention-Guided Contrastive Role Representations for Multi-Agent Reinforcement Learning," in Proceedings of International Conference on Learning Representations (ICLR), 2024. [code] [pdf]
Junyi Wang, Yuanyang Zhu, Zhi Wang*, Yan Zheng, Jianye Hao, and Chunlin Chen, "BiERL: A Meta Evolutionary Reinforcement Learning Framework via Bilevel Optimization," in Proceedings of European Conference on Artificial Intelligence (ECAI), 2023. [code]
Zhi Wang, Wei Bi, Yan Wang, and Xiaojiang Liu, "Better fine-tuning via instance weighting for text classification," in Proceedings of AAAI Conference on Artificial Intelligence (AAAI), 2019, 7241-7248. [pdf] [supp]
Journals
Zichuan Liu, Yuanyang Zhu, Zhi Wang*, Yang Gao, and Chunlin Chen, MIXRTs: Toward Interpretable Multi-Agent Reinforcement Learning Via Mixing Recurrent Soft Decision Trees," IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025, 47(5): 4090-4017. [pdf]
Jinmei Liu, Zhi Wang*, Chunlin Chen, and Daoyi Dong, Efficient Bayesian Policy Reuse With a Scalable Observation Model in Deep Reinforcement Learning," IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024, 35(10): 14797-14809. [pdf]
Donghan Xie, Zhi Wang*, Chunlin Chen, and Daoyi Dong, "Depthwise convolution for multi-agent communication with enhanced mean-field approximation," IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024, 35(6): 8557-8569. [pdf]
Hongyu Ding, Yuanze Tang, Qing Wu, Bo Wang, Chunlin Chen, Zhi Wang*, "Magnetic field-base reward shaping for goal-conditioned reinforcement learning," IEEE-CAA Journal of Automatica Sinica (JAS), 2023, 10(12): 1-15. [pdf] [code] [video]
Junyi Wang, Zhi Wang*, Huaxiong Li, Chunlin Chen, Adaptive noise-based evolutionary reinforcement learning with maximum entropy, Acta Automatica Sinica, 2023, 49(1): 54−66. [pdf]
Zhi Wang, Chunlin Chen, and Daoyi Dong, "A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong Reinforcement Learning," IEEE Transactions on Cybernetics (TCYB), 2023, 53(12): 7509-7520. [pdf] [code]
Zhi Wang, Chunlin Chen, and Daoyi Dong, "Instance weighted incremental evolution strategies for reinforcement learning in dynamic environments," IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023, 34(12): 9742-9756. [pdf] [code]
Yuanyang Zhu, Zhi Wang*, Chunlin Chen, and Daoyi Dong, "Rule-based reinforcement learning for efficient robot navigation with space reduction," IEEE-ASME Transactions on Mechatronics (TMECH), 2022, 27(2): 846-857. [pdf] [supp]
Zhi Wang, Chunlin Chen, and Daoyi Dong, "Lifelong incremental reinforcement learning with online Bayesian inference," IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022, 33(8): 4003-4016. [pdf] [code]
Zhi Wang and Han-Xiong Li, "Dissimilarity analysis based multimode modeling for complex distributed parameter systems," IEEE Transactions on Systems, Man, and Cybernetics: Systems (TSYS), 2021, 51(5): 2789-2797. [pdf]
Zhi Wang, Han-Xiong Li, and Chunlin Chen, "Incremental reinforcement learning in continuous spaces via policy relaxation and importance weighting," IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2020, 31(6): 1870-1883. [pdf] [code]
Zhi Wang, Han-Xiong Li, and Chunlin Chen, "Reinforcement learning based optimal sensor placement for spatiotemporal modeling," IEEE Transactions on Cybernetics (TCYB), 2020, 50(6): 2861-2871. [pdf]
Zhi Wang, Chunlin Chen, Han-Xiong Li, Daoyi Dong, and Tzyh-Jong Tarn, "Incremental reinforcement learning with prioritized sweeping for dynamic environments," IEEE-ASME Transactions on Mechatronics (TMECH), 2019, 24(2): 621-632. [pdf] [code]
Zhi Wang and Han-Xiong Li, "Incremental spatiotemporal learning for online modeling of distributed parameter systems," IEEE Transactions on Systems, Man, and Cybernetics: Systems (TSYS), 2019, 49(12): 2612-2622. [pdf]
Note: *indicates the corresponding author.
|