Zhi WANG (王志)
Reinforcement Learning | Robotics

I am currently a Ph.D. Candidate at City University of Hong Kong, Hong Kong, China, where I am advised by Han-Xiong Li. I work on reinforcement learning, machine learning and system modeling.

I received my bachelor degree in engineering at Nanjing University, Nanjing, China, in 2015, where I was advised by Chunlin Chen. I worked on reinforcement learning and robotics.

Email  /  CV  /  Chinese CV  /  Biography  /  Google Scholar  /  Github


I'm interested in reinforcement learning (RL), machine learning, and robotics. Specifically, I work on how learning algorithms can scale RL agents to dynamic environments, allowing them to autonomously adapt to the non-stationary task distributions in real-world domains. This includes a wide range of topics such as incremental learning, online learning, continual learning, transfer learning, model-based learning, and meta-learning. I have also worked in learning based intelligent modeling of distributed parameter systems (DPSs).

Journal Articles

Incremental reinforcement learning with prioritized sweeping for dynamic environments,
Zhi Wang, Chunlin Chen, Han-Xiong Li, Daoyi Dong, and Tzyh-Jong Tarn,
IEEE/ASME Transactions on Mechatronics, 2019.
pdf / code / BibTex / notes / Chinese notes

Traditional RL algorithms focus on learning in a stationary environment. We propose a novel Incremental Reinforcement Learning (IRL) algorithm for learning in dynamic environments where the reward function may change over time. IRL provides an appealing option for saving a significant amount of computational resources, while the dynamic environment scenario is supposed to hold in many challenging real-world domains.


Reinforcement learning based optimal sensor placement for spatiotemporal modeling,
Zhi Wang, Han-Xiong Li, and Chunlin Chen,
IEEE Transactions on Cybernetics, 2019.
pdf / BibTex

Optimizing the sensor locations within a distributed process is challenging since most distributed processes are intrinsically nonlinear with infinite dimensions. The self-learning property from unknown environments makes RL a promising candidate for the optimization or control of real systems. In this paper, we develop an integral RL-based optimal sensor placement method for spatiotemporal modeling of DPSs. The sensor placement configuration is mathematically formulated as a Markov decision process (MDP) with specified elements, and the sensor locations are optimized through learning the optimal policies of the MDP according to the spatial objective function.

Incremental learning for online modeling of distributed parameter systems,
Zhi Wang, and Han-Xiong Li,
IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2018.
pdf / BibTeX

Traditional spatiotemporal modeling methods are performed in batch-mode, limiting their online applications. We propose an incremental learning method that recursively updates the spatial basis functions and the temporal model. In this way, the model synthesis is inherited and updated efficiently as the streaming data increases over time in an online setting.

Conference Papers

A novel incremental learning scheme for reinforcement learning in dynamic environments,
Zhi Wang, Chunlin Chen, Han-Xiong Li, Daoyi Dong, and Tzyh-Jong Tarn.
In: 12th World Congress on Intelligent Control and Automation, 2016.
pdf / BibTeX

We initialize the concept of incremental learning in RL community, and propose a new scheme that aims at automatically adjusting the optimal policy to adapt to the ever-changing environment.

Invited Talks

Incremental reinforcement learning for dynamic environments,
Zhi Wang, School of Engineering and Information Technology, University of New South Wales, Canberra, Apr. 2019.

- Changing factors/unexpected perturbations are very common in real-world scenarios
- Avoid repeatedly training, save large computational resources
- Maintain and update learned knowledge for online applications


Learning based intelligent modeling for distributed parameter systems,
Zhi Wang, Department of Control and Systems Engineering, Nanjing University, Oct. 2018.

- Reinforcement learning based optimal sensor placement
- Incremental learning for online modeling
- Multimode modeling for complex distributed processes