Learning with sparse rewards remains a significant challenge in reinforcement learning (RL), especially when the aim is to train a policy capable of achieving multiple different goals. To date, the most successful approaches for dealing with multi-goal, sparse reward environments have been model-free RL algorithms. In this work we propose PlanGAN, a model-based algorithm specifically designed for solving multi-goal tasks in environments with sparse rewards. Our method builds on the fact that any trajectory of experience collected by an agent contains useful information about how to achieve the goals observed during that trajectory. We use this to train an ensemble of conditional generative models (GANs) to generate plausible trajectories that lead the agent from its current state towards a specified goal. We then combine these imagined trajectories into a novel planning algorithm in order to achieve the desired goal as efficiently as possible. The performance of PlanGAN has been tested on a number of robotic navigation/manipulation tasks in comparison with a range of model-free reinforcement learning baselines, including Hindsight Experience Replay. Our studies indicate that PlanGAN can achieve comparable performance whilst being around 4-8 times more sample efficient.
This paper proposes a probabilistic planner that can solve goal-conditional tasks such as complex continuous control problems. The approach reaches state-of-the-art performance when compared to current deep reinforcement learning algorithms. However, the method relies on an ensemble of deep generative models and is computationally intensive. It would be interesting to reproduce the results presented in this paper on their robotic manipulation and navigation problems as these are very challenging problems that current reinforcement learning methods cannot easily solve (and when they do, they require a significantly larger number of experiences). Can the results be reproduced out-of-the-box with the provided code?
They should focus on the model parameters such as the number of GANs, and the roll-out horizon.