We invested a lot of work to make the analyses from the paper reproducible and we are very curious how the documentation could be improved and if people run into any problems.
I used a lot of different tools and strategies to make this paper easily reproducible at different levels. There's Docker container for the highest level of reproducibility, and package versions are managed with renv. The data used in the paper is hosted on Zenodo to avoid long queue times when downloading from the Climate Data Store and future-proof for when it goes away and checksumed before using it.
This paper proposes a probabilistic planner that can solve goal-conditional tasks such as complex continuous control problems. The approach reaches state-of-the-art performance when compared to current deep reinforcement learning algorithms. However, the method relies on an ensemble of deep generative models and is computationally intensive. It would be interesting to reproduce the results presented in this paper on their robotic manipulation and navigation problems as these are very challenging problems that current reinforcement learning methods cannot easily solve (and when they do, they require a significantly larger number of experiences). Can the results be reproduced out-of-the-box with the provided code?
I suggested a few papers last year. I’m hoping that we’ve improved our reproducibility with this one, this year. We’ve done our best to package it up both in Docker and as an R package. I’d be curious to know what the best way to reproduce it is found to be. Working through vignettes or spinning up a Docker instance. Which is the preferred method?
It uses the drake R package that should make reproducibility of R projects much easier (just run make.R and you're done). However, it does depend on very specific package versions, which are provided by the accompanying docker image.
This paper is reproduced weekly in a docker container on continuous integration, but it is also set up to work via local installs as well. It would be interesting to see if it's reproducible with a human operator who knows nothing of the project or toolchain.