Review of
"Measuring the impact of COVID-19 vaccine misinformation on vaccination intent in the UK and USA"

Review of "Measuring the impact of COVID-19 vaccine misinformation on vaccination intent in the UK and USA"

Submitted by samuelpawel  

May 21, 2022, 9:27 a.m.

Lead reviewer

pdegen

Review team members

samuelpawel

Review Body

Reproducibility

Did you manage to reproduce it?
Partially Reproducible
Reproducibility rating
How much of the paper did you manage to reproduce?
8 / 10
Briefly describe the procedure followed/tools used to reproduce it

We cloned the project repository from Github, working with the latest commit (fd9822d). We created a Conda environment with the same Python version (3.7.3) as specified in the paper's Reporting Summary. As the exact PyStan version was not mentioned in the Reporting Summary, we first tried to run the code with a recent version (3+). However, the Jupyter Notebook for the statistical analysis did not manage to execute successfully. We then saw that one of the references in the paper refers to PyStan 2.17.1.0., upon installation of which the analysis notebook ran successfully.

In parallel, Samuel created a reproducible environment using a Docker container. The forked repository can be found at:

https://github.com/SamCH93/covid19-misinfo

Briefly describe your familiarity with the procedure/tools used by the paper.

I have a high degree of familiarity with the Jupyter environment and the Python data science stack. I have never used PyStan before.

Samuel is actively engaged in Bayesian statistics and reproducible research.

Which type of operating system were you working in?
Linux/FreeBSD or other Open Source Operating system
What additional software did you need to install?

PyStan 2.17.1.0

What software did you use

Python 3.7.3
Docker

Additionally, in my local environment:

Numpy 1.21.5
Pandas 1.3.5
Matplotlib 3.5.1
Seaborn 0.11.2 (for the figures in import_data.ipynb)
Jupyter Lab 3.4.2

What were the main challenges you ran into (if any)?

The Jupyter Notebook to create the figures (tables_figures.ipynb) ran into a Pandas Attribute Error. The notebook ran successfully after making the following two adjustments:

Line 911 in function subset_df() in utils.py:

# del df.index.name
df = df.rename_axis(None, axis=0)

Line 118 in function image_impact() in paper.py:

# del out.index.name
out = out.rename_axis(None, axis=0)

We did not encounter any errors in the two other notebooks (statistical_analysis.ipynb and import_data.ipynb). However, we did not manage to run the full statistical analysis within the allotted time of the ReproHack, as fitting the models takes a lot of computing time. The models up to and including social media use were completed without errors. The model on trusted information sources did not finish after running it for roughly two hours on a hexacore laptop (Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz). Notably, no diagnostic information was printed for me during the fitting of the models.

What were the positive features of this approach?

Creating a Docker environment and forking the repository allows other users to easily recreate our reproduction attempt.

Any other comments/suggestions on the reproducibility approach?

I chose the option "Partially Reproducible" because there was not enough time to fully run the analysis in the given time frame. Since we did not run into any other major issues that we couldn't fix ourselves, it is still possible that the results are fully reproducible.


Documentation

Documentation rating
How well was the material documented?
7 / 10
How could the documentation be improved?

We could not find the exact package versions used in the analysis, neither in the paper's Reporting Summary nor in the README.md. This is quite crucial, as the analysis notebook doesn't run out of the box with newer versions of PyStan (notably, even the import statement has changed between versions 2 and 3). The Attribute Error mentioned earlier was also likely the result of different Pandas versions.

Besides specifying the exact package versions in the paper's Reporting Summary, it would be beneficial to always include something like a Conda environment.yml file or a Dockerfile as part of the Git repository.

What do you like about the documentation?

The study design, mathematical details, and general flow of the notebooks are well documented. The analysis notebook can be read as a self-contained summary of the paper.

After attempting to reproduce, how familiar do you feel with the code and methods used in the paper?
6 / 10
Any suggestions on how the analysis could be made more transparent?

Reusability

Reusability rating
Rate the project on reusability of the material
10 / 10
Permissive Data license included:  
Permissive Code license included:  

Any suggestions on how the project could be more reusable?


Any final comments