Review of
"Measuring the impact of COVID-19 vaccine misinformation on vaccination intent in the UK and USA"

Review of "Measuring the impact of COVID-19 vaccine misinformation on vaccination intent in the UK and USA"

Submitted by NeWildeSache  

June 30, 2024, 2:36 p.m.

Lead reviewer

NeWildeSache

Review team members

User3318

Review Body

Reproducibility

Did you manage to reproduce it?
Partially Reproducible
Reproducibility rating
How much of the paper did you manage to reproduce?
5 / 10
Briefly describe the procedure followed/tools used to reproduce it

We cloned the repository from GitHub using the latest commit (fd9822d) and then opened it in an IDE of our choice (VS Code + Pycharm). After briefly examining the repository structure, we noticed that we'd need to run some python jupyter notebooks. To do so, we installed a python environment with the necessary packages specified in the paper using conda. While running the notebooks, we ran into issues with missing packages so we installed those as well. We also had issues with missing data and the pystan model calculations in the statistical analyses notebook ran infinitely without computing any results.

Briefly describe your familiarity with the procedure/tools used by the paper.

We have great experience with most of the technical tools used, namely python, jupyter notebooks and pandas but never used the pystan package. Our knowledge about the statistical analyses used in the paper was very limited though.

Which type of operating system were you working in?
Linux/FreeBSD or other Open Source Operating system
What additional software did you need to install?

We had to install a python environment including necessary packages. The packages necessary to run the notebooks were: pystan, pandas, numpy, pyreadstat, scipy, matplotlib.

What software did you use

We used python and VS Code/Pycharm.

What were the main challenges you ran into (if any)?

We had major issues trying to find information on the necessary python environment. The only information provided was: "Python version 3.7.3 was used for all analysis with the following libraries: -- pystan -- pandas -- numpy -- matplotlib" and this was only specified in the appendix of the paper. No information was given in the repository's README. The packages scipy and pyreadstat were missing and since there are no specified versions of the packages, it seems likely that we installed different versions than the authors. Furthermore, we ran into issues while running the notebooks. First, the notebook 'statistical_analyses.ipynb' seemed to get stuck while fitting the model. This could be due to us not running the training for long enough (we ran it for 30 minutes) but compiler warnings suggested otherwise. And then also, we could not run the notebook 'import_data.ipynb' since this required the raw data files which were not provided with the repository.

What were the positive features of this approach?

The repository was easy to access, well structured and the jupyter notebooks had great documentation of the workflow on the markdown cells. In general, jupyter notebooks are a great tool for reproducibility and documentation.

Any other comments/suggestions on the reproducibility approach?

Documentation

Documentation rating
How well was the material documented?
6 / 10
How could the documentation be improved?

Authors should always provide a complete list of the necessary packages including package versions. There are existing easy-to-use standards for automating the installation of python environments such as using a 'requirements.txt' that can also be automatically generated from the author's repository. Attaching such a file would be highly recommended. Also, the authors should have documented their own system used to run the code and there was no information regarding the operating system or used compilers for the c++ pystan code. Lastly, it should have been clearly communicated to the viewer that the raw data files were missing in the repository. You literally had to run the notebook 'import_data.ipynb' to find out there are missing files which is definitely not ideal.

What do you like about the documentation?

The jupyter notebooks describe the workflow very well and lots of readme files help with understanding the repository and data structure. Also, there was information on how to cite the materials in the paper.

After attempting to reproduce, how familiar do you feel with the code and methods used in the paper?
8 / 10
Any suggestions on how the analysis could be made more transparent?

The raw data should be publicly available for full transparency.


Reusability

Reusability rating
Rate the project on reusability of the material
8 / 10
Permissive Data license included:  
Permissive Code license included:  

Any suggestions on how the project could be more reusable?

The repository includes a very permissive MIT license without restrictions for the software. We are not sure if this also covers the data included in the repository since the license only explicitly states unrestricted use of "this software and associated documentation files". An explicit license for the data would help the reusability. Obviously, improving the documentation, listing all required packages and publicly providing the raw data as mentioned would also improve reusability.



Any final comments