Review of
"What do analyses of city size distributions have in common?"

Review of "What do analyses of city size distributions have in common?"

Submitted by mweber  

June 14, 2024, 9:40 a.m.

Lead reviewer

mweber

Review team members

DerH0lgi

Review Body

Reproducibility

Did you manage to reproduce it?
Partially Reproducible
Reproducibility rating
How much of the paper did you manage to reproduce?
9 / 10
Briefly describe the procedure followed/tools used to reproduce it

We took the given notebook from https://clementinecttn.github.io/MetaZipf/metametazipf_notebook.nb.html which contains the main code for the findings and ran the code cells. Since it is an html-notebook which can not be run directly, we copied the code from the cells into a jupyter-notebook with an R kernel running in jupyter-lab.

Briefly describe your familiarity with the procedure/tools used by the paper.

We have basic knowledge of working with the programming language R. We have good working knowledge of descriptive statistics and visualization.

Which type of operating system were you working in?
Apple Operating System (macOSX)
What additional software did you need to install?

We ran the notebook in a jupyterlab environment which was started in a docker-container on the host machine. For this we used the docker-image from ´quay.io/jupyter´which already contained the R-kernel for jupyter-notebooks.

What software did you use
  1. docker
  2. jupyter-notebook with R-kernel
What were the main challenges you ran into (if any)?
  1. Finding out which R-packages are needed to run the code, since there are not imports in the given html notebook.
  2. Installing the dependencies.
  3. Finding helper functions, which are not given in the notebook, but can be found in other files in the repository.
What were the positive features of this approach?
  1. The code in the notebook is well structured and it is clear, which code snippets are used for which visualization in the paper.
  2. Having the data inside the same repository makes it easy to just clone the repository and get the code running.
Any other comments/suggestions on the reproducibility approach?

It would be great to have just a single notebook which loads the raw data in the beginning and runs all the necessary steps until the results are reached.


Documentation

Documentation rating
How well was the material documented?
6 / 10
How could the documentation be improved?
  1. Imports/dependencies should be listed inside the given notebook.
  2. The notebook should contain the relevent helper functions or import them via a relative path from other files in the repository.
  3. The given notebook should not be an html-file but a runnable file of some kind.
What do you like about the documentation?
  1. The given notebooks was well structured and contained the relevant code for the visualizations and data in the paper.
After attempting to reproduce, how familiar do you feel with the code and methods used in the paper?
8 / 10
Any suggestions on how the analysis could be made more transparent?

The path from the raw data to the results could be more clear. The analysis of the data is distributed over different files in the repository.


Reusability

Reusability rating
Rate the project on reusability of the material
5 / 10
Permissive Data license included:  
Permissive Code license included:  

Any suggestions on how the project could be more reusable?

A license should be added to allow for reuse. Without that, only the ideas but not the concrete code can be reused by other researchers.



Any final comments

Overall it is clear, that a lot of effort has been made to make the work reproducible. Nevertheless, the structure of the repository could be simplified. Adding a permissive license the the repository would be great for legal certainty for other researchers using the material.