Review of
"A multi-level analysis of data quality for formal software citation"

Review of "A multi-level analysis of data quality for formal software citation"

Submitted by AnjaEggert  

Sept. 21, 2023, 12:41 p.m.

Lead reviewer


Review Body


Did you manage to reproduce it?
Partially Reproducible
Reproducibility rating
How much of the paper did you manage to reproduce?
8 / 10
Briefly describe the procedure followed/tools used to reproduce it
  • cloning GitHub repro
  • aim to reproduce analysis in Python (original: R)
  • as images were missing, analysis.qmd in R was succesfully rendered (but on different computer/system) -> R packages were installed
  • solution: in YAML header write self-contained = TRUE
  • lines 60:63: it is not clear why software type "book" is changed to "software_article"
  • lines 98-100: df_tmp yields different counts
  • line 101: no python function that yield multinomial estimates could be found, only confidence intervals available
  • Function in R: MultinomCI(), default method is "sisonglaz" --> Function + method used in Python: statsmodels.stats.proportion.multinomial_proportions_confint(method='sison-glaz') --> but confidence intervals are much larger in R --> maybe different method? maybe method='goodman'?
  • lines 112:134: code gives slightly different plot (different axis labels, order of software-citation-types, and respective counts)
Briefly describe your familiarity with the procedure/tools used by the paper.

We are 5 people with different background in programming, mostly in Python and also in R.

Which type of operating system were you working in?
Linux/FreeBSD or other Open Source Operating system
What additional software did you need to install?
  • Python packages: numpy, pandas, matplotlib, statsmodels
What software did you use

Python v. 3.10.13 in Visual Studio Code

What were the main challenges you ran into (if any)?
  • getting the missing images
  • translating R code into Python
  • constantly overwriting the same variables
What were the positive features of this approach?
  • recognizing that the original R code is very condensed
  • translating code from R into Python requires full comprehension of the code
Any other comments/suggestions on the reproducibility approach?


Documentation rating
How well was the material documented?
8 / 10
How could the documentation be improved?
  • add main results (maybe as a figure) to the readme
  • add designated licence file
What do you like about the documentation?
  • is provided
  • necessary R packages including version numbers are given
  • link to the paper
After attempting to reproduce, how familiar do you feel with the code and methods used in the paper?
10 / 10
Any suggestions on how the analysis could be made more transparent?
  • n.a. - as we only reproduced Fig. 3 of the paper


Reusability rating
Rate the project on reusability of the material
9 / 10
Permissive Data license included:  
Permissive Code license included:  

Any suggestions on how the project could be more reusable?
  • CC BY 4.0 licence is given, but should not be used for code
  • consider e.g. MIT licence

Any final comments
  • Link to the paper on the ReproHack homepage is broken