The paper describes pyKNEEr, a python package for open and reproducible research on femoral knee cartilage using Jupyter notebooks as a user interface. I created this paper with the specific intent to make both the workflows it describes and the paper itself open and reproducible, following guidelines from authorities in the field. Therefore, two things in the paper can be reproduced: 1) workflow results: Table 2 contains links to all the Jupyter notebooks used to calculate the results. Computations are long and might require a server, so if you want to run them locally, I recommend using only 2 or 3 images as inputs for the computations. Also, the paper should be sufficient, but if you need further introductory info, there are a documentation website: https://sbonaretti.github.io/pyKNEEr/ and a "how to" video: https://youtu.be/7WPf5KFtYi8 2) paper graphs: In the captions of figures 1, 4, and 5 you can find links to data repository, code (a Jupyter notebook), and the computational environment (binder) to fully reproduce the graph. These computations can be easily run locally and require a few seconds. All Jupyter notebooks automatically download data from Zenodo and provide dependencies, which should make reproducibility easier.
Paper and codes+data have been published 4 years ago, will they still work? I always try to release data and codes to reproduce my papers, but I seldom receive feedback. It would be useful to have comments from a reproducers' team, in order to improve sharing for future research (I switched from MATLAB to Python already).
This paper provides a novel approach to identifying oncogenes based on RNA overexpression in subsets of tumor relative to adjacent normal tissue. Showing that this study can be reproduced would aid other researchers who are attempting to identify oncogenes in other cancer types using the same methodology.
It'll a great helpful to independently check the scientific record I've published, so that errors, if there are any, could be corrected. Also, I will learn how to share the data in a more accessible to other if you could give me feedback.
I tried hard to make this paper as reproducible as possible, but as techniques and dependencies become more complex, it is hard to make it 100% clear. Any form of feedback is more than welcome.
Currently submitted paper on COVID19 on mental health. Unique clinical data (time series during the pandemic onset) & methods, hopefully fun to work on. Possibly too boring / easy to reproduce given my data & code? Not sure.
To see whether we did a good enough job in providing data and methods, and to check how the code has aged with respect to current libraries.
- This paper is a good example of a standard social science study that is (I hope!) fully reproducible, from main analysis, to supplementary analyses and figures. - I have not yet received any external feedback w.r.t. its reproducibility, so would be interested to see if I have overlooked any gaps in the reproduction workflow that I anticipated.
If all went right, the analysis should be fully reproducible without the need to make any adjustments. The paper aims to find optimal locations for new parkruns, but we were not 100% sure how 'optimal' should be defined. We provide a few examples, but the code was meant to be flexible enough to allow potential decision makers to specify their own, alternative objectives. The spatial data set is also quite interesting and fun to play around with. Cave: The full analysis takes a while to run (~30+ min) and might require >= 8gb ram.
Open data and reproducibility was important in this project.
It is a rare find of full reproducibility in the field of plant disease epidemiology.
Low Energy Electron Microscopy (LEEM) is a somewhat specific form of electron microscopy used to study surfaces and 2D materials. In this paper we describe a set of data processing techniques applied to LEEM and adapted to the peculiarities of LEEM. This is combined with a parallelized Python implementation using Dask in separate notebooks. So if you are interested in microscopy, image analysis, clustering of experimental physics data or parallel Python, this paper should be interesting to you.
The results of the individual studies (4) could be interpreted in support for the hypothesis, but the meta-analysis suggested that implicit identification was not a useful predictor overall. This conclusion is an important goalpost for future work.
We propose a simple method to retrieve optical constants from single optical transmittance measurements, in particular in the fundamental absorption region. The construction of needed envelopes is arbitrary and will depend on the user. However, the method should still be robust and deliver similar results.
This paper shows a fun and interesting simulation result. I find it (of course) very important that our results are reproducible. In this paper, however, we did not include the exact code for these specific simulations, but the results should be reproducible using the code of our previous paper in PLOS Computational Biology (Van Oers, Rens et al. https://doi.org/10.1371/journal.pcbi.1003774). I am genuinely curious to see if there is sufficient information for the Biophys J paper or if we should have done better. Other people have already successfully built upon the 2014 (PLOS) paper using our code; see e.g., https://journals.aps.org/pre/abstract/10.1103/PhysRevE.97.012408 and https://doi.org/10.1101/701037).
The format of the paper is a bit unusual: it is contained, and compiled as, an R package. Although this would seem, on its face, to make it easier to reproduce, it is an open question how obvious it will be. I wonder to what extent people reproducing the results would prefer this to simple R scripts.
We made a huge effort to ensure the paper is reproducible. But is it?
The original data took quite a while to produce for a previous paper, but for this paper, all tables and figures should be exactly reproducible by simply running the jupyter notebook.
This is a small dataset with a lot of missing data, so it's quite challenging to produce reliable results. It uses multiple imputation to fill the missing data, so it would be interesting to see whether the results hold up when this is redone. However, since the multiple imputation takes a couple of hours to run (on a decent laptop), the final multiply imputed data is also included. Additionally, multiply imputed data needs a different statistical analysis approach, which you can get familiar with.
We've tried to make it as easy as possible to reproduce. There's some fun physics on the paper and it's all done with Python!