The cornerstone of all scientific research is that every scientific result should be replicable. This means that independent researchers should be able to conduct the same study again and find the same results as a first study found. Over recent years across disciplines ranging from business studies to medicine, increasing efforts are being made to replicate research findings.
The cornerstone of all scientific research is that every scientific result should be replicable.
Replication is not possible, however, if the original data cannot be accessed by independent researchers. This is a bigger issue today than ever before with growing awareness of the importance of data protection and privacy and more and more laws that prevent confidential data from being freely shared. For instance, in economics, around 40% of the empirical papers published in the best academic journals use confidential data.
A new agency for certifying computational research
Christophe Pérignon and colleagues have recently launched the Certification Agency for Scientific Code and Data (cascad). The cascad agency is a not‐for‐profit certification agency created by academics with the support of the French National Centre for Scientific Research. It is a trusted third-party that formally checks whether the results presented in a paper can be obtained from the data and computer code of the researchers.
The cascad agency offers two kinds of certification: one for research based on open data and another one for research based on confidential data. The latter is being done through collaboration with the Centre d’Accès Sécurisé aux Données (CASD), a French public research infrastructure that allows users to access and work with confidential government data under secured conditions. This centre currently provides access to data from the French Statistical Institute and the French Ministries for Finance, Justice, Education, Labor, and Agriculture, as well as Social Security contributions and health data. Data cannot be downloaded, but access is made possible via a virtual machine that allows researchers to remotely access data on a specific piece of hardware that is protected by a fingerprint reader.
The cascad agency offers two kinds of certification: one for research based on open data and another one for research based on confidential data.
The application process to CASD data takes around six months and involves a presentation of the research project before the French Statistical Secrecy Committee. This creates a major roadblock preventing the referees tasked with evaluating research papers from gaining access, as they currently have to go through exactly the same process as the original researchers. “This has been a clear impediment to research reproducibility and we had to come up with a solution”, explains Pérignon. Now, thanks to the cascad-CASD partnership, researchers have the opportunity to signal the reproducibility of their work based on confidential data.
Introducing the reproducibility reviewer
Researchers will now be able to request a reproducibility certification for a paper when they want to publish it. Then, a “reproducibility reviewer”, who is a full‐time cascad employee specialized in the software used by the author, verifies the results presented in a paper by accessing a CASD virtual machine, which is a clone of the one used by the author. This includes a copy of the source dataset and of the author’s computer code, as well as all software required to run the code.
Having a dedicated team of reproducibility reviewers with the job of verifying research data brings a range of benefits. “We can do it in a few days rather than in a few months and we can do it systematically rather than waiting for someone who is interested or brave enough or patient enough to come along and do this themselves” says Pérignon. When researchers see that a paper has been certified, they know that a third-party has been able to use the very same code and data as the original researchers and has successfully reproduced the results. This boosts trust in the published results and in science.
Enriching the academic review process
Currently it is expected that journals take care of the academic review process. They enlist researchers who volunteer their time to evaluate research. In most cases, they have no access to the core data used to produce the work. Researchers do this to help the community but they are completely swamped by the volume of work they already have. In reality, checking that research papers accurately represent the data they are based on takes special skills and quite a bit of time. So “there are economies of scale of having a specialized agency with full-time experts in software and data. Journals can outsource this activity to a specialized agency like cascad,” explains Pérignon. This frees up the resources of researchers who review work and allows them to have greater confidence in the work they review.
Beyond academia
Now that cascad is up and running, there is no reason it need be limited to academia. “The tool that we designed here can go beyond academia; it could be used to redo analyses made by all sorts of agencies and to test algorithms used by corporations and public administrations”. Given the increasingly important role played by algorithms in our society, having a trusted third-party being able to certify that a given algorithm is valid and unbiased would be extremely useful.