On February 18, the Center for Open and REproducible Science held our Launch event to kick off our newly established center. It was a fantastic day filled with interdisciplinary lectures on the open science movement and panels focused on the status of open science at Stanford and discussing issues in open science. We were given a glimpse into the emerging universe of open science and how we can further improve the adoption at Stanford and beyond. We are sharing the videos with summaries for each presentation.
We started the launch event with introductory remarks from Russell Poldrack (Professor of Stanford Psychology, Director of CORES). Dr. Poldrack began by drawing our attention to a survey conducted in 2020 by the Pew Research Center illustrating how scientists are some of the most trustworthy people in the United States. Transparency and reproducibility lie at the heart of our integrity and public credibility of science. Stanford is well positioned to address and make advancements on these issues.
Open science practices are challenging for many colleagues to implement in their every day research operations. Open Science represents a set of values for how we decide to do our research: transparency and accessibility, diversity and inclusion, and community-mindedness. These objectives align nicely with a report released by the National Academies of Science, Engineering, and Medicine in 2018. They conveyed a set of open science recommendations for research institutions to adopt. As open science is further adopted there is value that can be derived from being more transparent. Reproducibility is crucial for trusting the results scientists publish on. Though there are barriers currently present that make it difficult to implement open science practices. This where the new CORES Center (and more broadly Stanford Data Science) can address those challenges. CORES has three primary areas of focus: open science, evidence synthesis, and reproducibility.
Mercè moved on to illustrate the recent advances in data sharing and university computing resources. We are seeing new data policies being implemented at journals across many disciplines, funders (e.g. National Institutes of Health (NIH)), and scientific communities. More domain-specific and domain-general repositories are coming online to support open data sharing efforts. As open data sharing continues to grow, there is a growing need for university services to support these efforts. These services can live within the libraries and IT offices or in academic departments such as statistics or bioinformatics. The typical service is general consulting. In designing the service offerings, these services will need to address and support researchers throughout the research lifecycle. The research lifecycle covers: planning, active research, and dissemination and preservation.
This panel was moderated by Russell Poldrack (Professor in Stanford Psychology and Director of CORES)
The panel participants: Kam Moler (Vice Provost and Dean of Research, Marvin Chodorow Professor and Professor of Applied Physics and of Physics)
Melissa Bondy (Stanford Medicine Discovery Professor and Professor of Epidemiology and Population Health, Co-Director, Stanford Center for Population Health Sciences, Associate Director, Population Sciences at the Stanford Cancer Institute)
Emmanuel Candes (Barnum-Simons Chair in Mathematics and Statistics, Professor of Statistics, Professor of Electrical Engineering (by courtesy), Faculty Director, Stanford Data Science Institute)
Steve Goodman (Associate Dean of Clinical and Translational Research, Professor of Epidemiology and Population Health and of Medicine (Primary care and Population Health), Co-Director, Meta-research Innovation Center at Stanford (METRICS))
Jon Krosnick (Frederic O. Glover Professor in Humanities and Social Sciences, Professor of Communication, Professor of Political Science, Professor of Psychology (by courtesy))
The panel discussion focused on the current state of open science initiatives at Stanford.
Steve Goodman began by introducing a new program on rigor and reproducibility in the School of Medicine. Rigor refers to the strength of research design and analysis of the experiment. Reproducibility refers to making the data accessible both to the primary investigator and research community. The first step of this initiative is data gathering through surveys. This is not limited to a technical venture. There are courses and tools to support data sharing and reproducibility. They strongly value and think what is really important is culture change. Culture change happens from the bottom to the up. They have working groups to modify promotional criteria and enhanced resumes. They want to make the process as easy and simple as possible to implement open science practices into their research workflows.
Kam Moler discussed the values and mission of the CORES center. Kam is excited by the aspirational message our mission conveys. The values the center holds has seen uptake at Stanford, which has positive byproducts in the broader context. Transparency and openness is an important value to hold and has clear connections with ongoing Stanford efforts. Transparency and openness is important and has wide support from government agencies that these values have demonstrated their worth. Truly innovative and advanced research projects are strengthened by open sharing policies. Transparency and openness practices nurtures public trust and strengthens research credibility.
Emmanuel Candes began by mentioning that we do open science to achieve replicability so the results can stand the test of time. Replicability has two aspects: ability to reproduce each step of data analysis and get the same results and the results can be confirmed by fellow scientists. The second aspect can become a statistical problem and the community can evaluate the statistical finding. The statistics department has devoted energy to developing methods and tools to ensure what we report can stand up to scientific scrutiny. Culture change is another important aspect and the hope is a course can be developed to teach students about the challenges of reproducibility.
Jon Krosnick started by referring us to the wikipedia entry of open science and highlighting the dissemination and accessibility of the research products. We can begin to stretch the definition of open science to also include learning and synthesizing the lessons of the past in a simple and easy way. Openness is challenging but forces scientists to do better housekeeping of one's work and possibility that one’s work will be closely scrutinized leading us to implement better research practices. There is fear in getting caught, but that probability is tiny if the scientific community is sharing their research products. One example of when open science fails is polling. When a community shifts from good to suboptimal methodology, it will affect the ultimate results and conclusions reached. We were introduced to a causal diagram of potential pathways explaining why a scientist may not do optimal science.
Melissa Bondy started by describing the importance of having rigor and reproducibility in science, including training the next generation of scientists. Incorporating standardized best practices is important in enhancing research methodology. One funder, National Institutes of Health, has implemented several checks and balances into the system to keep scientists honest and further push toward open data sharing. Particularly in human research, safeguards do need to be put in place to protect our research participants from later potential identification. It is important to ensure the data is cleaned up and wrangled to address the scientific questions under investigation.
In the open panel section, the group discussed the incentive aspect of the conversation. The key of incorporating open science is through the lens of value added. One potential change could be instituting a CV format change to provide space to highlight open science advancements. This could be implemented not only for tenure review but at salary review to capture the wide span of professor career ranks. The senior professors can help junior professors by giving them opportunities in larger team science projects. Addressing the current incentive structure such as publications may have to be answered to continue pushing the culture change we seek.
One challenge of implementing open science practices can be not knowing how well you or an institution is doing. One approach to resolving that problem is instituting automated digital dashboards for visualizing progress. It makes it easier to evaluate and benchmark yourself or institutions against each other. These dashboards could also make it easier for journals to evaluate themselves and understand how they can further enhance their policies.
Another way to value and highlight open science achievements is by reimagining our current implementation of the CV. This CV can then be used when evaluating a faculty member for hiring, promotion and tenure. Change will come as the incentive structures begin to be reevaluated and enhanced. Potential changes do need to be supported with evidence. Another avenue to pursue can be funders valuing and promoting the incorporation of open science practices into their proposals.
This panel was moderated by Monica Bobra (Research Scientist, Hansen Experimental Physics Laboratory)
The panel participants: David Studdert (Senior Associate Vice Provost for data resources, Professor of Medicine (Primary care outcomes research), Professor of Law)
Sharad Goel (Assistant Professor of Management Science and Engineering, Assistant Professor of Sociology (by courtesy), Assistant Professor of Computer Science (by courtesy), and Assistant Professor of Law (by courtesy))
Ashley Jester (Assistant Director, Science and Engineering Libraries)
Quay (Ph.D student in Civil and Environmental Engineering)
The panel discussion focused on the practical elements of a scientific study.
The panel began by discussing the pros and cons of open scientific data. Open data can be a bit ambiguous and one perspective can be that open data should be FAIR data. There also needs to be clarity about the differing levels of access different types of data are permitted to be. Different domains are at various points along the open data spectrum. In policy generation there needs to be an acknowledgement and understanding that there is not a one-size fits all model and it needs to be catered to that particular domain. In some domains, they are the data users of collected data from various agencies and stepping into the data sharing role is sometimes not a feasible goal to achieve. Another wrinkle to add into the conversation is embargoing data due to the long collection times some studies take and removing that incentive is challenging to reconcile with immediate reproducibility. One other case is holding off some data that could be legally permissible but may have ethical considerations attached to sharing. There is value in being the data creator and collector even if the dataset is immediately made available. The group that collected the data knows the data better than anyone and have control over what questions could be incorporated into the design.
The next question posed was regarding the accessibility and usability of open datasets. This also has a high amount of variance and isn’t typically standardized even within one source. The primary contributor of open data is typically the government. The modernization of the data delivery method can be quite costly and likely underfunded.
The panel moved onto the next topic of open source software. The panel evaluated how feasible incorporating a fully reproducible environment is. There are aspects and items that are easier than others to achieve. When external services are part of the research pipeline, that typically cannot achieve full reproducibility. There are both technical and practical challenges to reaching full reproducibility. One valuable aspect of reproducibility is a how-to guide for running their analysis and generating their figures. Unfortunately, that is not a common research product in the community. Standardization or best practices of data organization and code sharing is one way this can be addressed. An interesting discussion emerged regarding if proprietary software can still be considered open science. A challenge of that is restrictive software licensing will significantly adversely affect the global accessibility of one’s project. Open science should be lowering barriers to our research products and reproducibility. Stanford could have a part to play in the open source community. This could be funding open source projects or helping maintain them. The open source software could be incorporated into the classroom. Another way Stanford could help is by utilizing the research software engineer community and work into their role contributing back to open source projects.
We would like to thank all of our speakers for their presentations, panelists for their thoughts, and our attendees for tuning in!
See you at another CORES event!