Data-driven inquiry is key to all aspects of science and discovery, and data-based decisions are becoming integral to society. The challenges and the importance of meeting them are especially critical when the goal is to obtain relevant, valid, reproducible scientific insights. The Stanford Data Science Collaboratory will confront these challenges by creating a community of faculty, postdoctoral scholars, students, and research fellows that leverage data science methods and domain knowledge to tackle pressing problems. In the Collaboratory, data scientists will work closely with scholars from other fields who rely on large, accurate, dependable datasets and data science techniques. The Collaboratory will foster the work of researchers who study the ethical issues related to data collection and use, and will use data to solve societal and scientific problems. A hallmark will be thorough validation of data and a careful statistical calibration of the evidence to avoid misinterpretations that could have adverse consequences. A second major goal of the Collaboratory is the growth of a citizenry literate in data science: universities have an obligation to ensure the next generation understands how to interpret and learn from data, and how to collect and manage it.
The Collaboratory identifies a set of five high-profile, high-impact projects that domain scientists deem important, and where they believe they are unable to make progress without a paradigm shift in the way they approach data sets. The first two concern sustainable relationships between humans and the environment, namely
- (1) the problem of managing coral reefs in a changing climate and
- (2) reducing illegal fishing and forced labor in tuna supply chains. To make progress, the project will leverage new data sources: satellite remote sensing, ground monitoring stations based on soundscape, and genomic measurements that track biodiversity and evolution.
The other three projects are about fractures in society and steps towards a sustainable one:
- (3) How to understand the determinants of poverty in the U.S.;
- (4) How to detect and track political framing in digital media; and
- (5) How to develop data science tools that support equitable treatment between individuals. Public data streams (e.g., social media apps, Wikipedia and Wikidata and moderate-resolution satellite imagery), as well as private-sector data (e.g., cell phone records, Facebook activity, internet search queries, drone imagery and fine-resolution satellite data), will inform understanding of the mechanisms causing poverty.
To meet the research goals, the Collaboratory will incentivize faculty, students and postdocs to come together to find new data science solutions by supporting collaborative research teams and brainstorming working groups. The Collaboratory will also engage the undergraduate community by providing hands-on guided scientific research experience. To enlarge collaboration beyond Stanford, the Collaboratory will host outside visitors and invite scientists to campus for an annual symposium.