Skip to main content Skip to secondary navigation

2022 Data Science for Social Good

Main content start

The Data Science for Social Good summer program trains aspiring researchers to work on data science projects with social impact.  Working closely with governments and nonprofits, participants take on real-world problems in education, health, energy, public safety, transportation, economic development, international development, and more. Participants include a diverse and inclusive cohort of students who spend the summer on campus working with the program.

This fourth summer of the Stanford Data Science for Social Good (DSSG) program ran from June 20th to August 12th, 2022.

The goal of the DSSG program is to train the next generation of ethically aware data scientists and to provide measurable impact for projects with social impact. This summer's program had seven student fellows from a variety of backgrounds, ranging from computer science to statistics to sociology. This year, Stanford also invited fellows from other US universities! The fellows divided into two teams, each worked with a different partner organization to bring critical insights into a core data science challenge.


 COVID-19 Mortality

  • Final Presentation (Video/Slides)

More than one million Americans have died from COVID-19 since the start of the pandemic; because of challenges in access to COVID-19 testing and indirect deaths resulting from the pandemic, it has been shown that additional COVID-19 mortality is not accounted for. Excess death is a value that represents the difference between expected death outcomes and actual death outcomes for a given time and place. In 2020 and 2021, the COVID-19 pandemic contributed to increased global mortality that was not always accounted for in the COVID-19 mortality data. In this project, we model expected mortality at the U.S. county level for the 500 most populous U.S. counties to generate their excess death estimates. Furthermore, we use policy, health, vaccine, and socioeconomic features to investigate their effects on U.S. county-level excess mortality. Results from this project will reveal factors that contributed to high and low levels of excess death during the COVID-19 pandemic, which can provide guidance in policy implementation and improve pandemic response in the future.

Identifying behavioral health conditions from police records

  • Final Presentation ( Video/Slides)

Law enforcement officers seriously injured or killed 3,600 people in California from 2016 to 2020. In many cases, the victim’s mental health or substance abuse was a factor. In Bakersfield, in fact, this number was as high as 44% (in stark contrast with the 3% statistic reported by the agency). The team partnered with journalists at Big Local News to automatically identify such cases, enabling journalists to more easily hold police agencies accountable. Previously, each case—which oftentimes contained hundreds of pages—was read and manually annotated by two journalists. Our team created a classification pipeline which 1) labels whether a document contains references to behavioral/mental health, and 2) outputs the relevant information (e.g., text, page numbers) if so. Ultimately, our pipeline will be used to analyze police reports at scale as journalists develop a database of misconduct cases for reporting stories.

Sign up for DSSG announcements through the Stanford Data Science mailing list.

Are you interested in becoming a student fellow or mentor next summer? Add yourself to the mailing list and we’ll contact you when next summer’s applications for fellows and mentors are up in early spring. Summer 2023 will be open to non-Stanford affiliated students!

Do you have a social good project that you think DSSG could help with? If you’re interested in partnering with us, please add your name to this list, and we will notify you later this winter when the application for partnerships for next summer goes live.