2023 Data Science for Social Good

The Data Science for Social Good summer program trains aspiring researchers to work on data science projects with social impact. Working closely with governments and nonprofits, participants take on real-world problems in education, health, energy, public safety, transportation, economic development, international development, and more. Participants include a diverse and inclusive cohort of students who spend the summer on campus working with the program.

This fifth summer of the Stanford Data Science for Social Good (DSSG) program ran from June 26th to August 17th, 2023.

The goal of the DSSG program is to train the next generation of ethically aware data scientists and to provide measurable impact for projects with social impact. This summer's program had seven student fellows from a variety of backgrounds, ranging from computer science to math to political science. The fellows divided into three teams, and each worked with a different partner organization to bring critical insights into a core data science challenge.

Projects

View the intro to the program and all final presentations (Final Presentations were Wednesday, August 16, 2023 from 10:00 - 11:30 am)

DSSG Leadership Team:

Balasubramanian "Naras" Narasimhan is a senior advisor and research data scientist in Stanford Data Science, Statistics and Biomedical Data Science Departments. Profile

Shilaan Alzahawi is a Master's student in Statistics at Ghent University and a Ph.D. candidate in Organizational Behavior at the Stanford Graduate School of Business. Shilaan is interested in open, reproducible, and rigorous science, and is affiliated with the Stanford Center for Open and Reproducible Science. In her free time, she enjoys deadlifting, hiking, and taking board games much too seriously. At the Stanford Data Science for Social Good program, Shilaan served as the head organizer in Summer 2023 and Summer 2022 and as a technical mentor in Summer 2021.

Sophia Lu is a Ph.D. candidate in the Department of Statistics. Prior to graduate studies, she received her B.S. with honors in Mathematical and Computational Science from Stanford. Broadly, her research interests lie at the intersection of Bayesian modeling & inference, statistical machine learning, robust inference under distributional shifts, and their applications to computational genomics. Her current work focuses on developing efficient sampling algorithms for posterior inference. Outside of research, she is an avid portrait photographer, food critic, and chocolate connoisseur.

Maternal and Child Health - A Satellite’s Perspective

Final Presentation (Video/Slides)

Currently, basic indicators of maternal and child health (MCH) and coverage with essential MCH services (e.g., childhood vaccinations) are obtained from expensive nationally representative household surveys. This research project explores the use of machine learning with satellite imagery and other publicly available geotagged data for the estimation of key indicators of maternal and child health (MCH) indicator status. We use three main sources for training data: an aggregation of geo-tagged data collected using Google Earth Engine, extracted features from satellite imagery collected using the MOSAIKS API, as well as actual images from the Landsat satellites. Using household surveys collected by the USAID’s Development and Health Surveys (DHS) program as our ground truth data, we train a set of regression and classification models to predict MCH indicators. Specifically, we train models to predict the following MCH indicators: Mean/Median BMI, Under Five Mortality Rate, Unmet Need Rate, Skilled Birth Attendant Rate, and Stunted Growth Rate. With the numerical data from Google Earth Engine and MOSAIKS as our training datasets, we use Microsoft Azure’s automated machine learning functionalities to automatically generate stack ensemble models. For the Landsat images, we experimented with convolutional neural networks and vision transformers (ViT). Our trained regression models perform reasonably well for estimating several health indicators, with the model for estimating Skilled Birth Attendant Rate achieving the highest r-squared coefficient of 0.668. Our classification model for estimating Skilled Birth Attendant Rate also received the highest classification accuracy of 68%. Altogether, our research suggests that utilizing satellite imagery and other geotagged data is a promising approach for estimating MCH indicators.

Technical Mentor
Haojie Wang is a Postdoctoral Fellow at Stanford Data Science. His research focuses on the development of intelligent earth observation approaches for global population health monitoring, especially for low-resource regions like low- and middle-income countries (LMICs). He is also interested in using data science techniques to address research challenges associated with natural hazard forecasting, risk assessment and global environmental change. Outside of work, he is passionate about cooking, hiking and exploring new places through travel.

Fellows
Emily Wesel is a Bachelor’s and Master’s student in Computer Science at Stanford. During the school year, she does research in computational radiology, applying machine learning to MRI images. Emily strives for all her machine learning research to be deeply grounded in a thorough understanding of its respective field. Additionally, Emily minored in history and loves geospatial visualization. Outside of school, Emily loves running, swimming, Taylor Swift, historical novels, and crime shows. At the Stanford Data Science for Social Good program, Emily contributed to the satellite images for maternal health team.

Poojit Hegde is graduating with his B.S. degree in Mathematical and Computational Science, and is currently pursuing his coterminal Master’s degree in Computer Science. Poojit enjoys working on interdisciplinary projects that relate science and society. They are a student of history, and they like to study the past in the form of books and films. Poojit also enjoys spending time outside, going for walks, and climbing trees.

Mac Ya is an undergraduate junior at Stanford University majoring in Computer Science, with a potential double major or minor in Linguistics. Mac’s academic interests are centered around language models in Natural Language Processing (NLP) and the broader applications within computer vision. Prior to his current academic pursuits, he developed a keen interest in the intricacies of linguistic structures and their computational representations. Outside of academia, Mac has a passion for skiing, tennis, and motor racing.

Addressing Missing Data in a Study on Predictors of Adherence to HIV Treatment among Young Women in Kisumu, Kenya

Final Presentation (Video/Slides)

The complex relationships between intimate partner violence, mental health, and adherence to HIV treatment are severely understudied, especially in adolescents. As this population regularly struggles to meet the Joint United Nations Programme on HIV/AIDS (UNAIDS) 95-95-95 goals, a failure to understand the factors that impact HIV medication adherence may put the global 95-95-95 goals at risk. Our study investigates the factors that influence viral load adherence of adolescent girls and young women in Kisumu, Kenya, which is the county with the second highest prevalence of HIV nationwide, at roughly 20%. However, because of data collection challenges coinciding with the COVID-19 pandemic, the study only has viral load counts for approximately 50% of the 309 study participants. Our task was to predict the adherence of participants with missing data, using multiple imputation informed by an analysis of the predictors of adherence among participants with viral load data; as well as an analysis of observable differences between participants who are missing viral load data and those who are not.

Evaluating the Nutritional Landscape of US Food Banks

Final Presentation (Video/Slides)

Food banks play a critical role in reducing food waste and alleviating food insecurity. In this project, we develop metrics and data visualizations to evaluate the food supply for 32 food banks across the United States. Collaborating with several food bank partners, we develop a ‘nutritional profile’ for each of these 32 food banks that provides information about specific values, overall trends, and future directions. Our results indicate that food banks play an important role in providing healthy, nutritious, and sustainable food options for the communities they serve.

Sign up for DSSG announcements through the Stanford Data Science mailing list.

Are you interested in becoming a student fellow or mentor next summer? Add yourself to the mailing list and we’ll contact you when next summer’s applications for fellows and mentors are up in early spring.

Do you have a social good project that you think DSSG could help with? If you’re interested in partnering with us, please add your name to this list, and we will notify you later this winter when the application for partnerships for next summer goes live.

2023 Data Science for Social Good

Main navigation

Projects

Maternal and Child Health - A Satellite’s Perspective

Addressing Missing Data in a Study on Predictors of Adherence to HIV Treatment among Young Women in Kisumu, Kenya

Evaluating the Nutritional Landscape of US Food Banks