The Data Science for Social Good summer program trains aspiring researchers to work on data science projects with social impact. Working closely with governments and nonprofits, participants take on real-world problems in education, health, energy, public safety, transportation, economic development, international development and more. Participants include a diverse and inclusive cohort of students who spend the summer on campus working with the program.
This second summer of the Stanford Data Science for Social Good (DSSG) program ran from June 29th to August 21st, 2020.
The goal of the DSSG program is to train the next generation of ethically aware data scientists and to provide measurable impact for projects with social impact. This summer's program had nine student fellows from a variety of backgrounds, ranging from computer science to statistics to sociology. The fellows divided into three teams, each worked with a different partner organization to bring critical insights into a core data science challenge.
- View intro to the program from the final presentations here
Improving predictions for targeted human trafficking investigations in Brazil
Human trafficking remains a pervasive problem in modern Brazil. According to the Global Slavery Index, hundreds of thousands of people in the country are victims of modern slavery at any given time, amounting to almost 2 victims for every thousand of Brazilian citizens.
Despite these grave statistics, a strong commitment to fight human trafficking from the Brazilian government coupled with robust federal open data policies presents an opportunity to effectively address this hideous problem.
This summer, we will partner with the Human Trafficking Data Lab at Stanford to help the Brazilian Federal Labor Prosecution Office in targeting their investigations into firms involved in human trafficking. Specifically, we will utilize the tools of data science to help build the Intuition Engine – an ensemble predictive model combining regression models, spatial data science, natural language processing, deep learning and network analyses to better detect the risk of trafficking. In our work, we will primarily focus on the regression part of the Intuition Engine, constructing statistical models to identify the strongest predictors of human trafficking.
To make the predictions as precise as possible, we will utilize two main datasets. First, we will access a comprehensive database of all enterprises registered in Brazil as well as a “Dirty List” of companies found be have been engaged in human trafficking in the past. Second, we will process a set of reports from more than 5,000 investigations carried out by Brazilian prosecutors. In analyzing these two datasets, we will look for characteristics that increase the likelihood of a company being involved in human trafficking. Ultimately, the Intuition Engine will be used by Brazilian prosecutors as a decision support tool to improve the direction of future investigations into human trafficking in Brazil.
Madeleine Gates is a second-year student in Stanford’s Master of Science in Statistics program. Alongside her graduate studies, she was a core member of the women’s volleyball team that won the 2019-2020 NCAA Championship. Before arriving at Stanford, she graduated from UCLA with a Bachelor of Arts in Economics and a minor in Spanish in 2019.
Veer Shah is studying for a joint Bachelor of Science in Mathematical and Computational Science and a Master of Science at Stanford’s Institute for Computational and Mathematical Engineering. He loves tennis more than anything else in the world.
Michal Skreta is a senior at Stanford double majoring in Economics and Political Science. He is also a student in the Interdisciplinary Honors Program in Democracy, Development and the Rule of Law at the Stanford Freeman Spogli Institute for International Studies.
Building a network of land ownership in Kenya
Partnering with Code for Africa, our project aims to build a network of corporations and persons involved in land transfer and ownership, focusing on Kenya’s public gazette document records. We intend for our work to be used by journalists in Kenya to fight corruption and promote good governance of land resources.
From a technical standpoint, our outputs will be four-fold. We will create:
- A dataset of high quality PDF-to-text translations of the Kenya Gazettes.
- A pipeline which extracts names, addresses, ownership status, and other important information from the gazettes themselves.
- A pipeline which identifies relationships, such as location (between an address and a business) or ownership (between a person and a title).
- Rigorous documentation of our process, with a focus on reproducibility.
Our first three outputs aim to help journalists directly in investigative work they do concerning land use in Kenya. In order to ensure that our tools are integrated into journalists’ workflow, we will integrate them into Code for Africa’s existing data analysis tool, Aleph. Through thorough research, rigorous documentation, and a detailed final report, we hope to enable others to build off of our project or complete their own.
Tsion (T) Tesfaye is pursuing a master’s degree in statistics (data science track) at Stanford. She is particularly interested in the intersection of data science and design thinking to build innovative systems for purpose-driven companies. Her hobbies include reading autobiographies, exploring post-impressionist paintings and listening to country music.
Thea Rossman is an incoming master’s student in Computer Science at Stanford, where she just finished a B.A.S. in Mathematical & Computational Science and Ethnic Studies. She is interested in applying anti-oppression frameworks to shape public data systems. She loves speculative fiction and fantasy books, math education, and youth organizing.
Robbie Thompson is a junior at Stanford University studying Mathematical and Computational Science. He is particularly interested in quantitative methods in social science. Robbie’s hobbies include barefoot running and pickup basketball.
Identifying CAFO characteristics using satellite imagery
The team partnered with Stanford Law School’s Regulation, Evaluation, and Governance Lab (RegLab). RegLab leverages state-of-the-art advances in machine learning, artificial intelligence, and causal inference to design and evaluate programs, policies and technologies that modernize government.
Enforcement of environmental law depends critically on permitting and monitoring intensive animal agricultural facilities, known in the United States as ‘concentrated animal feeding operations’ (CAFOs). The current legal landscape in the United States has made it difficult for government agencies, environmental groups and the public to know where such facilities are located. To address this issue, RegLab has applied a deep convolutional neural network to high-resolution satellite images offers an effective, highly accurate and lower cost approach to detecting CAFO locations. This work by Handan-Nader and Ho has been published in Nature Sustainability, Deep learning to map concentrated animal feeding operations (2019).
The purpose of the DSSG project is to build on the work of RegLab to address the unpermitted expansion of CAFOs across the United States. Specifically, the team seeks to 1) identify and segment CAFOs from satellite images and 2) detect their unpermitted expansion.
The team successfully implemented several machine learning algorithms and neural networks that are able to correctly identify CAFOs from satellite images. Although not perfect, as the team’s primary goal is comparison of images across time, these models are able to sufficiently quantify CAFOs to be able to distinguish if there are any additional CAFOs in satellite images across time.
Sandy Lee is a second year Master’s student in Management Science and Engineering at Stanford. Her interest is in computational social science. Previously, she obtained her B.S. degree in Computer Science and Economics from Duke University.
David Kang is a recent graduate of the Master’s program in Statistics at Stanford. His interests include computer vision, deep learning, and applied statistics. He earned his B.S/B.A degree in Statistics and Economics from the University of North Carolina at Chapel Hill.
Seiji Eicher recently graduated with a B.S. in Mathematical and Computational Science at Stanford and will return for a coterm Master’s in Computer Science in the fall. They are interested in partnering with communities to share the possibilities of data science as a tool for social change.
DSSG is happening again, summer 2021!
Are you interested in becoming a student fellow or mentor next summer? Add yourself to the mailing list and we’ll contact you when next summer’s applications for fellows and mentors are up in early spring. Summer 2021 will be open to non-Stanford affiliated students!
Do you have a social good project that you think DSSG could help with? If you’re interested in partnering with us, please add your name to this list, and we will notify you later this winter when the application for partnerships for next summer goes live.