2020 Data Science for Social Good
The Data Science for Social Good summer program trains aspiring researchers to work on data science projects with social impact. Working closely with governments and nonprofits, participants take on real-world problems in education, health, energy, public safety, transportation, economic development, international development and more. Participants include a diverse and inclusive cohort of students who spend the summer on campus working with the program.
This second summer of the Stanford Data Science for Social Good (DSSG) program ran from June 29th to August 21st, 2020.
The goal of the DSSG program is to train the next generation of ethically aware data scientists and to provide measurable impact for projects with social impact. This summer's program had nine student fellows from a variety of backgrounds, ranging from computer science to statistics to sociology. The fellows divided into three teams, each worked with a different partner organization to bring critical insights into a core data science challenge.
- View intro to the program from the final presentations here
Improving predictions for targeted human trafficking investigations in Brazil
Human trafficking remains a pervasive problem in modern Brazil. According to the Global Slavery Index, hundreds of thousands of people in the country are victims of modern slavery at any given time, amounting to almost 2 victims for every thousand of Brazilian citizens.
Despite these grave statistics, a strong commitment to fight human trafficking from the Brazilian government coupled with robust federal open data policies presents an opportunity to effectively address this hideous problem.
This summer, we will partner with the Human Trafficking Data Lab at Stanford to help the Brazilian Federal Labor Prosecution Office in targeting their investigations into firms involved in human trafficking. Specifically, we will utilize the tools of data science to help build the Intuition Engine – an ensemble predictive model combining regression models, spatial data science, natural language processing, deep learning and network analyses to better detect the risk of trafficking. In our work, we will primarily focus on the regression part of the Intuition Engine, constructing statistical models to identify the strongest predictors of human trafficking.
To make the predictions as precise as possible, we will utilize two main datasets. First, we will access a comprehensive database of all enterprises registered in Brazil as well as a “Dirty List” of companies found be have been engaged in human trafficking in the past. Second, we will process a set of reports from more than 5,000 investigations carried out by Brazilian prosecutors. In analyzing these two datasets, we will look for characteristics that increase the likelihood of a company being involved in human trafficking. Ultimately, the Intuition Engine will be used by Brazilian prosecutors as a decision support tool to improve the direction of future investigations into human trafficking in Brazil.
Building a network of land ownership in Kenya
Partnering with Code for Africa, our project aims to build a network of corporations and persons involved in land transfer and ownership, focusing on Kenya’s public gazette document records. We intend for our work to be used by journalists in Kenya to fight corruption and promote good governance of land resources.
From a technical standpoint, our outputs will be four-fold. We will create:
- A dataset of high quality PDF-to-text translations of the Kenya Gazettes.
- A pipeline which extracts names, addresses, ownership status, and other important information from the gazettes themselves.
- A pipeline which identifies relationships, such as location (between an address and a business) or ownership (between a person and a title).
- Rigorous documentation of our process, with a focus on reproducibility.
Our first three outputs aim to help journalists directly in investigative work they do concerning land use in Kenya. In order to ensure that our tools are integrated into journalists’ workflow, we will integrate them into Code for Africa’s existing data analysis tool, Aleph. Through thorough research, rigorous documentation, and a detailed final report, we hope to enable others to build off of our project or complete their own.
Identifying CAFO characteristics using satellite imagery
The team partnered with Stanford Law School’s Regulation, Evaluation, and Governance Lab (RegLab). RegLab leverages state-of-the-art advances in machine learning, artificial intelligence, and causal inference to design and evaluate programs, policies and technologies that modernize government.
Enforcement of environmental law depends critically on permitting and monitoring intensive animal agricultural facilities, known in the United States as ‘concentrated animal feeding operations’ (CAFOs). The current legal landscape in the United States has made it difficult for government agencies, environmental groups and the public to know where such facilities are located. To address this issue, RegLab has applied a deep convolutional neural network to high-resolution satellite images offers an effective, highly accurate and lower cost approach to detecting CAFO locations. This work by Handan-Nader and Ho has been published in Nature Sustainability, Deep learning to map concentrated animal feeding operations (2019).
The purpose of the DSSG project is to build on the work of RegLab to address the unpermitted expansion of CAFOs across the United States. Specifically, the team seeks to 1) identify and segment CAFOs from satellite images and 2) detect their unpermitted expansion.
The team successfully implemented several machine learning algorithms and neural networks that are able to correctly identify CAFOs from satellite images. Although not perfect, as the team’s primary goal is comparison of images across time, these models are able to sufficiently quantify CAFOs to be able to distinguish if there are any additional CAFOs in satellite images across time.
DSSG is happening again, summer 2021!
Sign up for DSSG announcements through the Stanford Data Science mailing list.
Are you interested in becoming a student fellow or mentor next summer? Add yourself to the mailing list and we’ll contact you when next summer’s applications for fellows and mentors are up in early spring. Summer 2021 will be open to non-Stanford affiliated students!
Do you have a social good project that you think DSSG could help with? If you’re interested in partnering with us, please add your name to this list, and we will notify you later this winter when the application for partnerships for next summer goes live.