Skip to main content Skip to secondary navigation

Data Science for Economics

Main content start

Traditionally data analyses in economics have focused on answering causal questions. Although this is not true universally, many of the most pressing questions in empirical economics concern causal questions, such as the impact, both short and long run, of educational choices on labor market outcomes, and of economic policies on distributions of outcomes. This makes them conceptually quite different from the predictive type of questions that many of the recently develop methods in machine learning are primarily designed for. Often these questions involve deliberate treatment choices (e.g., educational choices, or price decisions) by individuals or firms intended to optimize outcomes, so that causal effects cannot simply be learned by comparing similar treated and control units. In addition inference plays a more important role than in prediction problems. Nevertheless, there are often predictive components to the models economists use where the predictive tools developed by computer scientists and statisticians can be used after being adapted to the specific context. With many economists now using large scale administrative data, from government (e.g., the work by Raj Chetty in the economics department using Internal Revenue Service data), and private companies (e.g., supermarket data, in the work by Susan Athey from the Stanford GSB and David Blei), text data (e.g., in the work by Matt Gentzkow in studies of polarization and the media), and methods for optimizing medical decisions (Mohsen Bayati at the gsb)these methods are becoming increasingly popular, and demand for more sophisticated methods that take account of the causal nature of these questions and the richness of the data is growing.