Skip to content Skip to navigation

Learning Mechanisms Without Experiments

List of checkmarks

A very large fraction of the “big data” everyone talks about is of the type statisticians call “observational.” In other words, it represents a record of something that has happened “in the wild” (in uncontrolled settings, without the researcher’s intervention).

Observing the world around us is a powerful way of learning, and one that humans exploit. It also has limitations: A baby learns by experimenting that crying alone does not retrieve a toy from across the room.

When the data we mine is very large, it is often quite possible to identify an association between variables (or functions of variables) that is very strong but that does not correspond to a causal mechanism. Believing that the relation is causal when it is not can lead us to serious mistakes. For example, parsing a plethora of data, we might observe that most computer programmers are male and most homemakers female. If we translate this to mean we should hire only male candidates for programming jobs, we would miss valuable employees and be engaging in an unfair practice.

Learning to distinguish causal from non-causal associations from observational data is notoriously difficult. When the stakes are high (as when deciding on the efficacy of a treatment) we have learned to resort to experiments whenever possible. But experiments are costly, time consuming and sometimes not possible.

Important questions arise: How can we take advantage of large-scale observational data to estimate the outcome of an intervention as a change in regulation? How can we design low-cost experiments that capitalize on the data that we already have? How do we communicate clearly the nature of the association we find? A number of Stanford researchers have been leveraging what we know about causality and introducing new methodologies that allow mining of large data set with a causal viewpoint. But it is only by interacting with domain experts interested in specific types of interventions and by examining new data sets that the full spectrum of problems and potential solutions will emerge.

Examples of faculty working in the area include Guido Imbens, Susan Athey, Stefan Wager, Mohsen Bayati, Jens Hainmueller, James Zou, Emmanuel Candes, Dominik Rothenhausler, Guillaume Basse, Mike Baiocchi and Daniel Ho.