Ethics and Data Science
New technologies often raise new moral questions. For example, the emergence of nuclear weapons placed great pressure on the distinction between combatants and non-combatants that had been central to the just war theory formulated in the middle ages. New theories were needed to reinterpret the meaning of this distinction in a nuclear age. With the emergence of new techniques of machine learning, and the possibility of using algorithms to perform tasks previously done by human beings, as well as to generate new knowledge, we again face a set of new ethical questions. These questions not only concern the possibility of harm by the misuse of data, but also questions of how to preserve privacy where data is sensitive, how to avoid bias in data selection, how to prevent disruption and “hacking” of data, and issues of transparency in data collection, research and dissemination. Underlying many of these questions is a larger question about who owns the data, who has the right of access to it, and under what conditions.
There are no currently agreed on responses to these questions. Nonetheless, it is extremely important to confront them and to attempt to work out shared ethical guidelines. Where agreement is not possible, it is important to attend to the competing values in place and to specifically articulate the underlying assumptions at work in different models. An interesting illustration involves the debate over fairness in models predicting the risk of recidivism among black and white defendants in Broward County Florida. Should a risk score be: equally accurate in predicting the likelihood of recidivism for members of different racial groups; assume that members of different groups have the same chance of being wrongly predicted to recidivate; or assume that failure to predict recidivism happens at the same rate across groups. Recent work has established that satisfying all three criteria at the same time would be impossible in most situations; meeting two will mean failing to comply with the third. So we need to decide which aspects of fairness are most important.
Developing a shared framework will take collaboration between programmers, statisticians, legal scholars and philosophers.