Skip to main content Skip to secondary navigation

Data Science for Humanity

Main content start

Our modern era is characterized by massive amounts of data documenting the behaviors of individuals, groups, organizations, cultures, and indeed entire societies. This wealth of data on modern humanity is accompanied by massive digitization of historical data, both textual and numeric, in the form of historic newspapers, literary and linguistic corpora, economic data, censuses, and other government data, gathered and preserved over centuries, and newly digitized, acquired, and provisioned by libraries, scholars, and commercial entities.

Data science methods and approaches allow scholars to act on this mass of data--to enhance our understanding of humanity in all its configurations and activities, and across time and space: from studying individual human behavior to better modeling of human communities, organizations, countries, and societies; from understanding both our current moment and the histories that have brought us here. Data science provides analytical leverage on long-standing questions such as what factors determine economic development and well-being? What leads to well-functioning societies and democracy? Why are some societies plagued by violence, repression, and conflict? Data science opens up new ways of answering these questions because it provides tools to interpret the beliefs and behaviors of people, groups, organizations; it can be leveraged to interpret the cultural and linguistic output of entire cultural periods and peoples; it can help us reason about the causal relationships and the complex motivations of societal and state actors. Both the data itself, and a science to make sense of it, are critical for advances across the social sciences, the humanities, business, education, and law.

Data science for humanity -- not only for the academic study of the humanities and the social sciences, but also for the betterment of humanity itself -- is a deeply interdisciplinary effort. Methods of machine learning, statistical reasoning, natural language processing, classification, textual analysis, and other data science approaches that developed largely (but not exclusively) in the computer science professions, have all become essential tools for scholars and students across the disciplines. This interdisciplinarity has gone both directions: data science itself has benefited from the complex, important, and consequential research questions focused both on the rich history of human cultures and societies, and on the present state of humankind -- in both its triumphs and its troubles. Data science can help diagnose those troubles, and suggest solutions; it offers a new window into the hidden histories and mysterious mechanisms of human cultures.