Skip to main content Skip to secondary navigation

Data Science for Astrophysics and Particle Physics

Main content start

Astrophysicists and particle physicists at Stanford and at the SLAC National Accelerator Laboratory are deeply engaged in studying the Universe at both the largest and smallest scales, with state-of-the-art instrumentation at telescopes and accelerator facilities. A common element of these facilities is the rich petascale datasets available to be mined for precious signals that can shed light on the nature of dark matter, dark energy, the evolution of black holes, and physics beyond the “standard model” of particle physics and cosmology. These data are being delivered now from instruments such as the ATLAS and CMS detectors at CERN. Meanwhile a new generation of astronomical surveys is beginning, that will revolutionize our understanding of the Universe, mapping it across the electromagnetic spectrum, from radio to gamma rays, to far greater depth and resolution than ever before. In a few years, we will begin to receive images from the world’s largest digital camera (currently under construction at SLAC) positioned at the heart of the Large Synoptic Survey Telescope, while complementary ground- and space-based observatories will map the cosmos to unprecedented resolution and depth at other wavelengths. The capabilities of these instruments are driven by huge leaps in detector technology, in many cases driven by Stanford teams, that have transformed the data gathering power of telescopes. It is likely that many of the most exciting discoveries will combine datasets taken at different wavelengths and different times, with different spatial or temporal resolution, which will present substantial data analysis and modeling challenges.

Facing data sets that are rich in volume, velocity and variety, and containing information not easily described with traditional likelihoods (galaxies, for example, exhibit a myriad of complex shapes, which change dramatically as a function of wavelength) the scientists working in these domains are turning to data-driven, machine learning solutions in response. Particle collisions, time series, and astronomical data can all be expressed as images suitable for input into deep neural network learning systems, while the generative modeling approach sits naturally in the forward-modeling philosophy that physicists have been working in for some time. The challenge is to re-cast the data modeling that physicists need to do in such a way that state of the art machine learning architectures can be applied, without compromising on the high level of systematic error control, nor the detailed understanding of the statistical uncertainty, that the physics questions being addressed at extreme accuracy in the 21st century demand.