Event Recap: Celebrating Innovation at the Sustainability Data Science Conference

On April 10, about 100 early-career data scientists gathered at the Simonyi Conference Center in Stanford's Computing & Data Science (CoDa) Building for the Sustainability Data Science Conference—a dynamic day of sharing, discovery, and inspiration at the intersection of data science and sustainability.
From cutting-edge research to practical applications, this year’s presenters raised the bar with the quality of their outstanding presentations, leaving the judging panel with the tough task of selecting just a few standouts.
Congratulations to Our Best Presentation Award Winners!
- 🏅 Ellianna Abrahams – Best Postdoc Presentation
- 🏅 Alexandra DiGiacomo – Best Student Presentation
- 🥈 Rebecca Grekin – Best Student Presentation, 2nd Place
- 🌟 Chloe Yu-Ning Cheng – Undergraduate Rising Star Award
- 🌟 Yuchen Li – Undergraduate Rising Star Award
Their work exemplifies the power of data to drive sustainability forward, and their passion continues to inspire our entire community. We’re excited to see where their research leads next!
♻️ Walking the Talk: A Zero-Waste Event

A partnership between the Stanford Data Science and Stanford Doerr School of Sustainability, the conference also embraced sustainable event practices in its mission toward zero waste. Black trash bins were replaced with thoughtful composting and recycling systems, and attendees enjoyed a delicious, sustainable feast curated by Chef Andrew Mayne (Stanford Catering)—a clear favorite among guests.
Enjoy the summary of the award-winning presentations and the image gallery (scroll down to the end of the article for the photo album), courtesy of David Gonzales.
Ellianna Abrahams – Best Postdoc Presentation: Enhancing Deep Learning on Satellite Imagery with Smarter Tiling
Ellianna Abrahams, a postdoctoral researcher in data science at Stanford University (hosted by the Geophysics department), recently presented a novel approach to preparing large-scale Earth observation imagery for deep learning applications. Her work addresses a fundamental challenge: the limited onboard memory of GPUs, which restricts how much image data can be processed at once.

This limitation becomes especially relevant when dealing with satellite data like that from the Landsat mission, which has captured high-resolution images of the Earth for decades. A single image cube from one location over a single year can easily exceed GPU memory capacities, making tiling—a process of dividing images into smaller pieces—a necessity.
However, traditional tiling approaches can disrupt semantic context. For instance, slicing an image of the San Francisco Bay Area may split meaningful features like the Golden Gate Bridge across tiles. Since deep learning models process each tile independently, tiling can cause the models to miss the broader spatial context, potentially impacting classification accuracy, particularly for underrepresented classes, like rare or poorly understood phenomena, which are often the classes of highest interest in scientific use cases.
Abrahams emphasized that expert annotators often rely on semantic clues from surrounding, well-represented classes to identify rare features. This led her to ask: Could preserving spatial context in tiles improve classification performance on these rare classes?
Drawing inspiration from the UNet paper by Ronneberger et al., which introduced a 50% overlapping tile strategy to increase contextual information, Abrahams highlighted a critical drawback: redundancy. In overlap tiling, the same feature will be repeated multiple times in training data in the same orientation, skewing the dataset's distribution and potentially biasing model outcomes.
To address this, Abrahams and her collaborators developed Flip-n-Slide: a tiling strategy that augments data through spatial permutations (like sliding and rotating tiles) without introducing redundancy. By permuting tiles in ways that are physically realistic for satellite data—excluding distortions like color shifts or blurs, this method maintains distributional integrity while extending training data by 8x.
When tested on Arctic satellite imagery, Flip-n-Slide significantly improved classification performance on rare classes like moss, lichen, and polar grassland, without requiring any changes to model architecture or loss function weighting. The method performed on par or better than the conventional 50% overlap approach, especially for underrepresented classes, while maintaining performance on common classes.
This approach is now available as an open-source Python package:📦 pip install flipnslide
Researchers can quickly integrate this new tiling strategy into PyTorch, TensorFlow, or scikit-learn workflows. A Quick Start Guide (QR code available in the original talk) walks users through setup in just a few lines of code.
Abrahams and her team are continuing to explore how the inclusion of context impacts deep learning outcomes in cases of class imbalance, here through the implicit inclusion of semantic context using methods like Flip-n-Slide and through the explicit inclusion of contexts in model architecture and optimizations.
Alexandra DiGiacomo – Best Student Presentation: From Dorsal Fins to Drones: Shaping the Future of Marine Science with Data Science
At Stanford’s Hopkins Marine Station, PhD candidate Alexandra DiGiacomo is advancing the use of data science to study marine megafauna—specifically, white sharks in Monterey Bay. Historically, the study of marine megafauna relied heavily on direct observation. In the 20th century, scientists stationed at the Farallon Islands in Northern California logged sightings of sharks, seabirds, and marine mammals by hand. Today, however, remote observation technology has transformed the field into a big data discipline.

Alexandra and the wider Stanford’s Block Lab team collect data at multiple scales—from individual animals to entire communities—using underwater cameras, drones, and animal-borne tags. This has resulted in a massive influx of information, creating what Alexandra calls a “data tsunami”.
To translate this data into meaningful insights, the team is developing scalable, non-invasive computer vision techniques that enable rapid and reliable ecological analysis of sensitive and difficult-to-study species like white sharks.
Case Study 1: Identifying Individual Sharks Using Computer Vision
White sharks can be individually identified by the unique shape of their dorsal fins, referred to as their ‘fin-ID’. Traditionally, researchers performed this matching process manually–printing photos and comparing them by hand—an increasingly time-consuming and impractical approach as image databases expand over 20 years of data collection.
To streamline this process, Alexandra partners with the LINQS Lab at UC Santa Cruz to implement vision transformers that analyze fin images and generate feature embeddings. These embeddings are used to compare visual similarity across fins and power a recommender system, allowing the team to quickly retrieve the top potential matches for new dorsal fin images incoming from the field.
This innovation enabled the team to match decades worth of fin-ID data in just a few months. As a result, they can now track real-time population dynamics, monitoring individuals recently appearing in Monterey Bay and at long-term monitoring sites.
Case Study 2: Measuring Shark Size from the Sky
Shark size has traditionally been visually estimated by researchers in the field, as these animals are too large to easily measure directly. This approach, however, is limited in precision, consistency, and depth of information. Now, Alexandra is using drone imagery and segmentation masks to accurately measure body size. With photogrammetry, she extracts measurements such as total length and girth (a proxy for fat stores and overall health), allowing them to analyze size distributions across aggregation sites in Monterey Bay.
Their findings show that smaller sharks tend to gather in the bay’s warmer, more protected waters, while larger sharks appear in colder, more exposed areas. The team also used this method to identify an emaciated individual in 2021 who was entangled and had limited foraging success, demonstrating the potential of this technique for detecting malnutrition and physical distress in marine animals.
Case Study 3: Analyzing Swimming Biomechanics
Fish swimming kinematics are often studied in captivity or through biologging devices. Alexandra’s team took a novel approach, applying pose estimation to drone videos of free-swimming white sharks.
By tracking key anatomical points across video frames, they calculated tailbeat frequency, amplitude, and other biomechanical markers such as regional flexion. These data provide insights into baseline activity levels and help detect behavioral deviations linked to environmental stress or human disturbance. This work shows how computer vision can reduce lag time in population monitoring, improve measurement precision, and scale up behavioral studies. These tools are essential for understanding and conserving protected species like white sharks in a rapidly changing world.
Rebecca Grekin—Best Student Presentation 2nd place: University Campus District Energy Consumption Survey (UCDECS)—Drivers for Heating and Cooling Needs in Thirty North American Universities
Purpose of the Study:

To address the gap in energy research related to existing commercial buildings (especially university campuses), which are often overlooked in favor of new construction or simulation models.
Methodology:
- Collected daily heating and cooling data from multiple U.S. and Canadian universities, with 1–10+ years of data per campus.
- Data normalized by square meterage to allow comparison.
- Analyzed based on climate zones across North America (temperature and humidity differences).
Key Findings:
- Cooling demand is closely tied to outside air enthalpy (temperature + humidity).
- Heating demand correlates more strongly with outside air temperature alone.
- This contradicts common practice, which often ignores humidity (enthalpy) in heating/cooling modeling.
Simultaneous Heating and Cooling:
- Identified opportunities for heat recovery chillers to recover waste heat during times when heating and cooling overlap.
- These chillers can reduce fossil fuel use, save energy, and lower emissions.
- Found up to 50% of heating/cooling demand at some campuses could be met via heat recovery chillers.
- Average energy savings: ~20%; CO₂ emissions savings: ~18%, depending on campus and grid carbon intensity.
Next Steps:
- Continue data analysis across more campuses.
- Publish findings.
- Share results with university facilities teams to inform policy and promote adoption of more efficient energy systems.
Chloe Yu-Ning Cheng – Undergraduate Rising Star Award: Storm-Driven Mixing in the Southern Ocean
Background & Importance:

- The Southern Ocean around Antarctica has a unique structure:
- A cold, fresh surface layer sits above a warm, salty, dense layer.
- This setup traps heat below, unless it's mixed upwards by strong winds or storms.
- The sea ice zone between the open ocean and the Antarctic coast is key for climate:
- It mediates the exchange of heat, carbon, moisture, and momentum.
- Sea ice is sensitive to heating from below, which influences melting.
Research Questions:
- Can we observe heat flux from storm-induced mixing, not just model it?
- Do storms entrain warm water upwards, melting sea ice?
Data & Methods:
- Combined:
- Sea ice thickness data (though noisy/unreliable),
- Argo floats for temperature, salinity, and pressure profiles,
- Atmospheric reanalysis (ERA) for identifying storms.
- Compared Argo profiles after storm events vs. non-storm times.
- Focused on a region with high Argo float density near Antarctica.
Findings:
- After storms, temperature profiles are warmer and salinity is fresher:
- Warming due to upward mixing of deeper, warmer water.
- Freshening due to the melting of sea ice (dilutes salinity).
- The freshening layer extends ~50 m deep—evidence of a deepened mixed layer.
- Using this data, they estimated:
- The amount of ice melted,
- Heat involved, and
- Resulting heat flux.
Even under conservative assumptions, storm-driven fluxes were greater than typical wintertime heat flux, suggesting climate significance.
Implications:
- Storms play a key role in bringing up heat and melting sea ice.
- If storm patterns change due to climate change, it could impact ice melt and ocean-atmosphere interaction.
- Need higher-resolution data to detect and quantify individual storm effects accurately.
Yuchen Li—Undergraduate Rising Star Award: On the Design, Merits, and Limitations of ML Models for Seasonal Sea Ice Forecasting

Can machine learning help predict Antarctic sea ice? New research from Yuchen's group explores this question using data-driven methods to forecast sea ice conditions 1 to 6 months in advance—critical for navigation, research expeditions, and understanding ocean-atmosphere interactions in the Southern Ocean.
Why Use Machine Learning?
Traditional physical climate models often struggle to simulate even the average state of the Southern Ocean, making accurate forecasts difficult. Recent work suggests that deep learning models, when trained on satellite observations, can make surprisingly skillful predictions of sea ice extent on seasonal timescales.
However, a major limitation is data scarcity: the satellite record spans only about 50 years, giving roughly 600 monthly data points—an extremely small dataset by machine learning standards.
Approach:
To test the potential of deep learning under richer data conditions, the team used simulations from the CESM “large ensemble”—a set of Earth system model simulations run under the same external conditions but with different starting points. This provides a synthetic yet physically consistent dataset much larger than what's available from observations alone.
The research focused on a U-Net architecture trained to predict Antarctic sea ice maps one month ahead, using several months of prior sea ice data. The team systematically increased the training dataset size (from 1x to 10x) and evaluated model accuracy using the Anomaly Correlation Coefficient (ACC).
Key Findings:
- Short lead-time forecasts are more accurate across all seasons.
- Summer forecasts at long lead times are the hardest, but show the biggest improvement with more data.
- Initial gains in performance come from the model learning persistence (simple autocorrelation), but greater gains occur when persistence fails, where the physical system is less predictable.
- With more training data, model improvements saturate, except in the most challenging prediction windows (e.g., summer at long lead times).
Physical Insight:
Two key sources of predictability in the sea ice system:
- Persistence – Month-to-month memory within the sea ice itself.
- Ocean heat storage – Subsurface heat retained during retreat seasons can re-emerge to influence ice growth months later.
The deep learning model leverages both, but particularly shines in capturing patterns where persistence breaks down.
Conclusions:
- Data-driven forecasts of Antarctic sea ice are promising but limited by the short observational record.
- Deep learning models improve forecasts where physical memory is weakest.
- Even with large training datasets, forecast skill eventually plateaus—except in the most complex cases, where additional data still helps.