Leptoukh Lecture Winners

2023 Leptoukh Lecture

Rahul Ramachandran
NASA Marshall Space Flight Center

From Petabytes to Insights: Tackling Earth Science's Scaling problems in Data, Information and Processes

Earth Science faces the challenge of converting petabytes of data into actionable insights. Managing Big Data involves addressing the vast, multidimensional data volumes, necessitating innovative production, ingest, storage, access and distribution solutions. This includes synchronizing data from varied sources, integrating different data formats, as well as managing dataset replication across multiple storage locations. Data analysis of these large sets not only demands robust computational resources but also optimized algorithms for cloud infrastructures, and management of dataset interdependencies. Modern science's collaborative nature, spanning multiple institutions or countries, amplifies the complexity of coordinating large-scale research. The scientific community requires new visualization analytic tools to extract insights from these large data sets. The surge in scientific publications complicates the task of identifying quality information. Differentiating between high-quality and subpar publications and staying updated with interdisciplinary research are key challenges. Efficient Big Data management calls for new data lifecycle management strategies, ensuring cost effective data storage, management, and accessibility. Addressing these scalability challenges requires a blend of new technological solutions, re-evaluating existing practices, and a shift in organizational mindset. In this presentation, I'll highlight how emerging technologies such as AI can pave the way to overcoming these scaling challenges and how, as members of the Earth Science Informatics community, we act as catalysts, continuously confronting the challenges and transforming the way science is conducted.

Read Essay: https://www.earthdata.nasa.gov/learn/articles/2023-leptoukh-lecture-essay 

Watch Recorded Lecture: https://www.youtube.com/watch?v=eSUGkiOIR1A

2022 Leptoukh Lecture

Paul Wessel
Department of Earth SciencesSchool of Ocean and Earth Science and TechnologyUniversity of Hawai'i at Manoa

The Generic Mapping Tools and Animations for the Masses

For his pioneering effort in developing widely used open-source software for the Earth science community, specifically the globally recognized Generic Mapping Tools (GMT). With fellow Columbia University graduate student Walter H.F. Smith, Wessel initiated the GMT development in the late 1980s. While at UH he has secured continuous US National Science Foundation funding for GMT since 1993. Today a whole generation of scientists has used GMT for almost 30 years, and it is ingrained in their mission-critical workflows.GMT is widely used in marine geology and geophysics, solid earth geophysics, geodynamics and oceanography, as well as planetary geoscience applications. GMT also provides a foundation for other open-source software used by the scientific community and is available for all operating systems and via wrappers for the Python, Julia, and MATLAB environments.

2021 Leptoukh Lecture

Charles Zender
Professor of Earth System Science (ESS) and of Computer Science, University of California, Irvine

What Geoscientists Want: Short and Sweet Commands with Eco-friendly Data

The twin pressures to achieve mind-share and to harness available computing power drive the evolution of geoscientific data analysis tools. Such tools have enabled a remarkable progression in the atomic or fundamental unit of data they can easily analyze. In the mid-1980s we analyzed one or a few naked arrays at at time, and now researchers routinely intercompare climatological ensembles each comprising thousands of files of heterogeneous variables richly dressed in metadata. Two complementary semantic trends have empowered this analytical revolution: more intuitive and concise analysis commands that can exploit more standardized and brokered self-describing data stores. This talk highlights how tool developers can leverage these trends to successfully imagine and build the analysis tools of tomorrow by understanding the needs of domain researchers and the power of domain specific languages today.

This talk will also highlight recent improvements in compression speed and interoperability that geoscientists can exploit to reduce our carbon footprint. Observations and simulations to advance Earth system sciences generate exabytes of archived data per year. Storage accounts for about 40% of datacenter power consumption, with its attendant consequences for greenhouse gas emissions and environmental sustainability. Precision-preserving lossy compression can further reduce the size of losslessly compressed data by 10-25% without compromising its scientific content. Modern lossless codecs (e.g., Zstandard or Zlib-ng) accelerate compression and decompression, relative to the traditional Zlib, by factors of 2-5x with no penalty in compression ratio. These proven modern compression technologies can help geoscientific datacenters become significantly greener.

2020 Leptoukh Lecture

Erin Robinson
Former Executive Director of Earth Science Information Partners (ESIP)

Putting Data to Work: Moving science forward together beyond where we thought possible!

We live in a world rich with data, where use and reuse would benefit not just science but also serve national security and society-at-large. Air quality impacts from forest fires, which are increasing in frequency, is one example of large, data-intensive science with societal impacts. Understanding long-range transport of smoke where I started my career and worked with Dr. Greg Leptoukh, for whom this lecture is named, required a variety of datasets from satellite, surface observations and models. Together with Greg, we formed the ESIP Air Quality Cluster, a community of practice, to determine which and how to use data access standards and metadata standards agreed to best support the broader Air Quality research community. Forest fire smoke analysis was based on datasets not originally intended for our purpose, but because the data was findable, accessible, interoperable and reusable (FAIR) and we were willing to reuse it, we reduced the time to wrangle data and were able to ask and answer new questions about each smoke event.

Today, we are seeing more and more examples like mine of science that was not possible without open data, standards and tools. However, our scientific data enterprise is evolving and maturing in an unmanaged fashion and due to insufficient coordination across planning, management, and resources, the potential benefits of all these data and distributed infrastructure are not fully realized. Reliable, long term funding as well as cultural changes including financial incentives and rewards are needed to turn Science Data Infrastructure into a first class citizen equal to Science. This talk will explore what it means to put data to work and explore the relationship between data-intensive science, data management and collaborative community efforts like the Earth Science Information Partners (ESIP) and Openscapes to move science forward beyond where we thought possible!

2019 Leptoukh Lecture

Barbara Jane Ryan
GEO Group on Earth Observations (former)»

Fashioning a Global Terrestrial Observation Infrastructure after the Global Weather Forecasting Infrastructure

Enabled by broad, open data policies and practices for the U.S. Landsat satellite series, and the European suite of Sentinel satellites in the Copernicus Programme, tremendous advancements have been made in information technologies and platforms for terrestrial observations. Included among these advancements are the work that Australia has done with Data Cube technologies at their National Computing Center, the Joint Research Center’s (JRC) work on the Global Surface Water Explorer using Google Earth’s Engine 60,000+ computers worldwide, and the U.S. Geological Survey’s (USGS) work on Land Change Monitoring, Assessment, and Projection (LCMAP) at their Sioux Falls, South Dakota Earth Resources Observation and Science Center (EROS).

These technological advancements, combined with international collaboration and cooperation are permitting the analysis of landscape change at regional, continental and, indeed, global scales. In fact, many of the elements that are emerging for this global terrestrial infrastructure can be compared to those that have historically comprised the global mid- to long-term weather forecasting system. Yet, the status of modeling/predicting landscape change falls short of similar efforts in the atmospheric and oceanic domains. This lecture will explore the essential elements of a global terrestrial observation infrastructure by fashioning it after the existing global weather forecasting infrastructure.

2018 Leptoukh Lecture

Benjamin James Kingston Evans Australian National University »

Evolving Data-driven science: the unprecedented coherence of Big Data, HPC, and informatics, and crossing the next chasms

As we approach the AGU Centenary, we celebrate the successes of data-driven science whilst looking anxiously at the future, with consideration of hardware software, workflow and interconnectedness that need further attention.
The colocation of scientific datasets with HPC/cloud compute has successfully demonstrated the overall supercharging of our research productivity. Over time we questioned whether to “bring data to the compute”, or “compute to the data” and considered and reconsidered the benefits, weaknesses and challenges both technically and socially. The gap between how large volume data and longtail data are managed is steadily closing, and the standards for interoperability and ability for connectivity between scientific fields have been slowly maturing. In many cases transdisciplinary science is now a reality.

However, computing technology is no longer advancing according to Moore’s law (and equivalents) and is evolving in unexpected ways. For some major computational software codes, these technology changes are forcing us to reconsider the development strategy, how to transition existing code to both address the needs of scientific improvements in capability, while at the same time improving the ability to adjust to changes in the underlying technical infrastructure. In doing so, some old assumptions of data precision and reproducibility are being reconsidered. Quantum computing is now on the horizon which will mean further consideration of software and data access mechanisms.

Currently, for data management, despite the apparent value and opportunity, the demands on high quality datasets that can be used for new data-driven methods are testing the funding/business case and overall value proposition for celebrated open data and its FAIRness. Powerful new technologies such as AI and deep learning have a voracious appetite for big data and much stronger (and unappreciated) requirements around quality of data, information management, connectivity and persistence. These new technologies are evolving at the same time as the ubiquitous IOT, fog computing, and blockchain pipelines have emerged creating even more complexity and potentially hyper-coherence issues.

In this talk I will discuss the journey so far in data-intensive computational science, and consider the chasms we have yet to cross.

2017 Leptoukh Lecture

Kirk Martinez University of Southampton »

Earth sensing: from ice to the Internet of Things

The evolution of technology has led to improvements in our ability to use sensors for earth science research. Radio communications have improved in terms of range and power use. Miniaturisation means we now use 32 bit processors with embedded memory, storage and interfaces. Sensor technology makes it simpler to integrate devices such as accelerometers, compasses, gas and biosensors. Programming languages have developed so that it has become easier to create software for these systems. This combined with the power of the processors has made research into advanced algorithms and communications feasible. The term environmental sensor networks describes these advanced systems which are designed specifically to take sensor measurements in the natural environment.

Through a decade of research into sensor networks, deployed mainly in glaciers, many areas of this still emerging technology have been explored. From deploying the first subglacial sensor probes with custom electronics and protocols we learnt tuning to harsh environments and energy management. More recently installing sensor systems in the mountains of Scotland has shown that standards have allowed complete internet and web integration.

This talk will discuss the technologies used in a range of recent deployments in Scotland and Iceland focussed on creating new data streams for cryospheric and climate change research.

2016 Leptoukh Lecture

Cynthia Chandler, Woods Hole Oceanographic Institution

Data, data everywhere…

The scientific research endeavor requires data, and in some cases massive amounts of complex and highly diverse data. From experimental design, through data acquisition and analysis, hypothesis testing, and finally drawing conclusions, data collection and proper stewardship are critical to science. Even a single experiment conducted by a single researcher will produce data to test the working hypothesis. The types of complex science questions being tackled today often require large, diverse, multi-disciplinary teams of researchers who must be prepared to exchange their data.

This 2016 AGU Leptoukh Lecture comprises a series of vignettes that illustrate a brief history of data stewardship: where we have come from, how and why we have arrived where we are today, and where we are headed with respect to data management. The specific focus will be on management of marine ecosystem research data and will include observations on the drivers, challenges, strategies, and solutions that have evolved over time. The lessons learned should be applicable to other disciplines and the hope is that many will recognize parallels in their chosen domain.

From historical shipboard logbooks to the high-volume, digital, quality-controlled ocean science data sets created by today’s researchers, there have been enormous changes in the way ocean data are collected and reported. Rapid change in data management practices is being driven by new data exchange requirements, by modern expectations for machine-interoperable exchange, and by the desire to achieve research transparency. Advances in technology and cultural shifts contribute to the changing conditions through which data managers and informatics specialists must navigate.

The unique challenges associated with collecting and managing environmental data, complicated by the onset of the big data era, make this a fascinating time to be responsible for data. It seems there are data everywhere, being collected by everyone, for all sorts of reasons, and people have recognized the value of access to data. Properly managed and documented data, freely available to all, hold enormous potential for reuse beyond the original reason for collection.

2015 Leptoukh Lecture

Dawn Wright, Environmental Systems Research Institute

Toward a Digital Resilience (with a Dash of Location Enlightenment)

The AGU Earth and Space Science Informatics Focus Group addresses a compelling array of research questions and projects. This year’s session topics range from large-scale data management within global cyberinfrastructures or virtual observatories, to intelligent systems theory, semantics, and handling of near-real-time data streams, to issues of “dark data,” data transparency, reproducibility, and more. The aim of this lecture is to build in part on these themes but to consider more broadly how we might push the boundaries of informatics knowledge more along the lines of use-inspired science (responsive to the needs and perspectives of society while still being fundamental and cutting edge). To wit, as we contend with human impacts on the biosphere recent innovations in computational and data science are now facilitating community resilience to climate change (e.g., helping communities to monitoring air quality or drought, find available drinking water, determine habitat vulnerability, etc.). But not often discussed is a path toward digital resilience. If digital tools are to continue helping communities, it stands to reason that they must engender some resilience themselves. The capacity to deal effectively with change and threats, to recover quickly from challenges or difficulties, even to withstand stress and catastrophe, can apply to data too. As investments in digital data continue to rise, we find ourselves in new “digital world order” comprised of ubiquitous technologies from satellites to wristwatches to human biochip implants. And a significant proportion of these are geospatial, given the incredible power of maps to communicate, persuade, inspire, understand, and elicit action. Therefore, the lecture reviews and recommends seven fundamental digital research and communication practices. The aim is ensuring not only a modicum of resilience for our nascent discipline, but in prototyping and delivering repeatable solutions that all can use to help guide the planet towards a more resilient future.

2014 Leptoukh Lecture

Bryan Lawrence

Director of Models and Data at the UK National Centre for Atmospheric Science, Professor of Weather and Climate Computing at the University of Reading, and the Director of the STFC Centre for Environmental Data Archival (CEDA).

Trends in Computing for Climate Research

The grand challenges of climate science will stress our informatics infrastructure severely in the next decade. Our drive for ever greater simulation resolution/complexity/length/repetition, coupled with new remote and in-situ sensing platforms present us with problems in computation, data handling, and information management, to name but three. These problems are compounded by the background trends: Moore’s Law is no longer doing us any favours: computing is getting harder to exploit as we have to bite the parallelism bullet, and Kryder’s Law (if it ever existed) isn’t going to help us store the data volumes we can see ahead. The variety of data, the rate it arrives, and the complexity of the tools we need and use, all strain our ability to cope. The solutions, as ever, will revolve around more and better software, but “more” and “better” will require some attention.

In this talk we discuss how these issues have played out in the context of CMIP5, and might be expected to play out in CMIP6 and successors. Although the CMIPs will provide the thread, we will digress into modelling per se, regional climate modelling (CORDEX), observations from space (Obs4MIPs and friends), climate services (as they might play out in Europe), and the dependency of progress on how we manage people in our institutions. It will be seen that most of the issues we discuss apply to the wider environmental sciences, if not science in general. They all have implications for the need for both sustained infrastructure and ongoing research into environmental informatics.

2013 Leptoukh Lecture
Simon Cox

Simon Cox
CSIRO

Simon Cox is a Senior Principal Research Scientist at CSIRO. He trained as geophysicist, with a PhD in experimental rock mechanics from Columbia (Lamont-Doherty) following degrees in geological sciences at Cambridge and Imperial College London. He came to Australia for a post-doc with CSIRO, and then spent four years teaching at Monash University in Melbourne where he first began using GIS. Returning to CSIRO in Perth in 1994 to work on information management for the Australian Geodynamics CRC, he moved its focus for reporting onto the emerging World Wide Web, deploying a web-mapping system for Australian geology and geophysics in 1995. The challenge of maintaining the AGCRC website led to metadata-based systems, and Simon’s engagement with the standards community when he joined the Dublin Core Advisory Council.

Work on XML-based standards for mineral exploration data led on to foundation of the GeoSciML project in collaboration with a number of geological surveys. An interest in tying these into broader interoperability systems led to engagement with the Open Geospatial Consortium, where he co-edited the Geography Markup Language (GML) v2 and v3. In OGC he developed Observations and Measurements as a common language for in situ, ex situ and remote sensing, going on to become an ISO standard, and forming the basis for operational systems in diverse fields including air-traffic, water data transfer and environmental monitoring applications. In 2009-10 he spent a year as a senior fellow at the EC Joint Research Centre in Italy working on integration of GEOSS and INSPIRE. He served on the council of the IUGS Commission for Geoscience Information and the International Association for Mathematical Geosciences. In 2006 he was awarded OGC’s highest honor, the Gardels medal. He has been a member of AGU since 1982. Simon is currently based in CSIRO Land and Water in Melbourne, working on a variety of projects across environmental informatics and spatial data systems.

2013 presentation at AGU »

2012 Leptoukh Lecture

Christopher Lynnes
Bridging Informatics and Earth Science: a Look at Gregory Leptoukh’s Contributions

With the tragic passing this year of Gregory Leptoukh, the Earth and Space Sciences community lost a tireless participant in – and advocate for – science informatics. Throughout his career at NASA, Dr. Leptoukh established a theme of bridging the gulf between the informatics and science communities. Nowhere is this more evident than his leadership in the development of Giovanni (GES DISC Interactive Online Visualization ANd aNalysis Infrastructure). Giovanni is an online tool that serves to hide the oftencomplex technical details of data format and structure, making science data easier to explore and use by Earth scientists. To date Giovanni has been acknowledged as a contributor in 500-odd scientific articles. In recent years, Leptoukh concentrated his efforts on multi-sensor data inter-comparison, merging and fusion.
This work exposed several challenges at the intersection of data and science. One of these was the ease with which a naive user might generate spurious comparisons, a potential hazard that was the genesis of the Multi-sensor Data Synergy Advisor (MDSA). The MDSA uses semantic ontologies and inference rules to organize knowledge about dataset quality and other salient characteristics in order to advise users on potential caveats for comparing or merging two datasets. Recently, Leptoukh also led the development of AeroStat, an online Giovanni instance to investigate aerosols via statistics from station and satellite comparisons and merged maps of data from more than one instrument. Aerostat offers a neural net based bias adjustment to "harmonize" the data by removing systematic offsets between datasets before merging. These examples exhibit Leptoukh's talent for adopting advanced computer technologies in the service of making science data more accessible to researchers. In this, he set an example that is at once both vital and challenging for the ESSI community to emulate.

2021 Leptoukh Lecture

Charles Zender
Professor of Earth System Science (ESS) and of Computer Science, University of California, Irvine

What Geoscientists Want: Short and Sweet Commands with Eco-friendly Data

The twin pressures to achieve mind-share and to harness available computing power drive the evolution of geoscientific data analysis tools. Such tools have enabled a remarkable progression in the atomic or fundamental unit of data they can easily analyze. In the mid-1980s we analyzed one or a few naked arrays at at time, and now researchers routinely intercompare climatological ensembles each comprising thousands of files of heterogeneous variables richly dressed in metadata. Two complementary semantic trends have empowered this analytical revolution: more intuitive and concise analysis commands that can exploit more standardized and brokered self-describing data stores. This talk highlights how tool developers can leverage these trends to successfully imagine and build the analysis tools of tomorrow by understanding the needs of domain researchers and the power of domain specific languages today.

This talk will also highlight recent improvements in compression speed and interoperability that geoscientists can exploit to reduce our carbon footprint. Observations and simulations to advance Earth system sciences generate exabytes of archived data per year. Storage accounts for about 40% of datacenter power consumption, with its attendant consequences for greenhouse gas emissions and environmental sustainability. Precision-preserving lossy compression can further reduce the size of losslessly compressed data by 10-25% without compromising its scientific content. Modern lossless codecs (e.g., Zstandard or Zlib-ng) accelerate compression and decompression, relative to the traditional Zlib, by factors of 2-5x with no penalty in compression ratio. These proven modern compression technologies can help geoscientific datacenters become significantly greener.

2021 Leptoukh Lecture

Charles Zender
Professor of Earth System Science (ESS) and of Computer Science, University of California, Irvine

What Geoscientists Want: Short and Sweet Commands with Eco-friendly Data

The twin pressures to achieve mind-share and to harness available computing power drive the evolution of geoscientific data analysis tools. Such tools have enabled a remarkable progression in the atomic or fundamental unit of data they can easily analyze. In the mid-1980s we analyzed one or a few naked arrays at at time, and now researchers routinely intercompare climatological ensembles each comprising thousands of files of heterogeneous variables richly dressed in metadata. Two complementary semantic trends have empowered this analytical revolution: more intuitive and concise analysis commands that can exploit more standardized and brokered self-describing data stores. This talk highlights how tool developers can leverage these trends to successfully imagine and build the analysis tools of tomorrow by understanding the needs of domain researchers and the power of domain specific languages today.

This talk will also highlight recent improvements in compression speed and interoperability that geoscientists can exploit to reduce our carbon footprint. Observations and simulations to advance Earth system sciences generate exabytes of archived data per year. Storage accounts for about 40% of datacenter power consumption, with its attendant consequences for greenhouse gas emissions and environmental sustainability. Precision-preserving lossy compression can further reduce the size of losslessly compressed data by 10-25% without compromising its scientific content. Modern lossless codecs (e.g., Zstandard or Zlib-ng) accelerate compression and decompression, relative to the traditional Zlib, by factors of 2-5x with no penalty in compression ratio. These proven modern compression technologies can help geoscientific datacenters become significantly greener.

2022 Leptoukh Lecture

Paul Wessel
Department of Earth SciencesSchool of Ocean and Earth Science and TechnologyUniversity of Hawai'i at Manoa

The Generic Mapping Tools and Animations for the Masses

For his pioneering effort in developing widely used open-source software for the Earth science community, specifically the globally recognized Generic Mapping Tools (GMT). With fellow Columbia University graduate student Walter H.F. Smith, Wessel initiated the GMT development in the late 1980s. While at UH he has secured continuous US National Science Foundation funding for GMT since 1993. Today a whole generation of scientists has used GMT for almost 30 years, and it is ingrained in their mission-critical workflows.GMT is widely used in marine geology and geophysics, solid earth geophysics, geodynamics and oceanography, as well as planetary geoscience applications. GMT also provides a foundation for other open-source software used by the scientific community and is available for all operating systems and via wrappers for the Python, Julia, and MATLAB environments.