Representation Learning in Earth Science

"In this project, we will perform large scale machine learning on historical observational data of the atmosphere to infer a description of the dynamics derived directly from real-world measurements."

The AtmoRep project asks if one can train one neural network that represents and describes all atmospheric dynamics. AtmoRep's ambition is hence to demonstrate that the concept of large-scale representation learning, whose principle feasibility and potential was established by large language models such as the GPT line by OpenAI and Google's PaLM model, is also applicable to scientific data and in particular to atmospheric dynamics. The project is enabled by the large amounts of atmospheric observations that have been made in the past as well as advances on neural network architectures and self-supervised learning that allow for effective training on petabytes of data. We aim to train on all of the ERA5 reanalysis and, furthermore, fine tune on observational data such as satellite measurements to move beyond the limits of reanalyses.

Downstream applications

The AtmoRep model is currently being adapted and tested on the following downstream applications to improve the next generation of weather and climate models.

Short-term weather forecasting

The short-term forecasting skills of the pre-trained AtmoRep model are being tested against the numerical prediction models from ECMWF (IFS) and DWD (ICON), as well as the main machine learning based forecasting algorithms now available, like FourCastNet. We are working to extend the AtmoRep forecasting skills to medium-range forecasting. We will also investigate the impact of additional variables related to the atmosphere-ocean interactions.

Downscaling

Downscaling, also known as super-resolution in computer vision, aims to increase the resolution of a coarse spatial input. For atmospheric fields, this can be achieved using weather forecasting models, which, however, is computationally expensive. With the pretrained transformer, one can use the decoder for downscaling to reach resolutions of a few kilometers. In particular, we will be using the COSMO REA6 Reanalysis dataset, which reaches a resolution of 6 km.

Bias corrections

The presence of biases in models can cause shifts and trends in reanalyses that can lead to sub-optimal forecasts. In AtmoRep we will correct for biases in the total precipitations forecast by fine tuning the pre-trained model directly on radar observations. We will use the German RadarKlimatologie (RADKLIM) dataset, which consists of a radar-based, high-resolution precipitation dataset over Germany from 2001 to the present day.

History

Presentations

Journal publications:

Talks: Posters:

Who we are

AtmoRep is a multi disciplinary collaboration among Computer Scientists from the University of Magdeburg, Earth Scientists from the Jülich Supercomputing Center and physicists from CERN.

Christian Lessig

Christian Lessig is assistant professor at Otto-von-Guericke-Universitat Magdeburg. His background is in computer science but he also works today in scientific computing and numerical analysis. In the last years, his research moved towards addressing climate change, in particular by developing hybrid weather and climate simulation models that combine classical discretizations of the governing partial differential equations with neural networks that account for phenomena that are too expensive to simulate or whose physics is not well understood.

Ilaria Luise

Ilaria Luise is a Senior Research Fellow at CERN, the European Center for Nuclear Research in Geneva. She works as a physicists within the Innovation Division of the CERN IT-Department. Her background is in high energy physics and big data management. She is Co-PI of the EMP2 project at CERN, which is part of the CERN Innovation Programme on Environmental Applications (CIPEA). The EMP2 project aims at implementing the AtmoRep model into a digital twin engine. This is performed in collaboration with the EU funded InterTwin project and the Digital Twin initiative at CERN.

Martin Schultz

Martin Schultz is the group leader of the Earth System Data Exploration research group at the Jülich Supercomputing Center. He has more than 30 years of experience in working with atmospheric data and numerical modeling of atmospheric composition and climate. He has authored and co-authored more than 130 publications and has been listed as a highly cited researcher in the field of environmental sciences in 2017 and 2020. He is an ERC Advanced Grant holder (IntelliAQ) where he explores the potential of machine learning for the analysis of air quality data.

Bing Gong

Bing Gong is a postdoctoral researcher at the Jülich Supercomputing Center since 2019. Her current duties in the group are developing state-of-art scalable deep learning neural networks with a focus on time series prediction and video frame prediction in weather and air quality applications. She obtained her Ph.D. in the field of artificial intelligence in the application of environmental science and energy from the Technical University of Madrid, Spain, in July 2017.

Michael Langguth

Michael Langguth holds a Master degree in Physics of the Earth and Atmosphere from the Rheinische Friedrich-Wilhelms-University of Bonn. During his PhD he implemented a hybrid parametrization scheme for deep convection in the ICOsahedral Non‐hydrostatic (ICON) model developed by the DWD and the MPI-M. His current research interests focus on machine learning for atmospheric Earth system, combined with expertise from numerical modelling.

Scarlet Stadtler

Scarlet Stadtler is a postdoctoral associate at the Jülich Supercomputing Centre (JSC). Her research focuses on explainable machine learning and uncertainty quantification. She is a trained meteorologist and atmospheric chemist, she applies data-driven techniques in air quality research. As PI of the KISTE project, AI strategy for Earth System data, she leads the construction of an Earth-AI software platform and Earth-AI e-learning platform.