Talk / Overview

Foundation models are artificial intelligence (AI) models that are pre-trained on large unlabeled datasets through self-supervision and then fine-tuned for different downstream tasks. There is increasing interest in the scientific community to investigate whether this approach can be successfully applied to domains beyond natural language processing and computer vision to effectively build generalist AI models. Here, we introduce Prithvi, a geospatial AI foundation model pre-trained on large source of multispectral satellite imagery from the NASA Harmonized Landsat-Sentinel 2 (HLS). Prithvi is a Temporal Vision Transformer that includes positional and temporal embeddings, which was trained on IBM Cloud Vela cluster (NVIDIA A100 GPUs) using a Masked Auto Encoder approach and Mean Absolute Error loss function for a total of 10k GPUs hours. Benchmarking downstream tasks such as flood mapping and burn scar identification, Prithvi could successfully be fine-tuned to produce state-of-the-art AI models for Earth observation tasks with the potential to achieve peak performance on test data quicker and with less training. As example use cases, we consider the applicability of geospatial AI foundation models to monitor reforestation activities in Kenya’s National Water Towers. The pre-trained model and fine-tuning workflows are available open-source on Hugging Face (https://huggingface.co/ibm-nasa-geospatial).

Talk / Speakers

Julian Kuehnert

Research Scientist at IBM Research Africa | Climate & Sustainability

AMLD / Global partners