Model improves accuracy for remote sensing interpretation

A new foundation model dubbed RingMo has been developed to improve the accuracy for remote sensing image interpretation.

This is according to the Aerospace Information Research Institute (AIR) of the Chinese Academy of Sciences (CAS).

The study titled “RingMo: A Remote Sensing Foundation Model with Masked Image Modeling” was published in IEEE Transactions on Geoscience and Remote Sensing.

Remote sensing images

Remote sensing images are applied in fields like classification and change detection, and deep learning approaches have contributed to the rapid development of remote sensing image interpretation. The most widely used training paradigm is to utilize ImageNet pre-trained models to process remote sensing data for specified tasks.

However, there are problems such as domain gap between natural and remote sensing scenes, and the poor generalization capacity of remote sensing models. Thus, it is necessary to develop a foundation model with general remote sensing feature representation. Since a large amount of unlabeled data is available, the self-supervised method is better than the fully supervised method in remote sensing.

RingMo features

The study aims to propose a remote sensing foundation model framework, which can leverage the benefits of generative self-supervised learning for remote sensing images. RingMo features a large-scale dataset constructed by collecting two million remote sensing images from satellite and aerial platforms, covering multiple scenes and objects around the world. In addition, remote sensing foundation model training method is designed for dense and small objects in complicated remote sensing scenes.

RingMo is the first generative foundation model for cross-modal remote sensing data. In the future, the model can be applied to 3D reconstruction, residential construction, transportation, water conservancy, environmental protection and other fields.

This report was first published by CAS.