文章目录
- 一、引言
- 二、单一模态
- 2.1 基础模型
- RingMo: A Remote Sensing Foundation Model With Masked Image Modeling
- Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model
- SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery
- Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning
- A Billion-scale Foundation Model for Remote Sensing Images
- SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding
- Towards Geospatial Foundation Models via Continual Pretraining
- 2.2 图像分割
- SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model
- RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model
- The Segment Anything Model (SAM) for Remote Sensing Applications: From Zero to One Shot
- RingMo-SAM: A Foundation Model for Segment Anything in Multimodal Remote-Sensing Images
- SAM-Assisted Remote Sensing Imagery Semantic Segmentation with Object and Boundary Constraints
- 三、文本-图像多模态
- 3.1 基础模型
- RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
- RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Model
- RSGPT: A Remote Sensing Vision Language Model and Benchmark
- GeoChat : Grounded Large Vision-Language Model for Remote Sensing
- REMOTE SENSING VISION-LANGUAGE FOUNDATION MODELS WITHOUT ANNOTATIONS VIA GROUND REMOTE ALIGNMENT
- 3.2 参考遥感图像分割(RRSIS)