Object-Centric Masked Image Modelling for Self-Supervised Pre-Training in Remote Sensing object Detection
Metadatos
Afficher la notice complèteEditorial
Universidad de Granada
Materia
Remote sensing technology Object Detection Object-Centric Masked Image Modelling Attention-Guided Mask Generator
Date
2024-12-31Referencia bibliográfica
AR. Sivakumaran, K. Shiva Prasanna, L. Sai Sreeja, K. Sai Srija (2024). Object-Centric Masked Image Modelling for Self-Supervised Pre-Training in Remote Sensing object Detection,Vol.15(5).265-276. ISSN 1989-9572
Résumé
The proliferation of remote sensing technologies has led to an increasing demand for effective object
detection in satellite and aerial imagery, with applications ranging from environmental monitoring to urban
planning. Traditional methods for analyzing such imagery often rely on manual inspection, which is both
time-consuming and prone to human error. While recent advancements in automated object detection have
improved efficiency, these systems frequently suffer from limitations in accurately identifying and
classifying objects due to their reliance on simplistic masking techniques and insufficient context
understanding. In this work, we propose a novel Object-Centric Masked Image Modelling (OCMIM)
algorithm designed to enhance self-supervised pre-training for remote sensing object detection. The
OCMIM algorithm comprises two key components: the Object-Centric Data Generator (OCDG) and the
Attention-Guided Mask Generator (AGMG). The OCDG component empowers the model to capture
comprehensive object-level context information, accommodating various scales and multiple categories,
thus enriching the pre-training process. Complementing this, the AGMG focuses on improving the
reconstruction of object regions by intelligently masking the most attention-worthy regions instead of
employing random masking, thereby enabling more accurate object detection and classification. Our
proposed OCMIM algorithm leverages the strengths of existing pre-trained models such as Mask R-CNN
(M-RCNN) and RetinaNet, enhancing their performance through the integration of OCDG and AGMG. For
evaluation purposes, we utilized several pre-trained models, including M-RCNN and RetinaNet, and
conducted experiments on diverse datasets such as NWPU, DIAR, and UCAS. Given the extensive training
time required for these models, we specifically employed M-RCNN in conjunction with OCMIM for
detailed experiments on the NWPU dataset.