TimeSpec4LULC: a global multispectral time series database for training LULC mapping models with machine learning
Metadatos
Afficher la notice complèteAuteur
Khaldi, Rohaifa; Alcaraz Segura, Domingo; Benhammou, Yassir; Herrera Triguero, Francisco; Tabik, SihamEditorial
Copernicus
Date
2022-03-30Referencia bibliográfica
Khaldi, R... [et al.]. TimeSpec4LULC: a global multispectral time series database for training LULC mapping models with machine learning, Earth Syst. Sci. Data, 14, 1377–1411, [https://doi.org/10.5194/essd-14-1377-2022], 2022.
Patrocinador
DETECTOR (Universidad de Granada/FEDER) A-RNM-256-UGR18; LifeWatch SmartEcoMountains (Ministerio de Ciencia e Innovacion/Universidad de Granada/FEDER); LifeWatch-2019-10-UGR-01; BBVA DeepSCOP (Ayudas Fundacion BBVA a Equipos de Investigacion Cientifica 2018); DeepL-ISCO (Ministerio de Ciencia e Innovacion/FEDER) A-TIC-458-UGR18; SMART-DASCI (Ministerio de Ciencia e Innovacion/Universidad de Granada/FEDER) TIN2017-89517-P; BigDDL-CET (Ministerio de Ciencia e Innovacion/Universidad de Granada/FEDER) P18-FR-4961; RESISTE (Consejeria de Economia, Conocimiento y Universidad from the Junta de Andalucia/FEDER) P18-RT-1927; European Commission 641762; Conselleria de Educacion, Cultura y Deporte de la Generalitat Valenciana; European Social Fund (ESF) APOSTD/2021/188; European Research Council (ERC); European Commission 647038/BIODESERT; Group on Earth Observations and Google Earth Engine (Essential Biodiversity Variables -ScaleUp project) PID2020-119478GB-I00Résumé
Land use and land cover (LULC) mapping are of paramount importance to monitor and understand
the structure and dynamics of the Earth system. One of the most promising ways to create accurate global
LULC maps is by building good quality state-of-the-art machine learning models. Building such models requires
large and global datasets of annotated time series of satellite images, which are not available yet. This
paper presents TimeSpec4LULC (https://doi.org/10.5281/zenodo.5913554; Khaldi et al., 2022), a smart opensource
global dataset of multispectral time series for 29 LULC classes ready to train machine learning models.
TimeSpec4LULC was built based on the seven spectral bands of the MODIS sensors at 500m resolution, from
2000 to 2021, and was annotated using spatial–temporal agreement across the 15 global LULC products available
in Google Earth Engine (GEE). The 22-year monthly time series of the seven bands were created globally
by (1) applying different spatial–temporal quality assessment filters on MODIS Terra and Aqua satellites; (2) aggregating
their original 8 d temporal granularity into monthly composites; (3) merging TerraCAqua data into a
combined time series; and (4) extracting, at the pixel level, 6 076 531 time series of size 262 for the seven bands
along with a set of metadata: geographic coordinates, country and departmental divisions, spatial–temporal consistency
across LULC products, temporal data availability, and the global human modification index. A balanced
subset of the original dataset was also provided by selecting 1000 evenly distributed samples from each class
such that they are representative of the entire globe. To assess the annotation quality of the dataset, a sample of
pixels, evenly distributed around the world from each LULC class, was selected and validated by experts using
very high resolution images from both Google Earth and Bing Maps imagery. This smartly, pre-processed, and
annotated dataset is targeted towards scientific users interested in developing various machine learning models,
including deep learning networks, to perform global LULC mapping.