Variables Selection from the Patterns of the Features Applied to Spectroscopic Data—An Application Case
Metadatos
Mostrar el registro completo del ítemEditorial
MDPI
Materia
big data dimension reduction features
Fecha
2024-12-29Referencia bibliográfica
Romero Béjar, J.L. & Esquivel Sánchez, F.J. & Esquivel Guerrero, J.A. Mathematics 2025, 13, 99 [https://doi.org/10.3390/math13010099]
Resumen
Spectroscopic data allows for the obtaining of relevant information about the
composition of samples and has been used for research in scientific disciplines such as chemistry,
geology, archaeology, Mars research, pharmacy, and medicine, as well as important
industrial use. In archaeology, it allows the characterization and classification of artifacts
and ecofacts, the analysis of patterns, the characterization and study of the exchange of
materials, etc. Spectrometers provide a large amount of data, the so-called “big data” type,
which requires the use of multivariate statistical techniques, mainly principal component
analysis, cluster analysis, and discriminant analysis. This work is focused on reducing
the dimensionality of the data by selecting a small subset of variables to characterize the
samples and presents a mathematical methodology for the selection of the most efficient
variables. The objective is to identify a subset of variables based on spectral features that
allow characterization of the samples under study with the least possible errors when
performing quantitative analyses or discriminations between different samples. The subset
is not predetermined and, in each case, is obtained for each set of samples based on the
most important features of the samples under study, which allows for a good fit to the
data. The reduction of the number of variables to an important performance based on
the previously chosen difference between features, with a great fit to the raw data. Thus,
instead of 2151 variables, a minimum optimal subset of 32 valleys and 31 peaks is obtained
for a minimum difference between peaks or between valleys of 20 nm. This methodology
has been applied to a sample of minerals and rocks extracted from the ECOSTRESS 1.0
spectral library.