On the use of the observation-wise k-fold operation in PCA cross-validation
Metadatos
Mostrar el registro completo del ítemMateria
Cross-validation Principal component analysis dimensionality assessment
Fecha
2015-07Referencia bibliográfica
Saccenti, E., and Camacho, J. ( 2015), On the use of the observation‐wise k‐fold operation in PCA cross‐validation. J. Chemometrics, 29, 467– 478. doi: 10.1002/cem.2726.
Resumen
Cross-validation (CV) is a common approach for determining the optimal number of components in a principal component analysis model. To guarantee the
independence between model testing and calibration, the observation-wise k-fold
operation is commonly implemented in each cross-validation step. This operation renders the CV algorithm computationally intensive and it is the main
limitation to apply CV on very large data sets. In this paper we carry out an
empirical and theoretical investigation of the use of this operation in the element
wise k-fold (ekf ) algorithm, the state-of-the-art CV algorithm. We show that
when very large data sets need to be cross-validated and the computational time
is a matter of concern, the observation-wise k-fold operation can be skipped.
The theoretical properties of the resulting modi ed algorithm, referred to as
column wise k-fold (ckf ) algorithm, are derived. Also, its performance is evaluated with several arti cial and real data sets. We suggest the ckf algorithm
to be a valid alternative to the standard ekf to reduce the computational time
needed to cross-validate a data set.