A Connection Between Pattern Classification by Machine Learning and Statistical Inference With the General Linear Model
Metadatos
Afficher la notice complèteAuteur
Gorriz Sáez, Juan Manuel; Jiménez Mesa, Carmen; Segovia Román, Fermín; Ramírez, J.; SIPBA groupEditorial
IEEE
Materia
General linear model Linear Regression Model Pattern classification Upper bounds Permutation tests Cross-validation
Date
2021-08-04Referencia bibliográfica
J. M. Górriz... [et al.]. "A Connection Between Pattern Classification by Machine Learning and Statistical Inference With the General Linear Model," in IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 11, pp. 5332-5343, Nov. 2022, doi: [10.1109/JBHI.2021.3101662]
Patrocinador
Ministerio de Ciencia e Innovacion (Espana)/FEDER RTI2018-098913B100; Junta de Andalucia; European Commission CV20-45250 A-TIC-080-UGR18 P20-00525; National Health and Medical Research Council (NHMRC) of Australia 18/04902Résumé
A connection between the general linear
model (GLM) with frequentist statistical testing and machine
learning (MLE) inference is derived and illustrated.
Initially, the estimation of GLM parameters is expressed as
a Linear Regression Model (LRM) of an indicator matrix;
that is, in terms of the inverse problem of regressing the
observations. Both approaches, i.e. GLM and LRM, apply to
different domains, the observation and the label domains,
and are linked by a normalization value in the least-squares
solution. Subsequently, we derive a more refined predictive
statistical test: the linear Support Vector Machine (SVM),
that maximizes the class margin of separation within a
permutation analysis. This MLE-based inference employs
a residual score and associated upper bound to compute a
better estimation of the actual (real) error. Experimental results
demonstrate how parameter estimations derived from
each model result in different classification performance in
the equivalent inverse problem. Moreover, using real data,
the MLE-based inference including model-free estimators
demonstrates an efficient trade-off between type I errors
and statistical power.