Using principal components for estimating logistic regression with high-dimensional multicollinear data Aguilera Del Pino, Ana María Escabias Machuca, Manuel Valderrama Bonnet, Mariano José Logistic regression Multicollinearity Principal components The logistic regression model is used to predict a binary response variable in terms of a set of explicative ones. The estimation of the model parameters is not too accurate and their interpretation in terms of odds ratios may be erroneous, when there is multicollinearity (high dependence) among the predictors. Other important problem is the great number of explicative variables usually needed to explain the response. In order to improve the estimation of the logistic model parameters under multicollinearity and to reduce the dimension of the problem with continuous covariates, it is proposed to use as covariates of the logistic model a reduced set of optimum principal components of the original predictors. Finally, the performance of the proposed principal component logistic regression model is analyzed by developing a simulation study where different methods for selecting the optimum principal components are compared. 2022-03-01T12:39:03Z 2022-03-01T12:39:03Z 2006-04-10 info:eu-repo/semantics/article Ana M. Aguilera, Manuel Escabias, Mariano J. Valderrama, Using principal components for estimating logistic regression with high-dimensional multicollinear data, Computational Statistics & Data Analysis, Volume 50, Issue 8, 2006, Pages 1905-1924, ISSN 0167-9473, https://doi.org/10.1016/j.csda.2005.03.011 http://hdl.handle.net/10481/73050 https://doi.org/10.1016/j.csda.2005.03.011 eng http://creativecommons.org/licenses/by-nd/3.0/es/ info:eu-repo/semantics/embargoedAccess Atribución-SinDerivadas 3.0 España Elsevier