Variable selection in Propensity Score Adjustment to mitigate selection bias in online surveys
Identificadores
URI: http://hdl.handle.net/10481/73063Metadatos
Mostrar el registro completo del ítemEditorial
Springer
Materia
Online surveys · Propensity Score Adjustment · Selection bias · Variable selection · Raking calibration
Fecha
2022-02-08Referencia bibliográfica
Ferri-García, Ramón; Rueda, María del Mar. Variable selection in Propensity Score Adjustment to mitigate selection bias in online surveys. Statistical Papers, Accepted: 8 February 2022
Patrocinador
Ministerio de Ciencia e Innovación, Spain [Grant No. PID2019-106861RBI00/AEI/10.13039/501100011033]. FPU grant from Ministerio de Ciencia, Innovación y Universidades. Funding for open access charge: Universidad de Granada / CBUA Spain. IMAG-Maria de Maeztu CEX2020-001105-M/AEI/10.13039/501100011033Resumen
The development of new survey data collection methods such as online surveys has
been particularly advantageous for social studies in terms of reduced costs, immediacy
and enhanced questionnaire possibilities. However, many such methods are strongly
affected by selection bias, leading to unreliable estimates. Calibration and Propensity
Score Adjustment (PSA) have been proposed as methods to remove selection bias in
online nonprobability surveys. Calibration requires population totals to be known for
the auxiliary variables used in the procedure, while PSA estimates the volunteering
propensity of an individual using predictive modelling. The variables included in
these models must be carefully selected in order to maximise the accuracy of the final
estimates. This study presents an application, using synthetic and real data, of variable
selection techniques developed for knowledge discovery in data to choose the best
subset of variables for propensity estimation.We also compare the performance of PSA
using different classification algorithms, after which calibration is applied. We also
present an application of this methodology in a real-world situation, using it to obtain
estimates of population parameters. The results obtained show that variable selection
using appropriate methods can provide less biased and more efficient estimates than
using all available covariates