Variable selection in Propensity Score Adjustment to mitigate selection bias in online surveys

Ferri García, Ramón; Rueda García, María Del Mar

doi:https://doi.org/10.1007/s00362-022-01296-x

bcfb7a13-5ef3-4c28-8099-d56304ad9268.pdf (474.9Kb)

Identificadores

URI: http://hdl.handle.net/10481/73063

DOI: https://doi.org/10.1007/s00362-022-01296-x

Exportar

Editorial

Springer

Materia

Online surveys · Propensity Score Adjustment · Selection bias · Variable selection · Raking calibration

Fecha

2022-02-08

Referencia bibliográfica

Ferri-García, Ramón; Rueda, María del Mar. Variable selection in Propensity Score Adjustment to mitigate selection bias in online surveys. Statistical Papers, Accepted: 8 February 2022

Patrocinador

Ministerio de Ciencia e Innovación, Spain [Grant No. PID2019-106861RBI00/AEI/10.13039/501100011033]. FPU grant from Ministerio de Ciencia, Innovación y Universidades. Funding for open access charge: Universidad de Granada / CBUA Spain. IMAG-Maria de Maeztu CEX2020-001105-M/AEI/10.13039/501100011033

Resumen

The development of new survey data collection methods such as online surveys has been particularly advantageous for social studies in terms of reduced costs, immediacy and enhanced questionnaire possibilities. However, many such methods are strongly affected by selection bias, leading to unreliable estimates. Calibration and Propensity Score Adjustment (PSA) have been proposed as methods to remove selection bias in online nonprobability surveys. Calibration requires population totals to be known for the auxiliary variables used in the procedure, while PSA estimates the volunteering propensity of an individual using predictive modelling. The variables included in these models must be carefully selected in order to maximise the accuracy of the final estimates. This study presents an application, using synthetic and real data, of variable selection techniques developed for knowledge discovery in data to choose the best subset of variables for propensity estimation.We also compare the performance of PSA using different classification algorithms, after which calibration is applied. We also present an application of this methodology in a real-world situation, using it to obtain estimates of population parameters. The results obtained show that variable selection using appropriate methods can provide less biased and more efficient estimates than using all available covariates

Colecciones

FQM365 - Artículos

Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución-NoComercial-SinDerivadas 3.0 España