Inference from Non-Probability Surveys with Statistical Matching and Propensity Score Adjustment Using Modern Prediction Techniques
Metadatos
Mostrar el registro completo del ítemEditorial
MDPI
Materia
Nonprobability surveys Machine learning Matching Propensity score adjustment Sampling
Fecha
2020-06-01Referencia bibliográfica
Castro-Martín, L., Rueda, M. D. M., & Ferri-García, R. (2020). Inference from Non-Probability Surveys with Statistical Matching and Propensity Score Adjustment Using Modern Prediction Techniques. Mathematics, 8(6), 879. [doi: 10.3390/math8060879]
Patrocinador
Ministerio de Economia, Industria y Competitividad, Spain MTM2015-63609-R; Ministerio de Ciencia, Innovacion y Universidades, Spain FPU17/02177Resumen
Online surveys are increasingly common in social and health studies, as they provide
fast and inexpensive results in comparison to traditional ones. However, these surveys often
work with biased samples, as the data collection is often non-probabilistic because of the lack
of internet coverage in certain population groups and the self-selection procedure that many online
surveys rely on. Some procedures have been proposed to mitigate the bias, such as propensity score
adjustment (PSA) and statistical matching. In PSA, propensity to participate in a nonprobability
survey is estimated using a probability reference survey, and then used to obtain weighted estimates.
In statistical matching, the nonprobability sample is used to train models to predict the values of
the target variable, and the predictions of the models for the probability sample can be used to
estimate population values. In this study, both methods are compared using three datasets to
simulate pseudopopulations from which nonprobability and probability samples are drawn and used
to estimate population parameters. In addition, the study compares the use of linear models and
Machine Learning prediction algorithms in propensity estimation in PSA and predictive modeling
in Statistical Matching. The results show that statistical matching outperforms PSA in terms of bias
reduction and Root Mean Square Error (RMSE), and that simpler prediction models, such as linear
and k-Nearest Neighbors, provide better outcomes than bagging algorithms.