Evaluation of available techniques and its combinations to address selection bias in nonprobability surveys
Identificadores
URI: https://hdl.handle.net/10481/104749Metadatos
Mostrar el registro completo del ítemAutor
Rueda-Sánchez, Jorge Luis; Ferri García, Ramón; Rueda García, María del Mar; Cobo Rodríguez, BeatrizEditorial
Springer Nature
Materia
Inference Nonprobability samples Survey sampling Selection bias Variable selection Weighted models
Fecha
2025Patrocinador
This research was partially supported by a grant from the Ministry of Science and Innovation (PID2019-106861RB-I00, PDC2022-133293-I00, Spain), Strategic Action in Health (DTS23/00032, Spain), from IMAG-María de Maeztu CEX2020-001105-M/AEI/10.13039/501100011033, and from the Own Plan Research and Transfer, University of Granada (PPJIA2023-030, Spain).Resumen
New survey methodologies that often produce nonprobability samples have recently
become very important. However, estimates from nonprobability samples can be
subject to selection bias, which is primarily caused by the lack of coverage and the
respondent’s ability to decide whether or not to participate in the survey. In such
cases, inclusion probabilities can be zero or unknown. When this happens, the estimators
normally used in sample surveys are useless, and we must employ methods
to reduce this bias. There is a wide variety of techniques to achieve this which
depend on the auxiliary information available, but no study has determined which is
better among all. In this paper, we briefly explain most of these methods and conduct
an extended study to compare their performances. We will study superpopulation
models, which require knowledge of the auxiliary variables of all individuals in
the population, linear calibration, which requires the population totals of the covariates,
and several techniques that use a reference probability sample, such as propensity
score adjustment, propensity-adjusted probability prediction, Kernel Weighting,
Statistical Matching and, Doubly Robust estimators. In addition, we compare
their performance using linear regression or XGBoost as a predictive model, and the
design weights in estimating inclusion probabilities or not, and with or without prior
variables selection. The study was performed using five different datasets to determine
which technique provides accurate and reliable estimates from nonprobability
samples.