Evaluation of available techniques and its combinations to address selection bias in nonprobability surveys

Rueda-Sánchez, Jorge Luis; Ferri García, Ramón; Rueda García, María del Mar; Cobo Rodríguez, Beatriz

doi:https://doi.org/10.1007/s10182-025-00530-9

Artículo principal (1.877Mb)

Identificadores

URI: https://hdl.handle.net/10481/104749

DOI: https://doi.org/10.1007/s10182-025-00530-9

Exportar

Editorial

Springer Nature

Materia

Inference

Nonprobability samples

Survey sampling

Selection bias

Variable selection

Weighted models

Fecha

2025

Patrocinador

This research was partially supported by a grant from the Ministry of Science and Innovation (PID2019-106861RB-I00, PDC2022-133293-I00, Spain), Strategic Action in Health (DTS23/00032, Spain), from IMAG-María de Maeztu CEX2020-001105-M/AEI/10.13039/501100011033, and from the Own Plan Research and Transfer, University of Granada (PPJIA2023-030, Spain).

Resumen

New survey methodologies that often produce nonprobability samples have recently become very important. However, estimates from nonprobability samples can be subject to selection bias, which is primarily caused by the lack of coverage and the respondent’s ability to decide whether or not to participate in the survey. In such cases, inclusion probabilities can be zero or unknown. When this happens, the estimators normally used in sample surveys are useless, and we must employ methods to reduce this bias. There is a wide variety of techniques to achieve this which depend on the auxiliary information available, but no study has determined which is better among all. In this paper, we briefly explain most of these methods and conduct an extended study to compare their performances. We will study superpopulation models, which require knowledge of the auxiliary variables of all individuals in the population, linear calibration, which requires the population totals of the covariates, and several techniques that use a reference probability sample, such as propensity score adjustment, propensity-adjusted probability prediction, Kernel Weighting, Statistical Matching and, Doubly Robust estimators. In addition, we compare their performance using linear regression or XGBoost as a predictive model, and the design weights in estimating inclusion probabilities or not, and with or without prior variables selection. The study was performed using five different datasets to determine which technique provides accurate and reliable estimates from nonprobability samples.

Colecciones

FQM365 - Artículos

Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution-NonCommercial-NoDerivatives 4.0 Internacional