Enhancing estimation methods for integrating probability and non-probability survey samples with machine-learning techniques. An application to a Survey on the impact of the COVID-19 pandemic in Spain
Metadatos
Afficher la notice complèteAuteur
Rueda García, María del Mar; Pasadas del Amo, Sara; Cobo Rodríguez, Beatriz; Castro Martín, Luis; Ferri García, RamónMateria
COVID-19 machine-learning techniques nonprobability surveys propensity score adjustment survey sampling
Date
2023Referencia bibliográfica
Rueda, María del Mar (AC), Pasadas del Amo, Sara; Cobo, Beatriz; Castro, Luis; Ferri, Ramón. (2023). Enhancing estimation methods for integrating probability. 65(2) and non-probability survey samples with machine-learning techniques. An application to a Survey on the impact of the COVID-19 pandemic in Spain. Biometrical Journal.
Patrocinador
The authorswould like to thank the Institute forAdvanced Social Studies at the SpanishNational Research Council (IESACSIC) for providing data and information about the Survey on the impact of the COVID-19 pandemic in Spain (ESPACOV) Survey. This study was partially supported by Ministerio de Educación y Ciencia (PID2019-106861RB-I00, Spain), IMAG-Maria de Maeztu CEX2020-001105-M/AEI/10.13039/501100011033, and FEDER/Junta de Andalucía-Consejería de Transformación Económica, Industria, Conocimiento y Universidades (FQM170-UGR20, A-SEJ-154-UGR20) and by Universidad de Granada / CBUA for open access charges.Résumé
Web surveys have replaced Face-to-Face and computer assisted telephone interviewing
(CATI) as the main mode of data collection in most countries. This trend
was reinforced as a consequence of COVID-19 pandemic-related restrictions.
However, this mode still faces significant limitations in obtaining probabilitybased
samples of the general population. For this reason, most web surveys rely
on nonprobability survey designs. Whereas probability-based designs continue
to be the gold standard in survey sampling, nonprobability web surveys may
still prove useful in some situations. For instance, when small subpopulations
are the group under study and probability sampling is unlikely to meet sample
size requirements, complementing a small probability sample with a larger
nonprobability one may improve the efficiency of the estimates. Nonprobability
samples may also be designed as a mean for compensating for known biases in
probability-basedweb survey samples by purposely targeting respondent profiles
that tend to be underrepresented in these surveys. This is the case in the Survey
on the impact of the COVID-19 pandemic in Spain (ESPACOV) that motivates
this paper. In this paper, we propose a methodology for combining probability
and nonprobabilityweb-based survey sampleswith the help ofmachine-learning
techniques. We then assess the efficiency of the resulting estimates by comparing
them with other strategies that have been used before. Our simulation study
and the application of the proposed estimation method to the second wave of the
ESPACOV Survey allow us to conclude that this is the best option for reducing
the biases observed in our data.