Evaluating Machine Learning methods for estimation in online surveys with superpopulation modeling
MetadataShow full item record
Superpopulation modelingMachine LearningOnline surveysSimulation
Ramón Ferri-García, Luis Castro-Martín, María del Mar Rueda, Evaluating Machine Learning methods for estimation in online surveys with superpopulation modeling, Mathematics and Computers in Simulation, Volume 186, 2021, Pages 19-28, ISSN 0378-4754, https://doi.org/10.1016/j.matcom.2020.03.005.
SponsorshipMinisterio de Economía y Competitividad, Spain; Ministerio de Ciencia, Innovación y Universidades, Spain
Online surveys, despite their cost and effort advantages, are particularly prone to selection bias due to the differences between target population and potentially covered population (online population). This leads to the unreliability of estimates coming from online samples unless further adjustments are applied. Some techniques have arisen in the last years regarding this issue, among which superpopulation modeling can be useful in Big Data context where censuses are accessible. This technique uses the sample to train a model capturing the behavior of a target variable which is to be estimated, and applies it to the nonsampled individuals to obtain population-level estimates. The modeling step has been usually done with linear regression or LASSO models, but machine learning (ML) algorithms have been pointed out as promising alternatives. In this study we examine the use of these algorithms in the online survey context, in order to evaluate and compare their performance and adequacy to the problem. A simulation study shows that ML algorithms can effectively volunteering bias to a greater extent than traditional methods in several scenarios.