Grupo: Diseño y análisis estadístico de encuestas por muestreo (FQM365)

Grupo: Diseño y análisis estadístico de encuestas por muestreo (FQM365) https://hdl.handle.net/10481/68300 2026-04-20T17:09:40Z Evaluation of available techniques and its combinations to address selection bias in nonprobability surveys https://hdl.handle.net/10481/104749 Evaluation of available techniques and its combinations to address selection bias in nonprobability surveys Rueda-Sánchez, Jorge Luis; Ferri García, Ramón; Rueda García, María del Mar; Cobo Rodríguez, Beatriz New survey methodologies that often produce nonprobability samples have recently become very important. However, estimates from nonprobability samples can be subject to selection bias, which is primarily caused by the lack of coverage and the respondent’s ability to decide whether or not to participate in the survey. In such cases, inclusion probabilities can be zero or unknown. When this happens, the estimators normally used in sample surveys are useless, and we must employ methods to reduce this bias. There is a wide variety of techniques to achieve this which depend on the auxiliary information available, but no study has determined which is better among all. In this paper, we briefly explain most of these methods and conduct an extended study to compare their performances. We will study superpopulation models, which require knowledge of the auxiliary variables of all individuals in the population, linear calibration, which requires the population totals of the covariates, and several techniques that use a reference probability sample, such as propensity score adjustment, propensity-adjusted probability prediction, Kernel Weighting, Statistical Matching and, Doubly Robust estimators. In addition, we compare their performance using linear regression or XGBoost as a predictive model, and the design weights in estimating inclusion probabilities or not, and with or without prior variables selection. The study was performed using five different datasets to determine which technique provides accurate and reliable estimates from nonprobability samples. A new technique for handling non-probability samples based on model-assisted kernel weighting https://hdl.handle.net/10481/104748 A new technique for handling non-probability samples based on model-assisted kernel weighting Cobo Rodríguez, Beatriz; Rueda-Sánchez, Jorge Luis; Ferri García, Ramón; Rueda García, María del Mar non-probability samples. Non-probability samples are increasingly used for their low research costs and the speed of the attainment of results, but these surveys are expected to have strong selection bias caused by several mechanisms that can eventually lead to unreliable estimates of the population parameters of interest. Thus, the classical methods of statistical inference do not apply because the probabilities of inclusion in the sample for individual members of the population are not known. Therefore, in the last few decades, new possibilities of inference from non-probability sources have appeared. Statistical theory offers different methods for addressing selection bias based on the availability of auxiliary information about other variables related to the main variable, which must have been measured in the non-probability sample. Two important approaches are inverse probability weighting and mass imputation. Other methods can be regarded as combinations of these two approaches. This study proposes a new estimation technique for non-probability samples. We call this technique model-assisted kernel weighting, which is combined with some machine learning techniques. The proposed technique is evaluated in a simulation study using data from a population and drawing samples using designs with varying levels of complexity for, a study on the relative bias and mean squared error in this estimator under certain conditions. After analyzing the results, we see that the proposed estimator has the smallest value of both the relative bias and the mean squared error when considering different sample sizes, and in general, the kernel weighting methods reduced more bias compared to based on inverse weighting. We also studied the behavior of the estimators using different techniques such us generalized linear regression versus machine learning algorithms, but we have not been able to find a method that is the best in all cases. Finally, we study the influence of the density function used, triangular or standard normal functions, and conclude that they work similarly. A case study involving a non-probability sample that took place during the COVID-19 lockdown was conducted to verify the real performance of the proposed methodology, obtain a better estimate, and control the value of the variance. Estimating response propensities in nonprobability surveys using machine learning weighted models. https://hdl.handle.net/10481/104747 Estimating response propensities in nonprobability surveys using machine learning weighted models. Ferri García, Ramón; Rueda-Sánchez, Jorge Luis; Rueda García, María del Mar; Cobo Rodríguez, Beatriz Propensity Score Adjustment (PSA) is a widely accepted method to reduce selection bias in nonprobability samples. In this approach, the (unknown) response probability of each individual is estimated in a nonprobability sample, using a reference probability sample. This, the researcher obtains a representation of the target population, reflecting the differences (for a set of auxiliary variables) between the population and the nonprobability sample, from which response probabilities can be estimated. Auxiliary probability samples are usually produced by surveys with complex sampling designs, meaning that the use of design weights is crucial to accurately calculate response probabilities. When a linear model is used for this task, maximising a pseudo log-likelihood function which involves design weights provides consistent estimates for the inverse probability weighting estimator. However, little is known about how design weights may benefit the estimates when techniques such as machine learning classifiers are used. This study aims to investigate the behaviour of Propensity Score Adjustment with machine learning classifiers, subject to the use of weights in the modelling step. A theoretical approximation to the problem is presented, together with a simulation study highlighting the properties of estimators using different types of weights in the propensity modelling step. Kernel Weighting for blending probability and non-probability survey samples https://hdl.handle.net/10481/104746 Kernel Weighting for blending probability and non-probability survey samples Rueda García, María Del Mar; Cobo Rodríguez, Beatriz; Rueda-Sánchez, Jorge Luis; Ferri García, Ramón; Castro Martín, Luis In this paper we review some methods proposed in the literature for combining a nonprobability and a probability sample with the purpose of obtaining an estimator with a smaller bias and standard error than the estimators that can be obtained using only the probability sample. We propose a new methodology based on the kernel weighting method. We discuss the properties of the new estimator when there is only selection bias and when there are both coverage and selection biases. We perform an extensive simulation study to better understand the behaviour of the proposed estimator. Estimation of the distribution function and quantiles through data integration https://hdl.handle.net/10481/104745 Estimation of the distribution function and quantiles through data integration Cobo Rodríguez, Beatriz; Martínez, Sergio; Rueda García, María del Mar collecting detailed data from individuals. Non-probability sampling is a relatively inexpensive data source, although they require special treatment because the estimate may suffer from sample selection bias. In this paper, we consider methods for integrating a non-representative volunteer sample into a probability survey. We investigate several approaches to correcting non-probability sample selection bias in the estimation of the distribution function. We combine the estimators of the distribution function that correct the selection bias with the design unbiased estimators based on the probability sample. Our methodology for combining the voluntary and probability samples can be applied to other non-linear parameters. Empirical evidence of the improvements offered by the proposed methodology is provided in simulation settings.