Handling uncertainty in citizen science data: Towards an improved amateur-based large-scale classification Jiménez, Manuel Triguero, Isaac John, Robert Citizen science Classification Astroinformatics Galaxy morphologies Uncertainty Data analysis The work of M. Jiménez was funded by a Ph.D. scholarship from the School of Computer Science of the University of Nottingham. Citizen Science, traditionally known as the engagement of amateur participants in research, is showing great potential for large-scale processing of data. In areas such as astronomy, biology, or geo-sciences, where emerging technologies generate huge volumes of data, Citizen Science projects enable image classification at a rate not possible to accomplish by experts alone. However, this approach entails the spread of biases and uncertainty in the results, since participants involved are typically non-experts in the problem and hold variable skills. Consequently, the research community tends not to trust Citizen Science outcomes, claiming a generalised lack of accuracy and validation. We introduce a novel multi-stage approach to handle uncertainty within data labelled by amateurs in Citizen Science projects. Firstly, our method proposes a set of transformations that leverage the uncertainty in amateur classifications. Then, a hybridisation strategy provides the best aggregation of the transformed data for improving the quality and confidence in the results. As a case study, we consider the Galaxy Zoo, a project pursuing the labelling of galaxy images. A limited set of expert classifications allow us to validate the experiments, confirming that our approach is able to greatly boost accuracy and classify more images with respect to the state-of-art. 2026-01-13T07:41:12Z 2026-01-13T07:41:12Z 2019-04 journal article Published version: Jiménez, Manuel et al. Handling uncertainty in citizen science data: Towards an improved amateur-based large-scale classification. Information Sciences Volume 479, April 2019, Pages 301-320. https://doi.org/10.1016/j.ins.2018.12.011 https://hdl.handle.net/10481/109576 10.1016/j.ins.2018.12.011 eng http://creativecommons.org/licenses/by/4.0/ open access Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License Atribución 4.0 Internacional Elsevier