A fusocelular skin dataset with whole slide images for deep learning models del Amor, Rocío López-Pérez, Miguel Meseguer, Pablo Morales, Sandra Terradez, Liria Aneiros Fernández, José Mateos Delgado, Javier Molina Soriano, Rafael Naranjo, Valery This work has received funding from the Spanish Ministry of Economy and Competitiveness through project PID2019-105142RB-C21 and PID2019-105142RB-C22 (AI4SkIN) and PID2022-140189OB-C21 and PID2022- 140189OB-C22 (ASSIST), and Generalitat Valenciana through project COM-TACTS2 (CIPROM/2022/20). The work of Rocío del Amor was supported by the Spanish Ministry of Universities (FPU20/05263). The work of Pablo Meseguer was funded by valgrAI - Valencian Graduate School and Research Network of Artifcial Intelligence. The work of Sandra Morales was co-funded by the Universitat Politécnica de Valéncia through the program PAID-10-20. Te work of M. López-Pérez was supported by the grant JDC2022-048318-I funded by MICIU/AEI/10.13039/501100011033 and the European Union NextGenerationEU/PRTR Cutaneous spindle cell (CSC) lesions encompass a spectrum from benign to malignant neoplasms, often posing significant diagnostic challenges. Computer-aided diagnosis systems offer a promising solution to make pathologists’ decisions objective and faster. These systems usually require large-scale datasets with curated labels for effective training; however, manual annotation is time-consuming and expensive. To overcome this challenge, crowdsourcing has emerged as a popular and valuable strategy to scale up the labeling process by distributing the effort among different non-expert annotators. This work introduces AI4SkIN, the first public dataset Whole Slide Images (WSIs) for CSC neoplasms, annotated using an innovative crowdsourcing protocol. AI4SkIN dataset contains 641 Hematoxylin and Eosin stained WSIs with multiclass labels from both expert and trainee pathologists. The dataset improves CSC neoplasm diagnosis using advanced machine learning and crowdsourcing based on Gaussian Processes, showing that models trained on non-expert labels perform comparably to those using expert labels. In conclusion, we illustrate that AI4SkIN provides a good resource for developing and validating methods for multiclass CSC neoplasm classification. 2025-06-30T11:52:53Z 2025-06-30T11:52:53Z 2025-05-14 journal article del Amor, R., López-Pérez, M., Meseguer, P. et al. A fusocelular skin dataset with whole slide images for deep learning models. Sci Data 12, 788 (2025). [DOI: 10.1038/s41597-025-05108-3] https://hdl.handle.net/10481/104987 10.1038/s41597-025-05108-3 eng http://creativecommons.org/licenses/by-nc-nd/4.0/ open access Attribution-NonCommercial-NoDerivatives 4.0 Internacional Springer Nature