The CrowdGleason dataset: Learning the Gleason grade from crowds and experts✩
Metadata
Show full item recordAuthor
López Pérez, Miguel; Morquecho, Alba; Schmidt, Arne; Pérez Bueno, Fernando; Martín Castro, Aurelio; Mateos Delgado, Javier; Molina Soriano, RafaelEditorial
Elsevier
Materia
Computational pathology Crowdsourcing Prostate cancer
Date
2024-11-01Referencia bibliográfica
López Pérez, M. et. al. Computer Methods and Programs in Biomedicine 257 (2024) 108472. [https://doi.org/10.1016/j.cmpb.2024.108472]
Sponsorship
FEDER/Junta de Andalucía under project P20_00286, grant PID2022-140189OB-C22 funded by MICIU/ AEI/10.13039/501100011033 and by ‘‘ERDF/EU’’; Grants JDC2022-048318-I and JDC2022-048784-I, respectively, funded by MICIU/AEI/10.13039/501100011033 the European Union ‘‘NextGenerationEU’’/PRTRAbstract
Background: Currently, prostate cancer (PCa) diagnosis relies on the human analysis of prostate biopsy
Whole Slide Images (WSIs) using the Gleason score. Since this process is error-prone and time-consuming,
recent advances in machine learning have promoted the use of automated systems to assist pathologists.
Unfortunately, labeled datasets for training and validation are scarce due to the need for expert pathologists
to provide ground-truth labels.
Methods: This work introduces a new prostate histopathological dataset named CrowdGleason, which consists
of 19,077 patches from 1045 WSIs with various Gleason grades. The dataset was annotated using a
crowdsourcing protocol involving seven pathologists-in-training to distribute the labeling effort. To provide a
baseline analysis, two crowdsourcing methods based on Gaussian Processes (GPs) were evaluated for Gleason
grade prediction: SVGPCR, which learns a model from the CrowdGleason dataset, and SVGPMIX, which
combines data from the public dataset SICAPv2 and the CrowdGleason dataset. The performance of these
methods was compared with other crowdsourcing and expert label-based methods through comprehensive
experiments.
Results: The results demonstrate that our GP-based crowdsourcing approach outperforms other methods for
aggregating crowdsourced labels (𝜅�� = 0.7048 ± 0.0207) for SVGPCR vs.(𝜅�� = 0.6576 ± 0.0086) for SVGP with
majority voting). SVGPCR trained with crowdsourced labels performs better than GP trained with expert
labels from SICAPv2 (𝜅�� = 0.6583 ± 0.0220) and outperforms most individual pathologists-in-training (mean
𝜅�� = 0.5432). Additionally, SVGPMIX trained with a combination of SICAPv2 and CrowdGleason achieves the
highest performance on both datasets (𝜅�� = 0.7814 ± 0.0083 and 𝜅�� = 0.7276 ± 0.0260).
Conclusion: The experiments show that the CrowdGleason dataset can be successfully used for training and
validating supervised and crowdsourcing methods. Furthermore, the crowdsourcing methods trained on this
dataset obtain competitive results against those using expert labels. Interestingly, the combination of expert and
non-expert labels opens the door to a future of massive labeling by incorporating both expert and non-expert
pathologist annotators.