Unsupervised Learning Applied to the Stratification of Preterm Birth Risk in Brazil with Socioeconomic Data
MetadataShow full item record
Preterm birthClusteringUnsupervised learningPTB riskBrazil
Lopes, M.L.B., Jr.; Barbosa, R.d.M.; Fernandes, M.A.C. Unsupervised Learning Applied to the Stratification of Preterm Birth Risk in Brazil with Socioeconomic Data. Int. J. Environ. Res. Public Health 2022, 19, 5596. [https://doi.org/10.3390/ijerph19095596]
SponsorshipCoordenacao de Aperfeicoamento de Pessoal de Nivel Superior (CAPES)
Preterm birth (PTB) is a phenomenon that brings risks and challenges for the survival of the newborn child. Despite many advances in research, not all the causes of PTB are already clear. It is understood that PTB risk is multi-factorial and can also be associated with socioeconomic factors. Thereby, this article seeks to use unsupervised learning techniques to stratify PTB risk in Brazil using only socioeconomic data. Through the use of datasets made publicly available by the Federal Government of Brazil, a new dataset was generated with municipality-level socioeconomic data and a PTB occurrence rate. This dataset was processed using various unsupervised learning techniques, such as k-means, principal component analysis (PCA), and density-based spatial clustering of applications with noise (DBSCAN). After validation, four clusters with high levels of PTB occurrence were discovered, as well as three with low levels. The clusters with high PTB were comprised mostly of municipalities with lower levels of education, worse quality of public services—such as basic sanitation and garbage collection—and a less white population. The regional distribution of the clusters was also observed, with clusters of high PTB located mostly in the North and Northeast regions of Brazil. The results indicate a positive influence of the quality of life and the offer of public services on the reduction in PTB risk.