Please use this identifier to cite or link to this item: http://repositorio.ufla.br/jspui/handle/1/59902
Title: Avaliação de modelos de aprendizado de máquina para predição do diabetes mellitus
Other Titles: Evaluation of machine learning models for predicting diabetes mellitus
Authors: Guimaraes, Paulo Henrique Sales
Pereira, Geraldo Magela da Cruz
Oliveira, Anderson Castro Soares de
Paixão, Crysttian Arantes
Keywords: Vigitel
Aprendizado de Máquina
Machine learning
Predição de Diabetes
Prediction of Diabetes
Issue Date: 10-Apr-2025
Publisher: Universidade Federal de Lavras
Citation: MACÁRIO, Noé Osório. Avaliação de modelos de aprendizado de máquina para predição do diabetes mellitus. 2025. 92 p. Dissertação (Mestrado em Estatística e Experimentação Agropecuária) - Universidade Federal de Lavras, Lavras, 2025.
Abstract: The present work evaluates the performance of different models of machine learning (ML) in the prediction of Diabetes, a chronic condition of great relevance for the public health. Using the VIGITEL (2023) data, which include more than 21 thousand observations, a full pre- processing process was carried out, which evolved selection of variables, balancing of groups, treatment of missing values and data standardization. The analyzed programs were Decision Trees, Random Forests, Naive Bayes, Artificial Neural Nets and XGBoost. The evaluation of the performance of the models was held on the basis of metrics such as sensibility and area under the ROC curve, fundamental to identify positive cases and make an efficient discrimination of the groups. The XGBoost model stood out as the most efficient, presenting the better metrics of sensibility, specificity and area under a ROC curve in almost all approaches (considered all the variables, MIC- Maximal Information Coefficient and PCA - Principal Component Analysis), either for balanced data either unbalanced, which shows its predictive superior capacity. Contrarily, the model of Decision Tree had the worst performance, highlighting its limitations when applied to unbalanced data. The results strengthen the potential of learning machine in the earlier detection of chronic diseases, such as Diabetes, underlining its relevance to master medical diagnostics, optimize costs and give crucial support for clinical interventions more efficient.
URI: http://repositorio.ufla.br/jspui/handle/1/59902
Appears in Collections:Estatística e Experimentação Agropecuária - Mestrado (Dissertações)



This item is licensed under a Creative Commons License Creative Commons