New Advanced Optimization Models for Diagnoses of Diseases with Imbalanced Datasets


  • Mustafa Fayez, Dr. Sefer Kurnaz


It has become important to create modern approaches for processing and interpreting medical data for the effective diagnosis of imbalanced medical data. The aim of the modern advanced optimization models developed in this study is to provide healthcare specialists with real-time feedback. Such models have been evolving to increase the quality of diagnostic research as an alternative approach. The capacity of these systems to generalize is controlled by the characteristics of the dataset used throughout their development. It was found that CVD has a very complex population incidence ratio, thereby rendering the dataset somewhat imbalanced. This problem can be solved in two stages, i.e., in both the data and algorithmic stages. First, these advanced optimization models include the Neighbourhood Cleaning Law (NCL) high-performance re-sampling technique, feature engineering, and pre-processing, with the optimized synthetic minority over-sampling technique (SMOTE). Second, we implemented our new advanced optimization models, including AutoML, advanced XGBoost, optimized random forest, and advanced models of ensemble stacking. The best accuracy was achieved at 92% using the advanced stacking model, at 91% with the optimized random forest, and at 87% using AutoML. Finally, we assume our innovations will be able to strengthen the understanding of machine learning by doctors using mature and advanced optimization machine learning technology and encourage broad clinical usage of artificial intelligence (AI) techniques.