Implementasi Algoritma SMOTE Sebagai Penyelesaian Imbalance Hight Dimensional Datasets

Rinci Kembang Hapsari, Tutuk Indriyani

Abstract


Dalam kehidupan nyata, khususnya di bidang medis, sering dijumpai klasifikasi multiclass dengan data input yang tidak seimbang, imbalanced dataset. Kelas mayor merupakan jumlah data yang lebih banyak, sedangkan kelas minor jumlahnya sedikit. Kondisi dataset yang imbalanced sangat mempengaruhi hasil akurasi proses klasifikasi. Algoritma klasifikasi akan mengalami penurunan performa jika diberikan input data yang imbalanced. Oleh karena itu, diperlukan penyeimbangan data input untuk mempertahankan performa algoritma klasifikasi. Sehingga, dalam penelitian ini diterapkan algoritma SMOTE untuk menyelesaikan permasalahan distribusi kelas yang tidak seimbang pada imbalanced dataset. Penelitian ini menggunakan 3 dataset, yaitu Dataset 1 yang terdiri dari 68 data, Dataset 2 terdiri dari 180 data, dan  Dataset 3 terdiri dari 371 data. Setelah dioperasikan dengan algoritma SMOTE, ketiga dataset tersebut menjadi data yang seimbang.


Keywords


Algoritma SMOTE; High Dimensional Datasets; Imbalanced

Full Text:

PDF

References


S Fotouhi, S Asadi, and M W Kattan 2019 A Comprehensive Data Level Analysis for Cancer Diagnosis on Imbalanced Data, Journal of Biomedical Informatics, vol. 90, no. October 2017, p. 103089.

H Sain and S W Purnami 2015 Combine Sampling Support Vector Machine for Imbalanced Data Classification, Procedia Computer Science, vol. 72, pp. 59–66.

Q Gu, X M Wang, Z Wu, B Ning, and C S Xin 2016 An Improved SMOTE Algorithm Based on Genetic Algorithm for Imbalanced Data Classification, Journal of Digital Information Management, vol. 14, no. 2, pp. 92–103.

C Jian, J Gao, and Y Ao 2016 A New Sampling Method for Classifying Imbalanced Data Based on Support Vector Machine Ensemble, Neurocomputing, vol. 193, pp. 115–122.

G Haixiang, L Yijing, J Shang, G Mingyun, H Yuanyue, and G Bing 2017 Learning from Class-Imbalanced Data: Review of Methods and Applications, Expert Systems with Applications, vol. 73, pp. 220–239.

B Krawczyk 2016 Learning from Imbalanced Data: Open Challenges and Future Directions, Progress in Artificial Intelligence, vol. 5, no. 4, pp. 221–232.

V García, J S Sánchez, and R A Mollineda 2012 On the Effectiveness of Preprocessing Methods When Dealing with Different Levels of Class Imbalance, Knowledge-Based Systems, vol. 25, no. 1, pp. 13–21.

M Buda, A Maki, and M A Mazurowski 2018 A Systematic Study of the Class Imbalance Problem in Convolutional Neural Networks, Neural Networks, vol. 106, no. March, pp. 249–259.

A Fernández, S García, M Galar, and R C Prati 2019 Learning from Imbalanced Data Sets (2018, Springer International Publishing).pdf. Berlin: Springer, 2019.

L M El Bakrawy, M A Cifci, S Kausar, and S Hussain 2022 A Modified Ant Lion Optimization Method and Its Application for Instance Reduction Problem in Balanced and Imbalanced Data, no. February.

J Gao, L Gong, J Y Wang, and Z C Mo 2019 Study on Unbalanced Binary Classification with Unknown Misclassification Costs, IEEE International Conference on Industrial Engineering and Engineering Management, vol. 2019-December, pp. 1538–1542.

E M F El Houby, N I R Yassin, and S Omran 2017 A Hybrid Approach from Ant Colony Optimization and K-Nearest Neighbor for Classifying Datasets Using Selected Features, Informatica (Slovenia), vol. 41, no. 4, pp. 495–506.

A Nikmatul Kasanah, M Muladi, and U Pujianto 2017 Penerapan Teknik SMOTE Untuk Mengatasi Imbalance Class Dalam Klasifikasi Objektivitas Berita Online Menggunakan Algoritma KNN, RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 3, no. 2, pp. 196–201.

T Astuti, S P Adipurwoko, R Diyani, R A Santosa, and B Permadi 2018 Pengaruh Seleksi Fitur Dan SMOTE Terhadap Performa Klasifikasi Ranking Mobile Legends, CITISEE, no. ISBN: 978-602-60280-1-3, pp. 113–117.

G AlMahadin, A Lotfi, M M Carthy, and P Breedon 2022 Enhanced Parkinson’s Disease Tremor Severity Classification by Combining Signal Processing with Resampling Techniques, SN Computer Science, vol. 3, no. 1, pp. 1–21.




DOI: https://doi.org/10.31284/p.snestik.2022.2868

Refbacks

  • There are currently no refbacks.


Copyright (c) 2022 Rinci Kembang Hapsari, Tutuk Indriyani

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.