Lexicon-Based, Naïve Bayes, C4.5 for Analyzing Visitor Data Reviews as Recommendations for Priority Development of Labuan Bajo Tourism

Arnoldus Janssen Dahur, Defitroh Chen Sami'un

Abstract


Labuan Bajo is one of Indonesia’s national priority destinations, known for its natural beauty and unique ecosystem. However, several issues raised by tourists—such as complaints regarding prices, infrastructure, and environmental cleanliness—may affect its image and tourism sustainability. This study aims to analyze tourist perceptions of Labuan Bajo based on 7,000 reviews from TripAdvisor and Google Maps obtained through web crawling. Sentiment labeling was conducted using a Lexicon-Based approach, while classification was performed using the Naïve Bayes and C4.5 algorithms, both with and without the Synthetic Minority Oversampling Technique (SMOTE). The results showed 4,374 positive, 2,769 negative, and 1,804 neutral reviews. Based on the CRISP-DM method, Naïve Bayes achieved the highest accuracy of 88%, compared to 78% for C4.5. Dominant positive terms such as beautiful, stunning, and sustainable highlight Labuan Bajo’s natural strengths, while negative terms like price, toilet, and trash indicate areas requiring improvement. The findings provide strategic recommendations to enhance tourism management and service quality toward sustainable tourism development.


Keywords


Labuan Bajo; Lexicon Based; Naïve Bayes; C4.5; Priorities; SMOTE

Full Text:

PDF

References


N. Hikmah, N. K. Fauziyah, M. Septiani, and D. M. Lasari, “Healing Sebagai Strategi Coping Stress Melalui Pariwisata,” Indones. J. Tour. Leis., vol. 3, no. 2, pp. 113–124, 2022, doi: 10.36256/ijtl.v3i2.308.

F. J. Amarrohman, M. Awaluddin, B. D. Yuwono, and A. Arifin, “Analisis Keberadaan Kepulauan Seribu Terhadap Batas Pengelolaan Laut Provinsi Dki Jakarta,” Elipsoida J. Geod. dan Geomatika, vol. 3, no. 01, pp. 87–91, 2020, doi: 10.14710/elipsoida.2020.7754.

H. L. L. Lada, “Komunikasi Pariwisata dalam Pengembangan Destinasi Wisata Premium Berbasis Pemberdayaan Masyarakat di Labuan Bajo,” Bull. Community Engagem., vol. 4, no. 3, pp. 57–67, 2024.

Y. A. Singgalen, “Analisis Performa Algoritma NBC, DT, SVM dalam Klasifikasi Data Ulasan Pengunjung Candi Borobudur Berbasis CRISP-DM,” Build. Informatics, Technol. Sci., vol. 4, no. 3, pp. 1634–1646, 2022, doi: 10.47065/bits.v4i3.2766.

Y. A. Singgalen, “Analisis Sentimen Pengunjung Pulau Komodo dan Pulau Rinca di Website Tripadvisor Berbasis CRISP-DM,” vol. 4, no. 2, pp. 614–625, 2023, doi: 10.47065/josh.v4i2.2999.

R. D. R. Apriliansyah, R. Astuti, W. Prihartono, and R. Hamonangan, “Penerapan Algoritma Naive Bayes Untuk Analisis Sentimen Pengunjung Di Pantai Kejawanan,” J. Inform. dan Tek. Elektro Terap., vol. 13, no. 1, 2025, doi: 10.23960/jitet.v13i1.5774.

V. A. Savitri, M. Sa’id, H. Husni, and A. Muntasa, “A sentiment analysis of madura island tourism news using C4.5 algorithm,” J. Soft Comput. Explor., vol. 5, no. 1, pp. 9–17, 2024, doi: 10.52465/joscex.v5i1.258.

Y. A. Gerhana, I. Fallah, W. B. Zulfikar, D. S. Maylawati, and M. A. Ramdhani, “Comparison of naive Bayes classifier and C4.5 algorithms in predicting student study period,” J. Phys. Conf. Ser., vol. 1280, no. 2, 2019, doi: 10.1088/1742-6596/1280/2/022022.

K. Munawaroh and A. Alamsyah, “Performance Comparison of SVM, Naïve Bayes, and KNN Algorithms for Analysis of Public Opinion Sentiment Against COVID-19 Vaccination on Twitter,” J. Adv. Inf. Syst. Technol., vol. 4, no. 2, pp. 113–125, 2023, doi: 10.15294/jaist.v4i2.59493.

S. Girendra Wardhani and A. Kurniawati, “Implementation of K-Nearest Neighbor Algorithm for Creditworthiness Analysis Using Methods Cross-Industry Standard Process for Data Mining (CRISP-DM),” Int. Res. J. Adv. Eng. Sci., vol. 10, no. 1, pp. 152–157, 2025, [Online]. Available: https://archive.ics.uci.edu/dataset/144/statelog+german+credit

D. P. Isnarwaty and I. Irhamah, “Text Clustering pada Akun TWITTER Layanan Ekspedisi JNE, J&T, dan Pos Indonesia Menggunakan Metode Density-Based Spatial Clustering of Applications with Noise (DBSCAN) dan K-Means,” J. Sains dan Seni ITS, vol. 8, no. 2, pp. 2–9, 2020, doi: 10.12962/j23373520.v8i2.49094.

D. Indra, J. Endro, and W. Amien, “Sentiment Analysis of Customer Reviews Using Support Vector Machine and Smote-Tomek Links For Identify Customer Satisfaction,” vol. 01, pp. 1–9, 2023, doi: 10.21456/vol13iss1pp1-9.

Syahril Dwi Prasetyo, Shofa Shofiah Hilabi, and Fitri Nurapriani, “Analisis Sentimen Relokasi Ibukota Nusantara Menggunakan Algoritma Naïve Bayes dan KNN,” J. KomtekInfo, vol. 10, pp. 1–7, 2023, doi: 10.35134/komtekinfo.v10i1.330.

C. Villavicencio, J. J. Macrohon, X. A. Inbaraj, J. H. Jeng, and J. G. Hsieh, “Twitter sentiment analysis towards covid-19 vaccines in the Philippines using naïve bayes,” Inf., vol. 12, no. 5, 2021, doi: 10.3390/info12050204.

S. Dwiasnati and Y. Devianto, “Utilization of Prediction Data for Prospective Decision Customers Insurance Using the Classification Method of C.45 and Naive Bayes Algorithms,” J. Phys. Conf. Ser., vol. 1179, no. 1, 2019, doi: 10.1088/1742-6596/1179/1/012023.

M. Y. Aldean, P. Paradise, and N. A. Setya Nugraha, “Analisis Sentimen Masyarakat Terhadap Vaksinasi Covid-19 di Twitter Menggunakan Metode Random Forest Classifier (Studi Kasus: Vaksin Sinovac),” J. Informatics, Inf. Syst. Softw. Eng. Appl., vol. 4, no. 2, pp. 64–72, 2022, doi: 10.20895/inista.v4i2.575.

J. Prasetya, “Penerapan Klasifikasi Naive Bayes dengan Algoritma Random Oversampling dan Random Undersampling pada Data Tidak Seimbang Cervical Cancer Risk Factors,” Leibniz J. Mat., vol. 2, no. 2, pp. 11–22, 2022, doi: 10.59632/leibniz.v2i2.173.




DOI: https://doi.org/10.31284/j.iptek.2025.v29i2.8178

Refbacks

  • There are currently no refbacks.


Indexed by:
SINTA logo Google Scholar logo Dimensions logo GARUDA logo Crossref logo Worldcat logo Base logo Scilit logo