In the evolving retail landscape, data-driven decision-making has become essential for understanding customer behavior and predicting sales trends. This study integrates clustering and classification techniques to analyze retail sales data comprising 1,000 transactions obtained from Kaggle. Using the K-Means algorithm, three optimal customer clusters were identified through the Elbow Method, achieving an average within-centroid distance of 25,272.635 and a Davies–Bouldin Index of 0.443, indicating clear cluster separation. The subsequent classification phase compared the predictive performance of three algorithms—Naïve Bayes, Decision Tree, and Random Forest—on 70:30 training-to-testing data partitions. The Naïve Bayes algorithm attained 94.67% accuracy, while both Decision Tree and Random Forest achieved perfect classification accuracy of 100%. These findings highlight the robustness and adaptability of tree-based models for complex retail datasets, outperforming probabilistic methods in terms of accuracy and generalization. The results suggest that the integration of clustering and classification provides retailers with a powerful analytical framework for identifying high-value customer segments, optimizing marketing strategies, and enhancing inventory management. Despite achieving strong outcomes, the study acknowledges dataset limitations and recommends future research involving larger and more diverse datasets, as well as additional features, to expand model scalability and predictive precision.
Retail Analytics; K-Means; Naïve Bayes; Decision Tree; Random Forest
Discover other articles with topics similar to what you're currently reading. Find more references and expand your knowledge base.
Ridha Afifah, Sugiyono
Vol. 4 No. 2 (2025): NovemberDita Tri Yuliantoro, Frencis Matheos Sarimole
Vol. 4 No. 2 (2025): NovemberDelia Maharani, Mesra Betty Yell
Vol. 4 No. 2 (2025): NovemberPutri Salfa Dhiyaa Azzizah, Mesra Betty Yel
Vol. 4 No. 2 (2025): November