Evaluation of data analytics based clustering algorithms for knowledge mining in a student engagement data
No Thumbnail Available
Files
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Description
The application of algorithms based on data analytics for the task of knowledge mining in a student dataset is an
important strategy for improving learning outcomes, student success and supporting strategic decision making in higher educa�tional institutions of learning. However, the widely used data analytics based clustering algorithms are highly data dependent,
making it pertinent to find the most effective algorithm for knowledge mining in a dataset associated with student engage�ment. In this study, performances of five famous clustering algorithms are evaluated for this purpose. The k-means algorithm
was benchmarked with 22 distance functions based on the Silhouette index, Dunn’s index and partition entropy internal valid�ity metrics. The hierarchical clustering algorithm was benchmarked with the Cophenetic correlation coefficient computed for
different combinations of distance and linkage functions. The Fuzzy c-means algorithm was benchmarked with the partition
entropy, partition coefficient, Silhouette index and modified partition coefficient. The k-nearest neighbor algorithm was applied
to determine the optimum epsilon value for the density-based spatial clustering of applications with noise. The default param�eter settings were accepted for the expectation-maximization algorithm. The overall ranking of the clustering algorithms was
based on cluster potentiality using the median deviation statistics. The results of the evaluation show the well-known k-means
algorithm to have the highest cluster potentiality, demonstrating its effectiveness for the task of knowledge mining in a student
engagement dataset
Keywords
Q Science (General)