MODIFIED POSSIBILISTIC FUZZY C-MEANS ALGORITHM FOR CLUSTERING INCOMPLETE DATA SETS
Keywords:Incomplete data, fuzzy clustering, possibilistic clustering, missing values imputation.
A possibilistic fuzzy c-means (PFCM) algorithm is a reliable algorithm proposed to deal with the weaknesses associated with handling noise sensitivity and coincidence clusters in fuzzy c-means (FCM) and possibilistic c-means (PCM). However, the PFCM algorithm is only applicable to complete data sets. Therefore, this research modified the PFCM for clustering incomplete data sets to OCSPFCM and NPSPFCM with the performance evaluated based on three aspects, 1) accuracy percentage, 2) the number of iterations, and 3) centroid errors. The results showed that the NPSPFCM outperforms the OCSPFCM with missing values ranging from 5% − 30% for all experimental data sets. Furthermore, both algorithms provide average accuracies between 97.75%−78.98% and 98.86%−92.49%, respectively.
L. Himmelspach. Fuzzy clustering of incomplete data. Ph.D. thesis, 2016.
J. C. Bezdek, R. Ehrlich, W. Full. Fcm: The fuzzy c-means clustering algorithm. Computers & Geosciences 10(2-3):191–203, 1984. doi:10.1016/0098-3004(84)90020-7.
R. Krishnapuram, J. M. Keller. A possibilistic approach to clustering. IEEE transactions on fuzzy systems 1(2):98–110, 1993. doi:10.1109/91.227387.
R. J. Hathaway, J. C. Bezdek. Fuzzy c-means clustering of incomplete data. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 31(5):735–744, 2001. doi:10.1109/3477.956035.
J. K. Dixon. Pattern recognition with partly missing data. IEEE Transactions on Systems, Man, and Cybernetics 9(10):617–621, 1979. doi:10.1109/TSMC.1979.4310090.
N. R. Pal, K. Pal, J. M. Keller, J. C. Bezdek. A possibilistic fuzzy c-means clustering algorithm. IEEE transactions on fuzzy systems 13(4):517–530, 2005. doi:10.1109/TFUZZ.2004.840099.
Y. Jiang, K. Zhao, K. Xia, et al. A novel distributed multitask fuzzy clustering algorithm for automatic mr brain image segmentation. Journal of medical systems 43(5):118, 2019. doi:10.1007/s10916-019-1245-1.
T. Ren, H. Wang, H. Feng, et al. Study on the improved fuzzy clustering algorithm and its application in brain image segmentation. Applied Soft Computing 81:105503, 2019. doi:10.1016/j.asoc.2019.105503.
N. X. Thao, M. Ali, F. Smarandache. An intuitionistic fuzzy clustering algorithm based on a new correlation coefficient with application in medical diagnosis. Journal of Intelligent & Fuzzy Systems 36(1):189–198, 2019. doi:10.3233/JIFS-181084.
Y. Li, J.-c. Fan, J.-S. Pan, et al. A novel rough fuzzy clustering algorithm with a new similarity measurement. Journal of Internet Technology 20(4):1145–1156, 2019. doi:10.3966/160792642019072004014.
I. Škrjanc, S. Blažic, E. Lughofer, D. Dovžan. Inner matrix norms in evolving cauchy possibilistic clustering for classification and regression from data streams. Information Sciences 478:540–563, 2019. doi:https://doi.org/10.1016/j.ins.2018.11.040.
A. Koutsibella, K. D. Koutroumbas. Stochastic gradient descent possibilistic clustering. In 11th Hellenic Conference on Artificial Intelligence, pp. 189–194. 2020. doi:10.1145/3411408.3411436.
L. Zhang, W. Lu, X. Liu, et al. Fuzzy c-means clustering of incomplete data based on probabilistic information granules of missing values. Knowledge-Based Systems 99:51–70, 2016. doi:10.1016/j.knosys.2016.01.048.
Rustam, A. Y. Gunawan, M. T. A. P. Kresnowati. The hard c-means algorithm for clustering indonesian plantation commodity based on metabolites composition. In Journal of Physics: Conference Series, vol. 1315, p. 012085. IOP Publishing, 2019. doi:10.1088/1742-6596/1315/1/012085.
X. L. Xie, G. Beni. A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence (8):841–847, 1991. doi:10.1109/34.85677.
R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of eugenics 7(2):179–188, 1936. doi:10.1111/j.1469-1809.1936.tb02137.x.
M. Forina, S. Lanteri, C. Armanino, et al. Parvus-an extendible package for data exploration, classification and correlation, institute of pharmaceutical and food analysis and technologies, via brigata salerno, 16147 genoa, italy (1988). Av Loss Av O set Av Hit-Rate 1991. doi:10.1002/cem.1180040210.
D. Dua, C. Graff. UCI machine learning repository 2017. http://archive.ics.uci.edu/ml.
Rustam, A. Y. Gunawan, M. T. A. P. Kresnowati. Artificial neural network approach for the identification of clove buds origin based on metabolites composition. Acta Polytechnica 60(5):440–447, 2020. doi:10.14311/AP.2020.60.0440.
M. K. Pakhira, S. Bandyopadhyay, U. Maulik. Validity index for crisp and fuzzy clusters. Pattern recognition 37(3):487–501, 2004. doi:10.1016/j.patcog.2003.06.005.
D. L. Davies, D. W. Bouldin. A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence (2):224–227, 1979. doi:10.1109/TPAMI.1979.4766909.
D. Zhang, M. Ji, J. Yang, et al. A novel cluster validity index for fuzzy clustering based on bipartite modularity. Fuzzy Sets and Systems 253:122–137, 2014. doi:10.1016/j.fss.2013.12.013.
R. N. Dave. Validating fuzzy partitions obtained through c-shells clustering. Pattern recognition letters 17(6):613–623, 1996. doi:10.1016/0167-8655(96)00026-8.
D.-Q. Zhang, S.-C. Chen. Clustering incomplete data using kernel-based fuzzy c-means algorithm. Neural processing letters 18(3):155–162, 2003. doi:10.1023/B:NEPL.0000011135.19145.1b.
Copyright (c) 2021 Rustam, Koredianto Usman, Mudyawati Kamaruddin, Dina Chamidah, Nopendri, Khaerudin Saleh, Yulinda Eliskar, Ismail Marzuki
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).