MODIFIED POSSIBILISTIC FUZZY C-MEANS ALGORITHM FOR CLUSTERING INCOMPLETE DATA SETS
DOI:
https://doi.org/10.14311/AP.2021.61.0364Keywords:
Incomplete data, fuzzy clustering, possibilistic clustering, missing values imputation.Abstract
A possibilistic fuzzy c-means (PFCM) algorithm is a reliable algorithm proposed to deal with the weaknesses associated with handling noise sensitivity and coincidence clusters in fuzzy c-means (FCM) and possibilistic c-means (PCM). However, the PFCM algorithm is only applicable to complete data sets. Therefore, this research modified the PFCM for clustering incomplete data sets to OCSPFCM and NPSPFCM with the performance evaluated based on three aspects, 1) accuracy percentage, 2) the number of iterations, and 3) centroid errors. The results showed that the NPSPFCM outperforms the OCSPFCM with missing values ranging from 5% − 30% for all experimental data sets. Furthermore, both algorithms provide average accuracies between 97.75%−78.98% and 98.86%−92.49%, respectively.
Downloads
References
L. Himmelspach. Fuzzy clustering of incomplete data. Ph.D. thesis, 2016.
J. C. Bezdek, R. Ehrlich, W. Full. Fcm: The fuzzy c-means clustering algorithm. Computers & Geosciences 10(2-3):191–203, 1984. doi:10.1016/0098-3004(84)90020-7.
R. Krishnapuram, J. M. Keller. A possibilistic approach to clustering. IEEE transactions on fuzzy systems 1(2):98–110, 1993. doi:10.1109/91.227387.
R. J. Hathaway, J. C. Bezdek. Fuzzy c-means clustering of incomplete data. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 31(5):735–744, 2001. doi:10.1109/3477.956035.
J. K. Dixon. Pattern recognition with partly missing data. IEEE Transactions on Systems, Man, and Cybernetics 9(10):617–621, 1979. doi:10.1109/TSMC.1979.4310090.
N. R. Pal, K. Pal, J. M. Keller, J. C. Bezdek. A possibilistic fuzzy c-means clustering algorithm. IEEE transactions on fuzzy systems 13(4):517–530, 2005. doi:10.1109/TFUZZ.2004.840099.
Y. Jiang, K. Zhao, K. Xia, et al. A novel distributed multitask fuzzy clustering algorithm for automatic mr brain image segmentation. Journal of medical systems 43(5):118, 2019. doi:10.1007/s10916-019-1245-1.
T. Ren, H. Wang, H. Feng, et al. Study on the improved fuzzy clustering algorithm and its application in brain image segmentation. Applied Soft Computing 81:105503, 2019. doi:10.1016/j.asoc.2019.105503.
N. X. Thao, M. Ali, F. Smarandache. An intuitionistic fuzzy clustering algorithm based on a new correlation coefficient with application in medical diagnosis. Journal of Intelligent & Fuzzy Systems 36(1):189–198, 2019. doi:10.3233/JIFS-181084.
Y. Li, J.-c. Fan, J.-S. Pan, et al. A novel rough fuzzy clustering algorithm with a new similarity measurement. Journal of Internet Technology 20(4):1145–1156, 2019. doi:10.3966/160792642019072004014.
I. Škrjanc, S. Blažic, E. Lughofer, D. Dovžan. Inner matrix norms in evolving cauchy possibilistic clustering for classification and regression from data streams. Information Sciences 478:540–563, 2019. doi:https://doi.org/10.1016/j.ins.2018.11.040.
A. Koutsibella, K. D. Koutroumbas. Stochastic gradient descent possibilistic clustering. In 11th Hellenic Conference on Artificial Intelligence, pp. 189–194. 2020. doi:10.1145/3411408.3411436.
L. Zhang, W. Lu, X. Liu, et al. Fuzzy c-means clustering of incomplete data based on probabilistic information granules of missing values. Knowledge-Based Systems 99:51–70, 2016. doi:10.1016/j.knosys.2016.01.048.
Rustam, A. Y. Gunawan, M. T. A. P. Kresnowati. The hard c-means algorithm for clustering indonesian plantation commodity based on metabolites composition. In Journal of Physics: Conference Series, vol. 1315, p. 012085. IOP Publishing, 2019. doi:10.1088/1742-6596/1315/1/012085.
X. L. Xie, G. Beni. A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence (8):841–847, 1991. doi:10.1109/34.85677.
R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of eugenics 7(2):179–188, 1936. doi:10.1111/j.1469-1809.1936.tb02137.x.
M. Forina, S. Lanteri, C. Armanino, et al. Parvus-an extendible package for data exploration, classification and correlation, institute of pharmaceutical and food analysis and technologies, via brigata salerno, 16147 genoa, italy (1988). Av Loss Av O set Av Hit-Rate 1991. doi:10.1002/cem.1180040210.
D. Dua, C. Graff. UCI machine learning repository 2017. http://archive.ics.uci.edu/ml.
Rustam, A. Y. Gunawan, M. T. A. P. Kresnowati. Artificial neural network approach for the identification of clove buds origin based on metabolites composition. Acta Polytechnica 60(5):440–447, 2020. doi:10.14311/AP.2020.60.0440.
M. K. Pakhira, S. Bandyopadhyay, U. Maulik. Validity index for crisp and fuzzy clusters. Pattern recognition 37(3):487–501, 2004. doi:10.1016/j.patcog.2003.06.005.
D. L. Davies, D. W. Bouldin. A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence (2):224–227, 1979. doi:10.1109/TPAMI.1979.4766909.
D. Zhang, M. Ji, J. Yang, et al. A novel cluster validity index for fuzzy clustering based on bipartite modularity. Fuzzy Sets and Systems 253:122–137, 2014. doi:10.1016/j.fss.2013.12.013.
R. N. Dave. Validating fuzzy partitions obtained through c-shells clustering. Pattern recognition letters 17(6):613–623, 1996. doi:10.1016/0167-8655(96)00026-8.
D.-Q. Zhang, S.-C. Chen. Clustering incomplete data using kernel-based fuzzy c-means algorithm. Neural processing letters 18(3):155–162, 2003. doi:10.1023/B:NEPL.0000011135.19145.1b.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 Rustam, Koredianto Usman, Mudyawati Kamaruddin, Dina Chamidah, Nopendri, Khaerudin Saleh, Yulinda Eliskar, Ismail Marzuki

This work is licensed under a Creative Commons Attribution 4.0 International License.