MODIFIED POSSIBILISTIC FUZZY C-MEANS ALGORITHM FOR CLUSTERING INCOMPLETE DATA SETS

Authors

  • Rustam Telkom University, School of Electrical Engineering, Department of Telecommunication Engineering, Jl. Telekomunikasi No.1 Dayeuh Kolot, 40257 Kabupaten Bandung, Jawa Barat, Indonesia https://orcid.org/0000-0001-8331-5793
  • Koredianto Usman Telkom University, School of Electrical Engineering, Department of Telecommunication Engineering, Jl. Telekomunikasi No.1 Dayeuh Kolot, 40257 Kabupaten Bandung, Jawa Barat, Indonesia https://orcid.org/0000-0002-5228-1348
  • Mudyawati Kamaruddin Universitas Muhammadiyah Semarang, Faculty of Health Sciences, Semarang, Jawa Tengah, Indonesia https://orcid.org/0000-0001-6932-1150
  • Dina Chamidah Universitas Wijaya Kusuma Surabaya, Faculty of Language and Science, Department of Biology Education, Surabaya, Jawa Timur, Indonesia https://orcid.org/0000-0001-9353-456X
  • Nopendri Telkom University, School of Industrial Engineering, Department of Industrial Engineering, Jawa Barat, Indonesia https://orcid.org/0000-0001-9641-677X
  • Khaerudin Saleh Telkom University, School of Electrical Engineering, Department of Telecommunication Engineering, Jl. Telekomunikasi No.1 Dayeuh Kolot, 40257 Kabupaten Bandung, Jawa Barat, Indonesia https://orcid.org/0000-0002-2688-070X
  • Yulinda Eliskar Telkom University, School of Electrical Engineering, Department of Telecommunication Engineering, Jl. Telekomunikasi No.1 Dayeuh Kolot, 40257 Kabupaten Bandung, Jawa Barat, Indonesia https://orcid.org/0000-0002-7698-1445
  • Ismail Marzuki Fajar University, Department of Chemical Engineering, Makassar, Sulawesi Selatan, Indonesia https://orcid.org/0000-0003-3316-0484

DOI:

https://doi.org/10.14311/AP.2021.61.0364

Keywords:

Incomplete data, fuzzy clustering, possibilistic clustering, missing values imputation.

Abstract

A possibilistic fuzzy c-means (PFCM) algorithm is a reliable algorithm proposed to deal with the weaknesses associated with handling noise sensitivity and coincidence clusters in fuzzy c-means (FCM) and possibilistic c-means (PCM). However, the PFCM algorithm is only applicable to complete data sets. Therefore, this research modified the PFCM for clustering incomplete data sets to OCSPFCM and NPSPFCM with the performance evaluated based on three aspects, 1) accuracy percentage, 2) the number of iterations, and 3) centroid errors. The results showed that the NPSPFCM outperforms the OCSPFCM with missing values ranging from 5% − 30% for all experimental data sets. Furthermore, both algorithms provide average accuracies between 97.75%−78.98% and 98.86%−92.49%, respectively.

References

L. Himmelspach. Fuzzy clustering of incomplete data. Ph.D. thesis, 2016.

J. C. Bezdek, R. Ehrlich, W. Full. Fcm: The fuzzy c-means clustering algorithm. Computers & Geosciences 10(2-3):191–203, 1984. doi:10.1016/0098-3004(84)90020-7.

R. Krishnapuram, J. M. Keller. A possibilistic approach to clustering. IEEE transactions on fuzzy systems 1(2):98–110, 1993. doi:10.1109/91.227387.

R. J. Hathaway, J. C. Bezdek. Fuzzy c-means clustering of incomplete data. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 31(5):735–744, 2001. doi:10.1109/3477.956035.

J. K. Dixon. Pattern recognition with partly missing data. IEEE Transactions on Systems, Man, and Cybernetics 9(10):617–621, 1979. doi:10.1109/TSMC.1979.4310090.

N. R. Pal, K. Pal, J. M. Keller, J. C. Bezdek. A possibilistic fuzzy c-means clustering algorithm. IEEE transactions on fuzzy systems 13(4):517–530, 2005. doi:10.1109/TFUZZ.2004.840099.

Y. Jiang, K. Zhao, K. Xia, et al. A novel distributed multitask fuzzy clustering algorithm for automatic mr brain image segmentation. Journal of medical systems 43(5):118, 2019. doi:10.1007/s10916-019-1245-1.

T. Ren, H. Wang, H. Feng, et al. Study on the improved fuzzy clustering algorithm and its application in brain image segmentation. Applied Soft Computing 81:105503, 2019. doi:10.1016/j.asoc.2019.105503.

N. X. Thao, M. Ali, F. Smarandache. An intuitionistic fuzzy clustering algorithm based on a new correlation coefficient with application in medical diagnosis. Journal of Intelligent & Fuzzy Systems 36(1):189–198, 2019. doi:10.3233/JIFS-181084.

Y. Li, J.-c. Fan, J.-S. Pan, et al. A novel rough fuzzy clustering algorithm with a new similarity measurement. Journal of Internet Technology 20(4):1145–1156, 2019. doi:10.3966/160792642019072004014.

I. Škrjanc, S. Blažic, E. Lughofer, D. Dovžan. Inner matrix norms in evolving cauchy possibilistic clustering for classification and regression from data streams. Information Sciences 478:540–563, 2019. doi:https://doi.org/10.1016/j.ins.2018.11.040.

A. Koutsibella, K. D. Koutroumbas. Stochastic gradient descent possibilistic clustering. In 11th Hellenic Conference on Artificial Intelligence, pp. 189–194. 2020. doi:10.1145/3411408.3411436.

L. Zhang, W. Lu, X. Liu, et al. Fuzzy c-means clustering of incomplete data based on probabilistic information granules of missing values. Knowledge-Based Systems 99:51–70, 2016. doi:10.1016/j.knosys.2016.01.048.

Rustam, A. Y. Gunawan, M. T. A. P. Kresnowati. The hard c-means algorithm for clustering indonesian plantation commodity based on metabolites composition. In Journal of Physics: Conference Series, vol. 1315, p. 012085. IOP Publishing, 2019. doi:10.1088/1742-6596/1315/1/012085.

X. L. Xie, G. Beni. A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence (8):841–847, 1991. doi:10.1109/34.85677.

R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of eugenics 7(2):179–188, 1936. doi:10.1111/j.1469-1809.1936.tb02137.x.

M. Forina, S. Lanteri, C. Armanino, et al. Parvus-an extendible package for data exploration, classification and correlation, institute of pharmaceutical and food analysis and technologies, via brigata salerno, 16147 genoa, italy (1988). Av Loss Av O set Av Hit-Rate 1991. doi:10.1002/cem.1180040210.

D. Dua, C. Graff. UCI machine learning repository 2017. http://archive.ics.uci.edu/ml.

Rustam, A. Y. Gunawan, M. T. A. P. Kresnowati. Artificial neural network approach for the identification of clove buds origin based on metabolites composition. Acta Polytechnica 60(5):440–447, 2020. doi:10.14311/AP.2020.60.0440.

M. K. Pakhira, S. Bandyopadhyay, U. Maulik. Validity index for crisp and fuzzy clusters. Pattern recognition 37(3):487–501, 2004. doi:10.1016/j.patcog.2003.06.005.

D. L. Davies, D. W. Bouldin. A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence (2):224–227, 1979. doi:10.1109/TPAMI.1979.4766909.

D. Zhang, M. Ji, J. Yang, et al. A novel cluster validity index for fuzzy clustering based on bipartite modularity. Fuzzy Sets and Systems 253:122–137, 2014. doi:10.1016/j.fss.2013.12.013.

R. N. Dave. Validating fuzzy partitions obtained through c-shells clustering. Pattern recognition letters 17(6):613–623, 1996. doi:10.1016/0167-8655(96)00026-8.

D.-Q. Zhang, S.-C. Chen. Clustering incomplete data using kernel-based fuzzy c-means algorithm. Neural processing letters 18(3):155–162, 2003. doi:10.1023/B:NEPL.0000011135.19145.1b.

Downloads

Published

2021-04-30

Issue

Section

Articles