Modified K-Nearest Neighbour Using Proposed Similarity Fuzzy Measure for Missing Data Imputation on Medical Datasets (MKNNMBI)

Modified K-Nearest Neighbour Using Proposed Similarity Fuzzy Measure for Missing Data Imputation on Medical Datasets (MKNNMBI)

B. Mathura Bai, Mangathayaru N., Padmaja Rani B.
Copyright: © 2022 |Pages: 15
DOI: 10.4018/IJFSA.306278
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Early disease diagnosis is a burning problem in health sector, medical domain and disease management. During analysis, quality of the data can be achieved only if the data is complete. Missing values reduces the efficiency of data analysis task. Researchers proposed various imputation methods but always there was a need for a better imputation method. This paper objective is to propose a method for imputation using proposed similarity fuzzy measure through which we can impute missing values by finding k similar instances called as Modified k-Nearest Neighbour for imputation of missing data (MKNNMBI). The proposed imputation method outperformed when compared with other existing imputation methods MV EM, MV BPCA, MV Ignore, MV KMeans, MV FKMeans, MV KNN, MV MC, MV WKNNimpute, MV SVDimpute, MV SVMimpute, CBC-IM-FUZZY. These imputation methods were studied on different benchmark datasets and tested for performance on different classifiers like C4.5, SVM, kNN, NB and found that the proposed method leads to accurate imputation and improves the accuracy.
Article Preview
Top

1. Introduction

Decision making must be accurate especially in medical and health sector. A critical decision-making system needs complete information otherwise degrades if the information goes missing by misinterpreting the decisions. To handle missing values, (Khan et al., 2013) proposed a medical decision system. Now-a-days in all fields, sophisticated applications have widely been used which collects huge quantity of data on daily-basis. Storage, analysis, mining such big datum needs computational intelligence techniques and data science analysis tools. The author in (Fernandez-Delgado et al., 2014) has done an exhaustive experiment on different datasets with various classifiers using many data analysis tools like R, Weka, C and Matlab. The performance of such data analysis tools is affected due to various issues. More attention is needed for handling such challenges by the data analysts for better analysis. The commonly occurring challenges or issues during data analysis and machine learning tasks are clearly explored in (Zhang et al., 2003) (Bai et al., 2015) (Zhu & Li, 2016) (Li & Ren, 2015). One such most important challenge is missing values. Missing values pose a hidden and unpredictable challenge which needs to be addressed. Missing values (Allison, 2001) are inevitable in real world data collected from different application domains. These applications use data mining, machine learning techniques to either impute or ignore such values. The possible reasons for missing values can be because of faulty devices, man-made mistakes, inaccurate or inconsistent entries, inadequate measurements, unanswered sensitive queries during survey etc. The existence of such missing values results in biased decisions affecting the accuracy of prediction. Incorrect data analysis or decisions may have severe consequences in medical domains, health sector (Gomila & Clark, 2020) (Stiglic et al., 2019), various financial applications etc. The possible ways to solve missing value could either ignore instances having missing data. Another replace the missing data with the approximate data called as parameter estimation so that correct decision making can be done. Ignoring the instances with missing values is an often-used simple method but it reduces the data thus affecting the learning process. Missing values-Ignore method affects the performance of the prediction model and leads to inaccurate decisions (Stiglic et al., 2019).The parameter estimation method called as model based imputation methods like EM algorithm is sensitive to outliers. The best alternative method would be to impute the missing values using Machine Learning (ML) based method (Lakshminarayan et al., 1999) (Little & Rubin, 2019). Such a process is treated as a data cleaning task in data analysis and machine learning during pre-processing phase. The process of inferring the missing value based on the existing data is called as missing data imputation (Myrtveit et al., 2001). Most of the data mining and machine learning algorithms need a complete dataset for knowledge extraction, pattern recognition and decision making. Researchers have proposed various missing data imputation methods for data analysis tasks like classification, regression, clustering etc.

Complete Article List

Search this Journal:
Reset
Volume 13: 1 Issue (2024)
Volume 12: 1 Issue (2023)
Volume 11: 4 Issues (2022)
Volume 10: 4 Issues (2021)
Volume 9: 4 Issues (2020)
Volume 8: 4 Issues (2019)
Volume 7: 4 Issues (2018)
Volume 6: 4 Issues (2017)
Volume 5: 4 Issues (2016)
Volume 4: 4 Issues (2015)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing