Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Auto Associative Extreme Learning Machine Based Hybrids for Data Imputation

Chandan Gautam, Vadlamani Ravi

Source Title: Handbook of Research on Intelligent Techniques and Modeling Applications in Marketing Analytics

DOI: 10.4018/978-1-5225-0997-4.ch005

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This chapter presents three novel hybrid techniques for data imputation viz., (1) Auto-associative Extreme Learning Machine (AAELM) with Principal Component Analysis (PCA) (PCA-AAELM), (2) Gray system theory (GST) + AAELM with PCA (Gray+PCA-AAELM), (3) AAELM with Evolving Clustering Method (ECM) (ECM-AAELM). Our prime concern is to remove the randomness in AAELM caused by the random weights with the help of ECM and PCA. This chapter also proposes local learning by invoking ECM as a preprocessor for AAELM. The proposed methods are tested on several regression, classification and bank datasets using 10 fold cross validation. The results, in terms of Mean Absolute Percentage Error (MAPE,) are compared with that of K-Means+Multilayer perceptron (MLP) imputation (Ankaiah & Ravi, 2011), K-Medoids+MLP, K-Means+GRNN, K-Medoids+GRNN (Nishanth & Ravi, 2013) PSO_Covariance imputation (Krishna & Ravi, 2013) and ECM-Imputation (Gautam & Ravi, 2014). It is concluded that the proposed methods achieved better imputation in most of the datasets as evidenced by the Wilcoxon signed rank test.

Chapter Preview

Top

Introduction

Missing data can be observed in many datasets, which have been collected in real time. It can occur due to many reasons like sometimes people don’t answers all query during surveydue to privacy or sometimes data entry operator leave blank space due to lack of concentration or some other reasons etc. Failure of any system or snsor nodes in wireless sensor network can also lead to missing data. Missing data is a very challenging issue in the field of analytics because the completeness and quality of the data always plays a crucial role in analyzing the available data. Replace the missing value by an appropriate value is called imputation. In general, data mining algorithms are not capable of handling data incompleteness on its own. So, it is necessary to impute those missing value by some appropriate vaue using some suitable data imputation algorithm (Ankaiah & Ravi, 2011; Abdella & Marwala, 2005; García & Kalenatic, 2011; Nishanth, Ravi, Ankaiah & Bose, 2012).

Kline (1988) proposed following procedure to handle missing data:

1.
Deletion procedure viz., Listwise deletion and Pairwise deletion (Song & Shepperd, 2007),
2.
Imputation procedure (Schafer, 1997),
3.
Model based procedure, and
4.
Machine learning methods.

The remainder of this chapter is organized as follows: first, a brief review of literature on imputation of missing data is presented. Further, proposed method is explained. Then, description of the dataset and Experimental design is described in next section. Results and discussions are presented in second last section and last section states about conclusion.

Top

Background

In case of numerical attributes, missing data can be handled in various ways. Numerous type of imputation is possible like: machine learning (ML) based, deletion of missing values, model based approaches etc. There are various ML based approaches like auto-associative neural network imputation with genetic algorithms (Abdella & Marwala, 2005), SOM (Merlin, Sorjamaa, Maillet & Lendasse, 2010), multi-layer perceptron (Gupta & Lam, 1996), K-Nearest Neighbor (Batista & Monard, 2002), fuzzy-neural network (Gabrys, 2002) etc. Batista and Monard (2002, 2003) and Jerez, Molina, Subirates and Franco (2006) employed K-nearest neighbour (K-NN) for handling missing data. Mutual K-NN method proposed by Liu and Zhang (2012) to classify noisy and incomplete data. For handling missing data, Samad and Harp (1992) employed SOM based approach, Austin and Escobar (2005) employed Monte Carlo simulations. Several studies employed Multi-layer perceptron (MLP) for imputation, we train MLP using data without missing attribute as autoassociative model and furthet pass data with missing attribute to trained model for imputation Sharpe and Solly (1995), Nordbotten (1996), Gupta and Lam (1996), Yoon and Lee (1999), Silva-Ramírez, Pino-Mejías, López-Coello and Cubiles-de-la-Vega (2011) and Nkuna and Odiyo (2011). The authors used MLP for data imputation. Auto-associative neural network (AANN) has also been employed for this task by keeping input and output variable identical (Marseguerra & Zoia, 2002; Marwala & Chakraverty, 2006). Ragel and Cremilleux (1999) employed Robust Association Rules Algorithm (RAR) to address multiple missing values in database. Chen, Huang, F. Tian and S. Tian (2008) proposed selective Bayes classifier to handle missing data. Fuzzy c-means algorithm has been employed by Nouvo (2011) to handle incomplete data. Principles of chaos theory has been employed by Elshorbagy, Simonovic and Panu (2002) to handle missing data in stream flow data. Expectation maximization (EM) algorithm has been employed by Dempster, Laird and Rubin (1977) to handle missing values in multivariate data. García and Kalenatic (2011) also proposed Genetic algorithm (GA) based approach to handle missing attribute in multivariate data. Ankaiah and Ravi (2011) handled missing data using hybrid method in two stages. In first stage, K-means has been employed and in second stage, MLP has been employed.

Key Terms in this Chapter

Evolving Clustering Method (ECM): A one-pass, fast clustering method based on normalized Euclidean distances. It can be applied in two modes: on-line and off-line mode. It yields results in just one pass only. It processes one data only one time, there is no iteration require to process the one time processed data gain and again.

Gray System Theory (GST): A method of Gray System Theory (GST) which measures the degree of similarity between two systems. Two things are needed to be calculated for GRA: Gray Relational Coefficient (GRC), Gray Relational Grade (GRG). A larger value of GRG indicates two systems or elements are more similar and smaller value indicates less similarity of the systems or elements.

Autoassociative Neural Network: Also called as autoencoder. An autoencoder has been generally used to learn representation from a dataset as well as for dimensionality reduction. In Autoassociative neural network, output is identical to input i.e. trying to reconstruct input at output layer.

Principal Component Analysis (PCA): A very popular dimensionality reduction technique. It converts correlated variable into linearly uncorrelated variable, which will be orthogonal to each other. Each principal component is a linear combination of the original variables i.e. correlated variables. So, it is not feature selection technique but dimensionality reduction technique.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference