Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

A New SVM Reduction Strategy of Large-Scale Training Sample Sets

Fang Zhu, Junfang Wei, Tao Gao

Source Title: International Journal of Advanced Pervasive and Ubiquitous Computing (IJAPUC) 4(4)

DOI: 10.4018/japuc.2012100107

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

There has become a bottleneck to use support vector machine (SVM) due to the problems such as slow learning speed, large buffer memory requirement, low generalization performance and so on. These problems are caused by large-scale training sample set and outlier data immixed in the other class. Aiming at these problems, this paper proposed a new reduction strategy for large-scale training sample set according to analyzing on the structure of the training sample set based on the point set theory. By using fuzzy clustering method in this new strategy, the potential support vectors are obtained and the non-boundary outlier data immixed in the other class is removed. In view of reducing greatly the scale of the training sample set, it improves the generalization performance of SVM and effectively avoids over-learning. Finally, the experimental results shown the given reduction strategy can not only reduce the train samples of SVM and speed up the train process, but also ensure accuracy of classification.

Article Preview

Top

1. Introduction

Support vector machine is a kind of machine learning method which is put forward by Vapnik and others based on statistical learning theory. In view of its avoiding effectively local minimum value problem, good generalization performance and good classification accuracy, SVM has been applied more and more widely in pattern recognition, regression analysis and feature extraction for recent years, which has become an international new research hotspot in the field of artificial intelligence and machine learning. However, the big learning samples bring about slowly learning speed and large storage demand, which directly obstruct the SVM technique application. Moreover, for training the sample data mingled with outlier data in the relatively class of sample, it often can not improve the classification capability. On the contrary, it will greatly increase the burden of the training calculation, and may also cause over learning so as to increase the VC dimension of classification discriminant functions, which largen the confidence interval, finally affect the generalization of SVM. Therefore, it appears a lot of improved algorithm of support vector machine (Agarwal, 2002; Daniael & Cao, 2004; Luo, 2007; Xiao, Li, & Zhang, 2006; Li, Wang, & Yuan, 2003; Zeng, 2007; Tan & Ding, 2008; Cao, Liu, & Zhang, 2006).

The reduction strategy in the article Zeng (2007) and Cao, Liu, and Zhang (2006) is presented based on the idea of class center. After obtaining the clustering center of the positive and negative sample, it reduces the training sample by determining the provision radius relationship between the sample and the clustering center. But this method is suitable only for the sample set with convex set; In the article Xiao, Li, and Zhang (2006), it makes restriction on the training sample by C-mean clustering method, that if the all samples of a group are from the same class, a clustering center instead, otherwise, reserving all the samples of the group. Therefore, that method is able to reduce effectively the non-convex training sample set, but when the cluster number less than 1/20 of the sample, the reduction effect is not obvious. For the larger sample set, with the increase of the cluster number, it will increase the cost of calculation time against the sample reduction, the method does not have practical significance; The NN-SVM algorithm proposed in the article Li, Wang, and Yuan (2003) is according to the similarities between the nearest class with each sample to determine accepting or rejecting. This method can not only reduce the size of the samples but also reduce the SVM generalization performance influence caused by outlier data, but it will spend a lot of time when looking for the nearest point of each sample points. For the larger sample set terms, the algorithm efficiency is extremely low, which also lost practical significance; Another reduction PSCC strategy is presented in article Luo (2007), that according to the geometry characteristic of the training sample which linear divided in the high dimension space to realize the reduction of samples, by calculating the angle between per sample and positive-negative clustering center of attachment in the high dimensional space. That is a kind of algorithm with practical significance, but which requires a lot of nuclear calculation, so the efficiency is not high.

In view of the above analysis of the basic idea of the existing improved algorithm and their advantages and disadvantages, this paper put forward a new large scale training samples reduction strategy based on the theory of point set for support vector machine (SVM). Aiming at the large scale training samples mingled with the class outlier data, it can effectively reduce sample scale and the influence of the classification discriminant functions caused by the outlier data mixed in the relative class, so as to increase the training speed without affecting the SVM classification performance.

Complete Article List

Search this Journal:

Reset

Open Access Articles: Forthcoming

Volume 11: 4 Issues (2019)

Volume 10: 4 Issues (2018)

Volume 9: 4 Issues (2017)

Volume 8: 4 Issues (2016)

Volume 7: 4 Issues (2015)

Volume 6: 4 Issues (2014)

Volume 5: 4 Issues (2013)

Volume 4: 4 Issues (2012)

Volume 3: 4 Issues (2011)

Volume 2: 4 Issues (2010)

Volume 1: 4 Issues (2009)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

A New SVM Reduction Strategy of Large-Scale Training Sample Sets

Abstract

1. Introduction

Complete Article List