Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Detection Approaches for Categorization of Spam and Legitimate E-Mail

Rachnana Dubey, Jay Prakash Maurya, R. S. Thakur

Source Title: Handbook of Research on Pattern Engineering System Development for Big Data Analytics

DOI: 10.4018/978-1-5225-3870-7.ch016

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The internet has become very popular, and the concept of electronic mail has made it easy and cheap to communicate with many people. But, many undesired mails are also received by users and the higher percentage of these e-mails is termed spam. The goal of spam classification is to distinguish between spam and legitimate e-mail messages. But, with the popularization of the internet, it is challenging to develop spam filters that can effectively eliminate the increasing volumes of unwanted e-mails automatically before they enter a user's mailbox. The main objective of this chapter is to examine and identify the best detection approach for spam categorization. Different types of algorithms and data mining models are proposed, implemented, and evaluated on data sets. For improvement of spam filtering technique, the authors analyze the methods of feature selection and give recommendations of their use. The chapter concludes that the data mining models using a combination of supervised learning algorithms provide better results than single data models.

Chapter Preview

Top

Introduction

E-mail is the most powerful medium of today communication. But E-mail spam is one the major problem for internet user. Every user is facing this problem on his day to day communication. Along with the growth of E-mail communication, spam’s are also continuously growing day by day. Spamming is of electronic communication systems to send unsought bulk messages or to push merchandise or services, that area unit nearly universally unwanted. Many problems arise due to spam mail; one of the major problem is many companies faces big financial loss (AnirudhRama, 2006). Another problem is that user needs to spend time on checking and deleting spam from their inbox. In addition, due to spam E-mails may contain malicious software (i.e. phishing software), illegal advertising, such as image schemes and attractive information, it has become a serious security issue on internet. The one of the best solution for solving spam issue is data mining with machine learning algorithm (Nema et al., 2016). Data mining as the approach for finding the spam type (spam or legitimate) text patterns from large amount of data through machine learning (Yadav et al., 2016), discover the similar pattern which are adopted by smart spammers as Shown in Figure 1.

Figure 1.

Flow Chart to find out spam

Five algorithms have been used for spam and legitimate categorization. The algorithms results are based on supervised learning algorithms (Naïve Bayes, Random Forest, Random tree, Bagging and Boosting). Moreover, Support Vector Machine can be used for spam categorization. Support vector machine is the supervised learning algorithm. SVM works on linear separable in different feature levels. In this proposed work, machine learning algorithm is evaluated using WEKA, Rapid Minor and SVM tool for finding accuracy, efficiency of classifiers and various types of errors. We have analyzed the most effective categorization methodology on bench mark dataset. This comprises 9324 records and 500 instances (70% for Training and 30% for Testing) to make the model. We described approaches and learning models for eliminate bulky commercial mails, malicious code, fraud E-mails. The main aim is to finding the unwanted keyword, which are mostly using for spam (Battista, 2011).

Key Terms in this Chapter

Optimization: Optimization is the process of adjusting a trading system in an attempt to make it more effective.

Relative Absolute Error: The absolute error is the magnitude of the difference between the exact value and the approximation.

Mean Absolute Error: The mean absolute error (MAE) is a quantity used to measure how close predictions are to the eventual outcomes.

Mean Squared Error: The difference between the estimator and what is estimated.

Categorization: Is a process where the objects are understood, recognized, and differentiated.

Legitimate: According to law.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Detection Approaches for Categorization of Spam and Legitimate E-Mail

Abstract

Introduction

Key Terms in this Chapter

Complete Chapter List