Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

An Algorithm for Multi-Domain Website Classification

Mohammad Aman Ullah, Anika Tahrin, Sumaiya Marjan

Source Title: International Journal of Web-Based Learning and Teaching Technologies (IJWLTT) 15(4)

DOI: 10.4018/IJWLTT.2020100104

Article PDF Download Open access articles are freely available for download

Abstract

The web is the largest world-wide communication system of computers. The web has local, academic, commercial and government sites. As the types of websites increases in numbers, the cost and accuracy of manual classification became cumbersome and cannot satisfy the increasing internet service demands, thereby automated classification became important for better and more accurate search engine results. Therefore, this research has proposed an algorithm for classifying different websites automatically by using randomly collected textual data from the webpages. This research also contributed ten dictionaries covering different domains and used as training data in the classification process. Finally, the classification was carried out using the proposed and Naïve Bayes algorithms and found the proposed algorithm outperformed on the scale of accuracy by 1.25%. This research suggests that the proposed algorithm could be applied to any number of domains if the related dictionaries are available.

Article Preview

Top

Introduction

A website may be an assortment of web content, images, videos or alternative digital assets that are hosted on one or more internet server sometimes accessible via the net. Websites are frequently devoted to a selected issue, starting from diversion and social networking to providing news and education. With the intensification in the variety of sites, the requirement for website classification gains attraction (Wang et al., 2010). Website classification is a very challenging issue and needs human expertise if it is done manually. The work cost of these standard classifications is also winding up progressively high, and this classification has turned out to be gradually troublesome (Deng, 2012). To overcome the usual classification problem of the websites, many machine learning algorithms such as naive Bayes, support vector machine, random forest, etc. have been used by the researchers in their works. In most of the research works, classic algorithms were used for the classification and classified only single domain.

This research has proposed an algorithm for classifying different websites automatically by using randomly collected textual data from the web pages. This research also contributed ten dictionaries covering different domains and used as training data in the classification process. The classification was done by both the proposed and Naïve Bayes algorithm and found the proposed algorithm outperform the naïve Bayes on the scale of accuracy by 1.25%. This study suggests that the proposed algorithm could be applied to any number of domains provided that the related dictionaries are available.

Therefore, the contributions of this research are:

1.
Proposal of an algorithm to classify the websites of different domains such as food, business, education, shopping, travel, and social media, etc.;
2.
Creation of different dictionaries to characterize the said domains;
3.
Improvement of the accuracy of web search.

This paper is structured as follows: section 2 includes a narrative of related works; section 3 represents the problem Statement. In Section 4, the description of the methodology is provided. Section 5 contains the details of data collection and preprocessing. Section 6 includes description regarding experiments and proposed algorithm. Section 7 is all about experiment results and analysis. The comparison is discussed in section 8. Finally, in section 9 conclusions and future work directions are discussed.

Top

Most of the work done so far emphasized the classification using classic classifier and classify at most two to three domains. (Patil et al., 2012) applied a Naïve Bayes algorithm to categorize the websites using the content of the homepages. As per them, web pages could be classified to a more specific category using different feature sets. (Roul et al., 2014) have classified the Web Document using the Association Mining technique. The classification was done using the frequent itemsets created by the Frequent Pattern (FP) Growth algorithm. Final classification was done on the feature set by Naïve Bayes classifier. A simple method was proposed by (Slamet et al., 2018) for web scraping to find the job vacancy from the Search Engine using Naïve Bayes classifier. (Klassen et al.,2010) works on Web document classification by keywords using random forests, their experiment showed that, increasing in domain reduces the accuracy of the classifier.

Complete Article List

Search this Journal:

Reset

Volume 19: 1 Issue (2024)

Volume 18: 2 Issues (2023)

Volume 17: 8 Issues (2022)

Volume 16: 6 Issues (2021)

Volume 15: 4 Issues (2020)

Volume 14: 4 Issues (2019)

Volume 13: 4 Issues (2018)

Volume 12: 4 Issues (2017)

Volume 11: 4 Issues (2016)

Volume 10: 4 Issues (2015)

Volume 9: 4 Issues (2014)

Volume 8: 4 Issues (2013)

Volume 7: 4 Issues (2012)

Volume 6: 4 Issues (2011)

Volume 5: 4 Issues (2010)

Volume 4: 4 Issues (2009)

Volume 3: 4 Issues (2008)

Volume 2: 4 Issues (2007)

Volume 1: 4 Issues (2006)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

An Algorithm for Multi-Domain Website Classification

Abstract

Introduction

Complete Article List

An Algorithm for Multi-Domain Website Classification

Abstract

Introduction

Related Work

Complete Article List