Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Application of Convolution Neural Networks in Web Search Log Mining for Effective Web Document Clustering

Suruchi Chawla

Source Title: International Journal of Information Retrieval Research (IJIRR) 12(1)

DOI: 10.4018/IJIRR.300367

Article PDF Download Open access articles are freely available for download

Abstract

The volume of web search data stored in search engine log is increasing and has become big search log data. The web search log has been the source of data for mining based on web document clustering techniques to improve the efficiency and effectiveness of information retrieval. In this paper Deep Learning Model Convolution Neural Network(CNN) is used in big web search log data mining to learn the semantic representation of a document. These semantic documents vectors are clustered using K-means to group relevant documents for effective web document clustering. Experiment was done on the data set of web search query and associated clicked URLs to measure the quality of clusters based on document semantic representation using Deep learning model CNN. The clusters analysis was performed based on WCSS(the sum of squared distances of documents samples to their closest cluster center) and decrease in the WCSS in comparison to TF.IDF keyword based clusters confirm the effectiveness of CNN in web search log mining for effective web document clustering.

Article Preview

1. INTRODUCTION

The volume of web data is increasing rapidly every day and is responsible for the information overload problem. (Gantz & Reinsel ,2012)The artificial intelligence techniques have been applied to big data to obtain the abstract representation of the knowledge present in data for various applications.(Adomavicius & Tuzhilin, 2005)

Documents clustering techniques are used for improving the efficiency and effectiveness of Information retrieval. Use of partition document clustering for information retrieval improves the retrieval efficiency as the document collections are partitioned and queries are matched against cluster centroids only. The retrieval efficiency is achieved by reducing the number of query-document comparisons for IR, but there is decrease in retrieval effectiveness. Retrieval effectiveness is the percentage of relevant documents retrieved (Salton, & Buckley,1988) . Hybrid of optimization techniques like ACO as well as trust, Genetic Algorithm and Ontology have been used for effective personalized web search. ( Chawla ,2016 ; Chawla, 2018)

Deep learning models are widely used in big data mining to identify the abstract semantic feature from low level input data. The input data vector is passed through successive layers of non linear transformation to generate the high level semantic abstraction. These semantic representations of web documents and queries are used as effective source of knowledge for fast and effective information retrieval. Deep learning technique like convolution neural network has been used effectively to extract the semantic representation of web search queries and clicked documents. CNN proves to be effective in learning of semantic and patterns from queries, documents, users and items. ( Shen et al., 2014)

In (Xu, He & Li, 2018) convolution neural network is used to learn document as well as query semantic feature vector of low dimensionality for search as well as neural collaborative filtering models for recommendation. K-means has been simple and efficient for wide variety of data types. K-means has low computational requirements and store only documents, cluster membership of the documents and the cluster centroids. (DeFreitas, & Bernard, 2015)

In this paper deep learning model convolution neural network(CNN) is used in web query session mining to generate the abstract document semantic vector. The resulting semantic vectors are further clustered using K-means clustering to reveal search patterns of web users and is evaluated for clusters quality.

Experiment is conducted on the data set of web search query sessions for analyzing the effectiveness of deep learning model convolution neural network on the quality of cluster of web documents. The results of cluster analysis based on WCSS has been compared with TF.DF based clusters. The results show that WCSS decreases drastically for clustering using CNN based document representation therefore confirms the improvement in clusters quality using CNN based document semantic representation.

The organization of paper is as follows section 2 provides a detailed survey of related work, section 3 covers basic concepts used in the paper, section 4 provides the details of proposed work, section 5 explains the experimental study and in section 6 conclusion of paper is described.

Complete Article List

Search this Journal:

Reset

Volume 14: 1 Issue (2024)

Volume 13: 1 Issue (2023)

Volume 12: 4 Issues (2022): 3 Released, 1 Forthcoming

Volume 11: 4 Issues (2021)

Volume 10: 4 Issues (2020)

Volume 9: 4 Issues (2019)

Volume 8: 4 Issues (2018)

Volume 7: 4 Issues (2017)

Volume 6: 4 Issues (2016)

Volume 5: 4 Issues (2015)

Volume 4: 4 Issues (2014)

Volume 3: 4 Issues (2013)

Volume 2: 4 Issues (2012)

Volume 1: 4 Issues (2011)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Application of Convolution Neural Networks in Web Search Log Mining for Effective Web Document Clustering

Abstract

1. INTRODUCTION

Complete Article List