Application of Convolution Neural Networks in Web Search Log Mining for Effective Web Document Clustering

Application of Convolution Neural Networks in Web Search Log Mining for Effective Web Document Clustering

Suruchi Chawla
Copyright: © 2022 |Pages: 14
DOI: 10.4018/IJIRR.300367
Article PDF Download
Open access articles are freely available for download

Abstract

The volume of web search data stored in search engine log is increasing and has become big search log data. The web search log has been the source of data for mining based on web document clustering techniques to improve the efficiency and effectiveness of information retrieval. In this paper Deep Learning Model Convolution Neural Network(CNN) is used in big web search log data mining to learn the semantic representation of a document. These semantic documents vectors are clustered using K-means to group relevant documents for effective web document clustering. Experiment was done on the data set of web search query and associated clicked URLs to measure the quality of clusters based on document semantic representation using Deep learning model CNN. The clusters analysis was performed based on WCSS(the sum of squared distances of documents samples to their closest cluster center) and decrease in the WCSS in comparison to TF.IDF keyword based clusters confirm the effectiveness of CNN in web search log mining for effective web document clustering.
Article Preview

1. INTRODUCTION

The volume of web data is increasing rapidly every day and is responsible for the information overload problem. (Gantz & Reinsel ,2012)The artificial intelligence techniques have been applied to big data to obtain the abstract representation of the knowledge present in data for various applications.(Adomavicius & Tuzhilin, 2005)

Documents clustering techniques are used for improving the efficiency and effectiveness of Information retrieval. Use of partition document clustering for information retrieval improves the retrieval efficiency as the document collections are partitioned and queries are matched against cluster centroids only. The retrieval efficiency is achieved by reducing the number of query-document comparisons for IR, but there is decrease in retrieval effectiveness. Retrieval effectiveness is the percentage of relevant documents retrieved (Salton, & Buckley,1988) . Hybrid of optimization techniques like ACO as well as trust, Genetic Algorithm and Ontology have been used for effective personalized web search. ( Chawla ,2016 ; Chawla, 2018)

Deep learning models are widely used in big data mining to identify the abstract semantic feature from low level input data. The input data vector is passed through successive layers of non linear transformation to generate the high level semantic abstraction. These semantic representations of web documents and queries are used as effective source of knowledge for fast and effective information retrieval. Deep learning technique like convolution neural network has been used effectively to extract the semantic representation of web search queries and clicked documents. CNN proves to be effective in learning of semantic and patterns from queries, documents, users and items. ( Shen et al., 2014)

In (Xu, He & Li, 2018) convolution neural network is used to learn document as well as query semantic feature vector of low dimensionality for search as well as neural collaborative filtering models for recommendation. K-means has been simple and efficient for wide variety of data types. K-means has low computational requirements and store only documents, cluster membership of the documents and the cluster centroids. (DeFreitas, & Bernard, 2015)

In this paper deep learning model convolution neural network(CNN) is used in web query session mining to generate the abstract document semantic vector. The resulting semantic vectors are further clustered using K-means clustering to reveal search patterns of web users and is evaluated for clusters quality.

Experiment is conducted on the data set of web search query sessions for analyzing the effectiveness of deep learning model convolution neural network on the quality of cluster of web documents. The results of cluster analysis based on WCSS has been compared with TF.DF based clusters. The results show that WCSS decreases drastically for clustering using CNN based document representation therefore confirms the improvement in clusters quality using CNN based document semantic representation.

The organization of paper is as follows section 2 provides a detailed survey of related work, section 3 covers basic concepts used in the paper, section 4 provides the details of proposed work, section 5 explains the experimental study and in section 6 conclusion of paper is described.

Complete Article List

Search this Journal:
Reset
Volume 14: 1 Issue (2024)
Volume 13: 1 Issue (2023)
Volume 12: 4 Issues (2022): 3 Released, 1 Forthcoming
Volume 11: 4 Issues (2021)
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing