1. INTRODUCTION
The volume of web data is increasing rapidly every day and is responsible for the information overload problem. (Gantz & Reinsel ,2012)The artificial intelligence techniques have been applied to big data to obtain the abstract representation of the knowledge present in data for various applications.(Adomavicius & Tuzhilin, 2005)
Documents clustering techniques are used for improving the efficiency and effectiveness of Information retrieval. Use of partition document clustering for information retrieval improves the retrieval efficiency as the document collections are partitioned and queries are matched against cluster centroids only. The retrieval efficiency is achieved by reducing the number of query-document comparisons for IR, but there is decrease in retrieval effectiveness. Retrieval effectiveness is the percentage of relevant documents retrieved (Salton, & Buckley,1988) . Hybrid of optimization techniques like ACO as well as trust, Genetic Algorithm and Ontology have been used for effective personalized web search. ( Chawla ,2016 ; Chawla, 2018)
Deep learning models are widely used in big data mining to identify the abstract semantic feature from low level input data. The input data vector is passed through successive layers of non linear transformation to generate the high level semantic abstraction. These semantic representations of web documents and queries are used as effective source of knowledge for fast and effective information retrieval. Deep learning technique like convolution neural network has been used effectively to extract the semantic representation of web search queries and clicked documents. CNN proves to be effective in learning of semantic and patterns from queries, documents, users and items. ( Shen et al., 2014)
In (Xu, He & Li, 2018) convolution neural network is used to learn document as well as query semantic feature vector of low dimensionality for search as well as neural collaborative filtering models for recommendation. K-means has been simple and efficient for wide variety of data types. K-means has low computational requirements and store only documents, cluster membership of the documents and the cluster centroids. (DeFreitas, & Bernard, 2015)
In this paper deep learning model convolution neural network(CNN) is used in web query session mining to generate the abstract document semantic vector. The resulting semantic vectors are further clustered using K-means clustering to reveal search patterns of web users and is evaluated for clusters quality.
Experiment is conducted on the data set of web search query sessions for analyzing the effectiveness of deep learning model convolution neural network on the quality of cluster of web documents. The results of cluster analysis based on WCSS has been compared with TF.DF based clusters. The results show that WCSS decreases drastically for clustering using CNN based document representation therefore confirms the improvement in clusters quality using CNN based document semantic representation.
The organization of paper is as follows section 2 provides a detailed survey of related work, section 3 covers basic concepts used in the paper, section 4 provides the details of proposed work, section 5 explains the experimental study and in section 6 conclusion of paper is described.