Application of Fuzzy C-Means Clustering and Semantic Ontology in Web Query Session Mining for Intelligent Information Retrieval

Application of Fuzzy C-Means Clustering and Semantic Ontology in Web Query Session Mining for Intelligent Information Retrieval

Suruchi Chawla
Copyright: © 2021 |Pages: 19
DOI: 10.4018/IJFSA.2021010101
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Information retrieval based on keywords search retrieves irrelevant documents because of vocabulary gap between document content and search queries. The keyword vector representation of web documents is very high dimensional, and keyword terms are unable to capture the semantic of document content. Ontology has been built in various domains for representing the semantics of documents based on concepts relevant to document subject. The web documents often contain multiple topics; therefore, fuzzy c-means document clustering has been used for discovering clusters with overlapping boundaries. In this paper, the method is proposed for intelligent information retrieval using hybrid of fuzzy c-means clustering and ontology in query session mining. Thus, use of fuzzy clusters of web query session concept vector improve quality of clusters for effective web search. The proposed method was evaluated experimentally, and results show the improvement in precision of search results.
Article Preview
Top

1. Introduction

Intelligent web information retrieval techniques customize the web search to the information need of the web user. Various personalized web search techniques has been proposed based on web mining for Intelligent Information Retrieval (Liu, Yu & Meng,2004; Leung, Lee & Lee, 2010; Speretta & Gauch,2005; Palleti, Karnick&Mitra, 2007; Pan, Wang & Gu, 2007; Peng, Niu, Huang, & Zhao, 2012; Senthilkumar & Geetha, 2010; Vanitha, 2013; Bedekar, Deshpande, & Joshi, 2008; Jie & Fangfang, 2010; Kim et al., 2010; Chawla & Bedi, 2007).

The keyword vector representation of web documents is unable to represent the semantic of web documents due to synonyms as well as hypernyms/hyponyms of words. (Wang, & Hodges, 2006) In order to capture the semantic of web documents, domain ontology has been used to generate the concept vector of documents by augmenting with synonyms/hyponyms of terms. Ontologies are used as domain knowledge for identifying the semantic relations as well as structure the data for effective information retrieval.

Clustering techniques have been widely used for mining web data for determining web usage pattern. The hard clustering techniques like k-means results are not optimized to infer the user behavior from web usage pattern as the web data is vague as well as imprecise. Fuzzy c-means has been widely used for generating web data clusters with overlapping boundaries where each data point belongs to clusters with some degree of membership. Thus fuzzy clusters are optimized based on minimizing the objective function using Eucledian distance measure (Karthikeyan & Sengottuvelan, 2010; Ansari et al., 2015).

An algorithm is designed for intelligent information retrieval based on fuzzy c-means clustering and semantic ontology. There are two parts of execution of algorithm Phase I and Phase II. In Phase I WordNet ontology has been used for concept vector representation of web query session. fuzzy c-means clustering of semantic web query session concept vector generates the clusters of user web query sessions with overlapping boundaries. The cluster centers are computed as weighted sum of concept web query session vector belonging to a given cluster.

In phase II the search query is augmented with concepts related to terms of user query for web search. The user search concept query is used for the selection of cluster based on cosine similarity measure with cluster centers. The clicked URLs in selected cluster are recommended to user. The user’s response to recommended URLs are tracked and stored in user profile. The user profile concept vector is generated based on ontology and is used for the selection of most similar cluster. The clicked URLs in selected cluster are recommended for personalized web search. Thus a novel algorithm is implemented for personalized web search based on fuzzy clusters of semantic web query sessions vector for effective user web search. The flowchart of the proposed approach is given in Figure 1.

Figure 1.

Shows the flowchart of steps of execution of proposed method using hybrid of Fuzzy c-means clustering and semantic ontology in web user search session mining

IJFSA.2021010101.f01

The implementation of an algorithm was evaluated on the data set of web query sessions captured in Academics, Entertainment, Sports domain. The results were compared with related approach of personalized web search (both keyword vector/semantic concept vector) (Chawla & Bedi, 2007; Chawla, 2018). The experimental results confirm that algorithm designed for Intelligent Information Retrieval in this paper improves the precision of search results significantly.

Top

In Drakshayani and Prasad (2012) new model was used for document representation using assignment of semantic weights to document phrases. In Thangamani and Thangaraj (2010) a method for building ontologies was built into unsupervised fuzzy document learning for deriving text semantics in the preview of linguistics. The web document high dimensional keyword vector is unable to capture the semantic of document content. Semantic domain ontology is built based on content mining of documents and store the relationship of terms and concepts using the synonym, meronym, and hypernym factors.

Complete Article List

Search this Journal:
Reset
Volume 13: 1 Issue (2024)
Volume 12: 1 Issue (2023)
Volume 11: 4 Issues (2022)
Volume 10: 4 Issues (2021)
Volume 9: 4 Issues (2020)
Volume 8: 4 Issues (2019)
Volume 7: 4 Issues (2018)
Volume 6: 4 Issues (2017)
Volume 5: 4 Issues (2016)
Volume 4: 4 Issues (2015)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing