Promoting Document Relevance Using Query Term Proximity for Exploratory Search

Promoting Document Relevance Using Query Term Proximity for Exploratory Search

Vikram Singh
Copyright: © 2023 |Pages: 22
DOI: 10.4018/IJIRR.325072
Article PDF Download
Open access articles are freely available for download

Abstract

In the information retrieval system, relevance manifestation is pivotal and regularly based on document-term statistics, i.e., term frequency (tf), inverse document frequency (idf), etc. Query term proximity (QTP) within matched documents is mostly under-explored. In this article, a novel information retrieval framework is proposed to promote the documents among all relevant retrieved ones. The relevance estimation is a weighted combination of document statistics and query term statistics, and term-term proximity is simply aggregates of diverse user preferences aspects in query formation, thus adapted into the framework with conventional relevance measures. Intuitively, QTP is exploited to promote the documents for balanced exploitation-exploration, and eventually navigate a search towards goals. The evaluation asserts the usability of QTP measures to balance several seeking tradeoffs, e.g., relevance, novelty, result diversification (coverage, topicality), and overall retrieval. The assessment of user search trails indicates significant growth in a learning outcome (due to novelty).
Article Preview
Top

Introduction

Information-seeking is a fundament endeavor of human being and several information search systems has been deigned to assist a user to pose queries and retrieves informative data to accomplish search goals. The traditional systems strongly trust user’s capability of phrasing precise request and perform better if requests are short and navigational. A potential obstacle to such systems is an astonishing rate of information overload that makes difficult to a user for identifying useful information. Therefore nowadays, search focus is shifting from finding to understanding information (White & Roth, 2009), especially in discovery-oriented search. When a user wants information for learning purpose, decision making or other cognitive activity, the conventional search methodologies are not capable to assist, though data exploration is helpful. A data exploration synthesis focused search and exploratory browsing, to discover the interesting data objects. Though, exploration become a recall-oriented navigation over complex and huge datasets using short typed ill-phrased data request (Idreos, Papaemmanouil, & Chaudhuri, 2015; White, 2016; Marchionini, 2006), and thus requires strong support for adaptive relevance measures in retrieval framework (Nandi, & Jagadish, 2011).

In the data deluge, retrieval of relevant data requires either formal awareness of complex schema and content for the formulation of a data retrieval request or assistance from information system (Kersten, Idreos, Manegold, & Liarou, 2011; Huston, Culpepper, & Croft, 2014). For both situations, the system employs implicit measures to outline matched objects and explicit measures to eventually steer search towards a region-of-interest. Most existing retrieval models score a document predominantly on documents-terms statistics, i.e. document lengths, query-term frequencies, inverse document frequencies, etc (Van, 1977; Daoud & Huang, 2013). Intuitively, the query terms proximities (QTPs) within pre-fetched result set/documents could be exploited for re-position/re-raking of the documents/results in which the matched query terms are close to each other. For example, an information search considering the query ‘exploratory search’ on two documents, both matching the two query terms once:

Doc1: {…exploratory search………}.

Doc2: {….exploratory….search….}.

Intuitively, document1 should be ranked higher, as occurrences of both query terms are closest to each other. In compare to the document 2, where both query terms are far apart and their combination does not necessarily imply the meaning of ‘exploratory search’.

The term-term affinity within matched document has role to play during the retrieval and eventually to position the document in appropriate relevance (Salton & Buckley, 1988; Borlund, 2003; Verma, 2016). For an information search, a user specify data request in more than one terms with an anticipated inherent closeness. The closeness in query terms characterizes structural constraints of a user query and the importance between two matched documents in an information-seeking. The query term proximity is one measure, however, has been principally under-explored in traditional retrieval framework and models; mainly due to intrinsic design concerns (how we can model proximity) and its overall usability (what it serve) into a retrieval model.

Complete Article List

Search this Journal:
Reset
Volume 14: 1 Issue (2024)
Volume 13: 1 Issue (2023)
Volume 12: 4 Issues (2022): 3 Released, 1 Forthcoming
Volume 11: 4 Issues (2021)
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing