Dealing with Relevance Ranking in Cross-Lingual Cross-Script Text Reuse

Dealing with Relevance Ranking in Cross-Lingual Cross-Script Text Reuse

Aarti Kumar, Sujoy Das
Copyright: © 2016 |Pages: 20
DOI: 10.4018/IJIRR.2016010102
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Proliferation of multilingual content on the web has paved way for text reuse to get cross-lingual and also cross script. Identifying cross language text reuse becomes tougher if one considers cross-script less resourced languages. This paper focuses on identifying text reuse between English-Hindi news articles and improving their relevance ranking using two phases (i) Heuristic retrieval phase for reducing search space and (ii) post processing phase for improving the relevance ranking. Dictionary based strategy of Cross-Language Information Retrieval is used for heuristic retrieval and Parse Feature Vector Model (PFVS) is proposed for post processing to improve the relevance ranking. The application of this model has been successful in tackling the obfuscation problems of synonymy, hyponymy, hypernymy, antonym, sentence addition/ deletion and word inflection. Instead of using traditional approaches, Parse Feature Vectors have been explored to detect the reused documents and as per the knowledge of the authors it is a novel contribution with regards to these two language pairs.
Article Preview
Top

Detecting cross-lingual reuse has been an area of research interest for many researchers since long. Stephan Vogel, Hermann Ney and Christoph Tillman (1996) tried to use Hidden Markov Model for aligning words of statistically translated English and French. As opposed to common approach where alignment probabilities are dependent upon absolute position of words, they aimed at making it dependent on relative position.

Noah A. Smith (2002) devised an approach which could be acclimatized for any multilingual corpus for classifying document pairs as either translationally equivalent or not.

Complete Article List

Search this Journal:
Reset
Volume 14: 1 Issue (2024)
Volume 13: 1 Issue (2023)
Volume 12: 4 Issues (2022): 3 Released, 1 Forthcoming
Volume 11: 4 Issues (2021)
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing