Finding Influence by Cross-Lingual Blog Mining through Multiple Language Lists

Blogs has been one of the important resources of information on the internet. Now-a-days lot of Indian language content being generated in the form of blogs. People express their opinions on various situations and events. The content in the blogs may contain named entities–names of people, places, and organizations. Named entities also contain names of eminent personalities who are famous in or out of that language community. The goal of this paper is to find the influence of a personality among cross-language bloggers. The approach we follow is to collect information from blog pages and index the named entities along with their probabilities of occurrence by removing irrelevant information from the blog. When user searches to find the influence of a personality through a query in Indian language, we use a cross language lexicon in the form of multiple language parallel lists to transliterate the query into other Indian languages and mine blogs to return the influence of the personality across Indian language bloggers. An overview of the system and preliminary results are described.