Finding Inexact Quotations Within a Tibetan Buddhist Corpus

Text matching can be a powerful tool in exploring historical texts. Here, we compare two Tibetan texts and identify pairs of nearly identical strings. A maximal-path algorithm, previously used for matching biological sequences, lies at the heart of our method. The matches found have been verified by Tibetan scholars. They were shown to be of concrete value for Tibetan studies and open up previously inaccessible research avenues.

[1]  Takeo Kanade,et al.  Sanskrit Computational Linguistics , 2009, Lecture Notes in Computer Science.

[2]  Alex Thomo,et al.  A graph approach to the threshold all-against-all substring matching problem , 2008, JEAL.

[3]  Shrisha Rao,et al.  Citation Matching in Sanskrit Corpora Using Local Alignment , 2010, Sanskrit Computational Linguistics.