Formal Structure of Sanskrit Text: Requirements Analysis for a Mechanical Sanskrit Processor

We discuss the mathematical structure of various levels of representation of Sanskrit text in order to guide the design of computer aids aiming at useful processing of the digitalised Sanskrit corpus. Two main levels are identified, respectively called the linear and functional level. The design space of these two levels is sketched, and the computational implications of the main design choices are discussed. Current solutions to the problems of mechanical segmentation, tagging, and parsing of Sanskrit text are briefly surveyed in this light. An analysis of the requirements of relevant linguistic resources is provided, in view of justifying standards allowing inter-operability of computer tools. This paper does not attempt to provide definitive solutions to the representation of Sanskrit at the various levels. It should rather be considered as a survey of various choices, allowing an open discussion of such issues in a formally precise general framework.

[1]  Gérard Huet,et al.  The zen computational linguistics toolkit , 2002 .

[2]  J. F. Staal,et al.  Syntactic and Semantic Relations in Pāṇini , 1969 .

[3]  Gérard P. Huet,et al.  Shallow syntax analysis in Sanskrit guided by semantic nets constraints , 2006, IWRIDL '06.

[4]  G. Huet Towards Computational Processing of Sanskrit , 2003 .

[5]  Frits Staal,et al.  Word Order in Sanskrit and Universal Grammar , 1967 .

[6]  David R. Dowty Grammatical Relations and Montague Grammar , 1982 .

[7]  J. Girard,et al.  Proofs and types , 1989 .

[8]  Oliver Hellwig SanskritTagger: A Stochastic Lexical and POS Tagger for Sanskrit , 2008, Sanskrit Computational Linguistics.

[9]  Thomas Oberlies,et al.  A grammar of epic Sanskrit , 2003 .

[10]  J. Girard,et al.  Advances in Linear Logic , 1995 .

[11]  Paul Kiparsky On the Architecture of Pān . ini ’ s Grammar ∗ , 2002 .

[12]  R. M. K. Sinha Computer processing of Asian languages : proceedings of the second regional workshop, March 12-16, 1992, Indian Institute of Technology, Kanpur , 1992 .

[13]  Per Martin-Löf,et al.  Intuitionistic type theory , 1984, Studies in proof theory.

[14]  B. Gillon,et al.  Word order in Classical Sanskrit , 1996 .

[15]  Samuel Eilenberg,et al.  Automata, languages, and machines. A , 1974, Pure and applied mathematics.

[16]  Louis Renou Terminologie grammaticale du Sanskrit , 1957 .

[17]  Madhav M. Deshpande,et al.  Indian Linguistic Studies: Festschrift in Honor of George Cardona , 2002 .

[18]  Stephanie W. Jamison,et al.  The Sanskrit Gerund: A Synchronic, Diachronic, and Typological Analysis , 1989 .

[19]  Igor Mel’čuk,et al.  Dictionnaire explicatif et combinatoire du francais contemporain. Recherches lexico-semantiques , 1985 .

[20]  S. C. Kleene,et al.  Introduction to Metamathematics , 1952 .

[21]  Peter M. Scharf Pāṇinian Accounts of the Vedic Subjunctive: Leṭ Kr̥ṇvaíte , 2005 .

[22]  B. S. Gillon The autonomy of word formation : Evidence from Classical Sanskrit , 1995 .

[23]  J. Roger Hindley,et al.  Introduction to combinators and λ-calculus , 1986, Acta Applicandae Mathematicae.

[24]  Igorʹ A. Melʹčuk,et al.  DEC dictionnaire explicatif et combinatoire du français contemporain , 1984 .

[25]  Pawan Goyal,et al.  Translation Divergence in English-Sanskrit-Hindi Language Pairs , 2008, Sanskrit Computational Linguistics.

[26]  Brendan S. Gillon,et al.  Pan : ini's As : t : adhyayõ¯ and Linguistic Theory , 2007 .

[27]  Brendan S. Gillon EXOCENTRIC ( BAHUVRĪHI ) COMPOUNDS IN CLASSICAL SANSKRIT , 2007 .

[28]  Paul Kiparsky On the Architecture of Panini's Grammar , 2008, Sanskrit Computational Linguistics.

[29]  Α. Hillebrandt,et al.  La valeur du parfait dans les hymnes védiques , 1927 .

[30]  Vaman Shivaram Apte,et al.  The student's guide to Sanskrit composition : a treatise on Sanskrit syntax for the use of schools and colleges , 1885 .

[31]  T. Burrow Der Injunktiv im Veda: eine synchronische Funktionsuntersuchung . By Karl Hoffman. (Indogermanische Bibliothek. Dritte Reihe: Untersuchungen.) pp. 298. Heidelberg, Carl Winter Universitätsverlag, 1967. , 1969 .

[32]  Gérard P. Huet,et al.  The Reactive Engine for Modular Transducers , 2006, Essays Dedicated to Joseph A. Goguen.

[33]  Gérard P. Huet,et al.  Under Consideration for Publication in J. Functional Programming a Functional Toolkit for Morphological and Phonological Processing, Application to a Sanskrit Tagger , 2022 .

[34]  Henk Barendregt,et al.  The Lambda Calculus: Its Syntax and Semantics , 1985 .

[35]  Bertil Tikkanen,et al.  Themes and Tasks in Old and Middle Indo-Aryan Linguistics , 2006 .

[36]  Frits Staal,et al.  Universals: Studies in Indian Logic and Linguistics , 1988 .

[37]  Gerard Huet,et al.  Design of a Lexical Database for Sanskrit , 2004 .

[38]  Franklin Edgerton,et al.  La valeur du parfait dans les hymnes védiques@@@La valeur du parfait dans les hymnes vediques , 1925 .

[39]  Marcus Kracht,et al.  The Combinatorics of Cases , 2003 .

[40]  Brendan S. Gillon Review of Natural language processing: a Paninian perspective by Akshar Bharati, Vineet Chaitanya, and Rajeev Sangal. Prentice-Hall of India 1995. , 1995 .

[41]  Peter M. Scharf Pāṇinian accounts of the Vedic subjunctive: , 2008 .

[42]  Stephanie W. Jamison,et al.  The Sanskrit gerund : a synchronic, diachronic and typological analysis , 1990 .

[43]  A. Verboom Towards a Sanskrit Wordparser , 1988 .

[44]  Brendan S. Gillon,et al.  Pāṇini’s Aṣṭādhyāyī and Linguistic Theory , 2007 .

[45]  Malhar Kulkarni,et al.  Phonological Overgeneration in Paninian System , 2009, Sanskrit Computational Linguistics.

[46]  Vipul Arora,et al.  Analysis of Sanskrit Text: Parsing and Semantic Relations , 2008, Sanskrit Computational Linguistics.

[47]  C. Retoré The Logic of Categorial Grammars: Lecture Notes , 2005 .

[48]  Geoffrey K. Pullum,et al.  The nature of syntactic representation , 1982 .

[49]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .