The 5th Workshop on Balto-Slavic Natural Language Processing, BSNLP@RANLP 2015, Hissar, Bulgaria, September 10-11, 2015

[1]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[2]  Dawid Weiss,et al.  A survey of Web clustering engines , 2009, CSUR.

[3]  Efstathios Stamatatos,et al.  An empirical text categorizing computational model based on stylistic aspects , 1996, Proceedings Eighth IEEE International Conference on Tools with Artificial Intelligence.

[4]  Maciej Eder,et al.  Does size matter? Authorship attribution, small samples, big problem , 2015, Digit. Scholarsh. Humanit..

[5]  Maciej Piasecki,et al.  Automated Generation of Derivative Relations in the Wordnet Expansion Perspective , 2012 .

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Gerard Kempen Clausal coordination and coordinative ellipsis in a model of the speaker , 2009 .

[8]  Robert Kabacoff,et al.  R in Action , 2011 .

[9]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[10]  Matthew L. Jockers,et al.  A comparative study of machine learning methods for authorship attribution , 2010, Lit. Linguistic Comput..

[11]  Borislav Rizov,et al.  Coping with Derivation in the Bulgarian Wordnet , 2014, GWC.

[12]  Patrick Juola,et al.  Future Trends in Authorship Attribution , 2007, IFIP Int. Conf. Digital Forensics.

[13]  Zeljko Agic,et al.  Improving Part-of-Speech Tagging Accuracy for Croatian by Morphological Analysis , 2008, Informatica.

[14]  Ronen Feldman,et al.  Techniques and applications for sentiment analysis , 2013, CACM.

[15]  Franciska de Jong,et al.  An Unsupervised Aspect Detection Model for Sentiment Analysis of Reviews , 2013, NLDB.

[16]  Michael Schulte-Mecklenbeck,et al.  Information search in the laboratory and on the Web: With or without an experimenter , 2003, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[17]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Kiril Ivanov Simov,et al.  Practical Annotation Scheme for an HPSG Treebank of Bulgarian , 2003, LINC@EACL.

[20]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[21]  Jakub Piskorski,et al.  Language Resources for Named Entity Annotation in the National Corpus of Polish , 2010 .

[22]  Rada Mihalcea,et al.  Building a Sense Tagged Corpus with Open Mind Word Expert , 2002, SENSEVAL.

[23]  Lei Zhang,et al.  A Survey of Opinion Mining and Sentiment Analysis , 2012, Mining Text Data.

[24]  Mathieu Lafourcade,et al.  Bénéfices et limites de l'acquisition lexicale dans l'expérience JeuxDeMots , 2013 .

[25]  Scott Cotton,et al.  SENSEVAL-2: Overview , 2001, *SEMEVAL.

[26]  Gérard Roland,et al.  Dimensions of politics in the European Parliament , 2006 .

[27]  Zeljko Agic,et al.  Croatian Dependency Treebank: Recent Development and Initial Experiments , 2012, LREC.

[28]  Cristina Bosco,et al.  Converting the parallel treebank ParTUT in Universal StanfordDependencies , 2014 .

[29]  Fabio Crestani,et al.  Finding Participants in a Chat: Authorship Attribution for Conversational Documents , 2013, 2013 International Conference on Social Computing.

[30]  Kiril Ivanov Simov,et al.  Constituency Parsing of Bulgarian: Word- vs Class-based Parsing , 2014, LREC.

[31]  Yasemin Altun,et al.  Broad-Coverage Sense Disambiguation and Information Extraction with a Supersense Sequence Tagger , 2006, EMNLP.

[32]  Saso Dzeroski,et al.  Towards a Slovene Dependency Treebank , 2006, LREC.

[33]  Marine Carpuat,et al.  Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.

[34]  David Y. W. Lee,et al.  Genres, Registers, Text Types, Domains and Styles: Clarifying the Concepts and Navigating a Path through the BNC Jungle , 2001 .

[35]  Vytautas Mickevicius,et al.  Analysing voting behavior of the Lithuanian parliament using cluster analysis and multidimensional scaling : technical aspects , 2014 .

[36]  Ivelina Stoyanova,et al.  Wordnet-Based Cross-Language Identification of Semantic Relations , 2013, BSNLP@ACL.

[37]  Suresh Manandhar,et al.  SemEval-2015 Task 12: Aspect Based Sentiment Analysis , 2015, *SEMEVAL.

[38]  Douglas Biber,et al.  Developing a bottom‐up, user‐based method of web register classification , 2015, J. Assoc. Inf. Sci. Technol..

[39]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[40]  Hans van Halteren,et al.  New Machine Learning Methods Demonstrate the Existence of a Human Stylome , 2005, J. Quant. Linguistics.

[41]  Dong-Hong Ji,et al.  Word Sense Disambiguation Using Label Propagation Based Semi-Supervised Learning , 2005, ACL.

[42]  Julie Weeds,et al.  Finding Predominant Word Senses in Untagged Text , 2004, ACL.

[43]  Byoung-Tak Zhang,et al.  Word Sense Disambiguation by Learning from Unlabeled Data , 2000, ACL.

[44]  Hany Hassan,et al.  Graph Based Semi-Supervised Approach for Information Extraction , 2006 .

[45]  Robert Sigley,et al.  Text categories and where you can stick them : A crude formality index , 1997 .

[46]  Suresh Manandhar,et al.  SemEval-2014 Task 4: Aspect Based Sentiment Analysis , 2014, *SEMEVAL.

[47]  Jurgita Kapociute-Dzikiene,et al.  Predicting Party Group from the Lithuanian Parliamentary Speeches , 2014, Inf. Technol. Control..

[48]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[49]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[50]  F. Mosteller,et al.  Inference in an Authorship Problem , 1963 .

[51]  Ana Zwitter Vitez Authorship Attribution: Specifics for Slovene , 2012 .

[52]  Gregory M. Kobele,et al.  Eliding the derivation: A minimalist formalization of ellipsis , 2012, Proceedings of the International Conference on Head-Driven Phrase Structure Grammar.

[53]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[54]  Jingbo Zhu,et al.  Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem , 2007, EMNLP.

[55]  Natalia V. Loukachevitch,et al.  Evaluating Sentiment Analysis Systems in Russian , 2013, BSNLP@ACL.

[56]  George M. Mohay,et al.  Mining e-mail content for author identification forensics , 2001, SGMD.

[57]  Serge Sharoff,et al.  In the Garden and in the Jungle Comparing Genres in the BNC and Internet , 2010 .

[58]  Shlomo Argamon,et al.  Effects of Age and Gender on Blogging , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[59]  Adam Kilgarriff,et al.  The Senseval-3 English lexical sample task , 2004, SENSEVAL@ACL.

[60]  Steven Abney,et al.  Semisupervised Learning for Computational Linguistics , 2007 .

[61]  T C Mendenhall,et al.  THE CHARACTERISTIC CURVES OF COMPOSITION. , 1887, Science.

[62]  Fei Xia,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[63]  J. R. Landis,et al.  An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. , 1977, Biometrics.

[64]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[65]  Janna Lipenkova,et al.  Converting Russian Dependency Treebank to Stanford Typed Dependencies Representation , 2014, EACL.

[66]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[67]  Kyle Johnson,et al.  Gapping Is Not (VP-) Ellipsis , 2009, Linguistic Inquiry.

[68]  Seth Kulick,et al.  Enhanced Annotation and Parsing of the Arabic Treebank , 2008 .

[69]  Vera Lúcia Strube de Lima,et al.  Open information extraction based on lexical semantics , 2015, Journal of the Brazilian Computer Society.

[70]  Adam Przepiórkowski,et al.  PoliMorf: a (not so) new open morphological dictionary for Polish , 2012, LREC.

[71]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[72]  William Lamb,et al.  Scottish Gaelic Speech and Writing: Register Variation in an Endangered Language , 2007 .

[73]  Benoît Sagot,et al.  Building a free French wordnet from multilingual resources , 2008 .

[74]  Olga Lyashevskaya,et al.  Automatic Word Sense Disambiguation and Construction Identification Based on Corpus Multilevel Annotation , 2011, TSD.

[75]  Elena Tutubalina,et al.  Clause-Based Approach to Extracting Problem Phrases from User Reviews of Products , 2014, AIST.

[76]  Efstathios Stamatatos A survey of modern authorship attribution methods , 2009 .

[77]  Ulf Brefeld,et al.  An Off-the-shelf Approach to Authorship Attribution , 2014, COLING.

[78]  Olga Kanishcheva Using of the Statistical Method for Authorship Attribution of the Text , 2014 .

[79]  John Tait,et al.  Word sense disambiguation in information retrieval revisited , 2003, SIGIR.

[80]  Verginica Barbu Mititelu Increasing the Effectiveness of the Romanian Wordnet in NLP Applications , 2013, Comput. Sci. J. Moldova.

[81]  Adam Radziszewski A Tiered CRF Tagger for Polish , 2013, Intelligent Tools for Building a Scientific Information Platform.

[82]  John R. te Velde Deriving coordinate symmetries , 2005 .

[83]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[84]  Preslav Nakov,et al.  Feature-Rich Part-of-speech Tagging for Morphologically Complex Languages: Application to Bulgarian , 2012, EACL.

[85]  Jingbo Zhu,et al.  Multi-Criteria-Based Strategy to Stop Active Learning for Data Annotation , 2008, COLING.

[86]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[87]  Manuel Montes-y-Gómez,et al.  Modality Specific Meta Features for Authorship Attribution in Web Forum Posts , 2011, IJCNLP.

[88]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[89]  Steven S. Smith,et al.  The Dimensionality of Congressional Voting Reconsidered , 2016 .

[90]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[91]  Koji Yatani,et al.  Analysis of Adjective-Noun Word Pair Extraction Methods for Online Review Summarization , 2011, IJCAI.

[92]  Michael A. Bailey Comparable Preference Estimates across Time and Institutions for the Court, Congress, and Presidency , 2007 .

[93]  Nancy Ide,et al.  Fine-Grained Word Sense Disambiguation Based on Parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets , 2004, COLING.

[94]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[95]  Krzysztof A. Cyran Machine learning approach to authorship attribution of literary texts , 2007 .

[96]  Joakim Nivre,et al.  Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.

[97]  Justin Zobel,et al.  Effective and Scalable Authorship Attribution Using Function Words , 2005, AIRS.

[98]  Jakub Piskorski,et al.  Real-Time News Event Extraction for Global Crisis Monitoring , 2008, NLDB.

[99]  Nikola Ljubešić,et al.  What Makes Sense? : Searching for Strong WSD Predictors in Croatian , 2007 .

[100]  Andelka Zecevic,et al.  N-gram Based Text Classification According To Authorship , 2011, RANLP.

[101]  Veronika Laippala,et al.  Universal Dependencies 1.4 , 2015 .

[102]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[103]  Petya Osenova,et al.  Joint Ensemble Model for POS Tagging and Dependency Parsing , 2014 .

[104]  Frank Keller,et al.  Gradience in Grammar: Experimental and Computational Aspects of Degrees of Grammaticality , 2001 .

[105]  James Shaw,et al.  Segregatory Coordination and Ellipsis in Text Generation , 1998, ACL.

[106]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[107]  Milos Utvic,et al.  An Authorship Attribution for Serbian , 2012, BCI.

[108]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[109]  Malvina Nissim,et al.  SemEval-2007 Task 08: Metonymy Resolution at SemEval-2007 , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[110]  Hwee Tou Ng,et al.  Getting Serious about Word Sense Disambiguation , 2002 .

[111]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[112]  Rada Mihalcea,et al.  Co-training and Self-training for Word Sense Disambiguation , 2004, CoNLL.

[113]  Vittorio Murino,et al.  Conversationally-inspired stylometric features for authorship attribution in instant messaging , 2012, ACM Multimedia.

[114]  D. Holmes The Evolution of Stylometry in Humanities Scholarship , 1998 .

[115]  Shlomo Argamon,et al.  Computational methods in authorship attribution , 2009 .

[116]  Dmitry V. Khmelev,et al.  Using Literal and Grammatical Statistics for Authorship Attribution , 2001, Probl. Inf. Transm..

[117]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[118]  Zdenek Zabokrtský,et al.  Prague Dependency Style Treebank for Tamil , 2012, LREC.

[119]  Veronika Laippala,et al.  Universal Dependencies for Finnish , 2015, NODALIDA.

[120]  N.H.J. Oostdijk,et al.  Using the MF/MD Method for Automatic Text Classification , 2003 .

[121]  Jurgita Kapociute-Dzikiene,et al.  The Effect of Author Set Size in Authorship Attribution for Lithuanian , 2015, NODALIDA.

[122]  M. Piasecki,et al.  Polish tagger TaKIPI: rule based construction and optimization , 2007 .

[123]  Maciej Piasecki,et al.  Corpus-Based Semantic Filtering in Discovering Derivational Relations , 2012, AIMSA.

[124]  Martha Palmer,et al.  The English all-words task , 2004, SENSEVAL@ACL.

[125]  Efstathios Stamatatos,et al.  Computer-Based Authorship Attribution Without Lexical Measures , 2001, Comput. Humanit..

[126]  Keith T. Poole,et al.  Spatial Models of Parliamentary Voting , 2005 .

[127]  Christiane Fellbaum,et al.  Putting Semantics into WordNet's "Morphosemantic" Links , 2009, LTC.

[128]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[129]  Tianshun Yao,et al.  Active Learning with Sampling by Uncertainty and Density for Word Sense Disambiguation and Text Classification , 2008, COLING.

[130]  Tomas Krilavicius,et al.  Automatic Thematic Classification of the Titles of the Seimas Votes , 2015, NODALIDA.

[131]  Christiane Fellbaum,et al.  English Tasks: All-Words and Verb Lexical Sample , 2001, *SEMEVAL.

[132]  Li Chen,et al.  Tri-Training for Authorship Attribution with Limited Training Data , 2014, ACL 2014.

[133]  Rudolf Rosa,et al.  HamleDT 2.0: Thirty Dependency Treebanks Stanfordized , 2014, LREC.

[134]  Efstathios Stamatatos,et al.  Automatic Text Categorization In Terms Of Genre and Author , 2000, CL.

[135]  Karin Harbusch Incremental sentence production inhibits clausal coordinate ellipsis: A treebank study into Dutch and German , 2011, Dialogue Discourse.

[136]  Fredrik Olsson,et al.  A literature survey of active machine learning in the context of natural language processing , 2009 .

[137]  Lei Zhang,et al.  Extracting Resource Terms for Sentiment Analysis , 2011, IJCNLP.

[138]  Karel Pala,et al.  Derivational Relations in Czech WordNet , 2007, ACL 2007.

[139]  E S Klyshinsky,et al.  Development of Russian subcategorization frames and its properties investigation , 2013 .

[140]  Daniel Zeman,et al.  Reusable Tagset Conversion Using Tagset Drivers , 2008, LREC.

[141]  Serge Sharoff,et al.  Automatic Classification of Web Texts Using Functional Text Dimensions , 2015 .

[142]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.

[143]  Božo Bekavac,et al.  Building Croatian WordNet , 2008 .

[144]  Lidia Pivovarova,et al.  Automatic Collocation Extraction and Classification of Automatically Obtained Bigrams , 2014 .

[145]  Hang Li,et al.  Word Translation Disambiguation Using Bilingual Bootstrapping , 2004, Computational Linguistics.

[146]  Amélie Marian,et al.  Beyond the Stars: Improving Rating Predictions using Review Text Content , 2009, WebDB.

[147]  D. S. Guru,et al.  Representation and Classification of Text Documents: A Brief Review , 2010 .

[148]  Moshe Koppel,et al.  Measuring Differentiability: Unmasking Pseudonymous Authors , 2007, J. Mach. Learn. Res..

[149]  Dirk Speelman,et al.  Register analysis in blogs: Correlation between professional sector and functional dimensions , 2013 .

[150]  Maciej Piasecki,et al.  Semi-supervised word sense disambiguation based on weakly controlled sense induction , 2009, 2009 International Multiconference on Computer Science and Information Technology.

[151]  Nikola Ljubesic,et al.  {bs,hr,sr}WaC - Web Corpora of Bosnian, Croatian and Serbian , 2014, WaC@EACL.

[152]  Adam Przepiórkowski,et al.  Slavic Information Extraction and Partial Parsing , 2007, ACL 2007.

[153]  Mark Sanderson,et al.  Ambiguous queries: test collections need more sense , 2008, SIGIR '08.

[154]  Emma Marsden,et al.  IRIS : a new resource for second language research , 2014 .

[155]  Junsaku Nakamura Statistical Methods and Large Corpora — A New Tool for Describing Text Types , 1993 .

[156]  Aidan Finn,et al.  Learning to classify documents according to genre , 2006, J. Assoc. Inf. Sci. Technol..

[157]  Martha Palmer,et al.  SemEval-2007 Task-17: English Lexical Sample, SRL and All Words , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[158]  Hsinchun Chen,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006 .

[159]  Marcin Sydow,et al.  DEBORA: Dependency-Based Method for Extracting Entity-Relationship Triples from Open-Domain Texts in Polish , 2012, ISMIS.

[160]  Maciej Eder,et al.  Style-markers in authorship attribution : a cross-language study of the authorial fingerprint , 2011 .

[161]  Shlomo Argamon,et al.  Authorship attribution in the wild , 2010, Lang. Resour. Evaluation.

[162]  Jingbo Zhu,et al.  Learning a Stopping Criterion for Active Learning for Word Sense Disambiguation and Text Classification , 2008, IJCNLP.

[163]  Maciej Piasecki,et al.  Towards Word Sense Disambiguation of Polish , 2008, 2008 International Multiconference on Computer Science and Information Technology.

[164]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[165]  Dawn Xiaodong Song,et al.  On the Feasibility of Internet-Scale Author Identification , 2012, 2012 IEEE Symposium on Security and Privacy.

[166]  Hua Xu,et al.  Weakness Finder: Find product weakness from Chinese reviews by using aspects based sentiment analysis , 2012, Expert Syst. Appl..

[167]  Helmut Schmid,et al.  Estimation of Conditional Probabilities With Decision Trees and an Application to Fine-Grained POS Tagging , 2008, COLING.

[168]  Nancy Ide,et al.  Sense Discrimination with Parallel Corpora , 2002, SENSEVAL.

[169]  Adam Kilgarriff,et al.  Introduction to the special issue on evaluating word sense disambiguation systems , 2002, Natural Language Engineering.

[170]  Artur Silic,et al.  Automatic Authorship Attribution for Texts in Croatian Language Using Combinations of Features , 2010, KES.

[171]  Martha Palmer,et al.  An Empirical Study of the Behavior of Active Learning for Word Sense Disambiguation , 2006, NAACL.

[172]  Walter Daelemans,et al.  Improving Topic Classification for Highly Inflective Languages , 2012, International Conference on Computational Linguistics.

[173]  A. Goldberg Constructions at Work: The Nature of Generalization in Language , 2006 .

[174]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[175]  S. Gries,et al.  Behavioral profiles: A corpus-based approach to cognitive semantic analysis , 2009 .

[176]  Walter Daelemans,et al.  Authorship Attribution and Verification with Many Authors and Limited Data , 2008, COLING.

[177]  Stefan Th. Gries,et al.  Collostructions: Investigating the interaction of words and constructions , 2003 .

[178]  Serge Sharoff,et al.  Document dissimilarity within and across languages: A benchmarking study , 2014, Lit. Linguistic Comput..

[179]  Yejin Choi,et al.  Connotation Lexicon: A Dash of Sentiment Beneath the Surface Meaning , 2013, ACL.

[180]  Orhan Bilgin,et al.  Morphosemantic Relations In and Across Wordnets A Study Based on Turkish , 2004 .

[181]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[182]  Jati K. Sengupta,et al.  Introduction to Information , 1993 .

[183]  Narendra K. Gupta,et al.  Extracting Phrases Describing Problems with Products and Services from Twitter Messages , 2013 .

[184]  Lidia Pivovarova,et al.  Automatic Detection of Stable Grammatical Features in N-Grams , 2013, MWE@NAACL-HLT.

[185]  Petya Osenova,et al.  Design and Implementation of the Bulgarian HPSG-based Treebank , 2004 .