Integral Business Intelligence System for the Croatian Language: Proper Name Recognition Module. The Fifth International Conference: Information Technology and Journalism - Journalism - The Next Step, Dubrovnik, Inter University Center, 22 - 26 May 2000

The aim of the work was to to build a database consisting of all existing names today in Croatia in all different word-forms in accordance with Croatian language rules and to set up rules for combination of proper names with family names in the Croatian language. Because we couldn’t get access to the social security data base or the data base of the ministery of interiors we were forced to use other publicly available sources which are much less accurate then the mentioned data bases. In the future, we hope to get access to those database and improve our inflectional database. This databes consist of 9538 different male names (682,283 occurences), 8963 female names (568,703 occurences) and 75,298 family names (1,251,106 occurences) in all possible word forms. Until now there was no lexical (and inflectional) database for proper names, although there are several ordinary lexical and inflectional data bases for Croatian language. This data base can be used as additional source for Croatian spelling checker preparation, as generative data base for all possible forms of Croatian names and/or as searching aid, to search for all forms of certain name, and/or as a proper name recognition module.