A priori algorithm for sub-category classification analysis of handwriting

The sub-category classification problem is that of discriminating a pattern to all sub-categories. Not surprisingly, sub-category classification performance estimates are useful information to mine as many researchers are interested in any trend of pattern in specific sub-category. This paper presents a datamining technique to mine a database consisting of experimental and observational unit variables. Experimental unit variables are those attributes which make sub-categories of the entity, e.g., demographic data and observational unit variables are features observed to classify the entity, e.g., test results or handwriting styles, etc. Since there are an enormously large number of subcategories based on the experimental unit variables, we apply the a priori algorithm to select only sub-categories that have enough support among all possible ones in a given database. Those selected sub-categories are then discriminated using observational unit variables as input features to the Artificial Neural Network (ANN) classifier. The importance of this paper is twofold. First, we propose an algorithm that quickly selects all sub-categories that have enough both support and classification rate. Second, we successfully applied the proposed algorithm to the field of handwriting analysis. The task is to determine similarity of handwriting style of a specific group of people. Document examiners are interested in trends in the handwriting of specific groups, e.g., (i) does a male write differently from a female? (ii) can we tell the difference in handwriting of age group between 25 and 45 from others?, etc. Subgroups of white males in the age group 15-24 and white females in the age group 45-64 show 87 % correct classification performance.