The head as a place of articulation: From automated detection to linguistic analysis

Research into sign languages undoubtedly benefits from computer-vision-based tools that can automatically extract information from large sets of video data. In our previous work (Karppa & al. 2011, 2012), we have developed such a tool for the analysis of movement, and here we will extend the system to the study of the place of articulation (POA). Specifically, we will focus on the head, and test how accurately our system (i) detects the signs with a head POA as occlusion events and (ii) distributes the realisations of the POA into the five main head location classes used in the online dictionary of FinSL, Suvi (Figure 1). We will also investigate the linguistic issue of (iii) how much and where the hands move in the vicinity of the face during signing. The operation of our POA analyser is based on the detection of local properties of the sign language video that are identified with events when the hands occlude the head. Primarily, the occlusions are detected by tracking local image neighbourhoods over the frames of the video. An occlusion is reported when a neighbourhood originating from outside the head region (prototypically the active hand) drifts over the head region. The tracking is based on template matching of constellations of nearby points. However, as the tracking in this stage is rather approximate, the information from tracking is combined with a measure of local motion abruptness. This measure compounds the residual template matching error between frames with the non-smoothness of the estimated motion field. In addition to just identifying hand-head occlusions, the system also encodes the locations of the occlusions relative to the parts of the face according to Suvi’s facial coding scheme. All results are presented in ELAN (Figure 2). At the time of writing, our data comprises 7950 frames of continuous semi-formal monologue at 25fps from 4 sitting signers recorded in frontal view and includes 101 head POA signs. Concerning the first question, the analysis indicated the average accuracy of 91% (Table 1). As nearly all of the non-detected signs also showed partial occlusion, we consider this result to be highly positive. In terms of the second question, the average success rate was 73% (Table 2). In the task, misclassification was relatively frequent but, as the incorrectly classified instances typically represented borderline cases between two vertically adjacent facial areas, we consider this result too to be very encouraging. Finally, in terms of the third question, we found that the hands were located in the vicinity of the face 36% of the time of signing, the most frequent locations on the head being the sides of the lower half of the face. As the overall share of head POA signs was further found to be only 36% of all occlusions, we conclude that signers raise their hands to the area of the head not just to produce head POA signs but, perhaps, also to maximize the perceptivity of the signing to the addressee, whose point of fixation is the signer’s face (Siple 1978).