The Classification of Programming Languages by Usage

Relationships between 16 programming languages have been investigated using data from 1062 U.K. software firms. The number of firms which use both of a given pair of languages is recorded for all pairings of the 16 languages. Above average co-occurrence of a pair is taken as evidence of relationship between the two languages. Alternatively, the number of firms which use neither of a given pair of languages is recorded for all pairs of languages. The two methods of deriving similarity matrices we call the AND analysis (relationship by co-occurrence) and NOR (relationship by co-absence), by analogy with the Boolean operators. The AND and NOR similarity matrices first undergo separate quasi Chi-square fits to remove the size-contributions; the residuals (observed minus expected values) are then used as the raw input to a simple hierarchical clustering algorithm. Separate AND and NOR analyses reveal a consistent picture of inter-language relationships. Subjectively labelled, the broadest dichotomy seems to be between traditional languages, quite often considered clumsy (such as BASIC, COBOL, FORTRAN, Assembler…) and more modern, elegant languages (such as the Algol family and APL). Business vs scientific seems to be a secondary dichotomy. Dependence and dominance relationships can be examined by an XOR analysis: counting when one language of a pair is used while the other is not. Relative dominance (when the size-effect has been removed) is modelled by a simple directed graph, with five sub-groups of languages as the nodes. Some other similarity measures that might be used to relate programming languages are discussed in the Introduction, any of which may contribute to similarity by usage. Finally, the general method of analysis is applicable to many different situations in which binary data about co-occurrence of events is gathered across a large number of elements.

[1]  John Doyle C - an alternative to assembly programming , 1985, Microprocess. Microsystems.

[2]  S. C. Johnson,et al.  UNIX time-sharing system: Portability of c programs and the UNIX system , 1978, The Bell System Technical Journal.