Reading the book of life
暂无分享,去创建一个
With the publication of the human genome sequence, we are passing into a new phase in the analysis of what is popularly being called the ‘Book of Life’. However symbolic the shift, it is nevertheless palpable: the data will be in a more or less contiguous, more or less stable form, constituting a reference text against which coding and regulatory regions, polymorphisms, model organism sequences, genetic phenomena, etc., can be systematically and reliably pinned. The role of the bioinformatics practitioner may also be expected to change, by degrees: one may become less like an archaeologist, discovering and poring over shards of evidence to piece together rudimentary translations, and more like a literary critic, attuned to theme and variation, elucidating ever more subtle nuances of meaning and interrelationship in a well-worn textus receptus. Indeed, the integrative task at hand has been characterized as ‘biosequence exegesis’ (Boguski, 1999). The notion of genome as literature may be seen as an extension of the linguistic metaphor that has dominated molecular biology from its inception, as is evident from the terminology used in the field. In fact, the span of modern molecular biology over the last half-century has some striking parallels to developments in the field of linguistics. In 1953, Michael Ventris and John Chadwick succeeded in deciphering Linear B, showing it to be a dialect of Mycenean Greek written in an early syllabic script that was an important milestone on the path to alphabetic languages (Robinson, 1995). This achievement, of course, occurred just at the time that Watson and Crick were establishing the same sort of lexical foundation for our present understanding of the genome. (Students of history, and of irony, may be interested to learn that Ventris and Chadwick were crucially aided in their endeavor by earlier groundbreaking efforts of a woman, Alice Kober, whose premature death denied her any possibility of achieving similar renown.) Just four years later, Crick went on to posit the central dogma of molecular biology, that genetic information flows by transcription of DNA to RNA, which is then translated to protein via what proved to be an essentially universal genetic code. In that same year, 1957, Noam Chomsky postulated a universal ‘deep structure’ underlying all human language, which he conceived as arising via generation from grammars and subsequently undergoing further surface transformation according to another set of rules (Chomsky, 1957). It is interesting to note that these stepwise processes were given names by biologists using linguistic terminology (‘transcription’, ‘translation’), while those in the linguistic realm were identified with rather organic notions (‘generation’, ‘transformation’). The correspondences do not stop with these lexical and syntactic parallels. For example, in 1955 an important distinction was made by the philosopher J. L. Austin between linguistic utterances that he termed constative, which convey information and may be said to be true or false, and those he called performative, which in themselves accomplish the action they denote (Austin, 1975). Thus, ‘He promised to be there’ is a constative statement, while ‘I promise to pay you’ is said to be a performative ‘speech act’. In that same year Crick proposed his adaptor hypothesis, which foresaw that the informational universe of DNA (and what came to be identified as mRNA) required some sort of mechanical connector to the structural universe of amino acids and protein (Judson, 1979). These adaptors proved to be tRNAs, which along with rRNA—and much later, even catalytic RNAs—demonstrated that nucleic acids not only convey information but may also possess a structure-based function in their own right. While such coincidences may be dismissed as mere curiosities in the history of science, there may well be deeper parallels that suggest actual utility. The molecular biologist’s hierarchy of ‘sequence’ to ‘structure’ to ‘function’, lately augmented with an additional step to ‘role’ so as to capture the importance of a macromolecule’s purpose in a larger physiological context and not just its mechanistic function in isolation, corresponds remarkably well to a similar progression long used by computational linguists, viz.: