Invited: Using novel small RNAs to understand genome structure and function

Centromeres are the site of kinetochore assembly and spindle attachment on a chromosome during meiosis and mitosis. Thus, the proper functioning of centromeres is a prerequisite for faithful segregation of chromosomes, the failure of which is fundamental in a broad spectrum of human diseases. The functional characterization of centromeres on abnormal chromosomes (e.g. mini- and marker chromosomes, B chromosomes and neocentromeres) in a variety of eukaryotes has been highly informative about the epigenetic features of eukaryotic centromeres. However, at the genomic level these special cases seem to differ from the normal, complex centromeres that define eukaryotic species. With the exception of these special cases and the point centromeres of the budding yeast, S. cerevisiae, the centromeres of most multicellular eukarya consist of highly repetitive DNA composed of large arrays of simple satellites and transposable elements, untenable to traditional sequencing technologies and assembly algorithms. As a consequence, complex eukaryotic centromeres have, to date, been virtually intractable to fine scale mapping and sequencing. What has been shown is a lack of conservation of centromeric sequences, even among closely related species, suggesting that the genomic component of eukaryotic centromeres is relatively rapidly evolving and under epigenetic control. Mounting evidence supports the hypothesis that a small RNA component is a crucial part of this epigenetic cascade. Previous work in plants and marsupials has identified centromere specific RNA sequences that are processed into small RNAs. These RNAs are larger than previously described small RNA classes, distinct from the miRNA/siRNA and piRNA size ranges. Given the paucity of genomic information for centromeres from any animal species, we have developed a toolkit that includes bioinformatic and functional assays to define the role these small RNAs play in centromere structure and function. Using massively parallel sequencing and both genome and repeat-specific analyses, we have evaluated small RNAs from six different primates to further define the function and conservation of this larger size fraction of small RNAs. Among the progenitor sequences, we have identified novel genomic repeats and centromeric elements within the human genome. Expression profiling indicates these small RNAs exhibit variable expression profiles among stem cells and differentiated cells, providing clues to their cellular function and role in the epigenetic control of gene transcription.