Statistical Analysis of Gene Expression Micro Arrays

Advancements in genetic research have led to increased amounts of data often without efficient analysis techniques. One such area of genetics that has developed a great deal in the past several years has been micro arrays. One area of micro array experimentation is that of gene expression. Gene expression uses arrays of thousands of genes with one to two targeted strands of DNA that are fluorescently tagged and used to identify which genes are expressed. This experiment can identify which conditions cause certain genes to be activated in different cells. It can also be used to track certain cellular changes through the expression of genes. The data gathered from the micro array experimentation is contained in a large matrix containing the data results from gene expression analysis on a micro array plate. Some of the standard analysis techniques and programs can be inefficient and inconclusive. The purpose of this analysis is to arrive at conclusive results with efficient methods. The primary statistical software used in the analysis was SAS/Stat system version 8.2. A series of statistical tests was applied to the data set to determine the meaningfulness of the results and the efficiency of the tests. INTRODUCTION Every living organism is made up of a single cell or many groups of cells. The identity of these different cells and the function of these cells are determined by genes. Genes are segments of DNA providing the code for producing proteins. Different organisms contain different numbers of genes. For example, a human being contains 30,000 genes as estimated and the fruit fly contains only about 13,000 genes. The identification of genes depends upon the knowledge of the DNA sequence made up of different alleles (different forms of a gene). The human genome has recently been fully mapped. However, not every gene has been fully expressed in the DNA alleles. Certain organisms have been fully mapped; this refers mainly to smaller organisms with a fewer number of different genes. The advantages of having the knowledge of the full sequenced genes will be discussed later with micro arrays. Understanding genes is important; unfortunately, genes can often be hidden within compacted DNA strands and are not easily identifiable. Genes also interact in different ways; some have similar properties and responsibilities while other genes may be totally different. Genes that are different, however, may be involved in the same reactions inside a cell and, equivalently, genes that have similar properties may not be involved in the same reactions. Micro arrays offer an efficient method of comparing multiple genes quickly and easily. Micro arrays, however, require several analyses upon completion of an experiment. The analysis of micro arrays offers substantial evidence of genes that may or may not be related in a cell. BACKGROUND Gene expression is important in cellular identification and gene function. With new technologies and research, gene expression and identification have become an ever growing area in biotechnologies with the opportunity for new, more efficient analyses available. The field of cellular genetics has shown that changing pH and temperature causes certain genes to be expressed and not expressed. It is possible to alter these settings in a lab and the expressed genes can be identified. Most genes are known by the proteins they produce and the function of these proteins. It is possible to analyze large groups of proteins as well as genes. This process will be discussed later. Analyzing different genes expressed can determine when certain reactions take place in the body, or determine what processes some of the genes are responsible for. Understanding which genes are expressed is important to different gene interactions and cellular identities. According to Campbell, ìIn all organisms, the expression of specific genesis are most commonly regulated at the level of transcription by DNA-binding proteins that also interact with other proteins and often with external signals. For that reason, the term gene expression is often equated with gene activitythat is, transcriptionfor both prokaryotes (cells lacking membrane enclosed nucleus and membrane enclosed organelles) and eukaryotes (cell with membrane enclosed nucleus and membrane enclosed organelles). However, the greater complexity of eukaryotic cell structure and function provides opportunities for controlling gene expression at additional stagesî (Campbell 2002, p. 362). Figure 1Cellular Structure (Brazma 2003, p. 1) Gene expression can be broken down into several different stages and identified at any time during one of the specific stages. These range from gene to functional protein activity. The first stage in expression is the unpacking of chromatin (complex of DNA and proteins that makes up a chromosome) DNA (Figure 1). In chromatin form, DNA is closely packed and regulatory proteins used in the transcription process often are unable to access certain portions of the DNA, and consequently certain genes are not expressed. The differences in chromatin packing vary in each different cell type and thus, some cells inhibit or allow the expression of certain genes. The chromatin packing is known as regulatory function for gene expression. Even though entire DNA strands exist in chromatin form at times, they can be unpacked to allow access to certain strands for protein production and replication. Methylation occurs, meaning the process when the DNA is unpacked and methyl groups (CH3) groups are placed on the ends of the DNA strand for the gene being used, and identifies the start and finish of the gene to replication compounds used for protein construction. Transcription occurs and the DNA is copied into RNA to change certain alleles and is then copied to mRNA for protein production outside the nucleolus. The mRNA is moved to the cytoplasm where protein translation is accomplished. Polypeptide (polymer chain in which amino acids are linked together with the peptide bonds) groups are created from the mRNA code and, after cleaving and modification, are known as proteins. From this stage the proteins are sent to certain areas of the cell for purposes identified by the type of protein. The mRNA strand breaks down after replicating. Usually several proteins in the cytoplasm and the allele groups are absorbed and recycled for later protein production. This whole process is known as the gene expression, and again any stage of this process can be used to identify the gene expression (Campbell 2002, p. 364). Cellular identification is determined by the genes expressed and the proteins produced. Different cells have different roles in the body and in turn produce a number of different proteins. By learning which conditions activate certain genes and deactivate other genes, more can be understood about certain cellular identities. Furthermore, much can be understood about cellular life and death and whether death occurs as a consequence of time or as a malfunction (Campbell 2002). Cells generally have the ability to regenerate and reproduce themselves. These processes are marked with the expression of certain genes. The understanding of when and how these processes take place, and more importantly, why some cells do not go through this process can be better understood through gene expression analysis. The main objective in micro array analysis is to identify gene expression and the comparison of several genes at once. The process of identifying and comparing the genes expressed in a cell or culture is a complex process resulting in large amounts of data. The process can be simplified into six main steps (See Figure 2): • Selecting the cell culture for analysis • Identifying the specific DNA gene sequence • Radioactively tagging the DNA sequences • Hybridization of the array (Hybridization is the process in which the fluorescently tagged cDNA is applied to the array) • Laser intensity readings from the plate • Interpreting the results of the hybridized array Of course, only fully sequenced genes can be used in this experimentation since known DNA sequences are hybridized to an array of thousands of different genes at one time. The idea behind creating this array is to identify genes at certain points in an expression and isolate conditions for the certain genes to be expressed. There are several different types of micro array experimentations. The first type is gene expressions using DNA. This method uses the DNA sequences of specific genes that are applied to an arrayed, cultured plate with the goal of identifying genes in the cells and comparing their interactions. Initially, the mRNA is used in this experiment. The Figure 3Microarray plate(Leming 2003, p. 1) mRNA, messenger RNA (synthesized DNA), is used in the production of proteins. The mRNA is copied directly from DNA in the cell. When the mRNA is copied, different gene alleles are used that differ from normal DNA. The mRNA sequences are known to be unstable and to deteriorate after a short amount of time, making them useless for hybridization reactions. For this reason cDNA (DNA copied from mRNA with enzyme reverse transcriptase) is copied from the mRNA for its ease of use and compatibility (Fortina 2000). The cDNA, complementary DNA, is copied from the mRNA and uses the original alleles as DNA. The cDNA is then tagged with fluorescent markers and applied to an array of cellular cultures. Another type of micro array experimentation uses DNA sequences. A DNA sequence, again fluorescently tagged, can be used so that the same sequence can be identified in different cells across an array. This identification can be used to understand evolutionary traits across species. The third process uses proteins. Proteins are tagged fluorescently and applied to a slide consisting of an environment with protein-carrying structures. This method can be used to determine the relationships of many proteins at once and to show how they