SIEGE: Smoking Induced Epithelial Gene Expression Database

The SIEGE (Smoking Induced Epithelial Gene Expression) database is a clinical resource for compiling and analyzing gene expression data from epithelial cells of the human intra-thoracic airway. This database supports a translational research study whose goal is to profile the changes in airway gene expression that are induced by cigarette smoke. RNA is isolated from airway epithelium obtained at bronchoscopy from current-, former- and never-smoker subjects, and hybridized to Affymetrix HG-U133A Genechips, which measure the level of expression of ∼22 500 human transcripts. The microarray data generated along with relevant patient information is uploaded to SIEGE by study administrators using the database's web interface, found at http://pulm.bumc.bu.edu/siegeDB. PERL-coded scripts integrated with SIEGE perform various quality control functions including the processing, filtering and formatting of stored data. The R statistical package is used to import database expression values and execute a number of statistical analyses including t-tests, correlation coefficients and hierarchical clustering. Values from all statistical analyses can be queried through CGI-based tools and web forms found on the ‘Search’ section of the database website. Query results are embedded with graphical capabilities as well as with links to other databases containing valuable gene resources, including Entrez Gene, GO, Biocarta, GeneCards, dbSNP and the NCBI Map Viewer.

[1]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[2]  F. E. Grubbs Procedures for Detecting Outlying Observations in Samples , 1969 .

[3]  J. Samet,et al.  Molecular damage in the bronchial epithelium of current and former smokers. , 1997, Journal of the National Cancer Institute.

[4]  C Hermans,et al.  Lung epithelium-specific proteins: characteristics and potential applications as markers. , 1999, American journal of respiratory and critical care medicine.

[5]  May D. Wang,et al.  GoMiner: a resource for biological interpretation of genomic and proteomic data , 2003, Genome Biology.

[6]  Robert N Proctor,et al.  The global smoking epidemic: a history and status report. , 2004, Clinical lung cancer.

[7]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  S. Bozinovski,et al.  Acquired somatic mutations in the molecular pathogenesis of COPD. , 2003, Trends in pharmacological sciences.

[9]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[10]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[11]  Gang Liu,et al.  Effects of cigarette smoke on the human airway epithelial cell transcriptome. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Avrum Spira,et al.  Gene expression in lung adenocarcinomas of smokers and nonsmokers. , 2003, American journal of respiratory cell and molecular biology.

[13]  P. Shields,et al.  Molecular epidemiology of lung cancer. , 1999, Annals of oncology : official journal of the European Society for Medical Oncology.

[14]  P J Slootweg,et al.  p53 overexpression in oral mucosa in relation to smoking , 1999, The Journal of pathology.

[15]  Tsviya Olender,et al.  Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE , 2003, Nucleic Acids Res..