USNAP: fast unique dense region detection and its application to lung cancer

Abstract Motivation Many real-world problems can be modeled as annotated graphs. Scalable graph algorithms that extract actionable information from such data are in demand since these graphs are large, varying in topology, and have diverse node/edge annotations. When these graphs change over time they create dynamic graphs, and open the possibility to find patterns across different time points. In this article, we introduce a scalable algorithm that finds unique dense regions across time points in dynamic graphs. Such algorithms have applications in many different areas, including the biological, financial, and social domains. Results There are three important contributions to this manuscript. First, we designed a scalable algorithm, USNAP, to effectively identify dense subgraphs that are unique to a time stamp given a dynamic graph. Importantly, USNAP provides a lower bound of the density measure in each step of the greedy algorithm. Second, insights and understanding obtained from validating USNAP on real data show its effectiveness. While USNAP is domain independent, we applied it to four non-small cell lung cancer gene expression datasets. Stages in non-small cell lung cancer were modeled as dynamic graphs, and input to USNAP. Pathway enrichment analyses and comprehensive interpretations from literature show that USNAP identified biologically relevant mechanisms for different stages of cancer progression. Third, USNAP is scalable, and has a time complexity of O(m+mc log nc+nc log nc), where m is the number of edges, and n is the number of vertices in the dynamic graph; mc is the number of edges, and nc is the number of vertices in the collapsed graph. Availability and implementation The code of USNAP is available at https://www.cs.utoronto.ca/~juris/data/USNAP22.

[1]  S. Wooding,et al.  Bitter taste receptors , 2021, Evolution, medicine, and public health.

[2]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[3]  A. Madabhushi,et al.  A prognostic model for overall survival of patients with early-stage non-small cell lung cancer: a multicentre, retrospective study , 2020, The Lancet. Digital health.

[4]  Deke Guo,et al.  Seasonal-Periodic Subgraph Mining in Temporal Networks , 2020, CIKM.

[5]  A. Schwab,et al.  Ion Channels in Lung Cancer. , 2020, Reviews of physiology, biochemistry and pharmacology.

[6]  V. Giorgio,et al.  The role of mitochondrial ATP synthase in cancer , 2020, Biological chemistry.

[7]  Shuai Ma,et al.  An Efficient Approach to Finding Dense Temporal Subgraphs , 2020, IEEE Transactions on Knowledge and Data Engineering.

[8]  Igor Jurisica,et al.  pathDIP 4: an extended pathway annotations and enrichment analysis resource for human, model organisms and domesticated species , 2019, Nucleic Acids Res..

[9]  L. Liang,et al.  The nasal methylome as a biomarker of asthma and airway inflammation in children , 2019, Nature Communications.

[10]  J. Hong,et al.  Loss of parkin reduces lung tumor development by blocking p21 degradation , 2019, PloS one.

[11]  P. Devillier,et al.  Bitter Taste Receptors (TAS2Rs) in Human Lung Macrophages: Receptor Expression and Inhibitory Effects of TAS2R Agonists , 2019, Front. Physiol..

[12]  Lu Qin,et al.  Mining Periodic Cliques in Temporal Networks , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[13]  P. Dasgupta,et al.  Acetylcholine signaling system in progression of lung cancers , 2019, Pharmacology & therapeutics.

[14]  B. Sawaya,et al.  HIV-1 Nef promotes cell proliferation and microRNA dysregulation in lung cells , 2019, Cell cycle.

[15]  Ciro Cattuto,et al.  Mining (maximal) Span-cores from Temporal Networks , 2018, CIKM.

[16]  Christos Faloutsos,et al.  SDREGION: Fast Spotting of Changing Communities in Biological Networks , 2018, KDD.

[17]  Sudipto Guha,et al.  SpotLight: Detecting Anomalies in Streaming Graphs , 2018, KDD.

[18]  Petko Bogdanov,et al.  Local Community Detection in Dynamic Networks , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[19]  G. Walker,et al.  Mechanisms of DNA damage, repair, and mutagenesis , 2017, Environmental and molecular mutagenesis.

[20]  Evaggelia Pitoura,et al.  Finding lasting dense subgraphs , 2016, Data Mining and Knowledge Discovery.

[21]  Christos Faloutsos,et al.  M-Zoom: Fast Dense-Block Detection in Tensors with Quality Guarantees , 2016, ECML/PKDD.

[22]  S. Jalal,et al.  DNA repair in lung cancer: potential not yet reached. , 2016, Lung cancer management.

[23]  E. Condom,et al.  Involvement of potassium channels in the progression of cancer to a more malignant phenotype. , 2015, Biochimica et biophysica acta.

[24]  Danai Koutra,et al.  TimeCrunch: Interpretable Dynamic Graph Summarization , 2015, KDD.

[25]  Silvio Lattanzi,et al.  Efficient Densest Subgraph Computation in Evolving Graphs , 2015, WWW.

[26]  Muhammad Abulaish,et al.  HOCTracker: Tracking the Evolution of Hierarchical and Overlapping Communities in Dynamic Social Networks , 2015, IEEE Transactions on Knowledge and Data Engineering.

[27]  Haibo Zhang,et al.  Expression of gamma-aminobutyric acid receptors on neoplastic growth and prediction of prognosis in non-small cell lung cancer , 2013, Journal of Translational Medicine.

[28]  H. Shin,et al.  Genome-wide methylation profiling of the bronchial mucosa of asthmatics: relationship to atopy , 2013, BMC Medical Genetics.

[29]  Ron Shamir,et al.  Dissection of Regulatory Networks that Are Altered in Disease via Differential Co-expression , 2013, PLoS Comput. Biol..

[30]  H. Morgenstern,et al.  Asthma and lung cancer risk: a systematic investigation by the International Lung Cancer Consortium. , 2012, Carcinogenesis.

[31]  Hongyu Zhao,et al.  COSINE: COndition-SpecIfic sub-NEtwork identification using a global optimization method , 2011, Bioinform..

[32]  Rainer Breitling,et al.  DiffCoEx: a simple and sensitive method to find differentially coexpressed gene modules , 2010, BMC Bioinformatics.

[33]  H. Shin,et al.  A new association between polymorphisms of the SLC6A7 gene in the chromosome 5q31–32 region and asthma , 2010, Journal of Human Genetics.

[34]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[35]  A. Barabasi,et al.  Quantifying social group evolution , 2007, Nature.

[36]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[37]  C. Kappen Hox genes in the lung. , 1996, American journal of respiratory cell and molecular biology.

[38]  Yao Zhang,et al.  Condensing Temporal Networks using Propagation , 2017, SDM.

[39]  M. Idzko,et al.  Serotoninergic receptors on human airway epithelial cells. , 2007, American journal of respiratory cell and molecular biology.

[40]  J. Bailey,et al.  Discovering correlated spatio-temporal changes in evolving graphs , 2008, Knowledge and Information Systems.