Exploring the wild birds’ migration data for the disease spread study of H5N1: a clustering and association approach

Knowledge about the wetland use of migratory bird species during the annual life circle is very interesting to biologists, as it is critically important in many decision-making processes such as for conservation site construction and avian influenza control. The raw data of the habitat areas and the migration routes are usually in large scale and with high complexity when they are determined by high-tech GPS satellite telemetry. In this paper, we convert these biological problems into computational studies and introduce efficient algorithms for the data analysis. Our key idea is the concept of hierarchical clustering for migration habitat localizations, and the notion of association rules for the discovery of migration routes from the scattered location points in the GIS. One of our clustering results is a tree structure, specially called spatial-tree, which is an illusive map depicting the breeding and wintering home range of bar-headed geese. A related result to this observation is an association pattern that reveals a high possibility that bar-headed geese’s potential autumn migration routes are likely between the breeding sites in the Qinghai Lake, China and the wintering sites in Tibet river valley. Given the susceptibility of geese to spread H5N1, and on the basis of the chronology and the rates of the bar-headed geese migration movements, we can conjecture that bar-headed geese play an important role in the spread of the H5N1 virus at a regional scale in Qinghai-Tibetan Plateau.

[1]  Keke Chen,et al.  “Best K”: critical clustering structures in categorical datasets , 2008, Knowledge and Information Systems.

[2]  Raphaël Mathevet,et al.  Creation of a nature reserve, its effects on hunting management and waterfowl distribution in the Camargue (southern France) , 2002, Biodiversity & Conservation.

[3]  P. Berthold,et al.  Recent Advances in Studies of Bird Migration , 1991 .

[4]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[5]  B. Pattnaik,et al.  Analysis of the PB2 gene reveals that Indian H5N1 influenza virus belongs to a mixed-migratory bird sub-lineage possessing the amino acid lysine at position 627 of the PB2 protein , 2007, Archives of Virology.

[6]  Jian-Long Chang Clustering Evolving Data Streams over Sliding Windows , 2007 .

[7]  Aoying Zhou,et al.  Tracking clusters in evolving data streams over sliding windows , 2008, Knowledge and Information Systems.

[8]  Baoping Yan,et al.  Characterization of H5N1 Influenza Viruses Isolated from Migratory Birds in Qinghai Province of China in 2006 , 2007, Avian diseases.

[9]  Ying Liu,et al.  Birds Bring Flues? Mining Frequent and High Weighted Cliques from Birds Migration Networks , 2010, DASFAA.

[10]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[11]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[12]  Vipin Kumar,et al.  Finding Topics in Collections of Documents: A Shared Nearest Neighbor Approach , 2003, Clustering and Information Retrieval.

[13]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[14]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[15]  Jian Pei,et al.  Mining Condensed Frequent-Pattern Bases , 2003, Knowledge and Information Systems.

[16]  Krishnamurthy Viswanathan,et al.  Improving clustering stability with combinatorial MRFs , 2009, KDD.

[17]  Baoping Yan,et al.  Seasonal movements and migration of Pallas's Gulls Larus ichthyaetus from Qinghai Lake, China , 2008 .

[18]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[19]  Mohammed J. Zaki Efficient enumeration of frequent sequences , 1998, CIKM '98.

[20]  John Y. Takekawa,et al.  Migration of Whooper Swans and Outbreaks of Highly Pathogenic Avian Influenza H5N1 Virus in Eastern Asia , 2009, PloS one.

[21]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[22]  Y. Guan,et al.  Avian flu: H5N1 virus outbreak in migratory waterfowl , 2005, Nature.

[23]  G. Gao,et al.  Highly Pathogenic H5N1 Influenza Virus Infection in Migratory Birds , 2005, Science.

[24]  Y. Guan,et al.  H5N1 Outbreaks and Enzootic Influenza , 2006, Emerging infectious diseases.

[25]  Jian Pei,et al.  PADS: a simple yet effective pattern-aware dynamic search method for fast maximal frequent pattern mining , 2009, Knowledge and Information Systems.

[26]  Baoping Yan,et al.  The Survey of H5N1 Flu Virus in Wild Birds in 14 Provinces of China from 2004 to 2007 , 2009, PloS one.

[27]  D. Stallknecht,et al.  Experimental Infection of Swans and Geese with Highly Pathogenic Avian Influenza Virus (H5N1) of Asian Lineage , 2008, Emerging infectious diseases.

[28]  Ming Liao,et al.  Characterization of a highly pathogenic H5N1 influenza virus derived from bar-headed geese in China. , 2006, The Journal of general virology.

[29]  Hans-Peter Kriegel,et al.  Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[30]  Jian Wang,et al.  H5N1 avian influenza re-emergence of Lake Qinghai: phylogenetic and antigenic analyses of the newly isolated viruses and roles of migratory birds in virus circulation , 2008, The Journal of general virology.

[31]  Jinyan Li,et al.  Discovery of Migration Habitats and Routes of Wild Bird Species by Clustering and Association Analysis , 2009, ADMA.

[32]  Vasileios Kandylas,et al.  Finding cohesive clusters for analyzing knowledge communities , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[33]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[34]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[35]  Jiawei Han,et al.  Discovery of Spatial Association Rules in Geographic Information Databases , 1995, SSD.

[36]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[37]  Masayuki Tamura,et al.  Migration routes and important stopover sites of endangered oriental white storks (Ciconia boyciana) as revealed by satellite tracking , 2004 .

[38]  Geoffrey H. Ball,et al.  ISODATA, A NOVEL METHOD OF DATA ANALYSIS AND PATTERN CLASSIFICATION , 1965 .

[39]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[40]  Y. Guan,et al.  Are Ducks Contributing to the Endemicity of Highly Pathogenic H5N1 Influenza Virus in Asia? , 2005, Journal of Virology.

[41]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[42]  B. Worton Kernel methods for estimating the utilization distribution in home-range studies , 1989 .

[43]  Hans-Peter Kriegel,et al.  Multi-step density-based clustering , 2005, Knowledge and Information Systems.

[44]  Jinyan Li,et al.  Mining border descriptions of emerging patterns from dataset pairs , 2005, Knowledge and Information Systems.