A Modified Markov Clustering Approach for Protein Sequence Clustering

In this paper we propose a modified Markov clustering algorithm for efficient clustering of large protein sequence databases, based on previously evaluated sequence similarity criteria. The proposed alteration consists in an exponentially decreasing inflation rate, which aims at helping the quick creation of the hard structure of clusters by using a strong inflation in the beginning, and at producing fine partitions with a weaker inflation thereafter. The algorithm, which was tested and validated using the whole SCOP95 database, or randomly selected 10-50% sections, generally converges within 12-14 iteration cycles and provides clusters of high quality. Furthermore, a novel generalized formula is given for the inflation operation, and an efficient matrix symmetrization technique is presented, in order to improve the partition quality with relatively low amount of extra computations. A large graph layout technique is also employed for the efficient visualization of the obtained clusters.

[1]  László Szilágyi,et al.  GeCiM: A Novel Generalized Approach to C-Means Clustering , 2008, CIARP.

[2]  Sándor M. Szilágyi,et al.  Sensibility analysis of the Arruda localization method , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[3]  Z. Benyo,et al.  3D Heart Simulation And Recognition Of Various Events , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[4]  László Szilágyi,et al.  Recognition of various events from 3-D heart model , 2005 .

[5]  Gergely Fordos,et al.  A novel ECG telemetry and monitoring system based on Z-Wave communication , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[6]  Anton J. Enright,et al.  BioLayout-an automatic graph layout algorithm for similarity visualization , 2001, Bioinform..

[7]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[8]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[9]  S.M. Szilagyi,et al.  Adaptive wavelet-transform-based ECG waveforms detection , 2003, Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No.03CH37439).

[10]  László Szilágyi,et al.  MEDICAL IMAGE SEGMENTATION FOR VIRTUAL ENDOSCOPY , 2005 .

[11]  László Szilágyi,et al.  Spatial Visualization of the Heart in Case of Ectopic Beats and Fibrillation , 2007, PSIVT.

[12]  M. Gerstein,et al.  The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. , 1999, Journal of molecular biology.

[13]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[14]  László Szilágyi,et al.  A Generalized Approach to the Suppressed Fuzzy c-Means Algorithm , 2010, MDAI.

[15]  Sándor M. Szilágyi,et al.  Intensity inhomogeneity correction and segmentation of magnetic resonance images using a multi-stage fuzzy clustering approach , 2009 .

[16]  Edward M Marcotte,et al.  LGL: creating a map of protein function with an algorithm for visualizing very large biological networks. , 2004, Journal of molecular biology.

[17]  László Szilágyi,et al.  Volumetric Analysis of the Heart Using Echocardiography , 2007, FIMH.

[18]  László Szilágyi,et al.  Echocardiographic Image Sequence Compression Based on Spatial Active Appearance Model , 2007, CIARP.

[19]  László Szilágyi,et al.  Adaptive ECG Compression Using Support Vector Machine , 2007, CIARP.

[20]  C. Ouzounis,et al.  Recent developments and future directions in computational genomics , 2000, FEBS letters.

[21]  G. Passariello,et al.  Comparison Between Neural-Network-Based Adaptive Filtering and Wavelet Transform for ECG Characteristic Points Detection , 1997, Proceedings of the 19th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 'Magnificent Milestones and Emerging Opportunities in Medical Engineering' (Cat. No.97CH36136).

[22]  László Szilágyi,et al.  Analytical and Numerical Evaluation of the Suppressed Fuzzy C-Means Algorithm , 2008, MDAI.

[23]  Sándor M. Szilágyi,et al.  MEDICAL IMAGE SEGMENTATION TECHNIQUES FOR VIRTUAL ENDOSCOPY , 2006 .

[24]  Sándor M. Szilágyi,et al.  Modifications in Arruda’s localization method in left ventricle analysis , 2007 .

[25]  László Szilágyi,et al.  A generalized c-means clustering model using optimized via evolutionary computation , 2009, 2009 IEEE International Conference on Fuzzy Systems.

[26]  László Szilágyi,et al.  Support Vector Machine-Based ECG Compression , 2007, Analysis and Design of Intelligent Systems using Soft Computing Techniques.

[27]  Zoltán Benyó,et al.  Heart model based ECG signal processing , 2003 .

[28]  S.M. Szilagyi,et al.  ECG signal compression using adaptive prediction , 1997, Proceedings of the 19th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 'Magnificent Milestones and Emerging Opportunities in Medical Engineering' (Cat. No.97CH36136).

[29]  Z. Benyo,et al.  Brain image segmentation for virtual endoscopy , 2004, The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[30]  László Szilágyi,et al.  Automated medical image processing methods for virtual endoscopy , 2007 .

[31]  S.M. Szilagyi The limits of heart-model-based computerized ECG diagnosis , 2000, Proceedings of the 22nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (Cat. No.00CH37143).

[32]  S.M. Szilagyi,et al.  On-line QRS complex detection using wavelet filtering , 2001, 2001 Conference Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[33]  László Szilágyi,et al.  Spatial Heart Simulation and Analysis Using Unified Neural Network , 2007, Analysis and Design of Intelligent Systems using Soft Computing Techniques.

[34]  S.M. Szilagyi,et al.  Comparison of malfunction diagnosis sensibility for direct and inverse ECG signal processing methods , 2002, Proceedings of the Second Joint 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society] [Engineering in Medicine and Biology.

[35]  S.M. Szilagyi,et al.  Efficient ECG signal compression using adaptive heart model , 2001, 2001 Conference Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[36]  Zoltán Benyó,et al.  Modification of Arruda's Accessory Pathway Localization Method to Improve the Performance of WPW Syndrome Interventions , 2008 .

[37]  László Szilágyi,et al.  A unified approach to c-means clustering models , 2009, 2009 IEEE International Conference on Fuzzy Systems.

[38]  Z. Benyo,et al.  Quick ECG Analysis for On-Line Holter Monitoring Systems , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[39]  László Szilágyi,et al.  Unified Neural Network Based Pathologic Event Reconstruction Using Spatial Heart Model , 2007, CIARP.

[40]  Zoltan Benyo,et al.  Inhomogeneity compensation for MR brain image segmentation using a multi-stage FCM-based approach , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[41]  S.M. Szilagyi Event recognition, separation and classification from ECG recordings , 1998, Proceedings of the 20th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Vol.20 Biomedical Engineering Towards the Year 2000 and Beyond (Cat. No.98CH36286).

[42]  R. Doolittle The multiplicity of domains in proteins. , 1995, Annual review of biochemistry.

[43]  László Szilágyi,et al.  A Thorough Analysis of the Suppressed Fuzzy C-Means Algorithm , 2008, CIARP.

[44]  L Holm,et al.  Towards a covering set of protein family profiles. , 2000, Progress in biophysics and molecular biology.

[45]  S.M. Szilagyi,et al.  Biomedical engineering education in Hungary , 1998, Proceedings of the 20th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Vol.20 Biomedical Engineering Towards the Year 2000 and Beyond (Cat. No.98CH36286).

[46]  László Szilágyi,et al.  Fast and Robust Fuzzy C-Means Algorithms for Automated Brain MR Image Segmentation , 2008 .

[47]  László Szilágyi,et al.  Analytical and numerical evaluation of the suppressed fuzzy c-means algorithm: a study on the competition in c-means clustering models , 2010, Soft Comput..

[48]  Sándor M. Szilágyi,et al.  A Weighted Patient Specific Electromechanical Model of the Heart , 2009 .

[49]  W. Fitch,et al.  Aspects of molecular evolution. , 1973, Annual review of genetics.

[50]  László Szilágyi,et al.  A modified Markov clustering approach to unsupervised classification of protein sequences , 2010, Neurocomputing.

[51]  Sándor M. Szilágyi,et al.  Improved Intensity Inhomogeneity Correction Techniques in MR Brain Image Segmentation , 2008 .

[52]  S.M. Szilagyi Non-linear adaptive prediction based ECG signal filtering , 1999, Proceedings of the First Joint BMES/EMBS Conference. 1999 IEEE Engineering in Medicine and Biology 21st Annual Conference and the 1999 Annual Fall Meeting of the Biomedical Engineering Society (Cat. N.

[53]  László Szilágyi,et al.  Inverse 3-D Heart Model for ECG Signal Simulation and Analysis , 2007 .

[54]  László Szilágyi,et al.  Új lehetőségek az orvostudományban az EKG jelek feldolgozása terén , 1996 .

[55]  Sándor M. Szilágyi,et al.  Iterative ECG signal filtering for better malfunction recognition and diagnosis , 2003 .

[56]  Dayhoff Mo,et al.  The origin and evolution of protein superfamilies. , 1976 .

[57]  Michael Kuperberg,et al.  Markov Models , 2019, Earthquake Statistical Analysis through Multi-state Modeling.

[58]  László Szilágyi,et al.  A Modified Fuzzy C-Means Algorithm for MR Brain Image Segmentation , 2007, ICIAR.

[59]  László Szilágyi,et al.  Volumetric Analysis and Modeling of the Heart Using Active Appearance Model , 2008 .

[60]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[61]  Z. Benyo,et al.  Quick QRS Complex Detection for On-Line ECG and Holter Systems , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[62]  László Szilágyi,et al.  A Novel Clustering Method for Quick Partial Volume Estimation in MR Brain Images , 2008 .

[63]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[64]  László Szilágyi,et al.  Quick ECG Segmentation, Artifact Detection and Risk Estimation Methods for On-Line Holter Monitoring Systems , 2007 .

[65]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[66]  László Szilágyi,et al.  Application of hybrid c-means clustering models in inhomogeneity compensation and MR brain image segmentation , 2009, 2009 5th International Symposium on Applied Computational Intelligence and Informatics.

[67]  László Szilágyi,et al.  Spatial Heart Simulation and Adaptive Wave Propagation , 2008 .

[68]  László Szilágyi,et al.  RISK ESTIMATION TECHNIQUES IN CASE OF WPW SYNDROME , 2005 .

[69]  S.M. Szilagyi,et al.  MR brain image segmentation using an enhanced fuzzy C-means algorithm , 2003, Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No.03CH37439).

[70]  S.M. Szilagyi,et al.  Wavelet transform and neural-network-based adaptive filtering for QRS detection , 2000, Proceedings of the 22nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (Cat. No.00CH37143).

[71]  László Szilágyi,et al.  Az EKG jel tömörítése genetikai algoritmus alkalmazásával , 1997 .

[72]  László Szilágyi,et al.  Efficient Feature Extraction for Fast Segmentation of MR Brain Images , 2007, SCIA.

[73]  S.M. Szilagyi Comparison of the neural-network-based adaptive filtering and wavelet transform for R, T and P waves detection , 1997, Information Technology Applications in Biomedicine. ITAB '97. Proceedings of the IEEE Engineering in Medicine and Biology Society Region 8 International Conference.

[74]  Sándor M. Szilágyi 3D HEART SIMULATION AND ANALYSIS , 2006 .

[75]  S.M. Szilagyi,et al.  A new method for epileptic waveform recognition using wavelet decomposition and artificial neural networks , 2002, Proceedings of the Second Joint 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society] [Engineering in Medicine and Biology.