Live and learn from mistakes: A lightweight system for document classification

We present a Life-Long Learning from Mistakes (3LM) algorithm for document classification, which could be used in various scenarios such as spam filtering, blog classification, and web resource categorization. We extend the ideas of online clustering and batch-mode centroid-based classification to online learning with negative feedback. The 3LM is a competitive learning algorithm, which avoids over-smoothing, characteristic of the centroid-based classifiers, by using a different class representative, which we call clusterhead. The clusterheads competing for vector-space dominance are drawn toward misclassified documents, eventually bringing the model to a ''balanced state'' for a fixed distribution of documents. Subsequently, the clusterheads oscillate between the misclassified documents, heuristically minimizing the rate of misclassifications, an NP-complete problem. Further, the 3LM algorithm prevents over-fitting by ''leashing'' the clusterheads to their respective centroids. A clusterhead provably converges if its class can be separated by a hyper-plane from all other classes. Lifelong learning with fixed learning rate allows 3LM to adapt to possibly changing distribution of the data and continually learn and unlearn document classes. We report on our experiments, which demonstrate high accuracy of document classification on Reuters21578, OHSUMED, and TREC07p-spam datasets. The 3LM algorithm did not show over-fitting, while consistently outperforming centroid-based, Naive Bayes, C4.5, AdaBoost, kNN, and SVM whose accuracy had been reported on the same three corpora.

[1]  Richard Nock,et al.  Mixed Bregman Clustering with Approximation Guarantees , 2008, ECML/PKDD.

[2]  Deyu Qi,et al.  Web prediction using online support vector machine , 2005, 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05).

[3]  Kjell Lemström,et al.  COMPRESSING QUANTIZED TONAL CENTROID VECTORS FOR COVER SONG IDENTIFICATION , 2011 .

[4]  Heikki Mannila,et al.  Evaluation of HapMap data in six populations of European descent , 2008, European Journal of Human Genetics.

[5]  V. Mäkinen,et al.  Detection of Viruses in Sweetpotato from Honduras and Guatemala Augmented by Deep-Sequencing of Small-RNAs. , 2012, Plant disease.

[6]  Patrik O. Hoyer,et al.  Supplementary Material for “ Statistical test for consistent estimation of causal effects in linear non-Gaussian models ” , 2012 .

[7]  Esko Ukkonen,et al.  Finding Significant Matches of Position Weight Matrices in Linear Time , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  Dimitrios Gunopulos,et al.  Embedding-based subsequence matching in time-series databases , 2011, TODS.

[9]  Michael Gutmann Unsupervised learning by discriminating data from artificial noise , 2009 .

[10]  Erkki Oja,et al.  Macadamia: Master's Programme in Machine Learning and Data Mining , 2008 .

[11]  Stephan Bloehdorn,et al.  Boosting for Text Classification with Semantic Features , 2004, WebKDD.

[12]  Heikki Mannila,et al.  The diffusion of language change in real time: Progressive and conservative individuals and the time depth of change , 2011, Language Variation and Change.

[13]  Terttu Nevalainen,et al.  CEECing the baseline: lexical stability and significant change in a historical corpus , 2012 .

[14]  Mikko Arvas,et al.  Detecting novel genes with sparse arrays. , 2010, Gene.

[15]  Aapo Hyvärinen,et al.  Characterization of Spontaneous Neuromagnetic Brain Rhythms Using Independent Component Analysis of Short-Time Fourier Transforms , 2010 .

[16]  Jukka M. Toivanen,et al.  Brains on Art: It's not just in your head , 2013 .

[17]  Szymon Grabowski,et al.  Approximate pattern matching with k-mismatches in packed text , 2013, Inf. Process. Lett..

[18]  Jaakko Hollmén,et al.  Pathways affected by asbestos exposure in normal and tumour tissue of lung cancer patients , 2008, BMC Medical Genomics.

[19]  Andreas Björklund,et al.  Covering and packing in linear space , 2010, Inf. Process. Lett..

[20]  Jakub Piskorski,et al.  Real-time text mining in multilingual news for the creation of a pre-frontier intelligence picture , 2010, ISI-KDD '10.

[21]  John Shawe-Taylor,et al.  Can eyes reveal interest? Implicit queries from gaze patterns , 2009, User Modeling and User-Adapted Interaction.

[22]  Heikki Mannila,et al.  Determining Attributes to Maximize Visibility of Objects , 2009, IEEE Transactions on Knowledge and Data Engineering.

[23]  Veli Mäkinen,et al.  Storage and Retrieval of Individual Genomes and other Repetitive Sequence Collections , 2008 .

[24]  Jyrki Kullaa,et al.  Three-way analysis of Structural Health Monitoring data , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[25]  Szymon Grabowski,et al.  New algorithms for binary jumbled pattern matching , 2013, Inf. Process. Lett..

[26]  Jaakko Hollmén,et al.  Selection of important input variables for RBF network using partial derivatives , 2008, ESANN.

[27]  Pauli Miettinen,et al.  A Case of Visual and Interactive Data Analysis: Geospatial Redescription Mining , 2012 .

[28]  Yeow Meng Chee,et al.  An enumeration of graphical designs , 2007, 0712.3895.

[29]  Aristides Gionis,et al.  A randomized approximation algorithm for computing bucket orders , 2009, Inf. Process. Lett..

[30]  Juha Kärkkäinen,et al.  Indexed Multi-pattern Matching , 2012, LATIN.

[31]  Antti Ukkonen Mining Local Correlation Patterns in Sets of Sequences , 2009, Discovery Science.

[32]  Nikolaj Tatti Itemsets for Real-Valued Datasets , 2013, 2013 IEEE 13th International Conference on Data Mining.

[33]  Esa Junttila,et al.  Patterns in permuted binary matrices , 2011 .

[34]  Heikki Mannila,et al.  The living and the fossilized: how well do unevenly distributed points capture the faunal information in a grid? , 2010 .

[35]  Wilhelmiina Hämäläinen,et al.  Efficient search for statistically significant dependency rules in binary data , 2010 .

[36]  Tapio Lokki,et al.  Canonical analysis of individual vocabulary profiling data , 2010, 2010 Second International Workshop on Quality of Multimedia Experience (QoMEX).

[37]  Juha Kärkkäinen,et al.  Medium-Space Algorithms for Inverse BWT , 2010, ESA.

[38]  Lidia Pivovarova,et al.  Adapting the PULS event extraction framework to analyze Russian text , 2013, BSNLP@ACL.

[39]  Juha Kärkkäinen,et al.  Grammar Precompression Speeds Up Burrows-Wheeler Compression , 2012, SPIRE.

[40]  Hannu Toivonen,et al.  A model for mining relevant and non-redundant information , 2012, SAC '12.

[41]  Jilles Vreeken,et al.  Finding Good Itemsets by Packing Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[42]  Jaakko Hollmén,et al.  Multi-year network level road maintenance programming by genetic algorithms and variable neighbourhood search , 2010, 13th International IEEE Conference on Intelligent Transportation Systems.

[43]  Jaakko Hollmén,et al.  Feature Extraction and Selection from Vibration Measurements for Structural Health Monitoring , 2009, IDA.

[44]  Aapo Hyvärinen,et al.  Learning a selectivity-invariance-selectivity feature extraction architecture for images , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[45]  Alessandro Valitutti Creative Coding for Humor Design: A Preliminary Exploration , 2012 .

[46]  Aapo Hyvärinen,et al.  Estimating Markov Random Field Potentials for Natural Images , 2009, ICA.

[47]  Aapo Hyvärinen,et al.  On the learning of nonlinear visual features from natural images by optimizing response energies , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[48]  B. Ylstra,et al.  Genomic Profiles Associated with Early Micrometastasis in Lung Cancer: Relevance of 4q Deletion , 2009, Clinical Cancer Research.

[49]  Aapo Hyvärinen,et al.  Learning Features by Contrasting Natural Images with Noise , 2009, ICANN.

[50]  Hannu Toivonen,et al.  On Creative Uses of Word Associations , 2012, SMPS.

[51]  Mikko Koivisto,et al.  Partial Order MCMC for Structure Discovery in Bayesian Networks , 2011, UAI.

[52]  Aapo Hyvärinen,et al.  Pairwise likelihood ratios for estimation of non-Gaussian structural equation models , 2013, J. Mach. Learn. Res..

[53]  Patric R. J. Östergård,et al.  Properties of the Steiner Triple Systems of Order 19 , 2010, Electron. J. Comb..

[54]  Niko Välimäki,et al.  Applications of Compressed Data Structures on Sequences and Structured Data , 2012 .

[55]  H. Parkinson,et al.  A global map of human gene expression , 2010, Nature Biotechnology.

[56]  Mohit Singh,et al.  Set Covering with our Eyes Closed , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[57]  Jouni Sirén Sampled Longest Common Prefix Array , 2010, CPM.

[58]  Tapani Raiko,et al.  Learning mixture models - courseware for finite mixture distributions of multivariate Bernoulli distributions , 2008 .

[59]  Martin Vingron,et al.  Integrating sequence, evolution and functional genomics in regulatory genomics , 2009, Genome Biology.

[60]  Aapo Hyvärinen,et al.  Representation of Cross-Frequency Spatial Phase Relationships in Human Visual Cortex , 2009, The Journal of Neuroscience.

[61]  Panagiotis Papapetrou,et al.  IBSM: Interval-Based Sequence Matching , 2013, SDM.

[62]  Aapo Hyvärinen ESTIMATION THEORY AND INFORMATION GEOMETRY BASED ON DENOIS ING , 2008 .

[63]  Fang Zhou,et al.  Compression of weighted graphs , 2011, KDD.

[64]  Pekka Orponen,et al.  Circumspect descent prevails in solving random constraint satisfaction problems , 2007, Proceedings of the National Academy of Sciences.

[65]  Susan T. Dumais,et al.  Using latent semantic analysis to improve information retrieval , 1988, CHI 1988.

[66]  Teppo E. Ahonen Compression-Based Clustering of Chromagram Data: New Method and Representations , 2012 .

[67]  Dimitri P. Bertsekas,et al.  Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.

[68]  Hannu Toivonen,et al.  Predicting and preventing student failure - using the k-nearest neighbour method to predict student performance in an online course environment , 2010, Int. J. Learn. Technol..

[69]  Mika Sulkava,et al.  Comparative Analysis of Power Consumption in University Buildings Using envSOM , 2011, IDA.

[70]  Kjell Lemström,et al.  Compression-based Similarity Measures in Symbolic, Polyphonic Music , 2011, ISMIR.

[71]  Patrik O. Hoyer,et al.  Causal Search in Structural Vector Autoregressive Models , 2009, NIPS Mini-Symposium on Causality in Time Series.

[72]  Petteri Kaski,et al.  Tight Local Approximation Results for Max-Min Linear Programs , 2008, ALGOSENSORS.

[73]  Pauli Miettinen,et al.  On the Positive-Negative Partial Set Cover problem , 2008, Inf. Process. Lett..

[74]  Gemma C. Garriga,et al.  Permutation Tests for Studying Classifier Performance , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[75]  Hannu Toivonen,et al.  Fast Discovery of Reliable k-terminal Subgraphs , 2010, PAKDD.

[76]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[77]  Panagiotis Papapetrou,et al.  Visually Controllable Data Mining Methods , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[78]  Lidia Pivovarova,et al.  Automatic Detection of Stable Grammatical Features in N-Grams , 2013, MWE@NAACL-HLT.

[79]  H Mannila,et al.  Transposition and time-scale invariant geometric music retrieval , .

[80]  Veli Mäkinen,et al.  Combinatorial Approaches for Mass Spectra Recalibration , 2005, IEEE ACM Trans. Comput. Biol. Bioinform..

[81]  Frederick Eberhardt,et al.  Learning linear cyclic causal models with latent variables , 2012, J. Mach. Learn. Res..

[82]  Gonzalo Navarro,et al.  Fast in-memory XPath search using compressed indexes , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[83]  Dimitri P. Bertsekas,et al.  Distributed asynchronous policy iteration in dynamic programming , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[84]  Aapo Hyvärinen,et al.  Causal modelling combining instantaneous and lagged effects: an identifiable model based on non-Gaussianity , 2008, ICML '08.

[85]  Aapo Hyvärinen,et al.  Testing Independent Component Patterns by Inter-Subject or Inter-Session Consistency , 2013, Front. Hum. Neurosci..

[86]  Gonzalo Navarro,et al.  Dynamic entropy-compressed sequences and full-text indexes , 2006, TALG.

[87]  George Karypis,et al.  Centroid-Based Document Classification: Analysis and Experimental Results , 2000, PKDD.

[88]  Bernhard Schölkopf,et al.  Telling cause from effect based on high-dimensional observations , 2009, ICML.

[89]  Alberto Apostolico,et al.  Efficient algorithms for the discovery of gapped factors , 2011, Algorithms for Molecular Biology.

[90]  Joseph S. B. Mitchell,et al.  Minimum-perimeter enclosures , 2008, Inf. Process. Lett..

[91]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[92]  Fang Zhou,et al.  A Framework for Path-Oriented Network Simplification , 2010, IDA.

[93]  Aristides Gionis,et al.  Searching the wikipedia with contextual information , 2008, CIKM '08.

[94]  Petri Kontkanen,et al.  Construction of irregular histograms by penalized maximum likelihood: A comparative study , 2012, 2012 IEEE Information Theory Workshop.

[95]  Juha Kärkkäinen,et al.  Suffix Array Construction , 2008, Encyclopedia of Algorithms.

[96]  Panagiotis Papapetrou,et al.  Size Matters: Finding the Most Informative Set of Window Lengths , 2012, ECML/PKDD.

[97]  Kerkko Luosto,et al.  The normalized maximum likelihood distribution of the multinomial model class with positive maximum likelihood parameters , 2012 .

[98]  Veli Mäkinen,et al.  Filtering methods for content-based retrieval on indexed symbolic music databases , 2010, Information Retrieval.

[99]  Juho Rousu,et al.  Structured Output Prediction of Anti-cancer Drug Activity , 2010, PRIB.

[100]  Hannu Toivonen,et al.  Team Association Analysis for Named Entity Filtering , 2012, TREC.

[101]  Aapo Hyvärinen Pairwise Measures of Causal Direction in Linear Non-Gaussian Acyclic Models , 2010, ACML.

[102]  Oskar Gross,et al.  ArNePo: Arts, News & Poetry , 2013 .

[103]  Hannu Toivonen Basket Analysis , 2010, Encyclopedia of Machine Learning.

[104]  Juha Kärkkäinen,et al.  Linear Time Lempel-Ziv Factorization: Simple, Fast, Small , 2012, CPM.

[105]  Markus Ojala,et al.  Assessing Data Mining Results on Matrices with Randomization , 2010, 2010 IEEE International Conference on Data Mining.

[106]  Heikki Mannila,et al.  Gaussian Clusters and Noise: An Approach Based on the Minimum Description Length Principle , 2010, Discovery Science.

[107]  Shi Zhong,et al.  Efficient online spherical k-means clustering , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[108]  Hannu Toivonen,et al.  Discovery of Novel Term Associations in a Document Collection , 2012, Bisociative Knowledge Discovery.

[109]  Aapo Hyvärinen,et al.  Estimating exogenous variables in data with more variables than observations , 2011, Neural Networks.

[110]  Dimitrios Gunopulos,et al.  Finding representative objects using link analysis ranking , 2012, PETRA '12.

[111]  Gonzalo Navarro,et al.  Document Listing on Repetitive Collections , 2013, CPM.

[112]  S. Gries Dispersions and adjusted frequencies in corpora , 2008 .

[113]  Heikki Mannila,et al.  Phasing genotypes using a hidden Markov model , 2008 .

[114]  Tobias Andersson Granberg,et al.  Socially optimal allocation of ATM resources via truthful market-based mechanisms , 2012 .

[115]  Aapo Hyvärinen,et al.  Learning reconstruction and prediction of natural stimuli by a population of spiking neurons , 2009, ESANN.

[116]  Aapo Hyvärinen,et al.  Structural equations and divisive normalization for energy-dependent component analysis , 2011, NIPS.

[117]  Valentin Polishchuk,et al.  Periodic Multi-labeling of Public Transit Lines , 2010, GIScience.

[118]  D. Sculley,et al.  Relaxed online SVMs for spam filtering , 2007, SIGIR.

[119]  H. Norppa,et al.  Aberrations of chromosome 19 in asbestos-associated lung cancer and in asbestos-induced micronuclei of bronchial epithelial cells in vitro. , 2008, Carcinogenesis.

[120]  J S Brownstein,et al.  An overview of internet biosurveillance. , 2013, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[121]  Valentin Polishchuk,et al.  Flexible Airlane Generation to Maximize Flow Under Hard and Soft Constraints , 2011 .

[122]  Johannes Fischer,et al.  Space Efficient String Mining under Frequency Constraints , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[123]  Juho Rousu,et al.  Minimum Mutation Algorithm for Gapless Metabolic Network Evolution , 2011, BIOINFORMATICS.

[124]  Markus Ojala,et al.  Randomization algorithms for assessing the significance of data mining results , 2011 .

[125]  Heikki Mannila,et al.  Correlations and Co-Occurrences of Taxa: the Role of Temporal, Geographic and Taxonomic Restrictions , 2011 .

[126]  Joseph S. B. Mitchell,et al.  Distributed localization and clustering using data correlation and the Occam's razor principle , 2011, 2011 International Conference on Distributed Computing in Sensor Systems and Workshops (DCOSS).

[127]  Leena Salmela Merkkijonoalgoritmeja monen hahmon hakuun , 2010 .

[128]  Thomas Whitington,et al.  Transcription Factor Binding in Human Cells Occurs in Dense Clusters Formed around Cohesin Anchor Sites , 2013, Cell.

[129]  Heikki Mannila,et al.  Randomization techniques for assessing the significance of gene periodicity results , 2011, BMC Bioinformatics.

[130]  Kai Puolamäki,et al.  Dental functional traits of mammals resolve productivity in terrestrial ecosystems past and present , 2012, Proceedings of the Royal Society B: Biological Sciences.

[131]  Kai Puolamäki,et al.  Precipitation and large herbivorous mammals II: application to fossil data , 2010 .

[132]  Petr A. Golovach,et al.  On the Parameterized Complexity of Cutting a Few Vertices from a Graph , 2013, MFCS.

[133]  Jakub Piskorski,et al.  Information Extraction: Past, Present and Future , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[134]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[135]  Frederick Eberhardt,et al.  Experiment selection for causal discovery , 2013, J. Mach. Learn. Res..

[136]  Heikki Mannila,et al.  Evaluation of BIC and Cross Validation for model selection on sequence segmentations , 2010, Int. J. Data Min. Bioinform..

[137]  Aapo Hyvärinen,et al.  Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity , 2010, J. Mach. Learn. Res..

[138]  Antti Laaksonen Efficient and Simple Algorithms for Time-Scaled and Time-Warped Music Search , 2013 .

[139]  Juho Rousu,et al.  Multilabel classification through random graph ensembles , 2014, Machine Learning.

[140]  D. Bertsekas,et al.  Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .

[141]  Hannes Heikinheimo Extending data mining techniques for frequent pattern discovery: Trees, low-entropy sets, and crossmining , 2010 .

[142]  Jaakko Hollmén,et al.  Analyzing subjective expert opinions about standardization of tree-ring series , 2010 .

[143]  Gonzalo Navarro,et al.  Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections , 2008, SPIRE.

[144]  Matti Vuorinen,et al.  Assessment of Utility in Web Mining for the Domain of Public Health , 2010, Louhi@NAACL-HLT.

[145]  Esther M. Arkin,et al.  Scandinavian Thins on Top of Cake: On the Smallest One-Size-Fits-All Box , 2012, FUN.

[146]  Hannu Toivonen,et al.  “Let Everything Turn Well in Your Wife”: Generation of Adult Humor Using Lexical Constraints , 2013, ACL.

[147]  Gemma C. Garriga,et al.  Randomization Techniques for Graphs , 2009, SDM.

[148]  Gonzalo Navarro,et al.  Faster entropy-bounded compressed suffix trees , 2009, Theor. Comput. Sci..

[149]  Veli Mäkinen,et al.  Unified View of Backward Backtracking in Short Read Mapping , 2010, Algorithms and Applications.

[150]  Mikko Koivisto,et al.  Partitioning into Sets of Bounded Cardinality , 2009, IWPEC.

[151]  Marko Salmenkivi Frequent Pattern , 2008, Encyclopedia of GIS.

[152]  Aapo Hyvärinen,et al.  Causal discovery of linear acyclic models with arbitrary distributions , 2008, UAI.

[153]  Oyer,et al.  Causal Inference by Independent Component Analysis: Theory and Applications∗ , 2012 .

[154]  Alex Norta,et al.  Utility Evaluation of Tools for Collaborative Development and Maintenance of Ontologies , 2010, 2010 14th IEEE International Enterprise Distributed Object Computing Conference Workshops.

[155]  Valentin Polishchuk,et al.  Local 3-approximation algorithms for weighted dominating set and vertex cover in quasi unit-disk graphs , 2008 .

[156]  Peter Sanders,et al.  Better external memory suffix array construction , 2008, JEAL.

[157]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[158]  Juho Rousu,et al.  Towards structured output prediction of enzyme function , 2008, BMC proceedings.

[159]  Dimitri P. Bertsekas,et al.  Q-learning and enhanced policy iteration in discounted dynamic programming , 2010, 49th IEEE Conference on Decision and Control (CDC).

[160]  Esko Ukkonen,et al.  Fast scaffolding with small independent mixed integer programs , 2011, Bioinform..

[161]  Heikki Mannila,et al.  Approximating the Minimum Chain Completion problem , 2009, Inf. Process. Lett..

[162]  Kjell Lemström Towards More Robust Geometric Content-Based Music Retrieval , 2010, ISMIR.

[163]  Hannu Toivonen,et al.  Software Newsroom – an approach to automation of news search and editing , 2013 .

[164]  Aapo Hyvärinen,et al.  Testing the ICA mixing matrix based on inter-subject or inter-session consistency , 2011, NeuroImage.

[165]  Eliza Congdon,et al.  Early Environment and Neurobehavioral Development Predict Adult Temperament Clusters , 2012, PloS one.

[166]  Dimitrios Gunopulos,et al.  Hum-a-song: A Subsequence Matching with Gaps-Range-Tolerances Query-By-Humming System , 2012, Proc. VLDB Endow..

[167]  Niko Vuokko,et al.  Consecutive Ones Property and Spectral Ordering , 2010, SDM.

[168]  Aapo Hyvärinen,et al.  Learning Topographic Representations for Linearly Correlated Components , 2011 .

[169]  Esther Galbrun,et al.  Topical organization of user comments and application to content recommendation , 2013, WWW '13 Companion.

[170]  Dieter Merkl Document Classification with Self-Organizing Maps , 1999 .

[171]  Hannu Toivonen,et al.  Named Entity Filtering Based on Concept Association Graphs , 2013, Res. Comput. Sci..

[172]  Michael Biehl,et al.  Dynamics of on-line competitive learning , 1997 .

[173]  Mika Sulkava,et al.  Learning from environmental data : methods for analysis of forest nutrition time series , 2008 .

[174]  Antoine Doucet,et al.  XML-aided phrase indexing for hypertext documents , 2008, SIGIR '08.

[175]  Jouni Sirén,et al.  Compressed Suffix Arrays for Massive Data , 2009, SPIRE.

[176]  Simon J. Puglisi,et al.  Lempel-Ziv factorization: Simple, fast, practical , 2013, ALENEX.

[177]  Fang Zhou,et al.  Network Simplification with Minimal Loss of Connectivity , 2010, 2010 IEEE International Conference on Data Mining.

[178]  Jussi Kollin,et al.  Computational Methods for Detecting Large-Scale Chromosome Rearrangements in SNP Data , 2010 .

[179]  Pasi Rastas,et al.  A General Framework for Local Pairwise Alignment Statistics with Gaps , 2009, WABI.

[180]  Jouni Sirén,et al.  Compressed Full-Text Indexes for Highly Repetitive Collections , 2012 .

[181]  Juho Rousu,et al.  Metabolite Identification through Machine Learning — Tackling CASMI Challenge Using FingerID , 2013, Metabolites.

[182]  Maxime Crochemore,et al.  Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching , 1992 .

[183]  Ilkka Autio,et al.  Modeling Efficient Classification as a Process of Confidence Assessment and Delegation , 2008 .

[184]  Hannu Toivonen,et al.  Finding reliable subgraphs from large probabilistic graphs , 2008, Data Mining and Knowledge Discovery.

[185]  Heikki Mannila,et al.  Randomization methods for assessing data analysis results on real‐valued matrices , 2009, Stat. Anal. Data Min..

[186]  Aristides Gionis,et al.  Algorithms for unimodal segmentation with applications to unimodality detection , 2006, Knowledge and Information Systems.

[187]  Mikko Koivisto,et al.  Exact Structure Discovery in Bayesian Networks with Less Space , 2009, UAI.

[188]  Markus Reichstein,et al.  The European carbon balance. Part 3: forests , 2010 .

[189]  Thomas Zeugmann,et al.  Proceedings of the 22nd international conference on Algorithmic learning theory , 1995 .

[190]  Aapo Hyvärinen,et al.  A General Linear Non-Gaussian State-Space Model , 2011, ACML.

[191]  Aapo Hyvärinen,et al.  Optimal Approximation of Signal Priors , 2008, Neural Computation.

[192]  Andreas Björklund,et al.  The traveling salesman problem in bounded degree graphs , 2012, TALG.

[193]  P. Hoyer,et al.  On Causal Discovery from Time Series Data using FCI , 2010 .

[194]  Travis Gagie On the Value of Multiple Read/Write Streams for Data Compression , 2013, Information Theory, Combinatorics, and Search Theory.

[195]  Peter Bak,et al.  Visual Analytics for Spatial Clustering: Using a Heuristic Approach for Guided Exploration , 2013, IEEE Transactions on Visualization and Computer Graphics.

[196]  Hannu Toivonen,et al.  Harnessing Constraint Programming for Poetry Composition , 2013, ICCC.

[197]  Aapo Hyvärinen,et al.  Independent component analysis of short-time Fourier transforms for spontaneous EEG/MEG analysis , 2010, NeuroImage.

[198]  Manfred Opper,et al.  A Bayesian approach to on-line learning , 1999 .

[199]  Panagiotis Papapetrou,et al.  Tracking your steps on the track: body sensor recordings of a controlled walking experiment , 2010, PETRA '10.

[200]  Esther M. Arkin,et al.  Maximum thick paths in static and dynamic environments , 2008, SCG '08.

[201]  Samuel Kaski,et al.  Learning to learn implicit queries from gaze patterns , 2008, ICML '08.

[202]  Sach Mukherjee Multiple Hypothesis Testing for Data Mining , 2009, Encyclopedia of Data Warehousing and Mining.

[203]  Petri Kontkanen,et al.  On the performance of histogram-based entropy estimators , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[204]  Valentin Polishchuk,et al.  Shape approximation using k-order alpha-hulls , 2010, SoCG '10.

[205]  Petteri Kaski,et al.  Approximating max-min linear programs with local algorithms , 2007, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[206]  Jukka-Pekka Kauppi,et al.  Inter-Subject Correlation in fMRI: Method Validation against Stimulus-Model Based Analysis , 2012, PloS one.

[207]  Mikko Koivisto,et al.  Bayesian structure discovery in Bayesian networks with less space , 2010, AISTATS.

[208]  Sami Hanhijärvi Multiple Hypothesis Testing in Pattern Discovery , 2011, Discovery Science.

[209]  Wing-Kai Hon,et al.  New Algorithms for Position Heaps , 2013, CPM.

[210]  Andreas Björklund,et al.  Fast zeta transforms for point lattices , 2012, SODA 2012.

[211]  Juho Rousu,et al.  Reaction Kernels - Structured Output Prediction Approaches for Novel Enzyme Function , 2018, BIOINFORMATICS.

[212]  Alexandru I. Tomescu,et al.  Motif matching using gapped patterns , 2014, Theor. Comput. Sci..

[213]  Gemma C. Garriga,et al.  Banded structure in binary matrices , 2008, Knowledge and Information Systems.

[214]  Juha Kärkkäinen,et al.  Permuted Longest-Common-Prefix Array , 2009, CPM.

[215]  Hannu Toivonen,et al.  Finding Representative Nodes in Probabilistic Graphs , 2012, Bisociative Knowledge Discovery.

[216]  Kimmo Hätönen,et al.  Data mining for telecommunications network log analysis , 2009 .

[217]  Dimitri P. Bertsekas,et al.  A Unifying Polyhedral Approximation Framework for Convex Optimization , 2011, SIAM J. Optim..

[218]  Moshe Lewenstein,et al.  Forbidden Patterns , 2012, LATIN.

[219]  Frederick Eberhardt,et al.  Causal discovery for linear cyclic models with latent variables , 2010 .

[220]  K. Puolamäki,et al.  Precipitation and large herbivorous mammals I: estimates from present-day communities , 2010 .

[221]  Esko Ukkonen,et al.  Mining the UKIDSS GPS: Star Formation and Embedded Clusters , 2012 .

[222]  Aapo Hyvärinen,et al.  Independent component analysis: recent advances , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[223]  Aapo Hyvärinen,et al.  Unsupervised learning of dependencies between local luminance and contrast in natural images , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[224]  Panagiotis Papapetrou,et al.  ARTEMIS: Assessing the Similarity of Event-Interval Sequences , 2011, ECML/PKDD.

[225]  Veli Mäkinen,et al.  Normalized N50 assembly metric using gap-restricted co-linear chaining , 2022 .

[226]  Teppo E. Ahonen Combining Chroma Features For Cover Version Identification , 2010, ISMIR.

[227]  Dilip Kumar,et al.  EECHE: energy-efficient cluster head election protocol for heterogeneous wireless sensor networks , 2009, ICAC3 '09.

[228]  Heikki Mannila,et al.  Lower Extinction Risk in Sleep‐or‐Hide Mammals , 2008, The American Naturalist.

[229]  Pauli Miettinen,et al.  Interpretable nonnegative matrix decompositions , 2008, KDD.

[230]  Patrik O. Hoyer,et al.  Data-driven covariate selection for nonparametric estimation of causal effects , 2013, AISTATS.

[231]  Antoine Doucet,et al.  Filtering news for epidemic surveillance: towards processing more languages with fewer resources , 2010 .

[232]  Teppo Niinimaki,et al.  Local Structure Discovery in Bayesian Networks , 2012, UAI.

[233]  Yoshua Bengio,et al.  Convergence Properties of the K-Means Algorithms , 1994, NIPS.

[234]  Jayanta Basak,et al.  Online Adaptive Decision Trees: Pattern Classification and Function Approximation , 2006, Neural Computation.

[235]  Indre Zliobaite,et al.  Fault Tolerant Regression for Sensor Data , 2013, ECML/PKDD.

[236]  Fang Zhou,et al.  Simplification of Networks by Edge Pruning , 2012, Bisociative Knowledge Discovery.

[237]  Sami Kilpinen,et al.  Application of Active and Kinase-Deficient Kinome Collection for Identification of Kinases Regulating Hedgehog Signaling , 2008, Cell.

[238]  Esther M. Arkin,et al.  Data transmission and base-station placement for optimizing network lifetime , 2010, DIALM-POMC '10.

[239]  Hwee Tou Ng,et al.  Bayesian online classifiers for text classification and filtering , 2002, SIGIR '02.

[240]  Jan Schröder,et al.  BIOINFORMATICS ORIGINAL PAPER , 2022 .

[241]  Aapo Hyvärinen,et al.  Causality Discovery with Additive Disturbances: An Information-Theoretical Perspective , 2009, ECML/PKDD.

[242]  Pauli Miettinen,et al.  The Boolean column and column-row matrix decompositions , 2008, Data Mining and Knowledge Discovery.

[243]  Hannu Toivonen,et al.  Apriori Algorithm , 2010, Encyclopedia of Machine Learning.

[244]  Antoine Bordes,et al.  The Huller: A Simple and Efficient Online SVM , 2005, ECML.

[245]  A Mawudeku,et al.  Landscape of international event-based biosurveillance , 2010, Emerging health threats journal.

[246]  Andreas Björklund,et al.  Trimmed Moebius Inversion and Graphs of Bounded Degree , 2008, Theory of Computing Systems.

[247]  Mikko Koivisto,et al.  Finding Efficient Circuits for Ensemble Computation , 2012, SAT.

[248]  Salla Ruosaari,et al.  Microarrays in Lung Cancer Research: From Comparative Analyses to Verified Findings , 2008 .

[249]  Alon Efrat,et al.  Optimization Schemes for Protective Jamming , 2012, MobiHoc '12.

[250]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[251]  Joydeep Ghosh,et al.  Under Consideration for Publication in Knowledge and Information Systems Generative Model-based Document Clustering: a Comparative Study , 2003 .

[252]  Valentin Polishchuk,et al.  Almost Stable Matchings by Truncating the Gale–Shapley Algorithm , 2009, Algorithmica.

[253]  Heikki Mannila,et al.  Higher origination and extinction rates in larger mammals , 2008, Proceedings of the National Academy of Sciences.

[254]  Petteri Kaski,et al.  Local Approximability of Max-Min and Min-Max Linear Programs , 2010, Theory of Computing Systems.

[255]  Pekka Parviainen,et al.  Algorithms for Exact Structure Discovery in Bayesian Networks , 2012 .

[256]  Panagiotis Papapetrou,et al.  Analyzing Word Frequencies in Large Text Corpora Using Inter-arrival Times and Bootstrapping , 2011, ECML/PKDD.

[257]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[258]  Mika Sulkava,et al.  Photosynthesis, temperature and radial growth of Scots pine in northern Finland: identifying the influential time intervals , 2011, Trees.

[259]  Petteri Kaski,et al.  Significance of Patterns in Time Series Collections , 2011, SDM.

[260]  Esko Ukkonen,et al.  The common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling , 2009, Nature Genetics.

[261]  Aapo Hyvärinen,et al.  A Family of Computationally E cient and Simple Estimators for Unnormalized Statistical Models , 2010, UAI.

[262]  Roman Yangarber,et al.  Hidden Markov Models for Induction of Morphological Structure of Natural Language , 2010 .

[263]  Jason D. M. Rennie ifile: An Application of Machine Learning to E-Mail Filtering , 2000 .

[264]  Hannu Toivonen,et al.  Biomine: A Network-Structured Resource of Biological Entities for Link Prediction , 2012, Bisociative Knowledge Discovery.

[265]  Aapo Hyvärinen,et al.  Correlated topographic analysis: estimating an ordering of correlated components , 2013, Machine Learning.

[266]  Antti Ukkonen,et al.  Example-dependent Basis Vector Selection for Kernel-Based Classifiers , 2010, ECML/PKDD.

[267]  Roman Yangarber,et al.  A Database of the Uralic Language Family for Etymological Research , 2008 .

[268]  Jakub Piskorski,et al.  News mining for border security Intelligence , 2010, 2010 IEEE International Conference on Intelligence and Security Informatics.

[269]  Mikko Koivisto,et al.  Fast Bayesian Haplotype Inference Via Context Tree Weighting , 2008, WABI.

[270]  Jaakko Hollmén,et al.  Functional prediction of unidentified lipids using supervised classifiers , 2010, Metabolomics.

[271]  Fang Zhou,et al.  Review of network abstraction techniques , 2009 .

[272]  Hannu Toivonen,et al.  Effective Pruning for the Discovery of Conditional Functional Dependencies , 2013, Comput. J..

[273]  Heikki Mannila,et al.  Mixture Model Clustering of Phenotype Features Reveals Evidence for Association of DTNBP1 to a Specific Subtype of Schizophrenia , 2009, Biological Psychiatry.

[274]  Andreas Björklund,et al.  The Travelling Salesman Problem in Bounded Degree Graphs , 2008, ICALP.

[275]  Juho Rousu,et al.  BMC Systems Biology BioMed Central Methodology article , 2009 .

[276]  Heikki Mannila,et al.  Finding Links and Initiators: A Graph-Reconstruction Problem , 2009, SDM.

[277]  Heikki Mannila,et al.  Randomization of real-valued matrices for assessing the significance of data mining results , 2008, SDM.

[278]  Gonzalo Navarro,et al.  An(other) Entropy-Bounded Compressed Suffix Tree , 2008, CPM.

[279]  Eliza Congdon,et al.  Temperament Clusters in a Normal Population: Implications for Health and Disease , 2012, PloS one.

[280]  Aapo Hyvärinen,et al.  A direct method for estimating a causal ordering in a linear non-Gaussian acyclic model , 2009, UAI.

[281]  Juho Rousu,et al.  Metabolite identification and molecular fingerprint prediction through machine learning , 2012, Bioinform..

[282]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[283]  Joonas Paalasmaa,et al.  Biomusic performance, Gallery Jade, The Night of the Arts , 2013 .

[284]  Heikki Mannila,et al.  The Effect of Scale, Climate and Environment on Species Richness and Spatial Distribution of Finnish Birds , 2011 .

[285]  Dino Ienco,et al.  Clustering Based Active Learning for Evolving Data Streams , 2013, Discovery Science.

[286]  Joseph S. B. Mitchell,et al.  Planning Routes with Wiggle Room in En Route Weather- Impacted Airspaces , 2009 .

[287]  Heikki Mannila,et al.  Determining significance of pairwise co-occurrences of events in bursty sequences , 2008, BMC Bioinformatics.

[288]  Juho Rousu,et al.  Efficient Path Kernels for Reaction Function Prediction , 2012, BIOINFORMATICS.

[289]  Gemma C. Garriga,et al.  Randomization Techniques for Statistical Significance Testing on Graphs , 2008 .

[290]  Heikki Mannila,et al.  Complexity control in a mixture model by the Hardy-Weinberg equilibrium , 2009, Comput. Stat. Data Anal..

[291]  Aapo Hyvärinen,et al.  Source Separation and Higher-Order Causal Analysis of MEG and EEG , 2010, UAI.

[292]  Patric R. J. Östergård,et al.  The Cycle Switching Graph of the Steiner Triple Systems of Order 19 is Connected , 2011, Graphs Comb..

[293]  S. Knuutila,et al.  Classification of human cancers based on DNA copy number amplification modeling , 2008, BMC Medical Genomics.

[294]  Steinberger Ralf,et al.  Automatic Epidemiological Surveillance from On-line News in MedISys and PULS , 2009 .

[295]  I. King,et al.  Competitive Learning Clustering for Information Retrieval in Image Databases , 1997, ICONIP.

[296]  Patrik O. Hoyer,et al.  Estimating a Causal Order among Groups of Variables in Linear Models , 2012, ICANN.

[297]  Wolfgang Gerlach,et al.  Engineering a compressed suffix tree implementation , 2007, JEAL.

[298]  Hannu Toivonen,et al.  Decomposition and Distribution of Humorous Effect in Interactive Systems , 2012, AAAI Fall Symposium: Artificial Intelligence of Humor.

[299]  Mika Sulkava,et al.  Effects of daily temperature and photosynthetic production on growth variation of Scots pine in northern Finland , 2009 .

[300]  Markku Saloheimo,et al.  13C-metabolic flux ratio and novel carbon path analyses confirmed that Trichoderma reesei uses primarily the respirative pathway also on the preferred carbon source glucose , 2009, BMC Systems Biology.

[301]  Susana Ladra,et al.  Approximate All-Pairs Suffix/Prefix Overlaps , 2010, CPM.

[302]  Nada Lavrac,et al.  SegMine workflows for semantic microarray data analysis in Orange4WS , 2011, BMC Bioinformatics.

[303]  C. Campi,et al.  Estimating the whole bone-marrow asset in humans by a computational approach to integrated PET/CT imaging , 2012, European Journal of Nuclear Medicine and Molecular Imaging.

[304]  Panagiotis Papapetrou,et al.  Benchmarking dynamic time warping for music retrieval , 2010, PETRA '10.

[305]  Mikko Koivisto,et al.  A space-time tradeoff for permutation problems , 2010, SODA '10.

[306]  Patrik O. Hoyer,et al.  Estimation of causal effects using linear non-Gaussian causal models with hidden variables , 2008, Int. J. Approx. Reason..

[307]  Hannu Toivonen,et al.  Retrieval of Relevant and Non-redundant Nodes ∗ , 2012 .

[308]  Jussi T. Lindgren,et al.  Learning Nonlinear Visual Processing from Natural Images , 2008 .

[309]  Gemma C. Garriga,et al.  An approximation ratio for biclustering , 2008, Inf. Process. Lett..

[310]  Heikki Mannila Finding Total and Partial Orders from Data for Seriation , 2008, Discovery Science.

[311]  Andreas Björklund,et al.  Computing the Tutte Polynomial in Vertex-Exponential Time , 2007, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[312]  Juho Rousu,et al.  An analytic and systematic framework for estimating metabolic flux ratios from 13C tracer experiments , 2008, BMC Bioinformatics.

[313]  Petteri Kaski,et al.  There are 1,132,835,421,602,062,347 nonisomorphic one‐factorizations of K14 , 2007, 0801.0202.

[314]  Andreas Björklund,et al.  Set Partitioning via Inclusion-Exclusion , 2009, SIAM J. Comput..

[315]  Margus Lukk,et al.  Construction of a global map of human gene expression : the process, tools and analysis , 2010 .

[316]  Stephen Muggleton,et al.  Machine Learning for Systems Biology , 2005, ILP.

[317]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[318]  Jukka M. Toivanen,et al.  Brains on Art: Brain Poetry , 2013 .

[319]  Patric R. J. Östergård,et al.  Steiner triple systems of order 19 and 21 with subsystems of order 7 , 2008, Discret. Math..

[320]  Jaakko Hollmén,et al.  Spatio-temporal Road Condition Forecasting with Markov Chains and Artificial Neural Networks , 2008, HAIS.

[321]  Frederick Eberhardt,et al.  Causal Discovery of Linear Cyclic Models from Multiple Experimental Data Sets with Overlapping Variables , 2012, UAI.

[322]  Ole Winther,et al.  Optimal perceptron learning: as online Bayesian approach , 1999 .

[323]  A. Hyvärinen,et al.  Spatial frequency tuning in human retinotopic visual areas. , 2008, Journal of vision.

[324]  Leena Salmela,et al.  Correction of sequencing errors in a mixed set of reads , 2010, Bioinform..

[325]  Petteri Kaski,et al.  An optimal local approximation algorithm for max-min linear programs , 2009, SPAA '09.

[326]  Kjell Lemström,et al.  Identifying cover songs using normalized compression distance , 2008 .

[327]  Markus Heinonen,et al.  Computational methods for small molecules , 2012 .

[328]  Abhay Harpale,et al.  Document Classification Through Interactive Supervision of Document and Term Labels , 2004, PKDD.

[329]  Jyrki Jaakkola,et al.  Wear and chemical resistance of sol-gel coatings on the stainless steel substrate , 2008 .

[330]  Hannu Toivonen,et al.  Automatical Composition of Lyrical Songs , 2013, ICCC.

[331]  Petteri Kaski,et al.  Fast monotone summation over disjoint sets , 2012, Inf. Process. Lett..

[332]  Heikki Huttunen,et al.  Mind reading with regularized multinomial logistic regression , 2012, Machine Vision and Applications.

[333]  Esko Ukkonen,et al.  Point Pattern Matching , 2016, Encyclopedia of Algorithms.

[334]  Aapo Hyvärinen,et al.  A Two-Layer Model of Natural Stimuli Estimated with Score Matching , 2010, Neural Computation.

[335]  Samuel Kaski,et al.  Bayesian Solutions to the Label Switching Problem , 2009, IDA.

[336]  D. Béroule The never-ending learning , 1989 .

[337]  Dimitri P. Bertsekas,et al.  Error Bounds for Approximations from Projected Linear Equations , 2010, Math. Oper. Res..

[338]  S. Knuutila,et al.  Integrated gene copy number and expression microarray analysis of gastric cancer highlights potential target genes , 2008, International journal of cancer.

[339]  Janne H. Korhonen,et al.  Exact Learning of Bounded Tree-width Bayesian Networks , 2013, AISTATS.

[340]  Juho Rousu,et al.  Structured Output Prediction of Novel Enzyme Function with Reaction Kernels , 2010, BIOSTEC.

[341]  Jaakko Hollmén,et al.  Novelty Detection in Projected Spaces for Structural Health Monitoring , 2010, IDA.

[342]  Kwang-Hyun Cho,et al.  Encyclopedia of Systems Biology , 2013, Springer New York.

[343]  Gonzalo Navarro,et al.  String matching with alphabet sampling , 2012, J. Discrete Algorithms.

[344]  Compton Mackenzie A Musical Chair , 1939 .

[345]  Lasse Kiviluoto,et al.  A catalogue of the Steiner triple systems of order 19 , 2009 .

[346]  David J. Murphy,et al.  Analysis of Noisy Biosignals for Musical Performance , 2012, IDA.

[347]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[348]  Aristides Gionis,et al.  Discovering Nested Communities , 2013, ECML/PKDD.

[349]  Juha Kärkkäinen,et al.  Multi-pattern Matching with Bidirectional Indexes , 2012, COCOON.

[350]  Joseph S. B. Mitchell,et al.  Improved Approximation Algorithms for Relay Placement , 2008, ESA.

[351]  Tapio Elomaa,et al.  Algorithms and Applications, Essays Dedicated to Esko Ukkonen on the Occasion of His 60th Birthday , 2010, Algorithms and Applications.

[352]  Fang Zhou,et al.  Network Compression by Node and Edge Mergers , 2012, Bisociative Knowledge Discovery.

[353]  Lidia Pivovarova,et al.  Event representation across genre , 2013, EVENTS@NAACL-HLT.

[354]  Shin Ishii,et al.  Sparse and Low-Rank Estimation of Time-Varying Markov Networks with Alternating Direction Method of Multipliers , 2010, ICONIP.

[355]  Teppo E. Ahonen Compressing lists for audio classification , 2010, MML '10.

[356]  Stefan Szeider,et al.  On finding optimal polytrees , 2012, Theor. Comput. Sci..

[357]  Martin Nöllenburg,et al.  Dynamic one-sided boundary labeling , 2010, GIS '10.

[358]  Mika Sulkava,et al.  Smoothed Prediction of the Onset of Tree Stem Radius Increase Based on Temperature Patterns , 2008, Discovery Science.

[359]  Nada Lavrac,et al.  Semantic Subgroup Discovery and Cross-Context Linking for Microarray Data Analysis , 2012, Bisociative Knowledge Discovery.

[360]  Andrew R. Gehrke,et al.  Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo , 2010, The EMBO journal.

[361]  Petteri Kaski,et al.  Local Approximation Algorithms for Scheduling Problems in Sensor Networks , 2007, ALGOSENSORS.

[362]  Ole Winther,et al.  Optimal Perceptron Learning : anOnline Bayesian , 1998 .

[363]  Patrik O. Hoyer,et al.  Bayesian Discovery of Linear Acyclic Causal Models , 2009, UAI.

[364]  Juha Kärkkäinen Multidimensional String Matching , 2008, Encyclopedia of Algorithms.

[365]  Ata Kabán,et al.  Factorisation and denoising of 0-1 data: A variational approach , 2008, Neurocomputing.

[366]  Ella Bingham,et al.  Recommendation of Multimedia Items by Link Analysis and Collaborative Filtering , 2008, ICWSM.

[367]  Patrik O. Hoyer,et al.  Discovering Cyclic Causal Models by Independent Components Analysis , 2008, UAI.

[368]  Esa Pitkänen,et al.  Computational Methods for Reconstruction and Analysis of Genome-Scale Metabolic Networks , 2010 .

[369]  Huizhen Yu,et al.  Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.

[370]  Hyvarinen Aapo Learning Natural Image Structure with a Horizontal Product Model , 2009 .

[371]  Juho Rousu,et al.  Biomarker Discovery by Sparse Canonical Correlation Analysis of Complex Clinical Phenotypes of Tuberculosis and Malaria , 2013, PLoS Comput. Biol..

[372]  Silvia Miksch,et al.  Proceedings of the ACM SIGKDD Workshop on Visual Analytics and Knowledge Discovery: Integrating Automated Analysis with Interactive Exploration, Paris, France, June 28, 2009 , 2009, KDD Workshop on Visual Analytics and Knowledge Discovery.

[373]  Fang Zhou,et al.  Methods for Network Abstraction , 2012 .

[374]  Samuel Kaski,et al.  Two-Way Grouping by One-Way Topic Models , 2009, IDA.

[375]  Petri Auvinen,et al.  Lep-MAP: fast and accurate linkage map construction for large SNP datasets , 2013, Bioinform..

[376]  Juho Rousu,et al.  Mass Spectrometry Informatics in Systems Biology , 2010 .

[377]  Joseph S. B. Mitchell,et al.  Throughput / Complexity Tradeoffs for Routing Traffic in the Presence of Dynamic Weather , 2010 .

[378]  Panagiotis Papapetrou,et al.  A Shapley Value Approach for Influence Attribution , 2011, ECML/PKDD.

[379]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[380]  Mika Sulkava,et al.  Automatic detection of onset and cessation of tree stem radius increase using dendrometer data , 2010, Neurocomputing.

[381]  Dimitri P. Bertsekas,et al.  Basis function adaptation methods for cost approximation in MDP , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[382]  Frederick Eberhardt,et al.  Discovering Cyclic Causal Models with Latent Variables: A General SAT-Based Procedure , 2013, UAI.

[383]  Juho Rousu,et al.  A Computational Method for Reconstructing Gapless Metabolic Networks , 2008, BIRD.

[384]  M. Tervaniemi,et al.  The preattentive processing of major vs. minor chords in the human brain: An event-related potential study , 2011, Neuroscience Letters.

[385]  Antoine Doucet,et al.  A Proposal for a Multilingual Epidemic Surveillance System , 2009, UCMedia.

[386]  Pauli Miettinen,et al.  The Discrete Basis Problem , 2006, IEEE Transactions on Knowledge and Data Engineering.

[387]  Juha Kärkkäinen,et al.  Lightweight Lempel-Ziv Parsing , 2013, SEA.

[388]  Juho Rousu,et al.  Computational methods for metabolic reconstruction. , 2010, Current opinion in biotechnology.

[389]  Gemma C. Garriga,et al.  Feature Selection in Taxonomies with Applications to Paleontology , 2008, Discovery Science.

[390]  Patric R. J. Östergård,et al.  Classification of resolvable balanced incomplete block designs — the unitals on 28 points , 2009 .

[391]  Pauli Miettinen,et al.  Siren: an interactive tool for mining and visualizing geospatial redescriptions , 2012, KDD.

[392]  Hannu Toivonen,et al.  Fast Discovery of Reliable Subnetworks , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[393]  Antti Ukkonen,et al.  The Support Vector Tree , 2010, Algorithms and Applications.

[394]  Aapo Hyvärinen,et al.  Decoding Magnetoencephalographic Rhythmic Activity Using Spectrospatial Information , 2022 .

[395]  Patric R. J. Östergård,et al.  There are exactly five biplanes with k=11 , 2006, Electron. Notes Discret. Math..

[396]  Jorma Tarhio,et al.  Bit-Parallel Search Algorithms for Long Patterns , 2010, SEA.

[397]  Alessandro Valitutti,et al.  How Many Jokes are Really Funny? Towards a New Approach to the Evaluation of Computational Humour Generators , 2011 .

[398]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[399]  Esko Ukkonen,et al.  Maximal and minimal representations of gapped and non-gapped motifs of a string , 2009, Theor. Comput. Sci..

[400]  Jaakko Hollmén,et al.  Collaborative Filtering for Coordinated Monitoring in Sensor Networks , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[401]  David Rizo,et al.  Polyphonic Music Retrieval with Classifier Ensembles , 2011 .

[402]  Prem Raj Adhikari,et al.  Patterns from multiresolution 0-1 data , 2010, UP '10.

[403]  Heikki Mannila,et al.  Tell me something I don't know: randomization strategies for iterative data mining , 2009, KDD.

[404]  Ella Bingham,et al.  Enhancing the Stability of Spectral Ordering with Sparsification and Partial Supervision: Application to Paleontological Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[405]  Jakub Piskorski,et al.  Automated Event Extraction in the Domain of Border Security , 2009, UCMedia.

[406]  Mika Sulkava,et al.  Daily temperature and daily photosynthetic production vs.Scots pine growth , 2010 .

[407]  Andreas Björklund,et al.  Evaluation of permanents in rings and semirings , 2010, Inf. Process. Lett..

[408]  Panagiotis Papapetrou,et al.  Mining poly-regions in DNA , 2012, Int. J. Data Min. Bioinform..

[409]  Jaana Wessman,et al.  Mixture Model Clustering in the Analysis of Complex Diseases , 2012 .

[410]  Juan M. Vaquerizas,et al.  Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. , 2010, Genome research.

[411]  Juha Kärkkäinen,et al.  Slashing the Time for BWT Inversion , 2012, 2012 Data Compression Conference.

[412]  Riitta Mahlberg,et al.  Soil resistant and self-cleaning surfaces of stainless steel with new Sol-gel and ALD coatings , 2008 .

[413]  Aapo Hyvärinen,et al.  Estimation of linear non-Gaussian acyclic models for latent factors , 2009, Neurocomputing.

[414]  Valentin Polishchuk,et al.  A simple local 3-approximation algorithm for vertex cover , 2008, Inf. Process. Lett..

[415]  Ari Laaksonen,et al.  Climate effects of northern hemisphere volcanic eruptions in an Earth System Model , 2012 .

[416]  Dario Papale,et al.  Does the European eddy flux tower network represent the climatic and ecophysiological diversity of Europe , 2009 .

[417]  Geraint A. Wiggins,et al.  Formalizing Invariances for Content-based Music Retrieval , 2009, ISMIR.

[418]  Aapo Hyvärinen,et al.  Visual Features Underlying Perceived Brightness as Revealed by Classification Images , 2009, PloS one.

[419]  Patric R. J. Östergård,et al.  The number of Latin squares of order 11 , 2009, Math. Comput..

[420]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[421]  Andreas Björklund,et al.  Counting Paths and Packings in Halves , 2009, ESA.

[422]  Heikki Mannila,et al.  Autumn temperature and carbon balance of a boreal Scots pine forest in Southern Finland. , 2010 .

[423]  Petteri Kaski,et al.  Testing the Significance of Patterns in Data with Cluster Structure , 2010, 2010 IEEE International Conference on Data Mining.

[424]  E. Kandel,et al.  Proceedings of the National Academy of Sciences of the United States of America. Annual subject and author indexes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[425]  Petteri Sevon,et al.  Subgraph Queries by Context-free Grammars , 2008, J. Integr. Bioinform..

[426]  Esther M. Arkin,et al.  The snowblower problem , 2011, Comput. Geom..

[427]  Mika Sulkava,et al.  EnvSOM: A SOM Algorithm Conditioned on the Environment for Clustering and Visualization , 2011, WSOM.

[428]  Ella Bingham,et al.  Enhancing the stability and efficiency of spectral ordering with partial supervision and feature selection , 2009, Knowledge and Information Systems.

[429]  Veli Mäkinen,et al.  Indexing Finite Language Representation of Population Genotypes , 2010, WABI.

[430]  Mikko Koivisto,et al.  Finding optimal Bayesian networks using precedence constraints , 2013, J. Mach. Learn. Res..

[431]  Mika Sulkava,et al.  How are N and S in deposition, in percolation water and in upper soil layers reflected in the chemical composition of needles in Finland? , 2008 .

[432]  Esther Galbrun,et al.  Towards Finding Relational Redescriptions , 2012, Discovery Science.

[433]  S. Linnarsson,et al.  Counting absolute numbers of molecules using unique molecular identifiers , 2011, Nature Methods.

[434]  Mikko Koivisto,et al.  Ancestor Relations in the Presence of Unobserved Variables , 2011, ECML/PKDD.

[435]  Heikki Mannila,et al.  Convergence in the distribution patterns of Europe’s plants and mammals is due to environmental forcing , 2012 .

[436]  Sang Won Bae,et al.  Geometric stable roommates , 2009, Inf. Process. Lett..

[437]  Esther M. Arkin,et al.  Not being (super)thin or solid is hard: A study of grid Hamiltonicity , 2009, Comput. Geom..

[438]  Ralf Steinberger,et al.  Text Mining from the Web for Medical Intelligence , 2007, NATO ASI Mining Massive Data Sets for Security.

[439]  Juho Rousu,et al.  Multi-task Drug Bioactivity Classification with Graph Labeling Ensembles , 2011, PRIB.

[440]  Valentin Polishchuk,et al.  Simple Wriggling is Hard Unless You Are a Fat Hippo , 2011, Theory of Computing Systems.

[441]  Malik Yousef,et al.  Document classification on neural networks using only positive examples (poster session) , 2000, SIGIR '00.

[442]  Prem Raj Adhikari,et al.  Fast progressive training of mixture models for model selection , 2013, Journal of Intelligent Information Systems.

[443]  Lidia Pivovarova,et al.  MDL-Based Models for Transliteration Generation , 2013, SLSP.

[444]  J. Linge,et al.  Internet surveillance systems for early alerting of health threats. , 2009, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[445]  Prem Raj Adhikari,et al.  Preservation of Statistically Significant Patterns in Multiresolution 0-1 Data , 2010, PRIB.

[446]  Aapo Hyvärinen,et al.  Distinguishing causes from effects using nonlinear acyclic causal models , 2008, NIPS 2010.

[447]  Hannu Toivonen,et al.  Lexical Creativity from Word Associations , 2012, 2012 Seventh International Conference on Knowledge, Information and Creativity Support Systems.

[448]  Jaakko Hollmén,et al.  Sequential input selection algorithm for long-term prediction of time series , 2008, Neurocomputing.

[449]  Esko Ukkonen,et al.  Efficient construction of maximal and minimal representations of motifs of a string , 2009, Theor. Comput. Sci..

[450]  Dimitrios Gunopulos,et al.  Applying Electromagnetic Field Theory Concepts to Clustering with Constraints , 2009, ECML/PKDD.

[451]  Evangelos Kranakis,et al.  Analysing local algorithms in location-aware quasi-unit-disk graphs , 2011, Discret. Appl. Math..

[452]  Mika Timonen,et al.  Modelling a Query Space Using Associations , 2011, EJC.

[453]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[454]  Dimitri P. Bertsekas,et al.  On Near Optimality of the Set of Finite-State Controllers for Average Cost POMDP , 2008, Math. Oper. Res..

[455]  Eric Rivals,et al.  EXACT SEARCH ALGORITHMS FOR BIOLOGICAL SEQUENCES , 2010 .

[456]  V. Podpecan,et al.  Constructing Information Networks from Text Documents , 2009 .

[457]  Frederick Eberhardt,et al.  Combining Experiments to Discover Linear Cyclic Models with Latent Variables , 2010, AISTATS.

[458]  Antti Ukkonen,et al.  Approximate Top-k Retrieval from Hidden Relations , 2010, ArXiv.

[459]  Gonzalo Navarro,et al.  Storage and Retrieval of Highly Repetitive Sequence Collections , 2010, J. Comput. Biol..

[460]  Kjell Lemström,et al.  Error-Tolerant Content-Based Music-Retrieval with Mathematical Morphology , 2010, CMMR.

[461]  Veli Mäkinen,et al.  Fast Index Based Filters for Music Retrieval , 2008, ISMIR.

[462]  Nikolaj Tatti,et al.  Maximum entropy based significance of itemsets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[463]  Heikki Mannila,et al.  Standing Out in a Crowd: Selecting Attributes for Maximum Visibility , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[464]  Aapo Hyvärinen,et al.  Discovery of Exogenous Variables in Data with More Variables Than Observations , 2009, ICANN.

[465]  Petteri Hintsanen,et al.  Simulation and graph mining tools for improving gene mapping efficiency , 2011 .

[466]  Nada Lavrac,et al.  Contrasting Subgroup Discovery , 2012, Comput. J..

[467]  Tapio Elomaa,et al.  Discovery Science - Proceedings of the 14th International Conference (DS 2011) , 2011 .

[468]  Shunsuke Inenaga,et al.  Missing pattern discovery , 2011, J. Discrete Algorithms.

[469]  Dario Papale,et al.  Assessing and improving the representativeness of monitoring networks: The European flux tower network example , 2011 .

[470]  Petri Kontkanen,et al.  Clustgrams: an extension to histogram densities based on the minimum description length principle , 2011, Central European Journal of Computer Science.

[471]  S. Knuutila,et al.  Prognostic classification of patients with acute lymphoblastic leukemia by using gene copy number profiles identified from array-based comparative genomic hybridization data. , 2010, Leukemia research.

[472]  David Rizo,et al.  Tree structured and combined methods for comparing metered polyphonic music , 2008 .

[473]  Hannu Toivonen,et al.  Biomine: predicting links between biological entities using network models of heterogeneous databases , 2012, BMC Bioinformatics.

[474]  Kai Puolamäki,et al.  Neogene aridification of the Northern Hemisphere , 2012 .

[475]  Joseph S. B. Mitchell,et al.  Routing multi-class traffic flows in the plane , 2012, Comput. Geom..

[476]  Mikhail Kopotev,et al.  Building Support Tools for Russian-Language Information Extraction , 2011, TSD.

[477]  Dimitri P. Bertsekas,et al.  New error bounds for approximations from projected linear equations , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[478]  Juho Rousu,et al.  Computing Atom Mappings for Biochemical Reactions without Subgraph Isomorphism , 2011, J. Comput. Biol..

[479]  Valentin Polishchuk,et al.  A Local 2-Approximation Algorithm for the Vertex Cover Problem , 2009, DISC.

[480]  Huizhen Yu,et al.  Least Squares Temporal Difference Methods: An Analysis under General Conditions , 2012, SIAM J. Control. Optim..

[481]  Nada Lavrac,et al.  Bisociative Knowledge Discovery for Microarray Data Analysis , 2010, ICCC.

[482]  Juha Kärkkäinen,et al.  Near in Place Linear Time Minimum Redundancy Coding , 2013, 2013 Data Compression Conference.

[483]  Bart Goethals,et al.  Mining Association Rules of Simple Conjunctive Queries , 2008, SDM.

[484]  Luc De Raedt,et al.  Patterns and Logic for Reasoning with Networks , 2012, Bisociative Knowledge Discovery.

[485]  Methods for statistical data analysis with decision trees Problems of the multivariate statistical analysis , 2003 .

[486]  David Rizo,et al.  Ensemble of state-of-the-art methods for polyphonic music comparison , 2009 .

[487]  Jaakko J. Väyrynen,et al.  WordICA—emergence of linguistic representations for words by independent component analysis , 2010, Natural Language Engineering.

[488]  Aapo Hyvärinen,et al.  Statistical Models of Natural Images and Cortical Visual Representation , 2010, Top. Cogn. Sci..

[489]  Aapo Hyvärinen,et al.  On the Identifiability of the Post-Nonlinear Causal Model , 2009, UAI.

[490]  Leena Salmela Average complexity of backward q-gram string matching algorithms , 2012, Inf. Process. Lett..

[491]  José Manuel Iñesta Quereda,et al.  Tree Representation in Combined Polyphonic Music Comparison , 2008, CMMR.

[492]  A. Hyvärinen,et al.  Characterization of neuromagnetic brain rhythms over time scales of minutes using spatial independent component analysis , 2012, Human brain mapping.

[493]  Jorma Laaksonen,et al.  Ubiquitous Contextual Information Access with Proactive Retrieval and Augmentation , 2009 .

[494]  Jaakko Hollmén Mixture modeling of gait patterns from sensor data , 2012, PETRA '12.

[495]  Thierry Poibeau,et al.  Multi-source, Multilingual Information Extraction and Summarization , 2012, Theory and Applications of Natural Language Processing.

[496]  Gemma C. Garriga,et al.  Evaluating Query Result Significance in Databases via Randomizations , 2010, SDM.

[497]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[498]  Gonzalo Navarro,et al.  Run-Length Compressed Indexes for Repetitive Sequence Collections , 2008 .

[499]  Nada Lavrac,et al.  Closed Sets for Labeled Data , 2006, PKDD.

[500]  Gad M. Landau,et al.  Binary Jumbled Pattern Matching on Trees and Tree-Like Structures , 2013, Algorithmica.

[501]  Chen Meng,et al.  Gene Selection in Time-Series Gene Expression Data , 2011, PRIB.

[502]  Valentin Polishchuk,et al.  Faster Algorithms for Minimum-Link Paths with Restricted Orientations , 2011, WADS.

[503]  Tuomas Sivula,et al.  Soul Music: Making music of your dreams , 2013 .

[504]  Stefano Basagni,et al.  Secure pebblenets , 2001, MobiHoc '01.

[505]  Minyi Guo,et al.  A class-feature-centroid classifier for text categorization , 2009, WWW '09.

[506]  Panagiotis Papapetrou,et al.  The smallest set of constraints that explains the data: a randomization approach , 2010 .

[507]  Luc De Raedt,et al.  Probabilistic Inductive Querying Using ProbLog , 2010, Inductive Databases and Constraint-Based Data Mining.

[508]  Hannu Toivonen,et al.  Unobtrusive online monitoring of sleep at home , 2012, 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[509]  Igor Goryanin,et al.  Journal of Integrative Bioinformatics , 2015 .

[510]  Roman YANGARBER,et al.  Content Collection and Analysis in the Domain of Epidemiology , 2008 .

[511]  Antti Ukkonen,et al.  ALGORITHMS FOR FINDING ORDERS AND ANALYZING SETS OF CHAINS , 2008 .

[512]  Arto Vihavainen,et al.  Predicting Relevance of Event Extraction for the End User , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[513]  Jarkko Tikka,et al.  INPUT VARIABLE SELECTION METHODS FOR CONSTRUCTION OF INTERPRETABLE REGRESSION MODELS , 2008 .

[514]  Travis Gagie,et al.  Heaviest Induced Ancestors and Longest Common Substrings , 2013, CCCG.

[515]  Hannes Heikinheimo,et al.  Decomposable Families of Itemsets , 2008, ECML/PKDD.

[516]  Aapo Hyvärinen,et al.  DirectLiNGAM: A Direct Method for Learning a Linear Non-Gaussian Structural Equation Model , 2011, J. Mach. Learn. Res..

[517]  Aapo Hyvärinen,et al.  Hermite Polynomials and Measures of Non-gaussianity , 2011, ICANN.

[518]  Esko Ukkonen Geometric Point Pattern Matching in the Knuth-Morris-Pratt Way , 2010, J. Univers. Comput. Sci..

[519]  Esko Ukkonen,et al.  On the complexity of finding gapped motifs , 2008, J. Discrete Algorithms.

[520]  Javad Nouri,et al.  Information-theoretic modeling of etymological sound change , 2013 .

[521]  Juha Kärkkäinen,et al.  Crochemore's String Matching Algorithm: Simplification, Extensions, Applications , 2013, Stringology.

[522]  Joonas Paalasmaa,et al.  Quantifying respiratory variation with force sensor measurements , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[523]  Aapo Hyvärinen,et al.  Spatial dependencies between local luminance and contrast in natural images. , 2008, Journal of vision.

[524]  Joseph S. B. Mitchell,et al.  Routing a maximum number of disks through a scene of moving obstacles , 2008, SCG '08.