Machine Learning and Knowledge Discovery in Databases: International Workshops of ECML PKDD 2019, Würzburg, Germany, September 16–20, 2019, Proceedings, Part II

In this ongoing study, we propose a higher order data mining approach for modelling district heating (DH) substations’ behaviour and linking operational behaviour representative profiles with different performance indicators. We initially create substation’s operational behaviour models by extracting weekly patterns and clustering them into groups of similar patterns. The built models are further analyzed and integrated into an overall substation model by applying consensus clustering. The different operational behaviour profiles represented by the exemplars of the consensus clustering model are then linked to performance indicators. The labelled behaviour profiles are deployed over the whole heating season to derive diverse insights about the substation’s performance. The results show that the proposed method can be used for modelling, analyzing and understanding the deviating and suboptimal DH substation’s behaviours.

[1]  Bernd Müller,et al.  LIVIVO – the Vertical Search Engine for Life Sciences , 2017, Datenbank-Spektrum.

[2]  Yisong Yue,et al.  Data-Driven Ghosting using Deep Imitation Learning , 2017 .

[3]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[4]  Georgios Balikas,et al.  An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition , 2015, BMC Bioinformatics.

[5]  Anni Coden,et al.  The ConceptMapper Approach to Named Entity Recognition , 2010, LREC.

[6]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[7]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[8]  Luis M. de Campos,et al.  CoLe and UTAI at BioASQ 2015: Experiments with Similarity Based Descriptor Assignment , 2015, CLEF.

[9]  Zhiyong Lu,et al.  Beyond accuracy: creating interoperable and scalable text-mining web services , 2016, Bioinform..

[10]  Grigorios Tsoumakas,et al.  Large-Scale Semantic Indexing of Biomedical Publications , 2013, BioASQ@CLEF.

[11]  Michael A. Alcorn (batter|pitcher)2vec: Statistic-Free Talent Modeling With Neural Player Embeddings , 2018 .

[12]  N. Fenton,et al.  Determining the level of ability of football teams by dynamic ratings based on the relative discrepancies in scores between adversaries , 2013 .

[13]  João Gama,et al.  A framework to monitor clusters evolution applied to economy and finance problems , 2012, Intell. Data Anal..

[14]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[15]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[18]  Wlodek Zadrozny,et al.  UNCC Biomedical Semantic Question Answering Systems. BioASQ: Task-7B, Phase-B , 2019, PKDD/ECML Workshops.

[19]  Veselka Boeva,et al.  A Split-Merge Evolutionary Clustering Algorithm , 2019, ICAART.

[20]  Dina Demner-Fushman,et al.  Recent Enhancements to the NLM Medical Text Indexer , 2014, CLEF.

[21]  Pablo M. Granitto,et al.  How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Charu C. Aggarwal,et al.  On change diagnosis in evolving data streams , 2005, IEEE Transactions on Knowledge and Data Engineering.

[23]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24]  A. Elo The rating of chessplayers, past and present , 1978 .

[25]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[26]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[27]  Dimitris Pappas,et al.  AUEB at BioASQ 6: Document and Snippet Retrieval , 2018, ArXiv.

[28]  Koushik Varma Kalidindi Deconstructing Word Embeddings , 2019, ArXiv.

[29]  Eric Nyberg,et al.  Learning to Answer Biomedical Questions: OAQA at BioASQ 4B , 2016 .

[30]  Manuel Montes-y-Gómez,et al.  A Mixed Information Source Approach for Biomedical Question Answering: MindLab at BioASQ 7B , 2019, PKDD/ECML Workshops.

[31]  Yanchun Zhang,et al.  The Fudan Participation in the 2015 BioASQ Challenge: Large-scale Biomedical Semantic Indexing and Question Answering , 2015, CLEF.

[32]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[33]  Kurt Hornik,et al.  Forecasting sports tournaments by ratings of (prob)abilities: A comparison for the EURO 2008 , 2010 .

[34]  Edwin Lughofer A dynamic split-and-merge approach for evolving cluster models , 2012, Evol. Syst..

[35]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[36]  Konstantinos Pelechrinis,et al.  LinNet: Probabilistic Lineup Evaluation Through Network Embedding , 2017, ECML/PKDD.

[37]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[38]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[39]  Jojo Moolayil Keras in Action , 2019 .

[40]  Johannes Fürnkranz,et al.  Sequential Clustering and Contextual Importance Measures for Incremental Update Summarization , 2016, COLING.

[41]  Danqi Chen,et al.  CoQA: A Conversational Question Answering Challenge , 2018, TACL.

[42]  Timothy M. Hospedales,et al.  Analogies Explained: Towards Understanding Word Embeddings , 2019, ICML.

[43]  Luca Soldaini QuickUMLS: a fast, unsupervised approach for medical concept extraction , 2016 .

[44]  Diego Molla Towards the Use of Deep Reinforcement Learning with Global Policy For Query-based Extractive Summarisation , 2017 .

[45]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[46]  Masakazu Matsugu,et al.  Subject independent facial expression recognition with robust face detection using a convolutional neural network , 2003, Neural Networks.

[47]  Aidong Zhang,et al.  MeSHProbeNet: a self-attentive probe net for MeSH indexing , 2019, Bioinform..

[48]  Håkan Grahn,et al.  Profiling of Household Residents' Electricity Consumption Behavior Using Clustering Analysis , 2019, ICCS.

[49]  Richard Hans Robert Hahnloser,et al.  Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit , 2000, Nature.

[50]  Giuseppe Attardi,et al.  Transformer Models for Question Answering at BioASQ 2019 , 2019, PKDD/ECML Workshops.

[51]  José Luís Oliveira,et al.  BeCAS: biomedical concept recognition services and visualization , 2013, Bioinform..

[52]  Jaewoo Kang,et al.  Pre-trained Language Model for Biomedical Question Answering , 2019, PKDD/ECML Workshops.

[53]  Dina Demner-Fushman,et al.  Convolutional Neural Network for Automatic MeSH Indexing , 2019, PKDD/ECML Workshops.

[54]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[55]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[56]  ChengXiang Zhai,et al.  DeepMeSH: deep semantic representation for improving large-scale MeSH indexing , 2016, Bioinform..

[57]  Fatma Oezdemir-Zaech,et al.  Semantically Corroborating Neural Attention for Biomedical Question Answering , 2019, PKDD/ECML Workshops.

[58]  Bowen Zhou,et al.  ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs , 2015, TACL.

[59]  Thorsten Joachims,et al.  Predicting Matchups and Preferences in Context , 2016, KDD.

[60]  Daniel King,et al.  ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing , 2019, BioNLP@ACL.

[61]  Kailash Budhathoki,et al.  Ranking the Teams in European Football Leagues With Agony , 2018, MLSA@PKDD/ECML.

[62]  Rajeev Motwani,et al.  Incremental clustering and dynamic information retrieval , 1997, STOC '97.

[63]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[64]  Dietrich Rebholz-Schuhmann,et al.  Selected Approaches Ranking Contextual Term for the BioASQ Multi-label Classification (Task6a and 7a) , 2019, PKDD/ECML Workshops.

[65]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[66]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS , 1952 .

[67]  Liviu Iftode,et al.  Finding hierarchy in directed online social networks , 2011, WWW.

[68]  Lars Magnus Hvattum,et al.  Using ELO ratings for match result prediction in association football , 2010 .

[69]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.