Big Data Analytics and Knowledge Discovery: 22nd International Conference, DaWaK 2020, Bratislava, Slovakia, September 14–17, 2020, Proceedings

The International Conference on Big Data Analytics and Knowledge Discovery (DaWaK) has become a key conduit to exchange experience and knowledge among researchers and practitioners in the field of data warehousing and knowledge discovery. This study has quantitatively analyzed the 775 papers published in DaWaK from 1999 to 2019. This study presents the knowledge structure of the DaWaK papers and identifies the evolution of research topics in this discipline. Several text mining techniques were applied to analyze the contents of the research fields and to structure the knowledge presented at DaWaK. Dirichlet Multinomial Regression (DMR) is used to examine the trend of the research topics. Research metrics were used to identify conference and paper performance in terms of citation counts, readers, and the number of downloads. The study shows that DaWaK research outcomes have been receiving consistent attention from the scholarly community in the past 21 years. The 775 papers were cited by 4,339 times, marking the average number of citations of each proceeding as 207 times, and the average number of citations per published paper as six times.

[1]  K. Pearson On the χ 2 Test of Goodness of Fit , 1922 .

[2]  W. O. Kermack,et al.  A contribution to the mathematical theory of epidemics , 1927 .

[3]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[4]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[5]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[6]  F. E. Grubbs Procedures for Detecting Outlying Observations in Samples , 1969 .

[7]  Fred T. Krogh,et al.  Efficient Algorithms for Polynomial Interpolation and Numerical Differentiation , 1970 .

[8]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[9]  J. D Gergonne,et al.  The application of the method of least squares to the interpolation of sequences , 1974 .

[10]  H. Akaike A new look at the statistical model identification , 1974 .

[11]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[12]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[13]  W. Weston Meyer,et al.  Optimal error bounds for cubic spline interpolation , 1976 .

[14]  Alon Itai,et al.  Finding a minimum circuit in a graph , 1977, STOC '77.

[15]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[16]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  George L. Nemhauser,et al.  A polynomial algorithm for the minimum weighted clique cover problem on claw-free perfect graphs , 1982, Discret. Math..

[18]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[19]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[20]  Anil K. Jain,et al.  A spatial filtering approach to texture analysis , 1985, Pattern Recognit. Lett..

[21]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[22]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[23]  D. Laband,et al.  The social cost of rent-seeking: First estimates , 1988 .

[24]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[25]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[26]  Irma J. Terpenning,et al.  STL : A Seasonal-Trend Decomposition Procedure Based on Loess , 1990 .

[27]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[28]  Edward A. Luke Defining and measuring scalability , 1993, Proceedings of Scalable Parallel Libraries Conference.

[29]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[30]  Emilio Pagoulatos,et al.  Rent seeking and the welfare cost of trade barriers , 1994 .

[31]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[32]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[33]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[34]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[35]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[36]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[37]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[38]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[39]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[40]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[41]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[42]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[43]  D. W.,et al.  Customer lifetime value: Marketing models and applications , 1998 .

[44]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[45]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[46]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[47]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[48]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[49]  W. Reinartz,et al.  On the Profitability of Long-Life Customers in a Noncontractual Setting: An Empirical Investigation and Implications for Marketing , 2000 .

[50]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[51]  Herbert W. Hethcote,et al.  The Mathematics of Infectious Diseases , 2000, SIAM Rev..

[52]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[53]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[54]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[55]  David C. Yen,et al.  Customer Relationship Management: An Analysis Framework and Implementation Strategies , 2001, J. Comput. Inf. Syst..

[56]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[57]  Arindam Banerjee,et al.  Clickstream clustering using weighted longest common subsequences , 2001 .

[58]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[59]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[60]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[61]  Gautam Biswas,et al.  Unsupervised Learning with Mixed Numeric and Nominal Data , 2002, IEEE Trans. Knowl. Data Eng..

[62]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[63]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[64]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[65]  Eyke Hüllermeier,et al.  Association Rules for Expressing Gradual Dependencies , 2002, PKDD.

[66]  Carlos Ordonez,et al.  Accelerating EM clustering to find high-quality solutions , 2003, Knowledge and Information Systems.

[67]  Niklas Sörensson,et al.  An Extensible SAT-solver , 2003, SAT.

[68]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[69]  Qiang Yang,et al.  Mining high utility itemsets , 2003, Third IEEE International Conference on Data Mining.

[70]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[71]  atherine,et al.  Finding the number of clusters in a data set : An information theoretic approach C , 2003 .

[72]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[73]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[74]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[75]  Huan Liu,et al.  Redundancy based feature selection for microarray data , 2004, KDD.

[76]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[77]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[78]  Ricardo Vilalta,et al.  Introduction to the Special Issue on Meta-Learning , 2004, Machine Learning.

[79]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[80]  Eamonn J. Keogh,et al.  Making Time-Series Classification More Accurate Using Learned Constraints , 2004, SDM.

[81]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[82]  Guy Lapalme,et al.  Legal Text Summarization by Exploration of the Thematic Structure and Argumentative Roles , 2004 .

[83]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[84]  Harri Niska,et al.  Methods for imputation of missing values in air quality data sets , 2004 .

[85]  Frank Budinsky,et al.  Eclipse modeling framework : a developer's guide , 2004 .

[86]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[87]  Panos Kalnis,et al.  On Discovering Moving Clusters in Spatio-temporal Data , 2005, SSTD.

[88]  Fabian Mörchen,et al.  Optimizing time series discretization for knowledge discovery , 2005, KDD '05.

[89]  Christophe Croux,et al.  Bagging and Boosting Classification Trees to Predict Churn , 2006 .

[90]  Maxime Crochemore,et al.  Bases of motifs for generating repeated patterns with wild cards , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[91]  Eamonn J. Keogh,et al.  A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering , 2005, PAKDD.

[92]  Ying Liu,et al.  A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets , 2005, PAKDD.

[93]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[94]  Timos K. Sellis,et al.  State-space optimization of ETL workflows , 2005, IEEE Transactions on Knowledge and Data Engineering.

[95]  Osmar R. Zaïane,et al.  On Pruning and Tuning Rules for Associative Classifiers , 2005, KES.

[96]  Kyoji Kawagoe,et al.  Extended SAX: Extension of Symbolic Aggregate Approximation for Financial Time Series Data Representation , 2006 .

[97]  Sanjay Chawla,et al.  Mining for Outliers in Sequential Databases , 2006, SDM.

[98]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[99]  Marc Boullé,et al.  MODL: A Bayes optimal discretization method for continuous attributes , 2006, Machine Learning.

[100]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[101]  Rob J Hyndman,et al.  25 years of time series forecasting , 2006 .

[102]  Hans-Peter Kriegel,et al.  Pattern Mining in Frequent Dynamic Subgraphs , 2006, Sixth International Conference on Data Mining (ICDM'06).

[103]  Howard J. Hamilton,et al.  Mining itemset utilities from transaction databases , 2006, Data Knowl. Eng..

[104]  Dominique M. Hanssens,et al.  Modeling Customer Lifetime Value , 2006 .

[105]  Lipika Dey,et al.  A k-mean clustering algorithm for mixed numeric and categorical data , 2007, Data Knowl. Eng..

[106]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[107]  Hiroki Arimura,et al.  An efficient polynomial space and polynomial delay algorithm for enumeration of maximal motifs in a sequence , 2007, J. Comb. Optim..

[108]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[109]  Thomas Schank,et al.  Algorithmic Aspects of Triangle-Based Network Analysis , 2007 .

[110]  Claire Grover,et al.  Extractive summarisation of legal texts , 2006, Artificial Intelligence and Law.

[111]  Luís Torgo,et al.  Resource-Bounded Fraud Detection , 2007, EPIA Workshops.

[112]  Ansaf Salleb-Aouissi,et al.  QuantMiner: A Genetic Algorithm for Mining Quantitative Association Rules , 2007, IJCAI.

[113]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[114]  Engelbert Mephu Nguifo,et al.  Extraction of Association Rules Based on Literalsets , 2007, DaWaK.

[115]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[116]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[117]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[118]  Ruoming Jin,et al.  Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[119]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[120]  Yen-Liang Chen,et al.  Mining Nonambiguous Temporal Patterns for Interval-Based Events , 2007, IEEE Transactions on Knowledge and Data Engineering.

[121]  Patrick Meyer,et al.  Association Rule Interestingness Measures: Experimental and Theoretical Studies , 2007, Quality Measures in Data Mining.

[122]  Yue-Shi Lee,et al.  Mining High Utility Quantitative Association Rules , 2007, DaWaK.

[123]  Marc Boullé,et al.  Compression-Based Averaging of Selective Naive Bayes Classifiers , 2007, J. Mach. Learn. Res..

[124]  Amir Khanlari,et al.  CUSTOMER LIFETIME VALUE (CLV) MEASUREMENT BASED ON RFM MODEL , 2007 .

[125]  Luc De Raedt,et al.  Constraint programming for itemset mining , 2008, KDD.

[126]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[127]  Raj P. Gopalan,et al.  Efficient Mining of High Utility Itemsets from Large Datasets , 2008, PAKDD.

[128]  Siegfried Nijssen,et al.  What Is Frequent in a Single Graph? , 2007, PAKDD.

[129]  Rajkumar Buyya,et al.  MRPGA: An Extension of MapReduce for Parallelizing Genetic Algorithms , 2008, 2008 IEEE Fourth International Conference on eScience.

[130]  Alexander Schliep,et al.  Ranking and selecting clustering algorithms using a meta-learning approach , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[131]  Andrew McCallum,et al.  Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression , 2008, UAI.

[132]  Peter Vinkler,et al.  Correlation between the structure of scientific research, scientometric indicators and GDP in EU and non-EU countries , 2008, Scientometrics.

[133]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[134]  Mong-Li Lee,et al.  Mining relationships among interval-based events for classification , 2008, SIGMOD Conference.

[135]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[136]  George Athanasopoulos,et al.  Hierarchical forecasts for Australian domestic tourism , 2009 .

[137]  Qing He,et al.  Parallel K-Means Clustering Based on MapReduce , 2009, CloudCom.

[138]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[139]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[140]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[141]  Gerhard Nahler,et al.  Pearson Correlation Coefficient , 2020, Definitions.

[142]  Xavier Llorà,et al.  Scaling Genetic Algorithms Using MapReduce , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[143]  Aristides Gionis,et al.  Mining Graph Evolution Rules , 2009, ECML/PKDD.

[144]  Yuval Shahar,et al.  Medical Temporal-Knowledge Discovery via Temporal Abstraction , 2009, AMIA.

[145]  Hao Yu,et al.  State of the Art in Parallel Computing with R , 2009 .

[146]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[147]  Young-Koo Lee,et al.  Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases , 2009, IEEE Transactions on Knowledge and Data Engineering.

[148]  Maguelonne Teisseire,et al.  Mining Frequent Gradual Itemsets from Large Databases , 2009, IDA.

[149]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[150]  Christoforos E. Kozyrakis,et al.  Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[151]  Francisco de A. T. de Carvalho,et al.  An Analysis of Meta-learning Techniques for Ranking Clustering Algorithms Applied to Artificial Data , 2009, ICANN.

[152]  John Cieslewicz,et al.  SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions , 2009, Proc. VLDB Endow..

[153]  Ivan G. Costa,et al.  Mining Rules for the Automatic Selection Process of Clustering Methods Applied to Cancer Gene Expression Data , 2009, ICANN.

[154]  Pang-Ning Tan,et al.  Detection and Characterization of Anomalies in Multivariate Time Series , 2009, SDM.

[155]  Dimitrios Gunopulos,et al.  Mining frequent arrangements of temporal intervals , 2009, Knowledge and Information Systems.

[156]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[157]  Dmitriy Fradkin,et al.  Robust Mining of Time Intervals with Semi-interval Partial Order Patterns , 2010, SDM.

[158]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[159]  Anthony K. H. Tung,et al.  On Triangulation-based Dense Neighborhood Graphs Discovery , 2010, Proc. VLDB Endow..

[160]  Rui Xu,et al.  Clustering Algorithms in Biomedical Research: A Review , 2010, IEEE Reviews in Biomedical Engineering.

[161]  P. Torcellini,et al.  Criteria for Definition of Net Zero Energy Buildings , 2010 .

[162]  Tammo H. A. Bijmolt,et al.  Staying Power of Churn Prediction Models , 2010 .

[163]  Luís Torgo,et al.  Data Mining with R: Learning with Case Studies , 2010 .

[164]  Alexandre Termier,et al.  PGLCM: efficient parallel mining of closed frequent gradual itemsets , 2010, 2010 IEEE International Conference on Data Mining.

[165]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[166]  Michael Stonebraker,et al.  MapReduce and parallel DBMSs: friends or foes? , 2010, CACM.

[167]  Suh-Yin Lee,et al.  An efficient algorithm for mining time interval-based patterns in large database , 2010, CIKM.

[168]  Russell Smyth,et al.  Multivariate Granger causality between electricity generation, exports, prices and GDP in Malaysia , 2010 .

[169]  Carlos Ordonez,et al.  Optimization of Linear Recursive Queries in SQL , 2010, IEEE Transactions on Knowledge and Data Engineering.

[170]  Byeong-Soo Jeong,et al.  A Novel Approach for Mining High‐Utility Sequential Patterns in Sequence Databases , 2010 .

[171]  Nitin Kumar,et al.  An Efficient Heuristic for Logical Optimization of ETL Workflows , 2010, BIRTE.

[172]  Maoguo Gong,et al.  Unsupervised evolutionary clustering algorithm for mixed type data , 2010, IEEE Congress on Evolutionary Computation.

[173]  David Lo,et al.  Mining interesting link formation rules in social networks , 2010, CIKM.

[174]  Chen Li,et al.  Efficient parallel set-similarity joins using MapReduce , 2010, SIGMOD Conference.

[175]  Matthias Wissner,et al.  The Smart Grid – A saucerful of secrets? , 2011 .

[176]  Tom A. B. Snijders,et al.  Social Network Analysis , 2011, International Encyclopedia of Statistical Science.

[177]  Jeffrey Xu Yu,et al.  Finding maximal cliques in massive networks , 2011, TODS.

[178]  Minoru Etoh,et al.  Correlation and Contrast Link Formation Patterns in a Time Evolving Graph , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[179]  Nathan J. Grasse,et al.  The Influence of Lobbying Activityin State Legislatures: Evidence from Wisconsin , 2011 .

[180]  Philip S. Yu,et al.  Mining High Utility Mobile Sequential Patterns in Mobile Commerce Environments , 2011, DASFAA.

[181]  Henry Lieberman,et al.  Modeling the Detection of Textual Cyberbullying , 2011, The Social Mobile Web.

[182]  Alon Y. Halevy,et al.  Web data management , 2011, SIGMOD '11.

[183]  Alin Deutsch,et al.  ASTERIX: towards a scalable, semistructured data platform for evolving-world models , 2011, Distributed and Parallel Databases.

[184]  Kelly Reynolds,et al.  Using Machine Learning to Detect Cyberbullying , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[185]  Hal Daumé,et al.  A Co-training Approach for Multi-view Spectral Clustering , 2011, ICML.

[186]  Amedeo Napoli,et al.  Revisiting Numerical Pattern Mining with Formal Concept Analysis , 2011, IJCAI.

[187]  Peter J. Haas,et al.  Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[188]  Arnaud Doucet,et al.  Autoregressive Kernels For Time Series , 2011, 1101.0673.

[189]  Panagiotis Papapetrou,et al.  ARTEMIS: Assessing the Similarity of Event-Interval Sequences , 2011, ECML/PKDD.

[190]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[191]  Luc De Raedt,et al.  Itemset mining: A constraint programming perspective , 2011, Artif. Intell..

[192]  Muhammad Marwan Muhammad Fuad Genetic Algorithms-Based Symbolic Aggregate Approximation , 2012, DaWaK.

[193]  Fuzhen Zhuang,et al.  Multi-view learning via probabilistic latent semantic analysis , 2012, Inf. Sci..

[194]  Abdulhamit Subasi,et al.  Parallelization of genetic algorithms using Hadoop Map/Reduce , 2012, SOCO 2012.

[195]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[196]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[197]  Sergei Vassilvitskii,et al.  Scalable K-Means++ , 2012, Proc. VLDB Endow..

[198]  R. Ordelman,et al.  Improved cyberbullying detection using gender information , 2012 .

[199]  Kun Li,et al.  The MADlib Analytics Library or MAD Skills, the SQL , 2012, Proc. VLDB Endow..

[200]  Longbing Cao,et al.  USpan: an efficient algorithm for mining high utility sequential patterns , 2012, KDD.

[201]  James Cheng,et al.  Triangle listing in massive networks , 2012, TKDD.

[202]  Mengchi Liu,et al.  Mining high utility itemsets without candidate generation , 2012, CIKM.

[203]  Leandro Nunes de Castro,et al.  Clustering Algorithm Recommendation: A Meta-learning Approach , 2012, SEMCCO.

[204]  Jean-François Boulicaut,et al.  Cohesive Co-evolution Patterns in Dynamic Attributed Graphs , 2012, Discovery Science.

[205]  Linda Di Geronimo,et al.  A Parallel Genetic Algorithm Based on Hadoop MapReduce for the Automatic Generation of JUnit Test Suites , 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.

[206]  Maral Dadvar,et al.  Improved Cyberbullying Detection Through Personal Profiles , 2012 .

[207]  Michael A. Langston,et al.  The maximum clique enumeration problem: algorithms, applications, and implementations , 2011, BMC Bioinformatics.

[208]  Muhammad Marwan Muhammad Fuad Differential evolution versus genetic algorithms: towards symbolic aggregate approximation of non-normalized time series , 2012, IDEAS '12.

[209]  I-En Liao,et al.  A new approach for data clustering and visualization using self-organizing maps , 2012, Expert Syst. Appl..

[210]  Astrid Rheinländer,et al.  Opening the Black Boxes in Data Flow Optimization , 2012, Proc. VLDB Endow..

[211]  Emmanuel Coquery,et al.  A SAT-Based Approach for Discovering Frequent, Closed and Maximal Patterns in a Sequence , 2012, ECAI.

[212]  Charu C. Aggarwal,et al.  Mining Text Data , 2012, Springer US.

[213]  Jason Lines,et al.  Transformation Based Ensembles for Time Series Classification , 2012, SDM.

[214]  Chunguang Zhou,et al.  An improved k-prototypes clustering algorithm for mixed numeric and categorical data , 2013, Neurocomputing.

[215]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[216]  R. Van Ness,et al.  Determinants and Effects of Corporate Lobbying , 2013 .

[217]  Lakhdar Sais,et al.  The Top-k Frequent Closed Itemset Mining Using Top-k SAT Problem , 2013, ECML/PKDD.

[218]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[219]  Philip S. Yu,et al.  Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[220]  Milos Hauskrecht,et al.  A temporal pattern mining approach for classifying electronic health record data , 2013, ACM Trans. Intell. Syst. Technol..

[221]  Chin-Wan Chung,et al.  An efficient MapReduce algorithm for counting triangles in a very large graph , 2013, CIKM.

[222]  John M. de Figueiredo,et al.  Advancing the Empirical Research on Lobbying , 2013 .

[223]  Simone A. Ludwig,et al.  Scaling Genetic Programming for data classification using MapReduce methodology , 2013, 2013 World Congress on Nature and Biologically Inspired Computing.

[224]  Yuhong Guo,et al.  Convex Subspace Representation Learning from Multi-View Data , 2013, AAAI.

[225]  Dirk Eddelbuettel,et al.  Seamless R and C++ Integration with Rcpp , 2013 .

[226]  George Athanasopoulos,et al.  Forecasting: principles and practice , 2013 .

[227]  Álvaro Veiga,et al.  Comparing variable selection techniques for linear regression: LASSO and Autometrics , 2013 .

[228]  Jiawei Han,et al.  Multi-View Clustering via Joint Nonnegative Matrix Factorization , 2013, SDM.

[229]  Rinkle Rani,et al.  Modeling and querying data in NoSQL databases , 2013, 2013 IEEE International Conference on Big Data.

[230]  Volker Markl,et al.  Peeking into the optimization of data flow programs with MapReduce-style UDFs , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[231]  Matt Grossmann,et al.  Lobbying and congressional bill advancement , 2013 .

[232]  Dolf Trieschnigg,et al.  Improving Cyberbullying Detection with User Context , 2013, ECIR.

[233]  Volker Markl,et al.  Iterative parallel data processing with stratosphere: an inside look , 2013, SIGMOD '13.

[234]  Lakhdar Sais,et al.  Boolean satisfiability for sequence mining , 2013, CIKM.

[235]  George C. Runger,et al.  A Bag-of-Features Framework to Classify Time Series , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[236]  Panos Vassiliadis,et al.  Scheduling strategies for efficient ETL execution , 2013, Inf. Syst..

[237]  Panagiotis Papapetrou,et al.  IBSM: Interval-Based Sequence Matching , 2013, SDM.

[238]  Philip S. Yu,et al.  Mining high utility episodes in complex event sequences , 2013, KDD.

[239]  Yufei Tao,et al.  Massive graph triangulation , 2013, SIGMOD '13.

[240]  Yuval Shahar,et al.  Classification-driven temporal discretization of multivariate time series , 2014, Data Mining and Knowledge Discovery.

[241]  K. Divya,et al.  A Study on Predictors of GDP: Early Signals , 2014 .

[242]  Sanjiban Sekhar Roy,et al.  Stock Market Forecasting Using LASSO Linear Regression Model , 2014, AECIA.

[243]  Vincent S. Tseng,et al.  FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning , 2014, ISMIS.

[244]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[245]  Zahid Halim,et al.  Multi-view document clustering via ensemble method , 2014, Journal of Intelligent Information Systems.

[246]  Norman May,et al.  A study of partitioning and parallel UDF execution with the SAP HANA database , 2014, SSDBM '14.

[247]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[248]  Jennifer Widom,et al.  The Beckman Report on Database Research , 2014, SGMD.

[249]  Elena Apostol,et al.  A Parallel Genetic Algorithm Framework for Cloud Computing Applications , 2014, ARMS-CC@PODC.

[250]  Yun Fu,et al.  Low-Rank Common Subspace for Multi-view Learning , 2014, 2014 IEEE International Conference on Data Mining.

[251]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[252]  Maral Dadvar,et al.  Experts and machines united against cyberbullying , 2014 .

[253]  Atri Rudra,et al.  Skew strikes back: new developments in the theory of join algorithms , 2013, SGMD.

[254]  Huan Liu,et al.  Feature selection for classification: A review , 2014 .

[255]  Dolf Trieschnigg,et al.  Experts and Machines against Bullies: A Hybrid Approach to Detect Cyberbullies , 2014, Canadian Conference on AI.

[256]  Mark Roantree,et al.  A heuristic approach to selecting views for materialization , 2014, Softw. Pract. Exp..

[257]  Dilpreet Singh,et al.  A survey on platforms for big data analytics , 2014, Journal of Big Data.

[258]  Anastasia Ailamaki,et al.  H2O: a hands-free adaptive store , 2014, SIGMOD Conference.

[259]  Cynthia A. Phillips,et al.  Why do simple algorithms for triangle enumeration work in the real world? , 2014, Internet Math..

[260]  George C. Runger,et al.  Learning a symbolic representation for multivariate time series classification , 2015, Data Mining and Knowledge Discovery.

[261]  Sriram Padmanabhan,et al.  Determining Essential Statistics for Cost Based Optimization of an ETL Workflow , 2014, EDBT.

[262]  Mehdi Kaytoue-Uberall,et al.  Triggering patterns of topology changes in dynamic graphs , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[263]  Kjersti Aas,et al.  Modelling and predicting customer churn from an insurance company , 2014 .

[264]  Vladimir Gorodetsky,et al.  Big Data: Opportunities, Challenges and Solutions , 2014, ICTERI.

[265]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[266]  Xuran Zhao,et al.  A subspace co-training framework for multi-view clustering , 2014, Pattern Recognit. Lett..

[267]  Suman Nath,et al.  Scalable data summarization on big data , 2014, Distributed and Parallel Databases.

[268]  Zhiyuan Liu,et al.  A C-LSTM Neural Network for Text Classification , 2015, ArXiv.

[269]  Jignesh M. Patel,et al.  The Case Against Specialized Graph Analytics Engines , 2015, CIDR.

[270]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[271]  Yaoliang Yu,et al.  Petuum: A New Platform for Distributed Machine Learning on Big Data , 2013, IEEE Transactions on Big Data.

[272]  Felix Naumann,et al.  SOFA: An extensible logical optimizer for UDF-heavy data flows , 2015, Inf. Syst..

[273]  Meike Klettke,et al.  Schema Extraction and Structural Outlier Detection for JSON-based NoSQL Data Stores , 2015, BTW.

[274]  Carlyna Bondiombouy Query Processing in Cloud Multistore Systems , 2015 .

[275]  Srikumar Krishnamoorthy,et al.  Pruning strategies for mining high utility itemsets , 2015, Expert Syst. Appl..

[276]  Vipin Kumar,et al.  Multi-view ensemble learning: an optimal feature set partitioning for high-dimensional data classification , 2015, Knowledge and Information Systems.

[277]  Yan Yu,et al.  Variable selection and corporate bankruptcy forecasts , 2015 .

[278]  Min Song,et al.  PKDE4J: Entity and relation extraction for public knowledge discovery , 2015, J. Biomed. Informatics.

[279]  Jun Wang,et al.  On the Non-Trivial Generalization of Dynamic Time Warping to the Multi-Dimensional Case , 2015, SDM.

[280]  Julian Shun,et al.  Multicore triangle computations without tuning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[281]  Jesús García Molina,et al.  Inferring Versioned Schemas from NoSQL Databases and Its Applications , 2015, ER.

[282]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[283]  Hartmut Klauck,et al.  Distributed Computation of Large-scale Graph Problems , 2015, SODA.

[284]  George K. Karagiannidis,et al.  Efficient Machine Learning for Big Data: A Review , 2015, Big Data Res..

[285]  Eser Kandogan,et al.  LabBook: Metadata-driven social collaborative data analysis , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[286]  Lovekesh Vig,et al.  Long Short Term Memory Networks for Anomaly Detection in Time Series , 2015, ESANN.

[287]  Nadia Essoussi,et al.  MapReduce-based k-prototypes clustering method for big data , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[288]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[289]  George C. Runger,et al.  Time series representation and similarity based on local autopatterns , 2016, Data Mining and Knowledge Discovery.

[290]  Vadlamani Ravi,et al.  One-class support vector machine based undersampling: Application to churn prediction and insurance fraud detection , 2015, 2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC).

[291]  Zhe Wang,et al.  A novel cluster center initialization method for the k-prototypes algorithms using centrality and distance , 2015 .

[292]  Liang Wang,et al.  Multi-view clustering via pairwise sparse subspace representation , 2015, Neurocomputing.

[293]  Kai Chen,et al.  A LSTM-based method for stock returns prediction: A case study of China stock market , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[294]  Hassan H. Alrehamy,et al.  Personal Data Lake with Data Gravity Pull , 2015, 2015 IEEE Fifth International Conference on Big Data and Cloud Computing.

[295]  Nadia Essoussi,et al.  Using Sequences of Words for Non-Disjoint Grouping of Documents , 2015, Int. J. Pattern Recognit. Artif. Intell..

[296]  Xiufeng Liu,et al.  An ETL optimization framework using partitioning and parallelization , 2015, SAC.

[297]  Ladjel Bellatreche,et al.  Managing Data Warehouse Traceability: A Life-Cycle Driven Approach , 2015, CAiSE.

[298]  Gang Hu,et al.  SQLGraph: An Efficient Relational-Based Property Graph Store , 2015, SIGMOD Conference.

[299]  Madhav V. Marathe,et al.  A Space-Efficient Parallel Algorithm for Counting Exact Triangles in Massive Networks , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.

[300]  Felix Naumann,et al.  Data profiling , 2017, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[301]  Jordi Cabot,et al.  JSONDiscoverer: Visualizing the schema lurking behind JSON documents , 2016, Knowl. Based Syst..

[302]  Alberto Abelló,et al.  Incremental Consolidation of Data-Intensive Multi-Flows , 2016, IEEE Transactions on Knowledge and Data Engineering.

[303]  Boumediene Belkhouche,et al.  A Comparative Analysis of Machine Learning Classifiers for Twitter Sentiment Analysis , 2016, Res. Comput. Sci..

[304]  C. Anthony Di Benedetto,et al.  Customer equity and value management of global brands: Bridging theory and practice from financial and marketing perspectives: Introduction to a Journal of Business Research Special Section , 2016 .

[305]  Luis Gravano,et al.  k-Shape: Efficient and Accurate Clustering of Time Series , 2016, SGMD.

[306]  Beth Plale,et al.  Provenance as Essential Infrastructure for Data Lakes , 2016, IPAW.

[307]  Hongxin Hu,et al.  Cyberbullying Detection with a Pronunciation Based Convolutional Neural Network , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[308]  Eamonn J. Keogh,et al.  The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances , 2016, Data Mining and Knowledge Discovery.

[309]  Christoph Quix,et al.  Metadata Extraction and Management in Data LakesWith GEMMS , 2016, Complex Syst. Informatics Model. Q..

[310]  Adriana Mexicano,et al.  The early stop heuristic: A new convergence criterion for K-means , 2016 .

[311]  Panagiotis Papapetrou,et al.  STIFE: A Framework for Feature-Based Classification of Sequences of Temporal Intervals , 2016, DS.

[312]  ChengXiang Zhai,et al.  Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining , 2016 .

[313]  Montserrat Guillen,et al.  Predicting Probability of Customer Churn in Insurance , 2016, MS.

[314]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[315]  Sandra Geisler,et al.  Constance: An Intelligent Data Lake System , 2016, SIGMOD Conference.

[316]  Bernhard Mitschang,et al.  The Stuttgart IT Architecture for Manufacturing - An Architecture for the Data-Driven Factory , 2016, ICEIS.

[317]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[318]  Lakhdar Sais,et al.  A SAT-Based Approach for Mining Association Rules , 2016, IJCAI.

[319]  Emmanuel Müller,et al.  Detecting Change Processes in Dynamic Networks by Frequent Graph Evolution Rule Mining , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[320]  Saeed Jalili,et al.  Single-pass and linear-time k-means clustering based on MapReduce , 2016, Inf. Syst..

[321]  Alon Y. Halevy,et al.  Managing Google's data lake: an overview of the Goods system , 2016, IEEE Data Eng. Bull..

[322]  Nouredine Melab,et al.  A GPU-based Branch-and-Bound algorithm using Integer-Vector-Matrix data structure , 2016, Parallel Comput..

[323]  Nikolaos Aletras,et al.  Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective , 2016, PeerJ Comput. Sci..

[324]  Un Desa Transforming our world : The 2030 Agenda for Sustainable Development , 2016 .

[325]  Raymond Y. K. Lau,et al.  Time series k-means: A new k-means type smooth subspace clustering for time series data , 2016, Inf. Sci..

[326]  David S. Rosenblum,et al.  From action to activity: Sensor-based activity recognition , 2016, Neurocomputing.

[327]  Liang Zhao,et al.  Time series clustering via community detection in networks , 2015, Inf. Sci..

[328]  Wellington Cabrera,et al.  The Gamma Matrix to Summarize Dense and Sparse Data Sets for Big Data Analytics , 2016, IEEE Transactions on Knowledge and Data Engineering.

[329]  Tong Zhang,et al.  Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings , 2016, ICML.

[330]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[331]  Michael Stonebraker,et al.  The BigDAWG polystore system and architecture , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[332]  Panagiotis Papapetrou,et al.  Generalized random shapelet forests , 2016, Data Mining and Knowledge Discovery.

[333]  Ulf Leser,et al.  Fast and Accurate Time Series Classification with WEASEL , 2017, CIKM.

[334]  Hong Yu,et al.  Multi-view clustering via multi-manifold regularized non-negative matrix factorization , 2017, Neural Networks.

[335]  Di Xiao,et al.  Improving I/O Complexity of Triangle Enumeration , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[336]  Liang Wang,et al.  Unified subspace learning for incomplete and unlabeled multi-view data , 2017, Pattern Recognit..

[337]  Robert Wrembel,et al.  From conceptual design to performance optimization of ETL workflows: current state of research and open problems , 2017, The VLDB Journal.

[338]  Osmar R. Zaïane,et al.  Exploiting statistically significant dependent rules for associative classification , 2017, Intell. Data Anal..

[339]  Frits W. Vaandrager,et al.  Model learning , 2017, Commun. ACM.

[340]  Diego Klabjan,et al.  Predicting litigation likelihood and time to litigation for patents , 2016, ICAIL.

[341]  Heri Ramampiaro,et al.  Efficient high utility itemset mining using buffered utility-lists , 2017, Applied Intelligence.

[342]  Subutai Ahmad,et al.  Unsupervised real-time anomaly detection for streaming data , 2017, Neurocomputing.

[343]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[344]  Michael T. Ewing,et al.  The impact of personalised incentives on the profitability of customer retention campaigns , 2017 .

[345]  Wellington Cabrera,et al.  Scalable parallel graph algorithms with matrix–vector multiplication evaluated with queries , 2017, Distributed and Parallel Databases.

[346]  Holger Ziekow,et al.  Benchmarking Big Data Technologies for Energy Procurement Efficiency , 2017, AMCIS.

[347]  Xavier Franch,et al.  A software reference architecture for semantic-aware Big Data systems , 2017, Inf. Softw. Technol..

[348]  Yun Sing Koh,et al.  mHUIMiner: A Fast High Utility Itemset Mining Algorithm for Sparse Datasets , 2017, PAKDD.

[349]  Zhi Cheng,et al.  Mining Recurrent Patterns in a Dynamic Attributed Graph , 2017, PAKDD.

[350]  Xuelong Li,et al.  Multi-View Clustering and Semi-Supervised Classification with Adaptive Neighbours , 2017, AAAI.

[351]  Josef van Genabith,et al.  Predicting the Law Area and Decisions of French Supreme Court Cases , 2017, RANLP.

[352]  Fabrice SPMF: A Java Open-Source Data Mining Library , 2017 .

[353]  Weibo Li,et al.  Detecting causality from short time-series data based on prediction of topologically equivalent attractors , 2017, BMC Systems Biology.

[354]  Lucas Dixon,et al.  Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.

[355]  Stanley B. Zdonik,et al.  Data Ingestion for the Connected World , 2017, CIDR.

[356]  George Ostrouchov,et al.  Programming with BIG Data in R: Scaling Analytics from One to Thousands of Nodes , 2017, Big Data Res..

[357]  Dario Colazzo,et al.  Schema Inference for Massive JSON Datasets , 2017, EDBT.

[358]  Tao Chen,et al.  Expert Systems With Applications , 2022 .

[359]  Benjamin Letham,et al.  Forecasting at Scale , 2018, PeerJ Prepr..

[360]  Ulf Leser,et al.  Multivariate Time Series Classification with WEASEL+MUSE , 2017, ArXiv.

[361]  Weiping Li,et al.  Deep and Shallow Model for Insurance Churn Prediction Service , 2017, 2017 IEEE International Conference on Services Computing (SCC).

[362]  Lakhdar Sais,et al.  Enumerating Non-redundant Association Rules Using Satisfiability , 2017, PAKDD.

[363]  Jeffrey Xu Yu,et al.  All-in-One: Graph Processing in RDBMSs Revisited , 2017, SIGMOD Conference.

[364]  H. You Ex Post Lobbying , 2017, The Journal of Politics.

[365]  Tim Oates,et al.  Time series classification from scratch with deep neural networks: A strong baseline , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[366]  Jacky Akoka,et al.  Model driven reverse engineering of NoSQL property graph databases: The case of Neo4j , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[367]  Hong Cheng,et al.  Efficient MapReduce algorithms for triangle listing in billion-scale graphs , 2017, Distributed and Parallel Databases.

[368]  Nadia Essoussi,et al.  One-pass MapReduce-based clustering method for mixed large scale data , 2019, Journal of Intelligent Information Systems.

[369]  Meng Zhang,et al.  Neural Network Methods for Natural Language Processing , 2017, Computational Linguistics.

[370]  Wellington Cabrera,et al.  Comparing columnar, row and array DBMSs to process recursive queries on graphs , 2017, Inf. Syst..

[371]  Duong Tuan Anh,et al.  A novel clustering-based method for time series motif discovery under time warping measure , 2017, International Journal of Data Science and Analytics.

[372]  Nadia Essoussi,et al.  KP-S: A Spark-Based Design of the K-Prototypes Clustering for Big Data , 2017, 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA).

[373]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[374]  Yun Fu,et al.  Multi-View Clustering via Deep Matrix Factorization , 2017, AAAI.

[375]  Joseph M. Hellerstein,et al.  Ground: A Data Context Service , 2017, CIDR.

[376]  Vachik S. Dave,et al.  Triangle counting in large networks: a review , 2018, WIREs Data Mining Knowl. Discov..

[377]  Syed Muhammad Fawad Ali,et al.  Next-generation ETL Framework to Address the Challenges Posed by Big Data , 2018, DOLAP.

[378]  Nadia Essoussi,et al.  Scalable Random Sampling K-Prototypes Using Spark , 2018, DaWaK.

[379]  Han Zou,et al.  Non-Parametric Outliers Detection in Multiple Time Series A Case Study: Power Grid Data Analysis , 2018, AAAI.

[380]  Amina Adadi,et al.  Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) , 2018, IEEE Access.

[381]  Di Wu,et al.  Machine Learning for Building Energy and Indoor Environment: A Perspective , 2017, ArXiv.

[382]  Yugang Niu,et al.  Hourly day-ahead solar irradiance prediction using weather forecasts by LSTM , 2018 .

[383]  Nadia Essoussi,et al.  A Novel Tweets Clustering Method using Word Embeddings , 2018, 2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA).

[384]  Andreas Schütze,et al.  Sensors 4.0 – smart sensors and measurement technology enable Industry 4.0 , 2018 .

[385]  Markus Spiekermann,et al.  A Metadata Model for Data Goods , 2018 .

[386]  Andri Pranolo,et al.  Modeling Data Containing Outliers using ARIMA Additive Outlier (ARIMA-AO) , 2018 .

[387]  Domenico Ursino,et al.  A New Metadata Model to Uniformly Handle Heterogeneous Data Lake Sources , 2018, ADBIS.

[388]  Victor C. M. Leung,et al.  Incomplete multi-view clustering via deep semantic mapping , 2018, Neurocomputing.

[389]  Nicolas Lachiche,et al.  A scalable robust and automatic propositionalization approach for Bayesian classification of large mixed numerical and categorical data , 2018, Machine Learning.

[390]  Amit Dhurandhar,et al.  Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives , 2018, NeurIPS.

[391]  Takaaki Goto,et al.  A Framework to Convert NoSQL to Relational Model , 2018, ACIT 2018.

[392]  Matteo Golfarelli,et al.  Schema profiling of document-oriented databases , 2018, Inf. Syst..

[393]  Wanwan Wang,et al.  An empirical evaluation of high utility itemset mining algorithms , 2018, Expert Syst. Appl..

[394]  Torben Bach Pedersen,et al.  Analytical metadata modeling for next generation BI systems , 2018, J. Syst. Softw..

[395]  Carlos Ordonez,et al.  Big Data Analytics: Exploring Graphs with Optimized SQL Queries , 2018, DEXA Workshops.

[396]  Massimo Lamanna,et al.  SWAN: A service for interactive analysis in the cloud , 2018, Future Gener. Comput. Syst..

[397]  Houshang Darabi,et al.  LSTM Fully Convolutional Networks for Time Series Classification , 2017, IEEE Access.

[398]  Peter Robinson,et al.  On the Distributed Complexity of Large-Scale Graph Computations , 2016, SPAA.

[399]  Anne Laurent,et al.  Mining Spatial Gradual Patterns: Application to Measurement of Potentially Avoidable Hospitalizations , 2018, SOFSEM.

[400]  Chang-Dong Wang,et al.  TW-Co-k-means: Two-level weighted collaborative k-means for multi-view clustering , 2018, Knowl. Based Syst..

[401]  Amit Awekar,et al.  Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms , 2018, ECIR.

[402]  Peter Filzmoser,et al.  Time Series Analysis: Unsupervised Anomaly Detection Beyond Outlier Detection , 2018, ISPEC.

[403]  Penghua Li,et al.  Law text classification using semi-supervised convolutional neural networks , 2018, 2018 Chinese Control And Decision Conference (CCDC).

[404]  Michael Flynn,et al.  The UEA multivariate time series classification archive, 2018 , 2018, ArXiv.

[405]  Nima Hatami,et al.  Bag of recurrence patterns representation for time-series classification , 2018, Pattern Analysis and Applications.

[406]  Nadia Essoussi,et al.  Overview of Scalable Partitional Methods for Big Data Clustering , 2018, Clustering Methods for Big Data Analytics.

[407]  Filomena Ferrucci,et al.  Using Hadoop MapReduce for Parallel Genetic Algorithms: A Comparison of the Global, Grid and Island Models , 2018, Evolutionary Computation.

[408]  Alberto Abelló,et al.  Intelligent assistance for data pre-processing , 2018, Comput. Stand. Interfaces.

[409]  Fernando Bação,et al.  Oversampling for Imbalanced Learning Based on K-Means and SMOTE , 2017, Inf. Sci..

[410]  Engelbert Mephu Nguifo,et al.  An Approach for Extracting Frequent (Closed) Gradual Patterns Under Temporal Constraint , 2018, 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[411]  Mustafa Gokce Baydogan,et al.  Autoregressive forests for multivariate time series modeling , 2018, Pattern Recognit..

[412]  J. Abonyi,et al.  Evaluating the Interconnectedness of the Sustainable Development Goals Based on the Causality Analysis of Sustainability Indicators , 2018, Sustainability.

[413]  Dimitrios Kampas,et al.  Deep learning in law: early adaptation and legal word embeddings trained on large corpora , 2018, Artificial Intelligence and Law.

[414]  Ioana Giurgiu,et al.  Additive Explanations for Anomalies Detected from Multivariate Temporal Data , 2019, CIKM.

[415]  Nadia Essoussi,et al.  STiMR k-Means: An Efficient Clustering Method for Big Data , 2019, Int. J. Pattern Recognit. Artif. Intell..

[416]  Germain Forestier,et al.  Adversarial Attacks on Deep Neural Networks for Time Series Classification , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[417]  Alaa El. Sagheer,et al.  Time series forecasting of petroleum production using deep LSTM recurrent networks , 2019, Neurocomputing.

[418]  Carlos Ordonez,et al.  Scalable Machine Learning in the R Language Using a Summarization Matrix , 2019, DEXA.

[419]  Houshang Darabi,et al.  Multivariate LSTM-FCNs for Time Series Classification , 2018, Neural Networks.

[420]  Dario Colazzo,et al.  Parametric schema inference for massive JSON datasets , 2019, The VLDB Journal.

[421]  Germain Forestier,et al.  Deep learning for time series classification: a review , 2018, Data Mining and Knowledge Discovery.

[422]  Martha Tatusch,et al.  Show Me Your Friends and I'll Tell You Who You Are. Finding Anomalous Time Series by Conspicuous Cluster Transitions , 2019, AusDM.

[423]  Carlos Ordonez,et al.  ER4ML: An ER Modeling Tool to Represent Data Transformations in Data Science , 2019, ER Forum/Posters/Demos.

[424]  Jen-Wei Huang,et al.  Mining frequent and top-K High Utility Time Interval-based Events with Duration patterns , 2019, Knowledge and Information Systems.

[425]  Philippe Fournier-Viger,et al.  HUE-Span: Fast High Utility Episode Mining , 2019, ADMA.

[426]  Jerry Chun-Wei Lin,et al.  A Survey of High Utility Itemset Mining , 2019, Studies in Big Data.

[427]  Michael Scriney,et al.  Automating Data Mart Construction from Semi-structured Data Sources , 2019, Comput. J..

[428]  Jiachen Zhao,et al.  Long short-term memory - Fully connected (LSTM-FC) neural network for PM2.5 concentration prediction. , 2019, Chemosphere.

[429]  Robert Wrembel,et al.  Towards a Cost Model to Optimize User-Defined Functions in an ETL Workflow Based on User-Defined Performance Metrics , 2019, ADBIS.

[430]  Maik Thiele,et al.  Parallelizing user–defined functions in the ETL workflow using orchestration style sheets , 2019, Int. J. Appl. Math. Comput. Sci..

[431]  Rob J Hyndman,et al.  A New Tidy Data Structure to Support Exploration and Modeling of Temporal Data , 2019, Journal of Computational and Graphical Statistics.

[432]  Nadia Essoussi,et al.  Ensemble Method for Multi-view Text Clustering , 2019, ICCCI.

[433]  Cécile Favre,et al.  Metadata Systems for Data Lakes: Models and Features , 2019, ADBIS.

[434]  Christoph Gröger,et al.  Ganzheitliches Metadatenmanagement im Data Lake: Anforderungen, IT-Werkzeuge und Herausforderungen in der Praxis , 2019, BTW.

[435]  Robert Dale,et al.  Law and Word Order: NLP in Legal Tech , 2018, Natural Language Engineering.

[436]  Holger Schwarz,et al.  Quality-driven early stopping for explorative cluster analysis for big data , 2019, SICS Software-Intensive Cyber-Physical Systems.

[437]  Amal Ait Brahim,et al.  Model Driven Extraction of NoSQL Databases Schema: Case of MongoDB , 2019, KDIR.

[438]  Frans Coenen,et al.  Sustainable Development Goal Attainment Prediction: A Hierarchical Framework using Time Series Modelling , 2019, KDIR.

[439]  Marc Boullé,et al.  FEARS: a Feature and Representation Selection approach for Time Series Classification , 2019, ACML.

[440]  Holger Schwarz,et al.  ASAP-DM: a framework for automatic selection of analytic platforms for data mining , 2019, SICS Software-Intensive Cyber-Physical Systems.

[441]  Mark Roantree,et al.  Detecting Multi-Relationship Links in Sparse Datasets , 2019, ICEIS.

[442]  Lakhdar Sais,et al.  Mining Gradual Itemsets Using Sequential Pattern Mining , 2019, 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[443]  Alexandre Termier,et al.  Agnostic Local Explanation for Time Series Classification , 2019, 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI).

[444]  Alexandre Quemy,et al.  Data Pipeline Selection and Optimization , 2019, DOLAP.

[445]  Zhi Cheng,et al.  Mining significant trend sequences in dynamic attributed graphs , 2019, Knowl. Based Syst..

[446]  Muhammad Imran,et al.  A Churn Prediction Model Using Random Forest: Analysis of Machine Learning Techniques for Churn Prediction and Factor Identification in Telecom Sector , 2019, IEEE Access.

[447]  Alexandre Quemy,et al.  Binary Classification in Unstructured Space With Hypergraph Case-Based Reasoning , 2018, Inf. Syst..

[448]  Jérôme Darmont,et al.  Metadata Management for Textual Documents in Data Lakes , 2019, ICEIS.

[449]  Dan Wang,et al.  Relaxed Functional Dependency Discovery in Heterogeneous Data Lakes , 2019, ER.

[450]  Antonio Peregrín,et al.  Evolutionary Design of Linguistic Fuzzy Regression Systems with Adaptive Defuzzification in Big Data Environments , 2019, Cognitive Computation.

[451]  Andreas Dengel,et al.  FuseAD: Unsupervised Anomaly Detection in Streaming Sensors Data by Fusing Statistical and Deep Learning Models , 2019, Sensors.

[452]  Alvin Cheung,et al.  The Seattle Report on Database Research , 2020, SIGMOD Rec..

[453]  Hong Jiang,et al.  LiteTE: Lightweight, Communication-Efficient Distributed-Memory Triangle Enumerating , 2019, IEEE Access.

[454]  Bernhard Mitschang,et al.  Leveraging the Data Lake: Current State and Challenges , 2019, DaWaK.

[455]  Holger Schwarz,et al.  Initializing k-Means Efficiently: Benefits for Exploratory Cluster Analysis , 2019, OTM Conferences.

[456]  Yusef Esa,et al.  Communication-Based Control for DC Microgrids , 2018, IEEE Transactions on Smart Grid.

[457]  Chien-Liang Liu,et al.  Multivariate Time Series Early Classification with Interpretability Using Deep Learning and Attention Mechanism , 2019, PAKDD.

[458]  Rogério Luís de Carvalho Costa,et al.  Comparing Time Series Prediction Approaches for Telecom Analysis , 2018, Contributions to Statistics.

[459]  Franck Ravat,et al.  Metadata Management for Data Lakes , 2019, ADBIS.

[460]  Rajeswari Devarajan,et al.  Computational grid scheduling architecture using MapReduce model-based non-dominated sorting genetic algorithm , 2019, Soft Comput..

[461]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[462]  Wei-keng Liao,et al.  Scalable Algorithms for MPI Intergroup Allgather and Allgatherv , 2019, Parallel Comput..

[463]  Yuval Shahar,et al.  Temporal Probabilistic Profiles for Sepsis Prediction in the ICU , 2019, KDD.

[464]  Vasileios Theodorou,et al.  A Metadata Framework for Data Lagoons , 2019, ADBIS.

[465]  Don Mitchell Wilkes,et al.  Machine Learning-based Natural Scene Recognition for Mobile Robot Localization in An Unknown Environment , 2019 .

[466]  Min Zhou,et al.  A survey of pattern mining in dynamic graphs , 2020, WIREs Data Mining Knowl. Discov..

[467]  Leepakshi Bindra,et al.  Bi-Level Associative Classifier Using Automatic Learning on Rules , 2020, DEXA.

[468]  Martha Tatusch,et al.  Fuzzy Clustering Stability Evaluation of Time Series , 2020, IPMU.

[469]  Carl A. B. Pearson,et al.  The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study , 2020, The Lancet Public Health.

[470]  Maslina Binti Zolkepli,et al.  Genetic Algorithm Based Parallel K-Means Data Clustering Algorithm Using MapReduce Programming Paradigm on Hadoop Environment (GAPKCA) , 2020, SCDM.

[471]  Kadan Aljoumaa,et al.  A comparative dimensionality reduction study in telecom customer segmentation using deep learning and PCA , 2020, Journal of Big Data.

[472]  Alaa Tharwat,et al.  Classification assessment methods , 2020, Applied Computing and Informatics.

[473]  Xin Hu,et al.  Research on a Customer Churn Combination Prediction Model Based on Decision Tree and Neural Network , 2020, 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA).

[474]  Giovanni Colavizza,et al.  Data Engineering for Data Analytics: A Classification of the Issues, and Case Studies , 2020, ArXiv.

[475]  P. Klepac,et al.  Early dynamics of transmission and control of COVID-19: a mathematical modelling study , 2020, The Lancet Infectious Diseases.

[476]  Sharma Chakravarthy,et al.  Query Processing on Large Graphs: Approaches To Scalability and Response Time Trade Offs , 2020, Data Knowl. Eng..

[477]  Michael Scriney,et al.  Predicting Customer Churn for Insurance Data , 2020, DaWaK.

[478]  Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI , 2019, Inf. Fusion.

[479]  Nadia Essoussi,et al.  Self-Organizing Map for Multi-view Text Clustering , 2020, DaWaK.