The Importance of Being Clustered: Uncluttering the Trends of Statistics from 1970 to 2015

In this paper we retrace the recent history of statistics by analyzing all the papers published in five prestigious statistical journals since 1970, namely: Annals of Statistics, Biometrika, Journal of the American Statistical Association, Journal of the Royal Statistical Society, series B and Statistical Science. The aim is to construct a kind of "taxonomy" of the statistical papers by organizing and by clustering them in main themes. In this sense being identified in a cluster means being important enough to be uncluttered in the vast and interconnected world of the statistical research. Since the main statistical research topics naturally born, evolve or die during time, we will also develop a dynamic clustering strategy, where a group in a time period is allowed to migrate or to merge into different groups in the following one. Results show that statistics is a very dynamic and evolving science, stimulated by the rise of new research questions and types of data.

[1]  Ranjan Maitra,et al.  A k-mean-directions Algorithm for Fast Clustering of Data on the Sphere , 2010 .

[2]  P. Groenen,et al.  Data analysis, classification, and related methods , 2000 .

[3]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[4]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[5]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[6]  C. L. Mallows NON-NULL RANKING MODELS. I , 1957 .

[7]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[8]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[9]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[12]  P. Diaconis Group representations in probability and statistics , 1988 .

[13]  Thierry Denoeux,et al.  Learning from partially supervised data using mixture models and belief functions , 2009, Pattern Recognit..

[14]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[15]  M. Fligner,et al.  Distance Based Ranking Models , 1986 .

[16]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[17]  Thomas Brendan Murphy,et al.  Mixtures of distance-based models for ranking data , 2003, Comput. Stat. Data Anal..

[18]  Yizhou Sun,et al.  iTopicModel: Information Network-Integrated Topic Modeling , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[19]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[20]  Gérard Govaert,et al.  A predictive deviance criterion for selecting a generative model in semi-supervised classification , 2013, Comput. Stat. Data Anal..