Mining Strengths and Weaknesses of Cricket Players Using Short Text Commentary

Knowledge of strengths and weaknesses of players is the key for team selection and strategy planning in any team sport such as Cricket. Computationally, this problem is mostly unexplored. Existing methods focus only on aggregate and macroscopic statistics that ignore many details. The central idea of our paper is to mine strength and weakness rules using short text commentary data. This dataset is compact, semi-structured, accurate, and yet ignored by the machine learning community until now. We collect fine-grained information about each player from the short text commentary dataset and represent it using domain-specific features identified by us. We employ a dimensionality reduction method specific to discrete random variable case, namely correspondence analysis and construct semantic relation between bowler and batsman. This relation is plotted using biplots. Human readable strength and weakness rules are extracted from the biplots. We have performed experiments using a large dataset that describes over one million deliveries. We validate our extracted rules using both intrinsic and extrinsic validation.

[1]  Carl D. Meyer,et al.  Matrix Analysis and Applied Linear Algebra , 2000 .

[2]  Hsinchun Chen,et al.  Sports Data Mining , 2010 .

[3]  Péter Schönhofen,et al.  Identifying Document Topics Using the Wikipedia Category Network , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[4]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[5]  Aixin Sun,et al.  Enhancing Topic Modeling for Short Texts with Auxiliary Word Embeddings , 2017, ACM Trans. Inf. Syst..

[6]  M Greenacre,et al.  Correspondence analysis in medical research , 1992, Statistical methods in medical research.

[7]  A. J. Lewis,et al.  A fair method for resetting the target in interrupted one-day cricket matches , 1998, J. Oper. Res. Soc..

[8]  John F. Roddick,et al.  What's interesting about Cricket?: on thresholds and anticipation in discovered rules , 2001, SKDD.

[9]  H. Lemmer,et al.  A Double Weighted Tool to Measure the Fielding Performance in Cricket , 2012 .

[10]  Ramesh Sharda,et al.  Prediction of athletes performance using neural networks: An application in cricket team selection , 2009, Expert Syst. Appl..

[11]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[12]  Somnath Banerjee,et al.  Clustering short texts using wikipedia , 2007, SIGIR.

[13]  Sohail Akhtar,et al.  An analysis of strategy in the first three innings in test cricket: declaration and the follow-on , 2011, J. Oper. Res. Soc..

[14]  Aixin Sun,et al.  Short text classification using very few words , 2012, SIGIR '12.

[15]  Julian Wolfson,et al.  Forecasting the Performance of College Prospects Selected in the National Football League Draft , 2017 .

[16]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[17]  P. Allsopp,et al.  Rating teams and analysing outcomes in one‐day and test cricket , 2004 .

[18]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[19]  Zhi Lu,et al.  Short text clustering by finding core terms , 2011, Knowledge and Information Systems.

[20]  K. Gabriel,et al.  The biplot graphic display of matrices with application to principal component analysis , 1971 .

[21]  Jim Albert,et al.  Handbook of Statistical Methods and Analyses in Sports , 2017 .

[22]  Snigdhansu Chatterjee,et al.  Procrustes Problems , 2005, Technometrics.

[23]  Christopher Meek,et al.  Improving Similarity Measures for Short Segments of Text , 2007, AAAI.