Text style analysis using trace ratio criterion patch alignment embedding

An effective algorithm for extracting cues of text styles is proposed in this paper. When processing document collections, the documents are first converted to a high dimensional data set with the assistant of a group of style markers. We also employ the Trace Ratio Criterion Patch Alignment Embedding (TR-PAE) to obtain lower dimensional representation in a textual space. The TR-PAE has some advantages that the inter-class separability and intra-class compactness are well characterized by the special designed intrinsic graph and penalty graph, which are based on discriminative patch alignment strategy. Another advantage is that the proposed method is based on trace ratio criterion, which directly represents the average between-class distance and average within-class distance in the lowdimensional space. To evaluate our proposed algorithm, three corpuses are designed and collected using existing popular corpuses and real-life data covering diverse topics and genres. Extensive simulations are conducted to illustrate the feasibility and effectiveness of our implementation. Our simulations demonstrate that the proposed method is able to extract the deeply hidden information of styles of given documents, and efficiently conduct reliable text analysis results on text styles can be provided. & 2014 Elsevier B.V. All rights reserved.

[1]  R. Harald Baayen,et al.  How Variable May a Constant be? Measures of Lexical Richness in Perspective , 1998, Comput. Humanit..

[2]  J. Koenderink Q… , 2014, Les noms officiels des communes de Wallonie, de Bruxelles-Capitale et de la communaute germanophone.

[3]  Xuelong Li,et al.  Patch Alignment for Dimensionality Reduction , 2009, IEEE Transactions on Knowledge and Data Engineering.

[4]  Benno Stein,et al.  Genre Classification of Web Pages , 2004, KI.

[5]  Jiawei Han,et al.  Document clustering using locality preserving indexing , 2005, IEEE Transactions on Knowledge and Data Engineering.

[6]  Qiang Yang,et al.  Transfer Learning via Dimensionality Reduction , 2008, AAAI.

[7]  Tommy W. S. Chow,et al.  Trace Ratio Linear Discriminant Analysis for Medical Diagnosis: A Case Study of Dementia , 2013, IEEE Signal Processing Letters.

[8]  Dieter Merkl,et al.  Text classification with self-organizing maps: Some lessons learned , 1998, Neurocomputing.

[9]  Feiping Nie,et al.  Trace Ratio Problem Revisited , 2009, IEEE Transactions on Neural Networks.

[10]  YanShuicheng,et al.  Graph Embedding and Extensions , 2007 .

[11]  Tommy W. S. Chow,et al.  Trace ratio criterion based generalized discriminative learning for semi-supervised dimensionality reduction , 2012, Pattern Recognit..

[12]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[13]  Maiko Shigeno,et al.  An Algorithm for Fractional Assignment Problems , 1995, Discret. Appl. Math..

[14]  M. E. J. Newman,et al.  Power laws, Pareto distributions and Zipf's law , 2005 .

[15]  ChengXiang Zhai,et al.  Positional language models for information retrieval , 2009, SIGIR.

[16]  Jun Yu,et al.  Complex Object Correspondence Construction in Two-Dimensional Animation , 2011, IEEE Transactions on Image Processing.

[17]  Meng Wang,et al.  Semisupervised Multiview Distance Metric Learning for Cartoon Synthesis , 2012, IEEE Transactions on Image Processing.

[18]  Stephen Lin,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[20]  Barry Smyth,et al.  Genre Classification and Domain Transfer for Information Filtering , 2002, ECIR.

[21]  Xiaofei He,et al.  Using Graph Model for Face Analysis , 2005 .

[22]  A. Q. Morton The Authorship of Greek Prose , 1965 .

[23]  Tommy W. S. Chow,et al.  Recognition of word collocation habits using frequency rank ratio and inter-term intimacy , 2013, Expert Syst. Appl..

[24]  Barron Brainerd Weighting Evidence in Language and Literature , 1974 .

[25]  W. Bruce Croft,et al.  Proximity-based document representation for named entity retrieval , 2007, CIKM '07.

[26]  Feiping Nie,et al.  Trace Ratio Criterion for Feature Selection , 2008, AAAI.

[27]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[28]  Maya R. Gupta,et al.  Part-of-speech histograms for genre classification of text , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Daniel E. Geer,et al.  Power. Law , 2012, IEEE Secur. Priv..

[30]  Yuxiao Hu,et al.  Face recognition using Laplacianfaces , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Xuelong Li,et al.  Face Sketch–Photo Synthesis and Retrieval Using Sparse Representation , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  Xinbo Gao,et al.  Stable Orthogonal Local Discriminant Embedding for Linear Dimensionality Reduction , 2013, IEEE Transactions on Image Processing.

[33]  Gil-Chang Kim,et al.  Multiple sets of features for automatic genre classification of web documents , 2005, Inf. Process. Manag..

[34]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[35]  Efstathios Stamatatos,et al.  Automatic Text Categorization In Terms Of Genre and Author , 2000, CL.

[36]  Aidan Finn,et al.  Learning to classify documents according to genre , 2006, J. Assoc. Inf. Sci. Technol..

[37]  Hujun Yin,et al.  Tree view self-organisation of web content , 2005, Neurocomputing.

[38]  Hinrich Schütze,et al.  Automatic Detection of Text Genre , 1997, ACL.

[39]  Lei Wang,et al.  Feature Selection With Redundancy-Constrained Class Separability , 2010, IEEE Transactions on Neural Networks.

[40]  Bo Du,et al.  Unsupervised transfer learning for target detection from hyperspectral images , 2013, Neurocomputing.

[41]  Alex Pentland,et al.  Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Jun Yu,et al.  On Combining Multiple Features for Cartoon Character Retrieval and Clip Synthesis , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[43]  Sung-Hyon Myaeng,et al.  Text genre classification with genre-revealing and subject-revealing features , 2002, SIGIR '02.

[44]  H. van Halteren,et al.  Outside the cave of shadows: using syntactic annotation to enhance authorship attribution , 1996 .

[45]  John Burrows,et al.  Word-Patterns and Story-Shapes: The Statistical Analysis of Narrative Style , 1987 .

[46]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[47]  Tommy W. S. Chow,et al.  M-Isomap: Orthogonal Constrained Marginal Isomap for Nonlinear Dimensionality Reduction , 2013, IEEE Transactions on Cybernetics.

[48]  Adele E. Howe,et al.  Effects of web document evolution on genre classification , 2005, CIKM '05.

[49]  David Sharp,et al.  Ngram and Bayesian Classification of Documents for Topic and Authorship , 2003, Lit. Linguistic Comput..

[50]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[51]  Claude S. Brinegar,et al.  Mark Twain and the Quintus Curtius Snodgrass Letters: A Statistical Test of Authorship , 1963 .

[52]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[53]  Tommy W. S. Chow,et al.  Trace Ratio Optimization-Based Semi-Supervised Nonlinear Dimensionality Reduction for Marginal Manifold Visualization , 2013, IEEE Transactions on Knowledge and Data Engineering.

[54]  L. Duchene,et al.  An Optimal Transformation for Discriminant and Principal Component Analysis , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  张振跃,et al.  Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment , 2004 .

[56]  Dong Xu,et al.  Trace Ratio vs. Ratio Trace for Dimensionality Reduction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[58]  Douglas Biber,et al.  Dimensions of Register Variation: A Cross-Linguistic Comparison , 1995 .