Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

The current surge of interest in search and comparison tasks in natural language processing has brought with it a focus on vector space approaches and vector space dimensionality reduction techniques. Presenting data as points in hyperspace provides opportunities to use a variety of welldeveloped tools pertinent to this representation. Dimensionality reduction allows data to be compressed and generalised. Eigen decomposition and related algorithms are one category of approaches to dimensionality reduction, providing a principled way to reduce data dimensionality that has time and again shown itself capable of enabling access to powerful generalisations in the data. Issues with the approach, however, include computational complexity and limitations on the size of dataset that can reasonably be processed in this way. Large datasets are a persistent feature of natural language processing tasks. This thesis focuses on two main questions. Firstly, in what ways can eigen decomposition and related techniques be extended to larger datasets? Secondly, this having been achieved, of what value is the resulting approach to information retrieval and to statistical language modelling at the ngram level? The applicability of eigen decomposition is shown to be extendable through the use of an extant algorithm; the Generalized Hebbian Algorithm (GHA), and the novel extension of this algorithm to paired data; the Asymmetric Generalized Hebbian Algorithm (AGHA). Several original extensions to the these algorithms are also presented, improving their applicability in various domains. The applicability of GHA to Latent Semantic Analysisstyle tasks is investigated. Finally, AGHA is used to investigate the value of singular value decomposition, an eigen decomposition variant, to ngram language modelling. A sizeable perplexity reduction is demonstrated.

[1]  Ola Angelsmark Constructing Algorithms for Constraint Satisfaction and Related Problems : Methods and Applications , 2005 .

[2]  Ulf Seigerroth,et al.  Att förstå och förändra systemutvecklingsverksamheter en taxonomi för metautveckling , 2003 .

[3]  Erik G. Larsson,et al.  An Integrated System-Level Design for Testability Methodology , 2000 .

[4]  Tobias Ritzau,et al.  Memory Efficient Hard Real-Time Garbage Collection , 2003 .

[5]  Joakim Gustafsson,et al.  Extending temporal action logic , 2001 .

[6]  Juha Takkinen,et al.  From information management to task management in electronic mail , 2002 .

[7]  Pawel Pietrzak,et al.  A type-based framework for locating errors in constraint logic programs , 2002 .

[8]  C. Curescu,et al.  Utility-based Optimisation of Resource Allocation for Wireless Networks , 2005 .

[9]  Dimiter Driankov,et al.  Towards a many‐valued logic of quantified belief: The information lattice , 1991 .

[10]  Erland Jungert,et al.  Synthesizing database structures from a user oriented data model , 1980 .

[11]  Stefan Cronholm,et al.  Metodverktyg och användbarhet : en studie av datorstödd metodbaserad systemutveckling , 1998 .

[12]  Osten Oskarsson,et al.  Mechanisms of modifiability in large software systems , 1982 .

[13]  Lars Hult,et al.  Publika informationstjänster : en studie av den elektroniska encyklopedins bruksegenskaper , 2003 .

[14]  Ewa Braf,et al.  Knowledge demanded for action : studies of knowledge mediation in organisations , 2004 .

[15]  Mikael Kindborg,et al.  Concurrent comics : programming of social agents by children , 2003 .

[16]  Yuxiao Zhao,et al.  Standards-based application integration for business-to-business communications , 2005 .

[17]  Mattias Arvola,et al.  Shades of Use: The Dynamics of Interaction Design for Sociable Use , 2005 .

[18]  Robert Nilsson,et al.  A Mutation-based Framework for Automated Testing of Timeliness , 2006 .

[19]  Johan Åberg,et al.  Live help systems : an approach to intelligent help for Web information systems , 2002 .

[20]  Peter Ambjorn Fritzson,et al.  Towards a distributed programming environment based on incremental compilation (debugging, prettyprinting, interactive) , 1984 .

[21]  Rita Kovordanyi,et al.  Modeling and simulating inhibitory mechanisms in mental image reinterpretation : towards cooperative human-computer creativity , 1999 .

[22]  Jonas Mellin,et al.  Resource-Predictable and Efficient Monitoring of Events , 2004 .

[23]  Ulf Nilsson,et al.  Abstract Interpretation & Abstract Machines: Contribution to a Methodology for the Implementation of Logic Programs , 1992 .

[24]  Jonas Hallberg,et al.  Timing issues in high-level synthesis , 1998 .

[25]  Peter Aronsson,et al.  Automatic Parallelization of Equation-Based Simulation Programs , 2006 .

[26]  N. Dahlbäck,et al.  Representations of discourse : cognitive and computational aspects , 1991 .

[27]  Fredrik Karlsson,et al.  Method configuration: method and computerized tool support , 2005 .

[28]  Vivian Vimarlund,et al.  An economic perspective on the analysis of impacts of information technology : from case studies in health-care towards general models and theories , 1999 .

[29]  Jörgen Lindström,et al.  Does distance matter? On geographical dispersion in organisations , 1999 .

[30]  Niklas Hallberg,et al.  Incorporating user values in the design of information systems and services in the public sector : a methods approach , 1999 .

[31]  Annika Flycht-Eriksson,et al.  Design and use of ontologies in information-providing dialogue systems , 2004 .

[32]  Jimmy Tjäder,et al.  Systemimplementering i praktiken : en studie av logiker i fyra projekt , 1999 .

[33]  Iakov Nakhimovski,et al.  Contributions to the Modeling and Simulation of Mechanical Systems with Detailed Contact Analyses , 2006 .

[34]  Pär Emanuelson,et al.  Performance enhancement in a well-structured pattern matcher through partial evaluation , 1980 .

[35]  Lin Padgham,et al.  Non-monotonic inheritance for an object-oriented knowledge-base , 1989 .

[36]  Joachim Karlsson,et al.  A systematic approach for prioritizing software requirements , 1998 .

[37]  Mikael Ericsson,et al.  Supporting the use of design knowledge : an assessment of commenting agents , 1999 .

[38]  Jörgen Hansson,et al.  Value-driven multi-class overload management in real-time database systems , 1999 .

[39]  Choong-ho Yi,et al.  Modelling object-oriented dynamic systems using a logic-based framework , 2002 .

[40]  Anders Arpteg,et al.  Intelligent Semi-Structured Information Extraction , 2005 .

[41]  Sorin Manolache,et al.  Analysis and Optimisation of Real-Time Systems with Stochastic Behaviour , 2005 .

[42]  Sofie Pilemalm,et al.  Information Technology for Non-Profit Organisations: Extended Participatory Design of an Information System for Trade Union Shop Stewards , 2002 .

[43]  Jonas Kvarnström,et al.  TALplanner and Other Extensions to Temporal Action Logic , 2005 .

[44]  David Dinka,et al.  Role, Identity and Work : Extending the design and development agenda , 2010 .

[45]  Sture Hägglund,et al.  Contributions to the development of methods and tools for interactive design of applications software , 1980 .

[46]  Pär J. Ågerfalk Information Systems Actability: Understanding Information Technology as a Tool for Business Action and Communication , 2003 .

[47]  Linda Askenäs,et al.  The roles of IT : studies of organising when implementing and using enterprise systems , 2004 .

[48]  Rego Granlund,et al.  Monitoring distributed teamwork training , 2002 .

[49]  Aleksandra Tesanovic,et al.  Developing Reusable and Reconfigurable Real-Time Software using Aspects and Components , 2006 .

[50]  Henrik Nilsson,et al.  Declarative debugging for lazy functional languages , 1998 .

[51]  Carl-Johan Petri,et al.  Organizational information provision : managing mandatory and discretionary utilization of information technology , 2001 .

[52]  Tarja Susi The puzzle of social activity: the significance of tools in cognition and cooperation , 2006 .

[53]  Vaida Jakoniene,et al.  Integration of Biological Data , 2006 .

[54]  Silvia Coradeschi,et al.  Anchoring symbols to sensory data , 1999 .

[55]  Magnus Merkel,et al.  Understanding and enhancing translation by parallel text processing , 1999 .

[56]  Luis Alejandro Cortés,et al.  Verification and Scheduling Techniques for Real-Time Embedded Systems , 2005 .

[57]  Erik Anders Tengvald,et al.  The design of expert planning systems: an experimental operations planning system for turning , 1984 .

[58]  Tore Risch,et al.  Compilation of multiple file queries in a meta-database system , 1978 .

[59]  Mikael Cäker,et al.  Management accounting as constructing and opposing customer focus : three case studies on management accounting and customer relations , 2005 .

[60]  Lars Degerstedt,et al.  Tabulation-based Logic Programming : A Multi-level View of Query Answering , 1996 .

[61]  Asmus Pandikow,et al.  A Generic Principle for Enabling Interoperability of Structured and Object-Oriented Analysis and Design Tools , 2002 .

[62]  Gert Jervan,et al.  Hybrid Built-In Self-Test and Test Generation Techniques for Digital Systems , 2005 .

[63]  Jaime Villegas,et al.  Simulation supported industrial training from an organisational learning perspective : development and evaluation of the SSIT method , 1996 .

[64]  Nahid Shahmehri,et al.  Generalized algorithmic debugging , 1991 .

[65]  Pernilla Qvarfordt,et al.  Eyes on multimodal interaction , 2004 .

[66]  Anders Avdic,et al.  Användare och utvecklare : om anveckling med kalkylprogram , 2001 .

[67]  Stefan Holmlid,et al.  Adapting users : towards a theory of use quality , 2002 .

[68]  P. Doherty NML3 : a non-monotonic formalism with explicit defaults , 1991 .

[69]  Claudiu Duma,et al.  Security and trust mechanisms for groups in distributed services , 2005 .

[70]  Ling Lin,et al.  Management of 1-D Sequence Data- From Discrete to Continuous , 1999 .

[71]  Mikael Lind,et al.  Från system till process kriterier för processbestämning vid verksamhetsanalys , 2001 .

[72]  Anneli Hagdahl,et al.  Development of IT-supported Inter-organisational Collaboration: A Case Study in the Swedish Public Sector , 2002 .

[73]  Henryk Jan Komorowski,et al.  A specification of an abstract Prolog machine and its application to partial evaluation , 1981 .

[74]  Eva L. Ragnemalm,et al.  Student modelling based on collaborative dialogue with a learning companion , 1999 .

[75]  Bourhane Kadmiry,et al.  Fuzzy gain scheduled visual servoing for an unmanned helicopter , 2005 .

[76]  Daniel Karlsson,et al.  Verification of Component-based Embedded System Designs , 2006 .