Concept indexing

Marking text in a document is a convenient way of identifying bits of knowledge that are relevant for the reader, a colleague or a larger group. Based on such markings, networks of concepts with hyperlinks to their occurrences in a collection of documents can be developed. On the Internet, marked documents can easily be shared, concepts can be constructed collaboratively and the concept-document network can be used for navigation and direct access. Text marking, grounded concepts and the Internet as base technology are characteristics of our tool for managing so called “concept indexes”. We describe the current and the new design and outline some application scenarios: electronic help desks, information digests on the Web, teaching design in virtual classes and planning under quality control in distributed teams.

[1]  David D. Lewis,et al.  Reuters-21578 Text Categorization Test Collection, Distribution 1.0 , 1997 .

[2]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[3]  Richard A. Harshman,et al.  Indexing by latent semantic indexing , 1990 .

[4]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[5]  Takenobu Tokunaga,et al.  Cluster-based text categorization: a comparison of category search strategies , 1995, SIGIR '95.

[6]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[7]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[8]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[9]  Bradley N. Miller,et al.  GroupLens: applying collaborative filtering to Usenet news , 1997, CACM.

[10]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[11]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[12]  Max Jacobson,et al.  A Pattern Language: Towns, Buildings, Construction , 1981 .

[13]  M. Crossan The Knowledge-Creating Company: How Japanese Companies Create the Dynamics of Innovation , 1996 .

[14]  George Karypis,et al.  Centroid-Based Document Classification Algorithms: Analysis & Experimental Results , 2000 .

[15]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[16]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[17]  Wolfgang Prinz,et al.  NESSIE: An awareness environment for cooperative settings , 1999, ECSCW.

[18]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[19]  William F. Punch,et al.  Finding Salient Features for Personal Web Page Categories , 1997, Comput. Networks.

[20]  Andreas Paepcke,et al.  Content Ratings and Other Third-Party Value-Added Information: Defining an Enabling Platform , 1995, D Lib Mag..

[21]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[22]  J. E. Jackson A User's Guide to Principal Components , 1991 .

[23]  Angi Voß,et al.  Reasoning with complex cases , 1997 .

[24]  A. Strauss Basics Of Qualitative Research , 1992 .

[25]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[26]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[27]  Dieter Fensel,et al.  Ontobroker: The Very High Idea , 1998, FLAIRS.

[28]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[29]  Vipin Kumar,et al.  Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification , 2001, PAKDD.

[30]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[31]  R GruberThomas Toward principles for the design of ontologies used for knowledge sharing , 1995 .

[32]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[33]  Angi Voß,et al.  SOAP: social agents providing people with useful information , 1997, GROUP.

[34]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[35]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[36]  Jason Lowder,et al.  Wide Area Selection as a Hyperdocument Search Interface , 1998, Comput. Networks.

[37]  I. Nonaka,et al.  How Japanese Companies Create the Dynamics of Innovation , 1995 .

[38]  Se June Hong,et al.  Use of Contextaul Information for Feature Ranking and Discretization , 1997, IEEE Trans. Knowl. Data Eng..

[39]  Paul Thompson,et al.  Automatic Categorization of Statute Documents , 1997 .

[40]  Rivka Oxman,et al.  Precedents in design: a computational model for the organization of precedent knowledge , 1994 .

[41]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[42]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[43]  Wolfgang Appelt,et al.  Effectiveness and Efficiency: The Need for Tailorable User Interfaces on the Web , 1998, Comput. Networks.

[44]  B. Chandrasekaran Towards a Functional Architecture for Intelligence Based on Generic Information Processing Tasks , 1987, IJCAI.

[45]  Vipin Kumar,et al.  Partitioning-based clustering for Web document categorization , 1999, Decis. Support Syst..

[46]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[47]  C. Ding A similarity-based probability model for latent semantic indexing , 1999, SIGIR '99.

[48]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[49]  Toyoaki Nishida,et al.  CoMeMo: constructing and sharing everyday memory , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[50]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[51]  James C. French,et al.  Clustering large datasets in arbitrary metric spaces , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[52]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[53]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[54]  David G. Lowe,et al.  Similarity Metric Learning for a Variable-Kernel Classifier , 1995, Neural Computation.

[55]  Paul Resnick,et al.  Recommender systems , 1997, CACM.

[56]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[57]  Yiming Yang,et al.  Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.

[58]  Wai Lam,et al.  Using a generalized instance set for automatic text categorization , 1998, SIGIR '98.

[59]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[60]  Paul S. Bradley,et al.  Initialization of Iterative Refinement Clustering Algorithms , 1998, KDD.

[61]  Ron Kohavi,et al.  Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology , 1995, KDD.

[62]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[63]  Vipin Kumar,et al.  WebACE: a Web agent for document categorization and exploration , 1998, AGENTS '98.

[64]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[65]  Philip S. Yu,et al.  On the merits of building categorization systems by supervised clustering , 1999, KDD '99.

[66]  Andrés Gómez de Silva Garza,et al.  Case-Based Reasoning in Design , 1995, IEEE Expert.

[67]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[68]  Mario Lenz,et al.  On Texts, Cases, and Concepts , 1999, XPS.

[69]  Keiichi Nakata,et al.  Collaborative Concept Extraction from Documents , 1998, PAKM.

[70]  Walter Daelemans,et al.  Learnability and markedness in data-driven acquisition of stress , 1993 .

[71]  C. Fellbaum An Electronic Lexical Database , 1998 .

[72]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[73]  Stephen J. Green,et al.  Automated Link Generation: Can we do Better than Term Repetition? , 1998, Comput. Networks.

[74]  David L. Waltz,et al.  Classifying news stories using memory based reasoning , 1992, SIGIR '92.

[75]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.