Handling of Numeric Ranges for Graph-Based Knowledge Discovery

Nowadays, graph-based knowledge discovery algorithms do not consider numeric attributes (they are discarded in the preprocessing step, or they are treated as alphanumeric values with an exact matching criterion), with the limitation to work with domains that do not have this type of attribute or finding patterns without numeric attributes. In this work, we propose a new approach for the numerical attributes handling for graphbased learning algorithms. Our approach shows how graph-based learning approaches increase their accuracy for the classification task and its descriptive power when they are able to use both nominal and numerical attributes. This new approach was tested with the Subdue system in the mutagenesis and PTC domains showing an accuracy increase around 16% compared to Subdue when it does not use our numerical attributes handling algorithm. In some research areas such as data mining and machine learning, the domain data representation is a fundamental aspect that determines in great measure the quality of the results of the discovery process. Depending on the domain, the Data Mining process analyzes a data collection (such as flat files, log files, relational databases, etc.) to discover patterns, relationships, rules, associations, or useful exceptions to be used for decision making processes and for the prediction of events and/or concept discovery. Graph based algorithms have been used for years to describe (in a natural way) flat, sequential, and structural domains with acceptable results (Gonzalez, Holder, and Cook 2002), (Ketkar, Holder, and Cook 2005). Some of these domains contain important numeric attributes (attributes with continuous values). Domains with continuous values are not appropriately manipulated by graph based knowledge discovery systems, although they can be appropriately represented. To the best of our knowledge there does not exist a graph based knowledge discovery algorithm that deals with continuous valued attributes. A solution proposed in the literature to approach this problem is the use of discretization techniques as a preprocessing or post-processing step but not at the knowledge discovery phase. However, we think that these techniques do not use all the available knowledge that can be taken ad

[1]  Lawrence B. Holder,et al.  Discovering Structural Patterns in Telecommunications Data , 2000, FLAIRS.

[2]  Lawrence B. Holder,et al.  Experimental Comparison of Graph-Based Relational Concept Learning with Inductive Logic Programming Systems , 2002, ILP.

[3]  Stephen Muggleton,et al.  Machine Invention of First Order Predicates by Inverting Resolution , 1988, ML.

[4]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[5]  D. Cook,et al.  Graph-based hierarchical conceptual clustering , 2002 .

[6]  Ashwin Srinivasan,et al.  Theories for Mutagenicity: A Study in First-Order and Feature-Based Induction , 1996, Artif. Intell..

[7]  Lawrence B. Holder,et al.  Concept Formation Using Graph Grammars , 2002, KDD 2002.

[8]  Stephen Muggleton,et al.  Inverse entailment and progol , 1995, New Generation Computing.

[9]  ISTVAN JONYER,et al.  Graph-Based Hierarchical Conceptual Clustering , 2000, Int. J. Artif. Intell. Tools.

[10]  Lawrence B. Holder,et al.  Comparison of graph-based and logic-based multi-relational data mining , 2005, SKDD.

[11]  Ranga Raju Vatsavai,et al.  Trends in Spatial Data Mining , 2022 .

[13]  Alan Robinson,et al.  Computational Logic - Essays in Honor of Alan Robinson , 1991, Computational Logic - Essays in Honor of Alan Robinson.

[14]  Ehud Shapiro,et al.  Inductive Inference of Theories from Facts , 1991, Computational Logic - Essays in Honor of Alan Robinson.

[15]  J. Ross Quinlan,et al.  Determinate Literals in Inductive Logic Programming , 1991, IJCAI.

[16]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[17]  Emden R. Gansner,et al.  A Technique for Drawing Directed Graphs , 1993, IEEE Trans. Software Eng..

[18]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .