A novel similarity measure for dependency trees [query answer system example]

We propose a new tree similarity measure based on the connectivity-integrality principle (CI). CI is a concept from graph theory that makes the tree similarity to be measured by taking account of both the partial and the integral structures of trees. We have proved in theory that the new measure has better flexibility than the other two typical similarity measures when given a variety of common substructures between two trees. We apply this new measure to a specific-domain QA (query answer) system to undertake the task of sentence-level disambiguation. Experimental results show that, the new model can effectively enhance the rejection rate that aims at irrelevant documents.

[1]  Gabriel Valiente,et al.  An efficient bottom-up distance between trees , 2001, Proceedings Eighth Symposium on String Processing and Information Retrieval.

[2]  Philip Bille,et al.  Tree Edit Distance, Alignment Distance and Inclusion , 2003 .

[3]  Yuji Matsumoto,et al.  Effects of Structural Matching and Paraphrasing in Question Answering , 2003 .

[4]  Dan Roth,et al.  Mapping Dependencies Trees: An Application to Question Answering , 2003 .

[5]  Bin Ma,et al.  Edit distance between two RNA structures , 2001, RECOMB.

[6]  Jun Suzuki,et al.  Kernels for Structured Natural Language Data , 2003, NIPS.

[7]  Stanley M. Selkow,et al.  The Tree-to-Tree Editing Problem , 1977, Inf. Process. Lett..

[8]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[9]  King-Sun Fu,et al.  A Tree System Approach for Fingerprint Pattern Recognition , 1976, IEEE Transactions on Computers.

[10]  Hanan Samet,et al.  Distance Transform for Images Represented by Quadtrees , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Haim Gaifman,et al.  Dependency Systems and Phrase-Structure Systems , 1965, Inf. Control..

[12]  Shin-Yee Lu,et al.  Waveform Correlation by Tree Matching , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[14]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.