Similarity-Based Estimation for Document Summarization using Fuzzy Sets

Information is increasing every day and thousands of documents are produced and made available in the Internet. The amount of information available in documents exceeds our capacity to read them. We need access to the right information without having to go through the whole document. Therefore, documents need to be compressed and produce an overview so that these documents can be utilized effectively. Thus, we propose a similarity model with topic similarity using fuzzy sets and probability theories to extract the most representative sentences. Sentences with high weights are extracted to form a summary. On average, our model (known as MySum) produces summaries that are 60% similar to the manually created summaries, while tf.isf algorithm produces summaries that are 30% similar. Two human summarizers, named P1 and P2, produce summaries that are 70% similar to each other using similar sets of documents obtained from TREC.

[1]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[2]  Stefan Rüger,et al.  Robust texture features for still-image retrieval , 2005 .

[3]  R. Sukthankar,et al.  Object-based image retrieval using the statistical structure of images , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[4]  Yixin Chen,et al.  A Region-Based Fuzzy Feature Matching Approach to Content-Based Image Retrieval , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Jitendra Malik,et al.  Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Nicu Sebe,et al.  Wavelet-Based Salient Points: Applications to Image Retrieval Using Color and Texture Features , 2000, VISUAL.

[7]  James Ze Wang,et al.  IRM: integrated region matching for image retrieval , 2000, ACM Multimedia.

[8]  Kyuseok Shim,et al.  WALRUS: a similarity retrieval algorithm for image databases , 1999, IEEE Transactions on Knowledge and Data Engineering.

[9]  Karen Spärck Jones Automatic summarising: factors and directions , 1998, ArXiv.

[10]  Jerry L. Prince,et al.  Snakes, shapes, and gradient vector flow , 1998, IEEE Trans. Image Process..

[11]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[12]  B. S. Manjunath,et al.  NeTra: A toolbox for navigating large image databases , 1997, Proceedings of International Conference on Image Processing.

[13]  Jonathan Lawry,et al.  A mass assignment theory of the probability of fuzzy events , 1996, Fuzzy Sets Syst..

[14]  Alex Pentland,et al.  Photobook: Content-based manipulation of image databases , 1996, International Journal of Computer Vision.

[15]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[16]  Trevor P Martin,et al.  Fril- Fuzzy and Evidential Reasoning in Artificial Intelligence , 1995 .

[17]  Christos Faloutsos,et al.  QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.

[18]  James F. Baldwin,et al.  Combining evidences for evidential reasoning , 1991, Int. J. Intell. Syst..

[19]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[20]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[21]  Masrah Azrifah Azmi Murad,et al.  Fuzzy text mining for intelligent information retrieval , 2005 .

[22]  Michael G. Strintzis,et al.  Region-Based Image Retrieval Using an Object Ontology and Relevance Feedback , 2004, EURASIP J. Adv. Signal Process..

[23]  Guojun Lu,et al.  Review of shape representation and description techniques , 2004, Pattern Recognit..

[24]  Yohei Seki,et al.  Sentence Extraction by tf/idf and Position Weighting from Newspaper Articles , 2002, NTCIR.

[25]  Alex A. Freitas,et al.  Document Clustering and Text Summarization , 2000 .

[26]  Arnold W. M. Smeulders,et al.  PicToSeek: combining color and shape invariant features for image retrieval , 2000, IEEE Trans. Image Process..

[27]  L. Dekang,et al.  Extracting collocations from text corpora , 1998 .

[28]  C. Tomasi The Earth Mover's Distance, Multi-Dimensional Scaling, and Color-Based Image Retrieval , 1997 .

[29]  George J. Klir,et al.  Fuzzy sets and fuzzy logic - theory and applications , 1995 .

[30]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[31]  Zellig S. Harris,et al.  Distributional Structure , 1954 .