A New Process Model for Text String Searching

Investigations involving digital media (e.g., hard disks and USB thumb drives) rely heavily on text string searches. Traditional search approaches utilizing matching algorithms or database technology and treebased indexing algorithms result in an overwhelming number of “hits ” — a large percentage of which are irrelevant to investigative objectives. Furthermore, current approaches predominantly employ literal search techniques, which lead to poor recall with respect to investigative objectives. A better approach is needed that reduces information retrieval overhead and improves investigative recall. This paper proposes a new, high-level text string search process model that addresses some of the shortfalls in current text string search paradigms. We hope that this model will stimulate efforts on extending information retrieval and text mining research to digital forensic text string searching.

[1]  Andrew H. Sung,et al.  Identifying Significant Features for Network Forensic Analysis Using Artificial Intelligence Techniques , 2003, Int. J. Digit. EVid..

[2]  Joseph Giordano,et al.  Cyber Forensics: A Military Operations Perspective , 2002, Int. J. Digit. EVid..

[3]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[4]  James Allan,et al.  Improving Interactive Retrieval by Combining Ranked List and Clustering , 2000, RIAO.

[5]  Mike Y. Chen,et al.  Yahoo! For Amazon: Sentiment Parsing from Small Talk on the Web , 2001 .

[6]  Sanjiv Ranjan Das Yahoo! for Amazon : Opinion Extraction from Small Talk on the Web , 2001 .

[7]  Dan Sullivan,et al.  Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing, and Sales , 2001 .

[8]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[9]  Anton Leuski,et al.  Evaluating document clustering for interactive information retrieval , 2001, CIKM '01.

[10]  Peter Sommer The challenges of large computer evidence cases , 2004, Digit. Investig..

[11]  Eoghan Casey Error, Uncertainty and Loss in Digital Evidence , 2002, Int. J. Digit. EVid..

[12]  W. Bruce Croft,et al.  An Evaluation of Techniques for Clustering Search Results , 2005 .

[13]  AllanJames,et al.  Interactive Information Retrieval Using Clustering and Spatial Proximity , 2004 .

[14]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[15]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[16]  Brian D. Carrier Defining Digital Forensic Examination and Analysis Tool Using Abstraction Layers , 2003, Int. J. Digit. EVid..

[17]  G. Richard,et al.  Breaking the Performance Wall: The Case for Distributed Digital Forensics , 2004 .

[18]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[19]  Eoghan Casey,et al.  Network traffic as a source of evidence: tool strengths, weaknesses, and future needs , 2004, Digit. Investig..

[20]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[21]  Jay F. Nunamaker,et al.  A framework for collaboration and knowledge management , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[22]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[23]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[24]  Matthew M. Shannon Forensic Relative Strength Scoring: ASCII and Entropy Scoring , 2004, Int. J. Digit. EVid..

[25]  James Allan,et al.  Interactive Information Retrieval Using Clustering and Spatial Proximity , 2004, User Modeling and User-Adapted Interaction.

[26]  Chan-Gun Lee,et al.  Classification of Virtual Investing-Related Community Postings , 2004, AMCIS.