SIM: A Search Engine by Correlating Scattered Data Sets for Cyber, Physical, and Social Systems

How to effectively locate a file in the emerging cyber, physical, and social systems is becoming a very important challenge due to the convergence of everyday life devices. A typical scenario goes like this. If you do not remember anything about a file (such as keywords or some text in the document) except some fragmented memories of what you are looking for, how can you locate the target in one of the many large folders in the system? A potential solution would be to use whatever you remember as a stepping stone along a path that leads to a small set of candidate targets. This paper proposes the design of a desktop search engine, Search Implicit Memories (SIM), which leverages both explicit and implicit memories. Explicit memories are employed as stepping stones to tap into the implicit memories, and locate the target in the implicit memories. The paper also proposes to monitor and correlate the existing data sets distributed across computer users' disk file systems to track user activities and construct correlation graphs. Experimental results show that SIM is both effective and lightweight. SIM can be used to complement existing keyword-based desktop search engines.

[1]  Allan Collins,et al.  A spreading-activation theory of semantic processing , 1975 .

[2]  D. Schacter Implicit memory: History and current status. , 1987 .

[3]  Colin M. Macleod Directed forgetting affects both direct and indirect tests of memory. , 1989 .

[4]  Pierre Jouvelot,et al.  Semantic file systems , 1991, SOSP '91.

[5]  Udi Manber,et al.  Integrating content-based access mechanisms with hierarchical file systems , 1999, OSDI '99.

[6]  David F. Redmiles,et al.  Extracting usability information from user interface events , 2000, CSUR.

[7]  Mary Czerwinski,et al.  An Investigation of Memory for Daily Computing Events , 2002 .

[8]  Mark Ginsburg,et al.  A Lightweight Framework for Cross-Application User Monitoring , 2002, Computer.

[9]  Olivier Ridoux,et al.  A Logic File System , 2003, USENIX Annual Technical Conference, General Track.

[10]  Kevin Li,et al.  Faceted metadata for image search and browsing , 2003, CHI '03.

[11]  Mark S. Ackerman,et al.  The perfect search engine is not enough: a study of orienteering behavior in directed search , 2004, CHI.

[12]  Craig A. N. Soules,et al.  Connections: using context to enhance file search , 2005, SOSP '05.

[13]  Bernard Cole Search engines tackle the desktop , 2005, Computer.

[14]  Thomas G. Dietterich,et al.  TaskTracer: a desktop environment to support multi-tasking knowledge workers , 2005, IUI.

[15]  Carlos Maltzahn,et al.  Richer file system metadata using links and attributes , 2005, 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05).

[16]  Margo I. Seltzer,et al.  Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.

[17]  Stephan Bloehdorn,et al.  TagFS - Tag Semantics for Hierarchical File Systems , 2006 .

[18]  Brian D. Noble,et al.  Using Provenance to Aid in Personal File Search , 2007, USENIX Annual Technical Conference.

[19]  Brad A. Myers,et al.  What to do when search fails: finding information by association , 2008, CHI.

[20]  Margo I. Seltzer,et al.  Hierarchical File Systems Are Dead , 2009, HotOS.

[21]  Jidong Chen,et al.  iMecho: an associative memory based desktop search system , 2009, CIKM.

[22]  Thomas G. Dietterich,et al.  The life and times of files and information: a study of desktop provenance , 2010, CHI.

[23]  Dutch T. Meyer,et al.  A study of practical deduplication , 2011, TOS.

[24]  Hong Jiang,et al.  Semantic-Aware Metadata Organization Paradigm in Next-Generation File Systems , 2012, IEEE Transactions on Parallel and Distributed Systems.

[25]  Hong Jiang,et al.  SANE: Semantic-Aware Namespacein Ultra-Large-Scale File Systems , 2014, IEEE Transactions on Parallel and Distributed Systems.