Extending Web Mining to Digital Forensics Text Mining

As digital devices become increasingly integral to our daily lives, so too is the prevalence of digital evidence in the form of unstructured, textual data. Such data exists in both cyber and non-cyber crime cases. As a result, text mining is an important forensic technique to digital forensic investigators. However, text mining in the digital forensics domain is a non-trivial task, as investigators must locate relevant search hits amongst millions of investigatively non-relevant hits that are in-fact responsive to the search query. This emergent research tackles the problem of exceedingly poor information retrieval overhead by reviewing extant web mining ranking algorithms, explaining why they cannot be simply extended to digital forensic text mining, and proposing a new digital forensic text mining ranking algorithm, using PageRank as its basis. Future work is on-going and focused on lexical ontology development and validating the proposed algorithm.