Web Mining for Open Source Intelligence

Web mining for open source intelligence is the retrieval, extraction and analysis of information from on-line Internet sites. There are two separate applications areas this paper will review, namely live news-monitoring and targeted topic based data mining. Most newspapers and news agencies have Web sites with live updates on unfolding events, opinions and perspectives on world events. Most governments monitor news reports to feel the pulse of public opinion, and for early warning of emerging crises. The Joint Research Centre has developed significant experience in Internet content monitoring through its work on media monitoring (EMM) for the European Commission. EMM forms the core of the Commission's daily press monitoring service. Intelligence services and law enforcement agencies also require specific site monitoring and topic monitoring, and EMM technology has been applied to the wider Internet for this purpose. The software extracts and downloads all the textual content from monitored sites and applies information extraction techniques. These tools help analysts process large amounts of documents to derive structured data. Lastly the visualisation of the extracted data is important for analysts to identify patterns and trends derived from both news reports and Web mining.