With the proliferation of threats of leakage of sensitive information such as military classified documents, information guards have recently gained increased interest. An information guard is merely a filter than controls the content of the exchanged information between two domains where one of them has a higher confidentiality level than the other one. The main role of an information guard is to block leakage of the sensitive information from the higher confidentiality domain to the lower confidentiality domain. An example of a higher confidentiality domain is a military network while a subcontractor network is an example of a lower confidentiality domain. The common practice is to use an automatic information guard based on predefined list of words that is called "dirty word list" in order to decide the security level of a document and consequently release it to the lower confidentially domain or block it. Traditional information guards are configured manually based on the notion of "Dirty Lists". The classification logic of traditional information guards uses the occurrence of words from the "Dirty Lists". In this paper, we advocate the use of machine learning as a corner stone for building advanced information guards. Machine learning can also be used as a supplement to the decision obtained based on "Dirty Lists" classification. Machine learning has hardly been analysed for this problem, and the analysis on topical classification presented here provides new knowledge and a basis for further work within this area. Ten different machine learning algorithms were applied on real life data from a military context. Presented results are promising and demonstrates that machine learning can become a useful tool to assist humans in determining the appropriate security label of an information object.
[1]
Stian Fauskanger,et al.
Policy-based labelling: A flexible framework for trusted data labelling
,
2015,
2015 International Conference on Military Communications and Information Systems (ICMCIS).
[2]
Hassan Mathkour,et al.
Automatic Information Classifier Using Rhetorical Structure Theory
,
2005,
Intelligent Information Systems.
[3]
Michael Schmeing,et al.
Secure Service Oriented Architectures (SOA) Supporting NEC [Architecture orientée service (SOA) gérant la NEC]
,
2009
.
[4]
Fabrizio Sebastiani,et al.
Machine learning in automated text categorization
,
2001,
CSUR.
[5]
Daniel Charlebois,et al.
Security Classification Using Automated Learning (SCALE): Optimizing Statistical Natural Language Processing Techniques to Assign Security Labels to Unstructured Text
,
2010
.
[6]
Richard Kissel,et al.
Glossary of Key Information Security Terms
,
2014
.
[7]
Sander Oudkerk,et al.
Content-Based Protection and Release Architecture for Future NATO Networks
,
2013,
MILCOM 2013 - 2013 IEEE Military Communications Conference.