Structured Machine Learning for Data Analytics and Modeling: Intelligent Security as an Example

Structured machine learning refers to learning a structured hypothesis from data with rich internal structure. We apply semantics-enabled (semi-)supervised learning for perfect and imperfect domain knowledge to fulfill the vision of structured machine learning for big data analytics and modeling. First, domain knowledge is modeled as RDF(S) ontologies, and SPARQL enables approximate queries for a type-labeled training dataset from ontologies to exploit a feature combination of a machine learning for hypothesis testing. Then, the existing type-labeled instances are used for classifying type-unlabeled new instances with the validation of testing dataset errors. Finally, these newly type-labeled instances are further forwarded to the structured ontologies to empower the ontology and rule learning. The proposed concepts have been tested and verified for intelligent security with the real KDD CUP 1999 datasets.

[1]  Thomas G. Dietterich,et al.  Structured machine learning: the next ten years , 2008, Machine Learning.

[2]  Marcus A. Maloof,et al.  Machine Learning and Data Mining for Computer Security , 2006 .

[3]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[4]  Marcus A. Maloof MACHINE LEARNING AND DATA MINING FOR COMPUTER SECURITY: METHODS AND APPLICATIONS , 2011 .

[5]  Dilpreet Singh,et al.  A survey on platforms for big data analytics , 2014, Journal of Big Data.

[6]  Johannes Fürnkranz,et al.  Foundations of Rule Learning , 2012, Cognitive Technologies.

[7]  P. Schyns,et al.  Concept learning , 1998 .

[8]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[9]  Luc De Raedt,et al.  ILP turns 20 , 2011, Machine Learning.

[10]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[11]  Achim Rettinger,et al.  Towards Machine Learning on the Semantic Web , 2008, URSW.

[12]  Yuh-Jong Hu,et al.  Composite Big Data Modeling for Security Analytics , 2016 .

[13]  Katsumi Inoue,et al.  ILP turns 20 - Biography and future challenges , 2012, Mach. Learn..

[14]  Salvatore J. Stolfo,et al.  A framework for constructing features and models for intrusion detection systems , 2000, TSEC.

[15]  M. Behlol,et al.  Concept of Learning , 2010 .

[16]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[17]  Mohammad Zulkernine,et al.  Random-Forests-Based Network Intrusion Detection Systems , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[18]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[19]  Matthias Seeger,et al.  A Taxonomy for Semi-Supervised Learning Methods , 2006, Semi-Supervised Learning.

[20]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[21]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.