An Intelligent and Autoadaptive System of Virtual Identities Based on Deep Learning for the Analysis of Online Advertising Networks

Marketing is one of the areas that benefits the most from web platforms. Several online marketing techniques have been developed throughout the years that focus on determining what is the most effective displayable content depending on the population target. In order to analyze the advertisements displayed to each population segment, a system based on Virtual Identities is proposed. In this system, each population target is represented by a virtual identity, that navigates the internet according to a scripted behavior. In each of the visited webpages, advertising content is displayed, that needs to be detected, located and subsequently analyzed to extract patterns. In this work, a Natural Language Processing model is presented, where advertising detection is treated as a binary classification problem, predicting for each block composing the webpage whether it is, or is not, commercial content. Two different approaches are considered for the input, depending on whether the HTML markup text is removed or not. Furthermore, several text embedding and predictive models are evaluated in order to select the best model according to the presented input.