论文信息 - Implementation of a Slogan Generator

Implementation of a Slogan Generator

Generation of slogans for companies, products or similar entities is a creative task that is difficult to automate. In this paper we describe our attempt of tackling this problem by combining computational linguistics, semantic resources and genetic algorithms. Introduction Use of computers for support or automation of tasks in creative industries is on the rise. Several such tools and methods emerged in recent years for various problems. Generation of slogans is one of the less supported problems in this field. There are some online tools available1, which seem to use templating and provide results of such a kind. To the best of our knowledge, there is only one scientific study dedicated particularly to slogan (and other creative sentences) generation, namely the BRAINSUP framework (Özbal, Pighin, and Strapparava 2013). The BRAINSUP approach emphasises user’s control of the generation process. Namely, by user-provided keywords, domain, emotions and similar properties of the slogans, the user has a lot of control over the generation process. This is practically very useful, as it shrinks the huge search space of slogans and improves the quality of results. In our work, on the other hand, we aim at a completely autonomous approach, which is not influenced by the user in any way, apart from being provided by a short textual description of the target entity. In this paper, we present our current approach, which follows the BRAINSUP framework, but also deviates from it with several modifications. At the core of our slogan generation procedure we use a genetic algorithm (GA) (Bäck 1996), which ensures good coverage of the search space, and a collection of heuristic slogan evaluation functions. ∗Authors are affiliated also to the Jožef Stefan International Postgraduate School, Jamova cesta 39, 1000 Ljubljana, Slovenia. This research was partly funded by the European Union, European Social Fund, in the framework of the Operational Programme for Human Resources Development, by the Slovene Research Agency and supported through EC funding for the project ConCreTe (grant number 611733) and project WHIM (grant number 611560) that acknowledge the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission. http://slogan4u.com ; http://www.sloganizer.net/en/ Resources Our slogan generation method requires some specific resources, such as a collection of frequent grammatical relations. Here we list these resources, describe their acquisition methodology and provide some illustrative examples. Database of existing slogans The database of exisitng slogans serves as a basis for the initial population generation and for comparison with generated slogans. There is a large number of known slogans for different companies and products available online and there are specialized Web pages that contain collections of slogans. However, none of those sources contain all the necessary information, so we constructed our own database in which each instance consists of: slogan, company/product name, official Web site URL and Wikipedia site URL. Currently the database contains 1041 slogans. Here is an example instance: ["Just do it.", "Nike", "http://www.nike.com/", "http://en.wikipedia.org/wiki/Nike"]. Database of frequent grammatical relations Frequent grammatical relations between words in sentences were used in some of our processes. For their acquisition we used the Stanford Dependencies Parser (Marneffe, MacCartney, and Manning 2006). Stanford dependencies are triplets containing two words, called governor and dependent, and the name of the relation between them. The parser also provides part-of-speech (POS) tags and phrase structure trees. To get representatives of frequent grammatical relations between words, we parsed 52,829 random Wikipedia pages, sentence by sentence, and obtained 4,861,717 different dependencies. Each dependency consists of: name of the relation, governor, governor’s POS tag, dependent, dependent’s POS tag and the number of occurrences. Database of slogan skeletons All the gathered known slogans were parsed with the Stanford Dependencies Parser. Grammatical structure of each slogan, without the content words, was then stored in a database. Each skeleton contains information about each position in the sentence its POS tag and all its dependency relations with other words in the sentence. For example, skeleton of the slogan "Just do it" is [[[’advmod’, ’***’, ’VB’, ’***’, ’RB’], [’2’, ’1’]], [[’dobj’, ’***’, ’VB’, ’***’, ’PRP’], [’2’, ’3’]]]. Here the first part tells us that the first word (RB adverb) is adverbial modifier of the second word (VB verb), and the second part indicates that the third word (PRP pronoun) is a direct object of the second word. Slogan generation In this section we describe our slogan generation approach in terms of its inputs, outputs and algorithmic steps. INPUT consists of two items: (1) a textual description of a company or a product, and (2) the algorithm parameters: evaluation function weights, mutation and crossover probabilities, size of the initial population and maximal number of genetic algorithm iterations. OUTPUT is a set of generated slogans. ALGORITHMIC STEPS are the following: 1. Parse the input text for keywords and the main entity. 2. Generate the initial population from random skeletons. 3. Evaluate the slogans and select parents for reproduction. 4. Produce a new generation using crossover and mutations. 5. Repeat steps 3. and 4. until predetermined quality of slogans or maximal number of iterations is achieved. Extraction of keywords and the main entity This first step is achieved using the Nodebox English Linguistics library2. The main entity is obtained by selecting the most frequent entity in the whole text using nltk library (Bird, Klein, and Loper 2009). Example of the keywords and the entity, extracted from the Coca Cola Wikipedia page: keywords = [’win’, ’produce’, ’celebrate’, ’using’, ’marketing’, ’north’, ’likely’, ’drink’, ’century’, ’diet’, ’production’, ’root’, ’product’, ’beverage’, ’water’, ’image’, ’sugar’,... ’] entity = ‘Coke’ Generation of the initial population of slogans The procedure of generating the initial population of slogans is based on the BRAINSUP framework (Özbal, Pighin, and Strapparava 2013), with some modifications and additions. It follows these steps: 1. Select a random slogan skeleton from the database. 2. Choose an empty position, which has the largest number of dependency relations in the sentence. Find the set of all possible fillers for that position. Fillers are words from the database of all grammatical relations between words and must satisfy all predefined dependencies and POS tags. 3. Find the intersection between the set of all possible fillers and the set of keywords. If the obtained set is not empty, choose a random word from it and fill the empty position. In case of an empty intersection, choose random word from the 20% of most frequent possible fillers, and fill the empty position. 4. Repeat steps 2 and 3 until all the empty spots are filled. 5. Check if the generated slogan contains any entities. If it does, replace them with the company entity. 6. Repeat steps from 1 to 5 until the initial population of the predetermined size is built. http://nodebox.net/code/index.php/Linguistics Evaluation of slogans To order the slogans by their quality, an aggregated evaluation function was constructed. It is composed of 10 different sub functions, each assessing a particular feature of a slogan with scores in the interval [0,1]. Parameter of the aggregation function is a list of 10 weights that sum to 1. They define the proportions of sub functions in the overall score. 2-gram function In order to work with 2-grams, we obtained the data set of 1,000,000 most frequent 2-grams and 5000 most frequent words in Corpus of Contemporary American English 3(COCA). The 2-gram evaluation score should to some degree represent the relatedness between words in slogan. We assume that slogans containing many frequent 2-grams, are more likely to make sense. The 2gram evaluation score is computed in the following manner: 1. Assign a score to every 2-gram in the slogan: • if 2-gram is among most frequent 2-grams: score = 1, • else if one word is an entity and the other is among 5000 most frequent words: score = 0.75, • else if one word is among 5000 most frequent words and the other is not: score = 0.5, • else score 0 2. Sum the scores of all 2-grams and divide it by the number of all 2-grams in the slogan. length function This function assigns score 1 to slogans with less than 8 words, and score 0 to longer ones. diversity function The diversity function evaluates a slogan by counting the number of repeated words. The highest score goes to a slogan with no repeated words. If a slogan contains identical consecutive words, it receives score 0. entity function It returns 1, if slogan contains the main entity, and 0, if it doesn’t. keywords function If one up to half words in a slogan belong to the set of keywords, the keywords function returns 1. If a slogan doesn’t contain any keyword, the score is 0. If more than half of the words in the slogan are keywords, the score is 0.75. word frequency function This function prefers slogans with many frequent words, as we assume that slogans which contain a lot of infrequent words are not good. The score is obtained by dividing the number of frequent words by the number of all words in the slogan. Word is considered to be frequent, if it is among 5000 most frequent words in COCA. polarity and subjectivity functions To calculate the polarity and subjectivity scores based on the adjectives in the slogan, we used the sentiment function from pattern package for Python (De Smedt and Daelemans 2012). We also integrated the weight score from SentiWordNet (Baccianella, Esuli, and Sebastiani 2010), which assigns to each word

Martin Znidarsic | Gregor Papa | Polona Tomasic

[1] Christopher D. Manning,et al. Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[2] Gözde Özbal,et al. BRAINSUP: Brainstorming Support for Creative Sentence Generation , 2013, ACL.

[3] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[4] Ewan Klein,et al. Natural Language Processing with Python , 2009 .

[5] Thomas Bäck,et al. Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .

[6] Walter Daelemans,et al. Pattern for Python , 2012, J. Mach. Learn. Res..

[7] Andrea Esuli,et al. SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.