Spotting and Discovering Terms Through Natural Language Processing

by content, whereby the k objects in the database are sought that are most similar to a specific query or object. Text based internet searches are the most obvious applications here, although searches over images and time series (movies) databases are also discussed, along with the clear problems of extracting content and context (“A picture may be worth a thousand words, but which thousand words to use is a non-trivial problem”). The later chapters contain sufficient one line commentary reminders of previously discussed results and ideas to allow them to be persued in a modular fashion, and this is clearly what the Authors intended. Regardless, there are plenty of references to earlier chapters should the reader have skipped ahead overly enthusiastically. If more detail is required than is provided in this book, the close of each chapter thoughtfully provides a “Further Reading” section containing, in total, references to well over four hundred books and articles, including other texts on data mining that have more e.g., computer science or economics orientations. This text is aimed at senior undergraduates and junior graduates who wish to learn about the principles of data mining. An undergraduate background in a quantitative discipline (i.e., computer science, mathematics, engineering, economics etc.) is assumed, and this is perhaps a fair assessment of the scientific level of the book. While the necessary basics in the first four chapters should serve as brief refresher, and perhaps more importantly they are examined from a data mining perspective, the remainder of the text manages to prevent unnecessary technical details from distracting the reader from understanding the main concepts and concerns. This is not to say that the text is overly general—to the Authors credit, detail is provided where it is needed, and there is a good number of specific examples and applications throughout the text in to illustrate or clarify a point—rather that a necessary distance is maintained in order to give a broad overview of the subject material. As a consequence, the discussion has a flowing nature that is both informative and easy to digest. While the text is probably better suited as a teaching aid, rather than as a must-have reference for researchers, those in related fields, or those with merely a general interest in discovering more about the subject material, will find this a highly accessible volume with a sufficiently extensive range of references to serve as a good starting point for further reading.