Textual Data in Transportation Research: Techniques and Opportunities

Abstract Transportation is rich in human communication. Travellers often use information services, interact with each other and with transport providers through natural language. And the transport providers and regulators (traffic managers, police, transit operators, etc.) need to constantly communicate to coordinate, plan and inform each other. This generates substantial amounts of structured, semistructured or unstructured data, generated within or outside of an organization's boundary. Such data is often rich in information that becomes absent, or oversimplified e.g., aggregated into dummy variables, in actual research or practical applications. If a researcher could transform such richer data into meaningful variables in her models, new opportunities would arise for e.g., better coping with heterogeneity or random effects. Due to recent advances in computational text analysis, textual data have become utilizable to a much higher degree. The objective of this chapter is to introduce techniques for textual data processing, and provide some recent application examples in transportation research.