论文信息 - Identifying behavioural principles underlying activity patterns by means of Bayesian networks

Identifying behavioural principles underlying activity patterns by means of Bayesian networks

Capturing behavioral principles underlying activity-travel patterns is of vital importance to build adequate transportation planning models. Of course, there is no single model, which is perfectly capable of capturing all behavioral patterns but certain techniques seem better suited than others. In this paper, Bayesian networks are introduced. Bayesian networks are potentially very strong representation and visualization techniques since they are capable of capturing the multidimensional nature of complex decisions. Several arguments are presented which clarify why the presented approach is particularly suited to identify behavioral patterns. The approach is illustrated in a study of transport mode choice. Several significant factors which influence transport mode choice decisions were extracted from a larger number of potentially influential factors from activity diaries. Furthermore, the paper reports the findings of a detailed sensitivity analysis. TRB 2003 Annual Meeting CD-ROM Paper revised from original submittal. Janssens, Wets, Brijs, Vanhoof and Timmermans 3 INTRODUCTION A premise underlying activity-based transportation research is that activity-travel patterns are caused by behavioral mechanisms and principles that individuals and households use to organize activities in time and space (1). However, gaining insight into behavioral patterns is not an easy task, since individuals and households use a combination of scheduling principles to cope with the complexity of the decision problem (2, 3). Therefore, transportation planning models which fail to capture the complexity of the underlying decision-making process are bound to produce biased or even wrong results (4). Of course, there is no single model, which is perfectly capable of capturing all behavioral patterns but certain techniques are better suited than others. Choosing the most appropriate technique may already significantly reduce the bias. The goal of the present paper is to explore the potential value of Bayesian networks to identify the complex relationships between a set of factors that cause particular behaviors. More specifically, this paper deals with the identification and the interpretation of a set of interrelated factors, which influence transport mode choice using activity diary data. The use of artificial intelligence techniques to predict transportation mode choice has been advocated before (e.g. 5), but to the best of our knowledge, the use of Bayesian networks in transportation research in general (6) and in activity-based modeling of transportation demand is very limited. In contributing to this line of research, we will argue and demonstrate that Bayesian networks are potentially very powerful representation techniques. They are particularly valuable to capture and visualize the multidimensional nature of complex decisions. They allow one to take into account the many (inter)dependencies that typically exist in complex decision-making processes. Furthermore, the technique is not restricted to the identification of the significant variables but also allows one to quantitatively evaluate the strengths of the relationships and to predict choice probabilities. The paper is organized as follows. In a first section, the data which were used to illustrate the potential of Bayesian networks are described. Next using these data, Bayesian networks are briefly introduced by explaining the difference between structural learning and parameter learning. This is followed by a detailed sensitivity analysis. The paper is completed by a summary of the findings and a discussion of topics for future research. THE DATA The activity diary data used in this study were collected in 1997 in the municipalities of Hendrik-IdoAmbacht and Zwijndrecht in the Netherlands to develop the Albatross model system (7). The data involve a full activity diary, implying that both in-home and out-of-home activities were reported. The sample covered all seven days of the week, but individual respondents were requested to complete the diaries for two designated consecutive days. Respondents were asked, for each successive activity, to provide information about the nature of the activity, the day, start and end time, the location where the activity took place, the transport mode, the travel time, accompanying individuals and whether the activity was planned or not. Open time intervals were used to report the start and end times of activities. A precoded scheme was used for activity reporting. Different administration modes were used. The response rates varied by mode of administration, ranging between 64 and 82%. There was some evidence of differential non-response. After cleaning the data set included 2198 household-day diaries. At the individual level, the data set included 2974 person-day diaries. As will be explained later, the strength of Bayesian networks is to compute the posterior probability distribution of the variable under consideration when evidence is entered in the network. In this study, the variable under consideration (dependent variable) is transport mode choice. We make a distinction between three different transport modes: (N1) slow (walk, bike), (N2) car driver and (N3) car passenger or public transport (bus, train, taxi, etc.). In order to build the Bayesian network, a large set of variables that potentially may influence transport mode choice is identified. These variables describe the cases at different levels of aggregation, including the household/person level, the trip level, the tour level and the activity pattern level. Continuous variables were discretized by using the 1 When it is assumed that the state of one variable or of a combination of variables is known, they can be entered as “evidences” in the network. We will elaborate on this, later on in the text. TRB 2003 Annual Meeting CD-ROM Paper revised from original submittal. Janssens, Wets, Brijs, Vanhoof and Timmermans 4 equal-frequency interval method. This method divides a continuous variable into n parts, each part containing approximately the same number of cases. Table 1 gives an overview of the variables that were used.