Inferring colocation and conversation networks from privacy-sensitive audio with implications for computational social science

New technologies have made it possible to collect information about social networks as they are acted and observed in the wild, instead of as they are reported in retrospective surveys. These technologies offer opportunities to address many new research questions: How can meaningful information about social interaction be extracted from automatically recorded raw data on human behavior? What can we learn about social networks from such fine-grained behavioral data? And how can all of this be done while protecting privacy? With the goal of addressing these questions, this article presents new methods for inferring colocation and conversation networks from privacy-sensitive audio. These methods are applied in a study of face-to-face interactions among 24 students in a graduate school cohort during an academic year. The resulting analysis shows that networks derived from colocation and conversation inferences are quite different. This distinction can inform future research in computational social science, especially work that only measures colocation or employs colocation data as a proxy for conversation networks.

[1]  J. Davis Clustering and Hierarchy in Interpersonal Relations: Testing Two Graph Theoretical Models on 742 Sociomatrices , 1970 .

[2]  P. Holland,et al.  The Statistical Analysis of Local Structure in Social Networks , 1974 .

[3]  P. Holland,et al.  Local Structure in Social Networks , 1976 .

[4]  P. Killworth,et al.  Informant Accuracy in Social Network Data , 1976 .

[5]  P. Killworth,et al.  INFORMANT ACCURACY IN SOCIAL NETWORK DATA II , 1977 .

[6]  Lawrence R. Rabiner,et al.  On the use of autocorrelation analysis for pitch detection , 1977 .

[7]  P. Killworth,et al.  Informant accuracy in social network data III: A comparison of triadic structure in behavioral and cognitive data , 1979 .

[8]  P. Killworth,et al.  Informant accuracy in social network data IV: a comparison of clique-level structure in behavioral and cognitive network data , 1979 .

[9]  P. Killworth,et al.  Informant accuracy in social-network data V. An experimental attempt to predict actual communication from recall data☆ , 1982 .

[10]  L. Freeman,et al.  Cognitive Structure and Informant Accuracy , 1987 .

[11]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[12]  K. Hawkins Some Consequences of Deep Interruption in Task-Oriented Communication , 1991 .

[13]  L. Freeman Filling in the Blanks: A Theory of Cognitive Categories and the Structure of Social Affiliation , 1992 .

[14]  Steven R. Corman,et al.  A synchronous digital signal processing method for detecting face-to-face organizational communication behavior☆ , 1994 .

[15]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[16]  E. Lazega,et al.  Position in formal structure, personal characteristics and choices of advisors in a law firm: A logistic regression model for dyadic network data , 1997 .

[17]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[18]  K. Fischer,et al.  DESPERATELY SEEKING EMOTIONS OR: ACTORS, WIZARDS, AND HUMAN BEINGS , 2000 .

[19]  R. Cowie,et al.  A new emotion database: considerations, sources and scope , 2000 .

[20]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[21]  Mitchel Resnick,et al.  Folk computing: designing technology to support face-to-face community building , 2002 .

[22]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[23]  Sumit Basu,et al.  Conversational scene analysis , 2002 .

[24]  Sumit Basu A linked-HMM model for robust voicing and speech detection , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[25]  Alex Pentland,et al.  Sensing and modeling human networks using the sociometer , 2003, Seventh IEEE International Symposium on Wearable Computers, 2003. Proceedings..

[26]  Roddy Cowie,et al.  Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[27]  Samy Bengio,et al.  Modeling human interaction in meetings , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[28]  R. T. Hurlburt,et al.  Descriptive Experience Sampling Demonstrates the Connection of Thinking to Externally Observable Behavior , 2002, Cognitive Therapy and Research.

[29]  Yan Bing Zhang,et al.  Social Interactions Across Media , 2004 .

[30]  Alex Pentland,et al.  Sensing and modeling human networks , 2004 .

[31]  Lee D. Davisson,et al.  An Introduction To Statistical Signal Processing , 2004 .

[32]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[33]  Iain McCowan,et al.  Clustering and segmenting speakers and their locations in meetings , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[34]  Steve Renals,et al.  Multi-stream segmentation of meetings , 2004, IEEE 6th Workshop on Multimedia Signal Processing, 2004..

[35]  N. Campbell The Recording of Emotional Speech-JST / CREST database research - , 2004 .

[36]  Douglas A. Reynolds,et al.  Approaches and applications of audio diarization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[37]  Samy Bengio,et al.  Detecting group interest-level in meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[38]  Blake Hannaford,et al.  A Hybrid Discriminative/Generative Approach for Modeling Human Activities , 2005, IJCAI.

[39]  Alex Pentland,et al.  Reality mining: sensing complex social systems , 2006, Personal and Ubiquitous Computing.

[40]  Dieter Fox,et al.  Gaussian Processes for Signal Strength-Based Location Estimation , 2006, Robotics: Science and Systems.

[41]  Gueorgi Kossinets,et al.  Empirical Analysis of an Evolving Social Network , 2006, Science.

[42]  Errol C. Caby An Introduction to Statistical Signal Processing , 2006, Technometrics.

[43]  Xavier Anguera Miró ROBUST SPEAKER DIARIZATION FOR MEETINGS , 2006 .

[44]  José Manuel Pardo,et al.  Robust Speaker Diarization for meetings , 2006 .

[45]  Mikko Kivelä,et al.  Generalizations of the clustering coefficient to weighted complex networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  M. Morris,et al.  Do People Mix at Mixers? Structure, Homophily, and the “Life of the Party” , 2007 .

[47]  A. Barabasi,et al.  Analysis of a large-scale weighted network of one-to-one human communication , 2007, physics/0702158.

[48]  Darren Leigh,et al.  The MERL motion detector dataset , 2007, MD '07.

[49]  Jeff A. Bilmes,et al.  Conversation detection and speaker segmentation in privacy-sensitive situated speech data , 2007, INTERSPEECH.

[50]  A. Barabasi,et al.  Quantifying social group evolution , 2007, Nature.

[51]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[52]  J. Brian Burns,et al.  Recovering Social Networks From Massive Track Datasets , 2008, 2008 IEEE Workshop on Applications of Computer Vision.

[53]  Peter Kampstra,et al.  Beanplot: A Boxplot Alternative for Visual Comparison of Distributions , 2008 .

[54]  Jeff A. Bilmes,et al.  COSINE - A corpus of multi-party COnversational Speech In Noisy Environments , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[55]  S. Goodreau,et al.  Birds of a feather, or friend of a friend? using exponential random graph models to investigate adolescent social networks* , 2009, Demography.

[56]  Jane Yung-jen Hsu,et al.  Probabilistic models for concurrent chatting activity recognition , 2009, TIST.