Bernstein Group for Computational Neuroscience
Home > Projects > Multi-modal emotion recognition and blind source separation

Multi-modal emotion recognition and blind source separation
(Michaelis (leader), Scheich, Wendemuth)

Scientific background and state-of-the-art

While speech recognition forms the basis for verbal communication, an additional analysis of prosodic speech signals, facial expressions, and hand/body movements often proves highly informative or even essential for recognizing the focus of speaker attention and the nature of speaker intention. Emotions, prosody and associated intentions are results of cortical processes. Their registration gives insight into ongoing brain processes. The working hypothesis is that specific brain structures are associated with emotions, prosody, etc. [Taylor 05, Marco 04, Heinzel 05].
Analyzing faces reliably across changes in pose, illumination and expression has proved to bea difficult problem [Blackburn 00, Li 05, Philips 00]. Several approaches to the analysis of emotional expression have been proposed [Li 05, Khuwaja 02, Cohn 02, Kanade 00]. More recent approachesmake use of independent components analysis (ICA) or similar techniques of blind source separation[Cao 03].
Prosodic signals include speech rhythm and accentuation and convey information about the emotional state of the speaker. Typically, both linguistic and prosodic information are processed by the same system [Shriberg 98, Taylor 98, Finke 98]. Common classification techniques include neural networks and hidden-markov models (HMMs). Our approach ignores words and phonemes, focusing instead on the prosodic units defined in the Magdeburg Prosody Corpus [Wendt 02].


