Bernstein Group for Computational Neuroscience
Home > Projects > Multi-modal emotion recognition and blind source separation

Multi-modal emotion recognition and blind source separation
(Michaelis (leader), Scheich, Wendemuth)

Scientific background and state-of-the-art

While speech recognition forms the basis for verbal communication, an additional analysis of prosodic speech signals, facial expressions, and hand/body movements often proves highly informative or even essential for recognizing the focus of speaker attention and the nature of speaker intention. Emotions, prosody and associated intentions are results of cortical processes. Their registration gives insight into ongoing brain processes. The working hypothesis is that specific brain structures are associated with emotions, prosody, etc. [Taylor 05, Marco 04, Heinzel 05].
Analyzing faces reliably across changes in pose, illumination and expression has proved to bea difficult problem [Blackburn 00, Li 05, Philips 00]. Several approaches to the analysis of emotional expression have been proposed [Li 05, Khuwaja 02, Cohn 02, Kanade 00]. More recent approachesmake use of independent components analysis (ICA) or similar techniques of blind source separation[Cao 03].
Prosodic signals include speech rhythm and accentuation and convey information about the emotional state of the speaker. Typically, both linguistic and prosodic information are processed by the same system [Shriberg 98, Taylor 98, Finke 98]. Common classification techniques include neural networks and hidden-markov models (HMMs). Our approach ignores words and phonemes, focusing instead on the prosodic units defined in the Magdeburg Prosody Corpus [Wendt 02].


Al-Hamadi A., Niese R., Panning A., Michaelis B.: Toward Robust Face Analysis Method of Non-Cooperative Persons in
Stereo Color Image Sequences, Special Issue of the International Journal "Machine Graphics and Vision", 2006, (accepted).

Al-Hamadi A., Panning A., Niese R., Michaelis B.: A model-based image analysis method for extraction and tracking of
facial features in video sequences. CSIT 2006, Vol.(3), pp. 499-509.

Blackburn D., Bone M., Phillips P.: Facial recognition vendor test 2000: Evaluation report, 2000

Cao Y.; Faloutsos P. Pighin F.: Unsupervised Learning for Speech Motion Editing; Eurograhics /SIGGRAPH Symposium on
Computer Animation (2003); D. Breen, M. Lin (Editors).

Cohn, J.F., Xiao, J., Moriyama, T., Ambadar, Z., & Kanade, T.: Automatic recognition of eye blinking in spontaneously
occurring behavior, 2002; Proc. of IEEE Conference on Automatic Face and Gesture Recognition.

Finke M, Lapata M, Lavie A, Levin L, Tomokioyo LM, Polzin T, Ries K, Waibel A, Zechner K (1998): Clarity: Inferring
Discourse Structure from Speech. AAAI Spring Symposium Series, Stanford University California.

Heinzel A, Bermpohl F, Niese R, Pfennig G, Pascual-Leone A, Schlaug G, Northoff G.: How do we modulate our emotions?
Parametric fMRI reveals cortical midline structures as region specifically involved in the processing of emotional valences.
Brain Res Cogn Brain Res . 2005;25:348-58

Kanade, T., Cohn, J.F., and Tian, Y.: Comprehensive Database for Facial Expression Analysis, The 4th IEEE International
Conference on Automatic Face and Gesture Recognition (FG'00), France.

Katz M, Meier HG, Dolfing H, Klakow D (2002): Robustness of linear discriminant analysis in automatic speech
recognition. International Conference on Pattern Recognition (ICPR 2002), Québec Canada, vol. 3, pp. 371-374.

Khuwaja G.J; Laghari, M.S.: A parameter-based combined classifier for invariant facial expression and gender recognition;
Inter. Journal of PR and AI; Vol.16, No.1 (02)27-51.

Li. S.Z., Jain A.K: Handbook of Face Recognition, ISBN: 0-387-40595-X, 2005.

Marco, J., Grau, C., Ruffini, G.: Combined ICA-LORETA analysis of mismatch negativity, Neuroimage 2004.

Michaelis, B.: Zusammengesetzte Messgrößen und ihre Anwendung, Habitilationsschrift, TU Magdeburg, 1980.

Phillips P.J., Moon H., Rizvi S., Rauss P.: The FERET evaluation methodology for face -recognition algorithms. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 22(10):1090-1104, 2000.

Schafföner M., Katz M., Krüger SE., Wendemuth A. (2003): Improved robustness of automatic speech recognition using a
new class definition in linear discriminant analysis. Proc. EUROSPEECH, pp 2841-2844.

Shriberg E., Bates R., Stolcke A., Taylor P., Jurafsky D., Ries K., Coccaro N., Martins R., Meteer M., Van Ess-Dykema C.
(1998): Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech? Language And Speech, vol.
41(3-4), pp. 439 -487.

Taylor J.G.; Scherer K.; Cowie R.: Neural networks, special issue Emotion and Brain. (ed) vol 18(4) May 2005.

Taylor P., King S., Isard S., Wright H. (1998): Intonation and Dialogue Context as Constraints for Speech Recognition.
Language And Speech, vol. 41(3-4), pp. 489 -508.

Wendemuth A. (2001): Modelling Uncertainty of Data Observation, Proc. International Conference On Acoustics, Speech,
and Signal Processing (ICASSP 01), session Speech P12.1, pp. 296-299.

Wendemuth A. (editor) 2003: Proceedings of the Speech Processing Workshop in connection with DAGM (Speech-DAGM),
Magdeburg, 2003. Published by University of Magdeburg. ISBN 3-929757-59-1.

Wendt B. u. Scheich H. (2002) The "Magdeburger Prosodie-Korpus". Proc. Speech Prosody,

Diese Seite: Seite drucken | Seite weiterempfehlen | Seite vorlesen lassen
Letzte Änderung: 30.01.2008 - Ansprechpartner: E-Mail  Webmaster