We present an audio-visual information analysis system for automatic emotion recognition. We propose 
an approach for the analysis of video sequences which combines facial expressions 
observed visually with acoustic features to automatically recognize five universal emotion classes: 
Anger, Disgust, Happiness, Sadness and Surprise. The visual component of our system evaluates 
the facial expressions using a bank of 20 Gabor filters that spatially sample the images. 
The audio analysis is based on global statistics of voice pitch and intensity along with the 
temporal features like speech rate and Mel Frequency Cepstrum Coefficients. We combine the two 
modalities at feature and score level to compare the respective joint emotion recognition rates. 
The emotions are instantaneously classified using a Support Vector Machine and the temporal 
inference is drawn based on scores obtained as the output of the classifier. This approach is 
validated on a posed audio-visual database and a natural interactive database to test the 
robustness of our algorithm. The experiments performed on these databases provide encouraging 
results with the best combined recognition rate being 82%.