We present an audio-visual information analysis system for automatic emotion recognition. We propose an approach for the analysis of video sequences which combines facial expressions observed visually with acoustic features to automatically recognize five universal emotion classes: Anger, Disgust, Happiness, Sadness and Surprise. The visual component of our system evaluates the facial expressions using a bank of 20 Gabor filters that spatially sample the images. The audio analysis is based on global statistics of voice pitch and intensity along with the temporal features like speech rate and Mel Frequency Cepstrum Coefficients. We combine the two modalities at feature and score level to compare the respective joint emotion recognition rates. The emotions are instantaneously classified using a Support Vector Machine and the temporal inference is drawn based on scores obtained as the output of the classifier. This approach is validated on a posed audio-visual database and a natural interactive database to test the robustness of our algorithm. The experiments performed on these databases provide encouraging results with the best combined recognition rate being 82%.