Mehrsan Javan Roshtkhari

PhD. CIM, McGill University

[Home]         [CV]         [Google Scholar Profile]         [Publications]         [Awards]           [Photography]

Simultaneous Dominant and Rare Event Detection in Videos


Abstract
We present a novel approach for video parsing and simultaneous online learning of dominant and anomalous behaviors in surveillance videos. Dominant behaviours are those occurring frequently in videos and hence, usually do not attract much attention. They can be characterized by different complexities in space and time, ranging from a scene background to human activities. In contrast, an anomalous behaviour is defined as having a low likelihood of occurrence. We do not employ any models of the entities in the scene in order to detect these two kinds of behaviors. In this paper, video events are learnt at each pixel without supervision using densely constructed spatio-temporal video volumes. Furthermore, the volumes are organized into large contextual graphs. These compositions are employed to construct a hierarchical codebook model for the dominant behaviours. By decomposing spatio-temporal contextual information into unique spatial and temporal contexts, the proposed framework learns the models of the dominant spatial and temporal events. Thus, it is ultimately capable of simultaneously modeling high level behaviours as well as low-level spatial, temporal and spatio-temporal pixel level changes.

Download paper                                                Download Poster  CVPR 2013 Poster

Algorithm Overview

       Overview
Figure 1: Video parsing. The input video is parsed into three meaningful components: background, dominant activities (walking pedestrians), and rare activities (the bicyclist).




       Overview
Figure 2: Algorithm overview: behaviour understanding. Behaviours are learnt from local low-level visual information, which is achieved by constructing a hierarchical codebook of the STVs. To capture spatio-temporal configurations of video volumes, a probabilistic framework is employed by estimating probability density functions of the arrangements of video volumes. The uncertainty in the codeword construction of STVs and contextual regions is considered, which makes the final decision more reliable. The high-level output can be employed to simultaneously model normal and abnormal behaviours.


Results: Click HERE to watch a video showing the results.

Belleview Belleview_2671_detected PR_Belleview.jpg
Boat_Sea Boat_Sea_detected PR_BatSea
Train Train Detected PR Traint
a b c
Figure 3: Dominant behaviour understanding and abnormality detection. Experiments with three videos are illustrated from top to bottom in the figure: Belleview, Boat-Sea and Train. The first experiment (first row) is concerned with detecting dominant and abnormal behaviour in a busy traffic scene. The second and third experiments were conducted on videos in which the abnormalities were defined as being rare but nevertheless acceptable foreground motions. The anomalous regions are highlighted in green. Column a) Sample frames from the three videos. Column b) The detected anomalous regions are cars moving from right to left (top), a boat moving to the right (middle), and a moving person (bottom). Column c) Precision/recall curves.



Supplemental Results:
In addition to the results presented in the paper, we have carried out experiments on another real-world video dataset, the subway surveillance dataset (A. Adam, et al., PAMI, 2008). Two actual surveillance videos of a subway station recorded by a camera at the entrance and exit gates have been employed to measure the performance of the proposed algorithm. The exit gate surveillance video is 43 minutes long. Dominant activities in this scene are exiting from the platform and coming up through the turnstiles and turning to the left or right at the top of the stairs. There are also scenes containing 19 anomalous events, mainly walking in the wrong direction and loitering near the exit gate area. Figure 4 shows some frames from this dataset together with the detected anomalies using our approach. The dominant behaviour is highlighted in purple and rare events are highlighted in green.


Click to watch a video showing the results. Simultaneous Dominant and Rare Event Detection

exit_sample_frame_2.jpg
a
Exit_1
b
Exit_2
c
Exit_3http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6619181
d
Figure 4: The subway surveillance video showing the exit gate. Dominant behaviours are highlighted in purple. Anomalous regions are highlighted in green. (a) Shows a sample frame. The dominant behaviours are people exiting from the platform and moving toward camera. The anomalies are shown as follows: (b) a person cleaning the walls. This is marked as anomalous since it is not observed regularly; (c,d ) walking in the wrong direction. In (d) a person is entering through the exit gate.

The second surveillance video shown below was captured at the entrance gate, and is 96 minutes long. It shows the dominant activity, which is going down to the turnstiles and entering the platform. There are also scenes containing 66 rare events, mainly people walking in the wrong direction, irregular interactions between people and some other events, including sudden stopping, running fast, etc. (A. Adam, et al., PAMI, 2008). Figure 5 shows some frames from this dataset together with the detected anomalies using our approach.

Entrance 1
a
Entrance 2
b
Entrance 3
c
Entrance 4
d
Figure 5: The entrance exit gate video. Dominant behaviours are highlighted in purple. Anomalous regions are highlighted in green. (a) Show sample frames of the scene along with dominant behaviours. The anomalies are shown as follows: (b, c) a person is exiting through the entrance gate; (d) two persons are trying to pass through the entrance gate without payment.




Related Publications:
  1. M. Javan Roshtkhari, M. D. Levine, “Online Dominant and Anomalous Behavior Detection in Videos”, IEEE Conference on Computer Vision and Pattern Recognition  (CVPR 2013), June 2013. [pdf] [poster] [videos] [url]
  2. M. Javan Roshtkhari and M. D. Levine, “An on-line, real-time learning method for detecting anomalies in videos using spatio-temporal compositions”, Computer Vision and Image Understanding, 2013. [pdf] [videos] [url]