|Simultaneous Dominant and Rare Event Detection in Videos|
We present a novel approach for video parsing and simultaneous online learning of dominant and anomalous behaviors in surveillance videos. Dominant behaviours are those occurring frequently in videos and hence, usually do not attract much attention. They can be characterized by different complexities in space and time, ranging from a scene background to human activities. In contrast, an anomalous behaviour is defined as having a low likelihood of occurrence. We do not employ any models of the entities in the scene in order to detect these two kinds of behaviors. In this paper, video events are learnt at each pixel without supervision using densely constructed spatio-temporal video volumes. Furthermore, the volumes are organized into large contextual graphs. These compositions are employed to construct a hierarchical codebook model for the dominant behaviours. By decomposing spatio-temporal contextual information into unique spatial and temporal contexts, the proposed framework learns the models of the dominant spatial and temporal events. Thus, it is ultimately capable of simultaneously modeling high level behaviours as well as low-level spatial, temporal and spatio-temporal pixel level changes.
Download paper Download Poster
Figure 1: Video parsing. The input video is parsed into three meaningful components: background, dominant activities (walking pedestrians), and rare activities (the bicyclist).
Figure 2: Algorithm overview: behaviour understanding. Behaviours are learnt from local low-level visual information, which is achieved by constructing a hierarchical codebook of the STVs. To capture spatio-temporal configurations of video volumes, a probabilistic framework is employed by estimating probability density functions of the arrangements of video volumes. The uncertainty in the codeword construction of STVs and contextual regions is considered, which makes the final decision more reliable. The high-level output can be employed to simultaneously model normal and abnormal behaviours.
Results: Click HERE to watch a video showing the results.
In addition to the results presented in the paper, we have carried out experiments on another real-world video dataset, the subway surveillance dataset (A. Adam, et al., PAMI, 2008). Two actual surveillance videos of a subway station recorded by a camera at the entrance and exit gates have been employed to measure the performance of the proposed algorithm. The exit gate surveillance video is 43 minutes long. Dominant activities in this scene are exiting from the platform and coming up through the turnstiles and turning to the left or right at the top of the stairs. There are also scenes containing 19 anomalous events, mainly walking in the wrong direction and loitering near the exit gate area. Figure 4 shows some frames from this dataset together with the detected anomalies using our approach. The dominant behaviour is highlighted in purple and rare events are highlighted in green.
Click to watch a video showing the results.
The second surveillance video shown below was captured at the entrance gate, and is 96 minutes long. It shows the dominant activity, which is going down to the turnstiles and entering the platform. There are also scenes containing 66 rare events, mainly people walking in the wrong direction, irregular interactions between people and some other events, including sudden stopping, running fast, etc. (A. Adam, et al., PAMI, 2008). Figure 5 shows some frames from this dataset together with the detected anomalies using our approach.