Intelligent Multimodal Video Analysis Group

Current Research

Detection of Penalties in Hockey Broadcast Videos

Farzaneh Askari (farzaneh.askari[at]mail.mcgill.ca)

In recent years, the computer vision community has been showing a growing interest in extracting rich information within broadcast video data and performing tasks such as object localization and recognition, semantic segmentation, action recognition and event detection in pictorial data. An interesting field of study is applying these techniques to derive different events in the videos of sports matches. Our research group develops algorithms to address these problems in ice hockey broadcasts. Penalties are an important part of ice hockey. They occur frequently and lead to stoppages during the game. The penalties are mostly enforced by sending the offending player to the penalty box for a prescribed number of minutes. This generally causes the teams to change their style of playing during that time. This project focuses on the detection of penalties in ice hockey broadcasts. More specifically, it will use the information from different data modalities (i.e., audio, text, and video) to detect and analyze the penalty scenes in a hockey game.

Face-off Summarization in Hockey Broadcast Videos

Seby Jacob (seby.jacob[at]mail.mcgill.ca)

Face-offs are very important set-pieces in hockey. The team that gains possession after the face-off controls the game at this point in time, which can ultimately lead to plays that may end in a goal. Hence, it is imperative to understand how a team lines up and conducts the face-off. This project aims to develop a system that can summarize the details of a face-off in a hockey game. Consequently, it can aid in developing statistics and an understanding of the techniques a team plays in order to win a face-off. We will be using deep-learning, natural language processing and computer vision to develop this system.

Multimodal Detection and Description of Events in videos

Zahra Vaseqi (zahra.vaseqi[at]mail.mcgill.ca)

Although, detecting an event, such as a goal-scoring event, may seem trivial for a human observer; it turns out to be an extremely challenging task for an automated process. An event can be defined as an abstract concept composed of definite actions and objects. For example, a goal-scoring event in Hockey may take place when a player hits the puck such that it crosses into the opponent's net. Using the aforementioned definition for an event, we leverage a hierarchical approach towards understanding objects and actions that lead to various events. To this end, we utilize the existing methods in semantic object recognition and action recognition. Semantic object recognition algorithms in still images have been heavily studied and they achieve a high level of task accuracy. While action recognition techniques have also been extensively researched in recent years; they mostly address the task in constrained settings. In our project, we aim to extend the action recognition techniques from still images to spatio-temporal space in untrimmed videos.

The temporal dimension in videos provide us with valuable motion clues which may be used to improve action recognition performance; this, however, comes at a computational cost as we need to process individual frames in a video along with temporal data that relates subsequent frames. This highlights the need for a compact while effective video representation scheme to develop computationally efficient techniques for action recognition and event detection. As part of my research on event detection in videos, I also aim to understand how to optimally represent a video while examining various trade-offs that affect task performance.

Hockey Player Tracking

Grant Zhao (nan.zhao2[at]mail.mcgill.ca)

Ice hockey is one of the most popular sports in North America. How to understand the game better using sports analytics has become a major concern for the hockey industry. Due to the rapid improvement in the field of computer vision, it has become an attractive option to analyze the game based on the broadcast video. The goal of this project is to use computer vision and deep learning to analyze hockey broadcast videos. More specifically, to develop a system that takes in a raw hockey broadcast video and outputs the individual so-called player tracklets. The latter is defined as the collection of sequential bounding boxes of a single player on the ice from the frame he appears in the field of view until the frame in which this player disappears. Issues such as temporary player occlusion will be considered.

High-speed Puck Detection and Tracking in Broadcast Videos

Neo Yang (xionghao.yang[at]mail.mcgill.ca)

The recent developments in using deep learning techniques for computer vision applications has enabled a wide range of new applications. For sports broadcasting, computer vision is playing a more important role. Hockey is one of the most popular sports in the world. Using computer vision can provide comprehensive game data to coaches, commentators, and hockey fans. In hockey television broadcasts, the puck moves too fast to be constantly tracked by audiences. In order to solve this problem using deep learning, a dataset which contains views of the puck, player, stick, and goal net is being created. This project focuses on developing an algorithm which consists of video enhancement and small object detection and tracking to capture the details of the puck’s track in a hockey broadcast video.

Video Summarization in Hockey Broadcasts

Rick Wu (teng.vu[at]mail.mcgill.ca)

In the past decade, online video has risen to become one of the most common ways of sharing and consuming media. Today, large platforms such as YouTube, Vimeo, and Twitch publicly distribute countless hours of video over the internet to millions of viewers. With so much video content readily available, browsing or processing these data is difficult due to the long duration of a typical hockey broadcast video. To address this problem, automatic video summarization has been investigated as an effective method of reducing the duration of a video while still retaining the most "important" components. Summarized videos are an excellent way for users to quickly assess the content of a video as well as a way to enjoy the highlights without the time commitment. This project focuses on researching methods for automatically generating these video summaries. Specifically, our objective is to design a deep network capable of automatically generating highlight videos from full hockey broadcasts by utilizing the video, audio, and text data.