Multi-modal ChaLearn Gesture Recognition

Problem

Devices such as Kinect can record movement but there is a classification problem when it comes to identifying specific movements. There is an annual ChaLearn conference on multi-modal gesture recognition from audio and RGB-D video data. The focus of the challenge is on “multiple instance, user independent learning” of gestures, which means learning to recognize gestures from several instances for each category performed by different users, drawn from a gesture vocabulary of 20 categories. A gesture vocabulary is a set of unique gestures, generally related to a particular task. In this challenge we will focus on the recognition of a vocabulary of 20 Italian cultural/anthropological signs. The challenge features a quantitative evaluation of automatic gesture recognition from a multi-modal dataset recorded with Kinect (providing RGB images of face and body, depth images of face and body, skeleton information, joint orientation and audio sources), including around 14,000 Italian gestures from several users.

Background

The aim of this project is on multi-modal automatic learning of a vocabulary of 20 types of Italian anthropological/cultural gestures performed by different users, with the aim of performing user independent continuous gesture recognition combined with audio information. There are more details and the actual training data at: http://gesture.chalearn.org/mmdata.

References

[1] Thomas B. Moeslund and Lau Nørgaard, "A Brief Overview of Hand Gestures used in Wearable Human Computer Interfaces", Technical report: CVMT 03-02, ISSN: 1601-3646, Laboratory of Computer Vision and Media Technology, Aalborg University, Denmark.

[2] Wi-Fi and gesture. http://www.washington.edu/news/2013/06/04/wi-fi-signals-enable-gesture-recognition-throughout-entire-home/