11 JUNE 2020, 1:00 UTC (9:00 AM ET)

Your player may be in pause and your sound muted when you arrive.  Please click play on the video and un-mute to join the workshop LIVE.

Recent progress in deep learning has significantly advanced the performance in understanding actions in videos. We start by presenting an approach for localizing spatio-temporally actions. We describe how action tublets result in state-of-the-art performance for action detection and show how modeling relations with objects and humans further improve the performance. Next we introduce an approach for behavior prediction for self-driving cars. We conclude by giving results how to use multi-modals information in video understanding.

Cordelia Schmid holds a permanent researcher position at Inria since 1997, where she is a research director. Starting 2018 she has a joint appointment with Google research. She has published more than 300 articles, mainly in computer vision. She has been editor-in-chief for IJCV (2013--2018), a program chair of IEEE CVPR 2005 and ECCV 2012 as well as a general chair of IEEE CVPR 2015, ECCV 2020 and ICCV 2023. In 2006, 2014 and 2016, she was awarded the Longuet-Higgins prize for fundamental contributions in computer vision that have withstood the test of time. She is a fellow of IEEE. She was awarded an ERC advanced grant in 2013, the Humboldt Research Award in 2015 and the Inria & French Academy of Science Grand Prix in 2016. She was elected to the German National Academy of Sciences, Leopoldina, in 2017. In 2018, she received the Koenderink Prize for fundamental contributions in computer vision. She received the Royal Society Milner award in 2020.

Enter a brief, clearly worded question below and submit. Questions will be answered during the webcast in order of popularity and as time permits.

Be sure to visit the "Most Popular Questions" tab and vote.
* Required fields