Tutorials

Abstract

Analyzing human behaviour in videos is one of the fundamental problems of computer vision and multimedia understanding. The task is very challenging as video is an information-intensive media with large variations and complexities in content. With the development of deep learning techniques, researchers have strived to push the limits of human behaviour understanding in a wide variety of applications from action recognition to event detection. This tutorial will present recent advances under the umbrella of human behavior understanding, which range from the fundamental problem of how to learn “good" video representations, to the challenges of categorizing video content into human action classes, finally to multimedia event detection and surveillance event detection in complex scenarios.

Tutorial content

Human Behavior Understanding: From Action Recognition to Complex Event Detection (3 Hours)

Instructors

Ting Yao , JD AI Research, Beijing, China

Jingen Liu , JD AI Research, Mountain View, CA, USA

Abstract

Intelligent image/video editing is a fundamental topic in image processing which has witnessed rapid progress in the last two decades. Due to various degradations in the image and video capturing, transmission and storage, image and video include many undesirable effects, such as low resolution, low light condition, rain streak and rain drop occlusions. The recovery of these degradations is ill-posed. With the wealth of statistic-based methods and learning-based methods, this problem can be unified into the cross-domain transfer, which cover more tasks, such as image stylization.

In our tutorial, we will discuss recent progresses of image stylization, rain streak/drop removal, image/video super-resolution, and low light image enhancement. This tutorial covers both traditional statistics based and deep-learning based methods, and contains both biological-driven model, i.e. Retinex model, and data-driven model. An image processing viewpoint that considers the popular deep networks as a traditional Maximum-a-Posteriori (MAP) Estimation is provided. The side priors, designed by researchers and learned by multi-task learnings, and automatically learned priors, captures by adversarial learning are two kinds of important priors in this framework. Three works under this framework, including single image super-resolution, low light image enhancement, and single image raindrop removal are presented.

Single image super-resolution is a classical problem in computer vision. It aims at recovering a high-resolution image from a single low-resolution image. This problem is an underdetermined inverse problem, of which solution is not unique. In this tutorial, we will discuss how we can solve the problem by deep convolutional networks in a data-driven manner. We will review different model variants and important techniques such as adversarial learning for image super-resolution. We will then discuss recent work on hallucinating faces of unconstrained poses and with very low resolution. Finally, the tutorial will discuss challenges of implementing image super-resolution in real-world scenarios.

Tutorial content

Intelligent Image Enhancement and Restoration - from Prior Driven Model to Advanced Deep Learning (3 Hours)

Instructors

Jiaying Liu , Peking University, Beijing, China

Wenhan Yang , National University of Singapore, Singapore

Chen Change Loy , Nanyang Technological University, Singapore

Abstract

Personal photo and video data are being accumulated at an unprecedented speed. For example, 14 petabytes of personal photos and videos were uploaded to Google Photo1 by 200 million users in 2015, while a tremendous amount of personal photos and videos are also being uploaded to Flickr every day. How to efficiently search and organize such data presents a huge challenge to both academic research and industrial applications.

To attack this challenge, this tutorial will review the research efforts in related subjects and showcases of successful industrial systems. We will discuss traditional visual search methods and the improvement of visual presentations brought by deep neural networks. The instructors will also share their experience of building large-scale fashion search and Flickr similarity search systems and bring insights on the challenges of extending the academic research to industrial applications.

This tutorial will discuss the queries and logs of search engines, and analyze how to address the characteristics of personal media search. By leveraging searching techniques to visual question answering, this tutorial will introduce a new task named MemexQA: given a collection of photos or videos from the user, can we automatically answer questions that help users recover their memory about events captured in the collection? New datasets and algorithms of MemexQA will be reviewed. We hope MemexQA will shed light on the next generation computer interface of exploding amount of personal photos and videos.

Tutorial content

IntelligentVisual Search and Question Answering (3 Hours)

Instructors

Lu Jiang , Google Cloud AI, Sunnyvale, CA, USA

Yannis Kalantidis , Facebook Research, Oakland, CA, USA

Liangliang Cao , University of Massachusetts, Amherst, MA, USA

Abstract

Tutorial content

Instructors

Jieping Ye , DiDi Research, Beijing, China

Tutorial Chairs

Jiebo Luo
University of Rochester, USA
Zheng-Jun Zha
University of Science and Tech of China, China