Vittorio Ferrari
Authored publications
Google publicationsSemantic classes can be either things (objects with a well-defined shape, e.g. car, person) or stuff (amorphous background regions, e.g. grass, sky). While lots of classifi- cation and detection works focus on thing classes, less at- tention has been given to stuff classes. Nonetheless, stuff classes are important as they allow to explain important aspects of an image, including (1) scene type; (2) which thing classes are likely to be present and their location (through contextual reasoning); (3) physical attributes, ma- terial types and geometric properties of the scene. To un- derstand stuff and things in context we introduce COCO...
View detailsHolger Caesar, Jasper Uijlings, Vittorio Ferrari
CVPR (2018) (to appear)
Object class labeling is the task of annotating images with labels on the presence or absence of objects from a given class vocabulary. Simply asking one yes-no question per class, however, has a cost that is linear in the vocabulary size and is thus inefficient for large vocabularies. While modern approaches rely on a hierarchical organization of the vocabulary to reduce annotation time, they are still expensive (several minutes per image for the 200 classes in ILSVRC). Instead, we propose a new interface where classes are annotated via speech. Speaking is fast and allows for direct access to the class name, without searching through a lis...
View detailsWe introduce Fluid Annotation, an intuitive human-machine collaboration interface for annotating the class label and outline of every object and background region in an image. Fluid Annotation starts from the output of a strong neural network model, which the annotator can edit by correcting the labels of existing regions, adding new regions to cover missing objects, and removing incorrect regions. Fluid annotation has several attractive properties: (a) it is very efficient in terms of human annotation time; (b) it supports full images annotation in a single pass, as opposed to performing a series of small tasks in isolation, such as indicati...
View detailsMisha Andriluka, Jasper Uijlings, Vittorio Ferrari
ACM Multimedia (2018) (to appear)
We introduce Intelligent Annotation Dialogs for bound- ing box annotation. We train an agent to automatically choose a sequence of actions for a human annotator to pro- duce a bounding box in a minimal amount of time. Specifi- cally, we consider two actions: box verification [34], where the annotator verifies a box generated by an object detector, and manual box drawing. We explore two kinds of agents, one based on predicting the probability that a box will be positively verified, and the other based on reinforcement learning. We demonstrate that (1) our agents are able to learn efficient annotation strategies in several scenarios, ...
View detailsKsenia Konyushkova, Jasper Uijlings, Chris Lampert, Vittorio Ferrari
CVPR (2018) (to appear)
We present a unified framework tackling two problems: class-specific 3D reconstruction from a single image, and generation of new 3D shape samples. These tasks have received considerable attention recently; however, existing approaches rely on 3D supervision, annotation of 2D images with keypoints or poses, and/or training with multiple views of each object instance. Our framework is very general: it can be trained in similar settings to these existing approaches, while also supporting weaker supervision scenarios. Importantly, it can be trained purely from 2D images, without ground-truth pose annotations, and with a single view per in...
View detailsPaul Henderson, Vittorio Ferrari
British Machine Vision Conference proceedings (2018)
We propose to revisit knowledge transfer for training object detectors on target classes from weakly supervised training images, helped by a set of source classes with bounding-box annotations. We present a unified knowledge transfer framework based on training a single neural net- work multi-class object detector over all source classes, or- ganized in a semantic hierarchy. This generates proposals with scores at multiple levels in the hierarchy, which we use to explore knowledge transfer over a broad range of gen- erality, ranging from class-specific (bycicle to motorbike) to class-generic (objectness to any class). Experiments o...
View detailsJasper Uijlings, Stefan Popov, Vittorio Ferrari
CVPR (2018) (to appear)
Manually annotating object bounding boxes is central to building computer vision datasets, and it is very time consuming (annotating ILSVRC [53] took 35s for one high-quality box [62]). It involves clicking on imaginary corners of a tight box around the object. This is difficult as these corners are often outside the actual object and several adjustments are required to obtain a tight box. We propose extreme clicking instead: we ask the annotator to click on four physical points on the object: the top, bottom, left- and right-most points. This task is more natural and these points are easy to find. We crowd-source extreme point annotations fo...
View detailsDim Papadopoulos, Jasper Uijlings, Frank Keller, Vittorio Ferrari
ICCV (2017)
Many machine vision applications require predictions for every pixel of the input image (for example semantic segmentation, boundary detection). Models for such problems usually consist of encoders which decreases spatial resolution while learning a high-dimensional representation, followed by decoders who recover the original input resolution and result in low-dimensional predictions. While encoders have been studied rigorously, relatively few studies address the decoder side. Therefore this paper presents an extensive comparison of a variety of decoders for a variety of pixel-wise prediction tasks. Our contributions are: (1) Decoders matter...
View detailsZbigniew Wojna, Vittorio Ferrari, Sergio Guadarrama, Nathan Silberman, Liang-chieh Chen, Alireza Fathi, Jasper Uijlings
BMVC (2017)
Training object class detectors typically requires a large set of images with objects annotated by bounding boxes. However, manually drawing bounding boxes is very time consuming. In this paper we greatly reduce annotation time by proposing center-click annotations: we ask anno- tators to click on the center of an imaginary bounding box which tightly encloses the object instance. We then incor- porate these clicks into existing Multiple Instance Learn- ing techniques for weakly supervised object localization, to jointly localize object bounding boxes over all training im- ages. Extensive experiments on PASCAL VOC 2007 and MS COCO s...
View detailsDim Papadopoulos, Jasper Uijlings, Frank Keller, Vittorio Ferrari
CVPR (2017)
We propose a method to discover the physical parts of an articulated object class (e.g. tiger, horse) from multiple videos. Since the individual parts of an object can move independently of one another, we discover them as object regions that consistently move relatively with respect to the rest of the object across videos. We then learn a location model of the parts and segment them accurately in the individual videos using an energy function that also enforces temporal and spatial consistency in the motion of the parts. Traditional methods for motion segmentation or non-rigid structure from motion cannot discover parts unless they display ...
View details