My interests include computer vision, computational linguistics, and machine learning and how these areas can collaborate best.
- 07/2016: 3 ECCV papers on fully, semi-, and un-supervised grounding, Segmentation from Natural Language Expressions, and Generating Visual Explanations
- 06/2016: We won the Visual Question Answering (VQA) challenge using Multimodal Compact Bilinear Pooling, try your images and questions: demo.berkeleyvision.org
- 05/2016: Our video description research has been featured in the magazine MaxPlanckForschung: “Digtial Storytellers” (in German: “Der digitale Bildreporter“)
- 03/2016: 3 CVPR orals: Neural Module Networks for VQA, Natural Language Object Retrieval and Describing Novel Object Categories without Paired Training Data;
- 03/2016: NAACL Best Paper Award: Learning to Compose Neural Networks for Question Answering
- 02/2016: We are considering holding a second edition of the challenge during the upcoming ECCV’16 and would like to get your short (1 minute) feedback: http://goo.gl/forms/jzRAKfUTYI
- 01/2016: Learning to Compose Neural Networks for Question Answering
- 11/2015: ArXiv season: Check out our work on Neural Module Networks for VQA, “unsupervised” Grounding of Textual Phrases in Images by Reconstruction, Natural Language Object Retrieval, and Describing Novel Object Categories without Paired Training Data
- 10/2015: I got the dissertation award for the best thesis in 2014 from the German Patter Recognition Society (DAGM) [pdf of my thesis]
- 10/2015: Our paper “The Long-Short Story of Movie Description” got the GCPR 2015 Honorable Mention prize.
- 09/2015: Check out the arXives of our three ICCV 2015 papers about Video description, Question answering (oral), and one on Spatial Semantic Regularisation for Large Scale Object Detection
- 08/2015: Paper accepted at IJCV and published online. See our project page for the dataset MPII Cooking 2.
- 07/2015: We will hold a workshop and challenge on video description @ ICCV 2015