November 20, 2019
Speakers: Aykut Erdem, Erkut Erdem
Title: Towards An Understanding of Procedural Commonsense Knowledge Through Images and Text
One of the long-standing goals of artificial intelligence is to build computers that have a deep understanding of the world and reason about its various aspects. Towards this end, deep learning based techniques have made significant progress on many fronts in computer vision and natural language processing fields. More recently, there has been an interest in moving beyond conventional text understanding or visual recognition tasks to explore the interaction between vision and language. In this talk, I will highlight our recent efforts on understanding procedural commonsense knowledge in a multimodal setting. In particular, I will first present RecipeQA — a multimodal machine comprehension dataset based on cooking recipes collected from the web, which involves a number of reasoning tasks. For a machine to master in these tasks, it should be able to make sense of both visual and textual data, identifying key entities, keeping track of their state changes, and understanding temporal and causal relations. In the second part, I’ll talk about Procedural Reasoning Networks — a new entity-aware neural comprehension model augmented with external relational memory units. While reading the text instructions, our model learns to dynamically update entity states in relation to each other and exploits this information when reasoning about the recipe. We show that this greatly improves our previously reported results on RecipeQA.
Aykut Erdem is an Associate Professor in the Department of Computer Engineering at Hacettepe University and a co-founder of the Hacettepe University Computer Vision Laboratory (HUCVL). The broad goal of his research is to explore better ways to understand, interpret and manipulate visual data. His research interests are centered around multimodal learning with vision and language. He received his BSc and MSc degrees from Middle East Technical University in 2001 and 2003. During his doctoral studies at the same institution, he was a guest researcher at Virginia Tech in the summer of 2004, and a visiting scholar at MIT in the fall of 2007. After completing his doctorate in 2008, he worked as a post-doctoral researcher at the Ca’Foscari University in Venice under the EU-FP7 SIMBAD project from 2008 to 2010. For more information, visit his webpage at https://web.cs.hacettepe.edu.tr/~aykut
Title: Teaching Machines to Manipulate Natural Scene Images
In the past few years, much progress has been made towards image synthesis. Specifically, Generative Adversarial Networks (GANs) have shown a great promise in producing photo-realistic images. In this talk, I will describe our recent work on semantic manipulation of natural outdoor images in which we utilize GANs as an image prior. In particular, we explore a two-stage framework that enables users to directly adjust high-level transient attributes of a natural image. The key component of our approach is a conditional GAN model which can hallucinate images of a scene as if they were taken at a different season, weather condition or time of day. Once a plausible scene is hallucinated with the given attributes, the corresponding look is then transferred to the original input image while preserving its semantic details intact, giving a photo-realistic result.
Erkut Erdem is an Associate Professor in the Department of Computer Engineering at Hacettepe University and one of the founders of Hacettepe University Computer Vision Laboratory (HUCVL). He received his Master’s and Ph.D. degrees in Computer Science from Middle East Technical University in 2003 and 2008, respectively. He pursued his post-doctoral research at Télécom ParisTech, Ecole Nationale Supérieure des Télécommunications during 2009-2010. His research interests are on computer vision and machine learning with applications to image editing and integrated vision and language. For more information, visit his webpage at http://web.cs.hacettepe.edu.tr/~erkut/