报告题目：On the Connection between Vision and Language
报告人：Dr. Hanwang Zhang（Research Scientist，Department of Computer Science, Columbia University, USA）
We are experiencing an unprecedented evolution of deep learning techniques. As a holy grail of Artificial Intelligence, the task of connecting vision and language perhaps gains the most appreciable benefits in recent years. For example, today’s machines are able to outperform humans in large-scale visual recognition, describe or answer questions about an image/video in natural language; and none of the above is thinkable just a decade ago. In this talk, I will first provide a brief retrospect of the progress of the connection in the pre- and current deep learning era. Then, I will introduce our recent works in addressing what is missing in the state-of-the-art paradigm, including the groundings of open-vocabulary, social networks, and deeper scene understanding. At last, several interesting future research directions are discussed.
Dr. Hanwang Zhang is currently a research scientist at the Department of Computer Science, Columbia University, USA. He has received the B.Eng (Hons.) degree in computer science from Zhejiang University, Hangzhou, China, in 2009, and the Ph.D. degree in computer science from the National University of Singapore in 2014. His research interest includes computer vision, multimedia, and social media. Dr. Zhang is the recipient of the Best Demo runner-up award in ACM MM 2012, the Best Student Paper award in ACM MM 2013, and the Best Paper Honorable Mention in ACM SIGIR 2016. He is also the winner of Best Ph.D. Thesis Award of School of Computing, National University of Singapore, 2014. Dr. Zhang serves as an associate editor in Neurocomputing and MTAP, and reviewers in various journals and conferences such as CVPR, MM, TIP, TMM, TOMCCAP, TCSVT, and Neurocomputation.