The slides of the two talks are available here.
For many years digital multimedia data was sparse. Photos, video and audio recordings had to be digitized and took a lot of - at that time precious - data storage space. This has changed a lot over the last 10 years. Digital image and video recording equipment is available at reasonable prices and many people can afford and use it. With the advent of internet broadband connections and the availability of multimedia sharing platforms like Facebook, Flickr, and YouTube, uploading, sharing and consumption of multimedia data became easy and even ubiquitous. However, academic problems, which have been identified and discussed before this era of availability, are now even more pressing and recent. Three of the most famous "gaps" are: (i) the semantic gap, i.e. Computers are not able to interpret the semantics of visual information automatically in full detail, (ii) the use context gap, i.e. the context in which the pictures are taken and used is not known and can hardly be inferred from the pictures' contents, and (iii) the intention (query) gap, i.e. the query of a user at search is short and does not reflect the actual information need of the user sufficiently in many cases.
A lot of effort has been put in bridging those gaps and many researchers focused on the semantic gap. Still, an ultimate solution has not (yet) been found. In this talk innovative and novel approaches for integrating users' context into multimedia information systems through user intentions - the goals or aims of users - are discussed. First user motivation for image and video retrieval is investigated and models for describing the aims of users on a higher level and their application to content based image retrieval are discussed. Then user intentions for creating multimedia data are explored by presenting and discussing the results of recent empirical study. Finally, an outlook on future possibilities and challenges is given.
Dr. Mathias Lux is Senior Assistant Professor at the Institute for Information Technology (ITEC) at Klagenfurt University. He is working on user intentions in multimedia retrieval and production and emergent semantics in social multimedia computing. In his scientific career he has (co-) authored more than 70 scientific publications, has served in multiple program committees and as reviewer of international conferences, journals and magazines, and has organized several scientific events. He is also well known for the development of the award winning and popular open source tools Caliph & Emir and LIRe (http://www.semanticmetadata.net) for multimedia information retrieval.
The large success of online social platforms for creation, sharing and tagging of user-generated media has lead to a strong interest by the multimedia and computer vision communities in research on methods and techniques for annotating and searching social media. Visual content similarity, geo-tags and tag co-occurrence, together with social connections and comments, can be exploited to perform tag suggestion as well as to perform content classification and clustering and enable more effective semantic indexing and retrieval of visual data. However there is need to countervail the relatively low quality of these metadata user produced tags and annotations are known to be ambiguous, imprecise and/or incomplete, overly personalized and limited - and at the same time take into account the ‘web-scale’ quantity of media and the fact that social network users continuously add new images and create new terms. We will review the state of the art approaches to automatic annotation and tag refinement for social images and discuss extensions to tag suggestion and localization in web video sequences.