The development of powerful deep learning technologies has also brought about some negative effects. One such issue is the emergence of deepfakes. While most work has focused on fake images and video alone, the multi-modal, audiovisual aspect is very important to both convincing generation and accurate detection of fake multimedia content. In addition to developing accurate and robust detection models, it is worthwhile to explore fake media generation methods as well. Content generation has many meaningful and beneficial applications, such as commercial ads, education, privacy protection, etc. Therefore, fake media generation constitutes an integral component of the proposed workshop. The purpose of the workshop is to provide a platform for researchers and engineers to share their ideas and approaches, and give some insights on fake media generation and detection to both academia and industry.


We invite submissions on a range of AI technologies and applications for media forensics domains. Topics of interest include but are not limited to the following:

  • Fake image generation and/or detection
  • Fake voice generation and/or detection
  • Audiovisual Deepfakes and adversarial attacks
  • Audiovisual Deepfakes and Fairness and Ethics
  • Audiovisual Deepfakes and Data augmentation
  • Audiovisual Deepfake datasets
Important Dates
  • Paper Submission Deadline: August 10th 2021
  • Notification of Acceptance: August 26th 2021
  • Camera-Ready: September 2nd 2021
  • Workshop: October 24th 2021

To participate in the workshop, please refer to the registration information here

Contact Us

If you have any queries, please email us at

CMT Submission Website is available now at

Please choose "Track: 1st Workshop on Synthetic Multimedia – Audiovisual Deepfake Generation and Detection" to submit your workshop paper.

Submissions should follow the ACM Multimedia 2021 format and comprise 6 to 8 pages (refer to the template here:, with up to two additional pages for references.

This year ACM Multimedia will have hybrid conference. The speakers as well as the authors can choose the options to attend the conference either online or physically.

9:00am – 9:10amOpening Address
Deepfake Detection Track​
9:10am – 9:55amKeynote 1: Fighting AI-synthesized Fake Media (Prof Lyu Siwei)
9:55am – 10:40amKeynote 2: Representations for Content Creation, Manipulation and Animation (Dr Sergey Tulyakov)
10:40am – 11:00amPaper: Evaluation of an Audio-Video Multimodal Deepfake Dataset using Unimodal and Multimodal Detectors
11:00am – 11:20amPaper: DmyT: Dummy Triplet Loss for Deepfake Detection
11:20am – 11:30amBreak​
Deepfake Generation Track​​
11:30am – 12:15pmKeynote 3: “Deepfake” Portrait Image Generation (Prof Cai Jianfei)
12:15pm – 12:35pmPaper: Invertable Frowns: Modifying Affect in Existing Videos
Keynote Talks

Title: “Deepfake” Portrait Image Generation

Abstract: With the prevailing of deep learning technology, especially generative adversarial networks (GAN), generating photo-realistic facial images has made huge progress. In this talk, we will review a series of my group’s work on high-quality portrait/facial image generation in the recent years, which can be divided into 3D based approaches and GAN based 2D approaches. For 3D based approach, we will explain how to models facial geometry, reflectance and lighting explicitly. We will also show how such 3D modelling knowledge can be used in portrait manipulation. For 2D GAN based approach, we will present a framework for pluralistic facial image generation from a masked facial input, for which the previous approaches only aim to produce one output. At the end, I will try to give some suggestions for detecting deepfake from a generation point of view.

Title: Fighting AI-synthesized Fake Media

Abstract: Recent years have witnessed an unexpected and astonishing rise of AI-synthesized fake media, thanks to the rapid advancement of technology and the omnipresence of social media. Together with other forms of online disinformation, the AI-synthesized fake media are eroding our trust in online information and have already caused real damage. It is thus important to develop countermeasures to limit the negative impacts of AI-synthesized fake media. In this presentation, Dr. Lyu will highlight recent technical developments to fight AI-synthesized fake media, and discuss the future of AI-synthesized fake media and their counter technology.

Title: Representations for Content Creation, Manipulation and Animation

Abstract: “What I cannot create, I do not understand” said the famous writing on Dr. Feynman’s blackboard. The ability to create or to change objects requires us to understand their structure and factors of variation. For example, to draw a face an artist is required to know its composition and have a good command of drawing skills (the latter is particularly challenging for the presenter). Animation additionally requires the knowledge of rigid and non-rigid motion patterns of the object. This talk shows that generation, manipulation and animation skills of deep generative models substantially benefit from such understanding. Moreover we see, the better the models can explain the data they see during training, the higher quality content they are able to generate. Understanding and generation form a loop in which improved understanding improves generation, improving understanding even more. To show this, I detail our works in three areas: video synthesis and prediction, image animation by motion retargeting. I will further introduce a new direction in video generation which allows the user to play videos as they're generated. In each of these works, the internal representation was designed to facilitate better understanding of the task, resulting in improved generation abilities. Without a single labeled example, our models are able to understand factors of variation, object parts, their shapes, their motion patterns and perform creative manipulations previously only available to trained professionals equipped with specialized software and hardware.

Stefan Winkler is Senior Deputy Director at AI Singapore and Associate Professor (Practice) at the National University of Singapore. Prior to that he was Distinguished Scientist and Director of the Video & Analytics Program at the University of Illinois’ Advanced Digital Sciences Center (ADSC) in Singapore. He also co-founded two start-ups and worked for a Silicon Valley company. Dr. Winkler has a Ph.D. degree from the Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland, and a Dipl.-Ing. (M.Eng./B.Eng.) degree from the University of Technology Vienna, Austria. He is an IEEE Fellow and has published over 150 papers. He has also contributed to international standards in VQEG, ITU, ATIS, VSF, and SCTE.

Weiling Chen is Senior AI Engineer at AI Singapore and National University of Singapore. Prior to that she was Data Scientist at Lazada Group. She received her Ph.D degree from Nanyang Technological University and B.Eng. in Computer Science and Technology from Shandong University. She has won Champion for SeeTrue Workshop organized by Defence Science and Technology Agency (DSTA).

Abhinav Dhall is a lecturer and co-director of the Human-Centred Artificial Intelligence lab at Monash University. He is also an Assistant Professor (on leave) at the Indian Institute of Technology Ropar. He received his Ph.D. degree from the Australian National University. His research has received awards at ACM ICMR, IEEE FG and IEEE ICME.

Dr. Pavel Korshunov is a research associate at the Idiap Research Institute (Martigny, Switzerland). He currently works on detection of deepfakes and audio-visual manipulations, age detection in images and voice, and speech anti-spoofing. He received Ph.D. in Computer Science from School of Computing, National University of Singapore in 2011 and was a postdoctoral researcher at EPFL (Lausanne, Switzerland) and Idiap Research Institute. During his past tenures, he worked on problems related to high dynamic range(HDR) imaging, crowdsourcing, visual privacy in video surveillance.He has over 70 papers with one ACM TOMM journal best paper award(2011), two top 10% best paper awards in MMSP 2014, and a top 10% best paper awards in ICIP 2014. He is also a co- editor of JPEG XT standard for HDR images.

Programme Committee

Ramanathan Subramanian, University of Canberra

Sunpreet Arora, Visa Research

Yisroel Mirsky, Ben Gurion University

Ruben Tolosana, Universidad Autonoma de Madrid

Jyoti Joshi, Kroop AI

Siwei Lyu, University at Buffalo

Rui Shao, Hong Kong Baptist University

Luisa Verdoliva, University Federico II of Naples

This research is supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG-RP-2019-050).
Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.
© Copyright 2021, AI Singapore. All Rights Reserved.