Program
EDT | JST | CEST | Thursday | Friday | Saturday | CEST |
2.30 | 15.30 | 8.30 | Opening | |||
3.00 | 16.00 | 9.00 | Special Synthesis Problems | Articulation and Naturalness | Modeling and Evaluation | 9.00 |
5.00 | 18.00 | 11.00 | break | break | break | 11.00 |
5.10 | 18.10 | 11.10 | Lior Wolf - Keynote 1 | István Winkler - Keynote 2 | Thomas Drugman - Keynote 3 | 11.10 |
6.10 | 19.10 | 12.10 | Morning Session Discussion | Morning Session Discussion 2 | Morning Session Discussion 3 | 12.10 |
7.10 | 20.10 | 13.10 | Articulation and Speech Styles | Emotion, Singing and Voice Conversation | Synthesis and Context | 13.10 |
9.10 | 22.10 | 15.10 | break | break | Closing & SynSIG announcement | 15.10 |
9.20 | 22.20 | 15.20 | Expressive Synthesis | Multilingual and Evaluation | 15.20 | |
11.20 | 0.20 | 17.20 | Afternoon Session Discussion | Afternoon Session Discussion 2 | 17.20 | |
| Social event |
Papers at ISCA website
Each paper has its DOI number. https://www.isca-speech.org/archive/ssw_2021/index.html
absz | ea | ||||||||
August 26., Thursday | |||||||||
Opening8.30 - 9.00 | |||||||||
Géza Németh, Chairman, BME, Hungary | |||||||||
Special Synthesis Problems9.00 - 11.00 | |||||||||
Session Chair: Lior Wolf | |||||||||
Sai Sirisha Rallabandi, Babak Naderi and Sebastian Möller: | |||||||||
Tamás Gábor Csapó: | |||||||||
Martin Lenglet, Olivier Perrotin and Gérard Bailly: | |||||||||
Marc Illa, Bence Mark Halpern, Rob van Son, Laureano Moro-Velazquez and Odette Scharenborg: | |||||||||
Elijah Gutierrez, Pilar Oplustil-Gallegos and Catherine Lai: Location, Location: Enhancing the Evaluation of Text-to-Speech synthesis using the Rapid Prosody Transcription Paradigm | |||||||||
Keynote 111.10 - 12.10 | |||||||||
Session Chair: Erica Cooper Lior Wolf, Facebook AI Research and Tel Aviv University, Israel | |||||||||
Lior Wolf is a research scientist at Facebook AI Research and a full professor in the School of Computer Science at Tel-Aviv University, Israel. He conducted postdoctoral research at prof. Poggio's lab at the Massachusetts Institute of Technology and received his PhD degree from the Hebrew University, under the supervision of Prof. Shashua. He is an ERC grantee and has won the ICCV 2001 and ICCV 2019 honorable mention, and the best paper awards at ECCV 2000 and ICANN 2016. His research focuses on computer vision, audio synthesis, and deep learning. | |||||||||
Morning Session Discussion12.10 - 13.10 | |||||||||
Articulation and Speech Styles13.10 - 15.10 | |||||||||
Session Chair: Esther Klabbers | |||||||||
Tamás Gábor Csapó, Laszlo Toth, Gábor Gosztolya and Alexandra Markó: | |||||||||
Javier Latorre, Charlotte Bailleul, Tuuli Morrill, Alistair Conkie and Yannis Stylianou: | |||||||||
Christina Tånnander and Jens Edlund: | |||||||||
Joakim Gustafson, Jonas Beskow and Eva Szekely: | |||||||||
Csaba Zainkó, László Tóth, Amin Honarmandi Shandiz, Gábor Gosztolya, Alexandra Markó, Géza Németh and Tamás Gábor Csapó: Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging | |||||||||
Expressive Synthesis15.20 - 17.20 | |||||||||
Session Chair: Gábor Olaszy | |||||||||
Bastian Schnell and Philip N. Garner: | |||||||||
Slava Shechtman and Avrech Ben-David: | |||||||||
Bastian Schnell, Goeric Huybrechts, Bartek Perz, Thomas Drugman and Jaime Lorenzo-Trueba: | |||||||||
Abdelhamid Ezzerg, Adam Gabrys, Bartosz Putrycz, Daniel Korzekwa, Daniel Saez-Trigueros, David McHardy, Kamil Pokora, Jakub Lachowicz, Jaime Lorenzo-Trueba and Viacheslav Klimkov: | |||||||||
Lucas H. Ueda, Paula D. P. Costa, Flavio O. Simoes and Mário U. Neto: | |||||||||
Afternoon Session Discussion17.20 - 18.20 | |||||||||
August 27, Friday | |||||||||
Articulation and Naturalness9.00 - 11.00 | |||||||||
Session Chair: Tamás Gábor Csapó | |||||||||
Debasish Ray Mohapatra, Pramit Saha, Yadong Liu, Bryan Gick and Sidney Fels: | |||||||||
Raahil Shah, Kamil Pokora, Abdelhamid Ezzerg, Viacheslav Klimkov, Goeric Huybrechts, Bartosz Putrycz, Daniel Korzekwa and Thomas Merritt: | |||||||||
Paul Konstantin Krug, Simon Stone and Peter Birkholz: | |||||||||
Ambika Kirkland, Marcin Włodarczak, Joakim Gustafson and Eva Szekely: | |||||||||
Alejandro Mottini, Jaime Lorenzo-Trueba, Sri Vishnu Kumar Karlapati and Thomas Drugman: Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments | |||||||||
Keynote 211.10 - 12.10 | |||||||||
Session Chair: Cassia Valentini Botinhao István Winkler, Research Centre for Natural Sciences, Hungary | |||||||||
István Winkler, PhD, DSc, electrical engineer, psychologist. He received his PhD in 1993 at the University of Helsinki, studying auditory sensory memory by electroencephalographic measures. He defended his Doctor of Science thesis in 2005 at the Hungarian Academy of Sciences on auditory deviance detection. His current fields of interest are predictive processing in the auditory deviance detection, auditory scene analysis, communication by sound, and the development of these functions in infancy. During his career, he has authored/coauthored over 250 publications, which received over 11000 references. Currently he is the director of the Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Budapest, Hungary and the head of the Sound and Speech Perception research group (http://www.ttk.hu/kpi/en/sound-and-speech-perception/). | |||||||||
Morning Session Discussion12.10 - 13.10 | |||||||||
Emotion, Singing and Voice Conversation13.10 - 15.10 | |||||||||
Session Chair: Simon King | |||||||||
Konstantinos Markopoulos, Nikolaos Ellinas, Alexandra Vioni, Myrsini Christidou, Panos Kakoulidis, Georgios Vamvoukakis, June Sig Sung, Hyoungmin Park, Pirros Tsiakoulis, Aimilios Chalamandaris and Georgia Maniati: | |||||||||
Jennifer Williams, Jason Fong, Erica Cooper and Junichi Yamagishi: | |||||||||
Erica Cooper, Xin Wang and Junichi Yamagishi: | |||||||||
Hieu-Thi Luong and Junichi Yamagishi: | |||||||||
Patrick Lumban Tobing and Tomoki Toda: Low-latency real-time non-parallel voice conversion based on cyclic variational autoencoder and multiband WaveRNN with data-driven linear prediction | |||||||||
Multilingual and Evaluation15.20 - 17.20 | |||||||||
Session Chair: Junichi Yamagishi | |||||||||
Johannah O'Mahony, Pilar Oplustil-Gallegos, Catherine Lai and Simon King: | |||||||||
Arun Baby, Pranav Jawale, Saranya Vinnaitherthan, Sumukh Badam, Nagaraj Adiga and Sharath Adavane: | |||||||||
Dan Wells and Korin Richmond: | |||||||||
Ayushi Pandey, Sebastien Le Maguer, Julie Berndsen and Naomi Harte: | |||||||||
Jason Fong, Jilong Wu, Prabhav Agrawal, Andrew Gibiansky, Thilo Koehler and Qing He: Improving Polyglot Speech Synthesis through Multi-task and Adversarial Learning | |||||||||
Afternoon Session Discussion17.20 - 18.20 | |||||||||
August 28, Saturday | |||||||||
Modeling and Evaluation9.00 - 11.00 | |||||||||
Session Chair: Gérard Bailly Ammar Abbas, Bajibabu Bollepalli, Alexis Moinet, Arnaud Joly, Penny Karanasou, Peter Makarov, Simon Slangens, Sri Karlapati and Thomas Drugman: | |||||||||
Erica Cooper and Junichi Yamagishi: | |||||||||
Kazuya Yufune, Tomoki Koriyama, Shinnosuke Takamichi and Hiroshi Saruwatari: | |||||||||
Jason Taylor, Sébastien Le Maguer and Korin Richmond: | |||||||||
Qiao Tian, Chao Liu, Zewang Zhang, Heng Lu, Linghui Chen, Bin Wei, Pujiang He and Shan Liu: | |||||||||
Keynote 311.10 - 12.10 | |||||||||
Session Chair: Gustav Eje Henter Thomas Drugman, Amazon, Germany | |||||||||
Thomas Drugman is a Science Manager in Amazon TTS Research team. He received his PhD in 2011 from the University of Mons, winning the IBM Belgium award for “Best Thesis in Computer Science”. His PhD thesis studied the use of glottal source analysis in Speech Processing. He then made a 3-year post-doc on speech/audio analysis for two biomedical applications: trachea-esophageal speech reconstruction and cough detection in chronic respiratory diseases. In 2014, he joined Amazon as a Scientist in the Alexa ASR team. He then transferred to the TTS team in 2016, where he is Science Manager since 2017. He has contributed in making Amazon’s Neural TTS more natural and expressive, notably by enriching Alexa’s experience with different speaking styles: emotions, newscaster, whispering, etc. His current research interests lie in improving the naturalness and flow of longer synthetic speech interactions. He has about 125 publications in the field of Speech Processing. He got the Interspeech Best Student Paper awards in 2009 and 2014 (as supervisor). He is also member of the IEEE Speech and Language Technical Committee since 2019. | |||||||||
Morning Session Discussion12.10 - 13.10 | |||||||||
Synthesis and Context13.10 - 15.10 | |||||||||
Session Chair: Thomas Drugman | |||||||||
Pilar Oplustil-Gallegos, Johannah O'Mahony and Simon King: | |||||||||
Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Naoko Tanji, Yusuke Ijima, Ryo Masumura and Hiroshi Saruwatari: | |||||||||
Mano Ranjith Kumar M, Jom Kuriakose, Karthik Pandia D S and Hema A Murthy: | |||||||||
Marco Nicolis and Viacheslav Klimkov: | |||||||||
Jason Fong, Jennifer Williams and Simon King: Analysing Temporal Sensitivity of VQ-VAE Sub-Phone Codebooks | |||||||||
Closing & SynSIG announcement15.10 - 15.20 | |||||||||