Suyoun Kim, Ph.D.

Research Scientist
Meta, AI Speech

Email: suyounkim [at] meta [dot] com

Download my CV »
Linkedin GoogleScholar

I am a Research Scientist at Facebook, AI Speech. I completed my Ph.D in Electrical and Computer Engineering at Carnegie Mellon University, where I worked with Professors Richard M. Stern, and Florian Metze. I draw on Speech Recognition, Deep Learning, and Machine Learning, and develop Conversational AI and related technologies. I received my M.S. in Language Techonologies Institute , School of Computer Science at Carnegie Mellon University.


Research Interests

    Speech Recognition, Spoken Dialog System, Deep Learning, Machine Learning, and Conversational AI

Ph.D. Dissertation


Paper

  • Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding
    Suyoun Kim, Akshat Shrivastava, Duc Le, Ju Lin, Ozlem Kalinli, Michael L. Seltzer
    in INTERSPEECH, 2023
    [paper]
  • Introducing Semantics into Speech Encoders
    Derek Xu, Shuyan Dong, Changhan Wang*, Suyoun Kim*, Zhaojiang Lin*, Akshat Shrivastava, Shang-Wen Li, Liang-Hsuan Tseng, Alexei Baevski, Guan-Ting Lin, Hung-yi Lee, Yizhou Sun, Wei Wang
    in ACL, 2023
    [paper]
  • Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition
    Suyoun Kim, Ke Li, Lucas Kabela, Rongqing Huang, Jiedan Zhu, Ozlem Kalinli, Duc Le
    in Findings of EMNLP, 2022
    [paper]
  • Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric
    Suyoun Kim, Duc Le, Weiyi Zheng, Tarun Singh, Abhinav Arora, Xiaoyu Zhai, Christian Fuegen, Ozlem Kalinli, Michael L Seltzer
    in INTERSPEECH, 2022
    [paper]
  • Deliberation Model for On-Device Spoken Language Understanding
    Duc Le, Akshat Shrivastava, Paden Tomasello, Suyoun Kim, Aleksandr Livshits, Ozlem Kalinli, Michael L Seltzer
    in INTERSPEECH, 2022
    [paper]
  • Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding
    Suyoun Kim, Abhinav Arora, Duc Le, Ching-Feng Yeh, Christian Fuegen, Ozlem Kalinli, Michael L Seltzer
    in INTERSPEECH, 2021
    [paper]
  • Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion
    Duc Le, Mahaveer Jain, Gil Keren, Suyoun Kim, Yangyang Shi, Jay Mahadeokar, Julian Chan, Yuan Shangguan, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Michael L Seltzer
    in INTERSPEECH, 2021
    [paper]
  • Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer
    Suyoun Kim, Yuan Shangguan, Jay Mahadeokar, Antoine Bruguier, Christian Fuegen, Michael L Seltzer, Duc Le
    in ICASSP, 2021
    [paper]
  • Improving RNN transducer based ASR with auxiliary tasks
    Chunxi Liu, Frank Zhang, Duc Le, Suyoun Kim, Yatharth Saraf, Geoffrey Zweig
    in SLT, 2021
    [paper]
  • End-to-End Speech Recognition on Conversations
    Suyoun Kim
    Ph.D. Dissertation, 2019
    [dissertation]
  • Cross-Attention End-to-End ASR for Two-Party Conversations
    Suyoun Kim, Siddharth Dalmia, Florian Metze
    in INTERSPEECH, 2019
    [paper]
  • Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion
    Suyoun Kim, Siddharth Dalmia, Florian Metze
    in ACL, 2019
    [paper]
  • Acoustic-to-Word Models with Conversational Context Information
    Suyoun Kim, Florian Metze
    in NAACL, 2019
    [paper]
  • Dialog-context aware end-to-end speech recognition
    Suyoun Kim, Florian Metze
    in SLT, 2018
    [paper]
  • Situation Informed End-to-End ASR for CHiME-5 Challenge
    Suyoun Kim*, Siddharth Dalmia*, Florian Metze
    in CHiME Workshop, 2018
    [paper]
  • Improved training for online end-to-end speech recognition systems
    Suyoun Kim, Michael L. Seltzer, Jinyu Li, Rui Zhao
    in INTERSPEECH, 2018
    [paper]
  • Towards Language-universal end-to-end speech recognition
    Suyoun Kim, Michael L. Seltzer
    in ICASSP, 2018 [selected to give the oral presentation]
    [paper]
  • Hybrid CTC/Attention Architecture for End-to-End Speech Recognition
    Shinji Watanabe, Takaaki Hori, Suyoun Kim, John R Hershey, Tomoki Hayashi
    IEEE Journal of Selected Topics in Signal Processing, 2017
    [paper]
  • End-to-End Speech Recognition with Auditory Attention for Multi-Microphone Distance Speech Recognition
    Suyoun Kim, Ian Lane,
    in INTERSPEECH, 2017
    [paper]
  • Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning
    Suyoun Kim, Takaaki Hori, Shinji Watanabe,
    in ICASSP, 2017 [selected to give the oral presentation]
    [paper]
  • Multi-Channel Speech Recognition: LSTMs All the Way Through
    Hakan Erdogan, Tomoki Hayashi, John R. Hershey, Takaaki Hori, Chiori Hori, Wei-Ning Hsu, Suyoun Kim, Jonathan Le Roux, Zhong Meng, and Shinji Watanabe
    in CHiME Workshop, 2016
    [paper]
  • Recurrent Models for Auditory Attention in Multi-Microphone Distant Speech Recognition
    Suyoun Kim, Ian Lane,
    in INTERSPEECH, 2016
    [paper]
  • Environmental Noise Embeddings for Robust Speech Recognition
    Suyoun Kim, Bhiksha Raj, Ian Lane,
    in arXiv, 2016
    [paper]
  • Recurrent Models for Auditory Attention in Multi-Microphone Distant Speech Recognition (earlier version)
    Suyoun Kim, Ian Lane,
    in ICLR workshop, 2016
    [paper]
  • Multimodal Transfer Deep Learning with an Application in Audio-Visual Recognition
    Seungwhan Moon, Suyoun Kim, Haohan Wang,
    in NIPS workshop, 2015
    [paper]
  • Impact of nano-scale through-silicon vias on the quality of today and future 3D IC designs
    Dae Hyun Kim, Suyoun Kim, Sung Kyu Lim,
    Proceedings of the System Level Interconnect Prediction Workshop. IEEE Press, 2011
    [paper]

Patent

  • Attention-based Neural Networks for Multi-Microphone Speech Recognition, Provisional Patent Application 2016-127

Professional Experience

  • Microsoft Research (MSR), Speech and Dialog Research Group, Summer 2017
    Research Intern, responsible for research on speech recognition
  • Carnegie Mellon University, Electrical and Computer Engineering, 2014 - 2019
    Research Assistant, responsible for research on speech recognition
  • Mitsubishi Electric Research Laboratories (MERL), Speech & Audio Lab., Summer 2016
    Research Intern, responsible for research on End-to-end speech recognition system
    Collaboration with Shinji Watanabe, and Takaaki Hori
  • Carnegie Mellon University, School of Computer Science, LTI, 2012 - 2014
    Research Assistant, responsible for research on computational biology, protein protein interaction, and drug repositioning
  • Samsung Electronics, Visual Display Division, 2005 - 2012
    Software Engineer, responsible for development of Internet Protocol Set-top Box software on embedded linux system
  • Samsung Software Membership, 2004 - 2005

Awards and Honors


Teaching


Courses