- Website: https://pyf98.github.io/
I am a final-year Ph.D. student in the Department of Electrical and Computer Engineering at Carnegie Mellon University. I am fortunate to be supervised by Prof. Shinji Watanabe (Sep 2021 - now) and Prof. Ian Lane (Aug 2020 - Aug 2021; now at UC, Santa Cruz). I received my bachelorâs degree from the Department of Electronic Engineering at Tsinghua University in 2020.
In Summer 2024, I was an AI Research Intern at NVIDIA NeMo, where I worked on joint speech-text language models. In Summer 2023, I was a research scientist intern at Meta AI FAIR and worked on speech language models for voice-preserved textless speech-to-speech translation. In Summer 2022, I worked as a speech recognition intern at ASAPP about speech model compression.
My research area is speech and language processing. My Ph.D. thesis is to develop effective and efficient open speech foundation models. I have led the project of Open Whisper-style Speech Models (OWSM) at CMU WAVLab, developing the first large-scale, fully open speech foundation model from academia. Recently, I am also interested in integrating speech capabilities into large language models.
I published first-authored papers at top-tier AI/speech conferences, such as ICML, ACL, ICASSP, and INTERSPEECH. Several projects received notable recognition, including the Best Paper Award at SLT 2024, Best Paper Award at EMNLP 2024, Top 3% Paper Recognition at ICASSP 2023 (3 papers), and Best Student Paper Award Finalist at SPIE Medical Imaging 2020. I also contribute to a widely used speech processing toolkit, ESPnet. Specifically, I have been the primary contributor to several major projects:
- Novel speech encoder architecture: Branchformer (ICMLâ22), E-Branchformer vs Conformer (INTERSPEECHâ23)
- Speech model compression: I3D (ICASSPâ23 Top 3%), HJ-Pruning (ICASSPâ23 Top 3%), DPHuBERT (INTERSPEECHâ23)
- Open speech foundation models: OWSM (ASRUâ23), OWSM v3.1 (INTERSPEECHâ24), OWSM-CTC (ACLâ24)
- Speech language models: SpeechLM analysis, MSLM-S2ST, VoiceTextBlender, and more to follow