🪴 Anil's Garden

❯

❯

I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

18 Jul 20251 min read

paper
speech
pruning
efficient

Title: I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Authors: Yifan Peng, Jaesong Lee, Shinji Watanabe
Published: 14th March 2023 (Tuesday) @ 04:47:00
Link: http://arxiv.org/abs/2303.07624v1

Abstract

Transformer-based end-to-end speech recognition has achieved great success. However, the large footprint and computational overhead make it difficult to deploy these models in some real-world applications. Model compression techniques can reduce the model size and speed up inference, but the compressed model has a fixed architecture which might be suboptimal. We propose a novel Transformer encoder with Input-Dependent Dynamic Depth (I3D) to achieve strong performance-efficiency trade-offs. With a similar number of layers at inference time, I3D-based models outperform the vanilla Transformer and the static pruned model via iterative layer pruning. We also present interesting analysis on the gate probabilities and the input-dependency, which helps us better understand deep encoders.

Graph View

Backlinks

Efficient Machine Learning
Speech and Audio

Website
Bluesky
Twitter/X
GitHub
LinkedIn
Instagram
Goodreads
Letterboxd
🍋