Title: Data Efficient Reflow for Few Step Audio Generation
Authors: Lemeng Wu, Zhaoheng Ni, Bowen Shi, Gael Le Lan, Anurag Kumar, Varun Nagaraja, Xinhao Mei, Yunyang Xiong, Bilge Soran, Raghuraman Krishnamoorthi, Wei-Ning Hsu, Yangyang Shi, Vikas Chandra
Published: 2024-12-02
Link: https://ieeexplore.ieee.org/abstract/document/10832165/

Abstract

Flow matching has been successfully applied onto generative models, particularly in producing high-quality images and audio. However, the iterative sampling required for the ODE solver in flow matching-based approaches can be time-consuming. Reflow finetune, a technique derived from Rectified flow, offers a promising solution by transforming the ODE trajectory into a straight one, thereby reducing the number of sampling steps. In this paper, we focus on developing data-efficient flow-based approaches for text-to-audio generation. We found that directly applying reflow to the pre-trained flow matching-based audio generation models is typically computationally expensive. It requires over 50,000 training iterations and five times the amount of training data to achieve satisfactory results. To address this issue, we introduce a novel data-efficient reflow (DEreflow) method. This method modifies the reflow data pairs and trajectory to align with the flow matching distribution. As a result of this alignment, our approach requires significantly fewer steps (8,000 compared to 50,000) and data pairs (0.5 times the scale of training data compared to 5 times). Results show that the proposed DEreflow consistently outperforms the original reflow method on the text-to-audio generation task.