🪴 Anil's Garden

❯

PLACEHOLDER hertz-dev - Standard Intelligence

18 Jul 20252 min read

speech
paper
annotated
question

Title: PLACEHOLDER hertz-dev - Standard Intelligence Authors: Standard Intelligence Published: 2024-11-03 Link: https://si.inc/hertz-dev/

Abstract

This entry is a placeholder. See 👉 Introducing hertz-dev - Standard Intelligence

Code and models available at GitHub link above: https://github.com/Standard-Intelligence/hertz-dev?tab=readme-ov-file

Main releases

hertz-codec: convolutional audio autoencoder
- encodes mono, 16kHz speech to 8 Hz latent representation at approx. 1kbps bitrate
- outperforms Soundstream and Encodec at 6kbps; on par with DAC at 8kbps in subjective evaluations
- fewer tokens per second than any popular tokenizer (“critical for language modeling”)
- 5 million encoder parameters; 95 million decoder parameters
hertz-vae: 1.8 billion parameter transformer decoder
- acts as a learned prior for the audio VAE
- context of 8192 sampled latent representations $\approx$ 17 minutes
- predicts next encoded audio frame as a mixture of Gaussians
- 15 bits of quantized information from the next token act as semantic scaffolding to steer the generation in a streamable manner
hertz-dev: 6.6 billion parameter transformer stack
- primary checkpoint is partially initialized from weights of a pre-trained language model
  - question Which LM do they initialize with?
- trained for a single epoch on 500B tokens with a 2048-token (4 minute) context length
- “We’re also publishing an ablation of the language model initialization which is similarly trained on 500B tokens.”
- Hertz-dev has a theoretical latency of 65ms and a real-world average latency of 120ms on a RTX 4090
  - question What accounts for the difference in theoretical vs practical latency? I’ve heard these two figures reported in parallel quite frequently.
- This latency is about 2x lower than any other public model in the world
  - question Is it lower than Moshi?

Graph View

Backlinks

Speech and Audio

Website
Bluesky
Twitter/X
GitHub
LinkedIn
Instagram
Goodreads
Letterboxd
🍋

🪴 Anil's Garden

Explorer

PLACEHOLDER hertz-dev - Standard Intelligence

Main releases

Graph View

Backlinks