🪴 Anil's Garden
Search
Search
Dark mode
Light mode
Explorer
Clippings
‘A place of joy’ why scientists are joining the rush to Bluesky
‘Monsters The Lyle and Erik Menendez Story’ Has One Great Episode, but Doesn’t Know What to Do With It Vanity Fair
🦅 Eagle 7B Soaring past Transformers with 1 Trillion Tokens Across 100+ Languages
2.4 Scaling Laws AI Safety, Ethics, and Society Textbook
3 Montreal Forced Aligner Corpus Phonetics Tutorial
3. Data model — Python 3.12.4 documentation
3.10. Fundamental frequency (F0) — Introduction to Speech Processing
4.9. usrlocal Local hierarchy
5 free monospaced fonts with coding ligatures Better Web Type
5.2 Model formulation and estimation Notes for Predictive Modeling
6.3 Rejection Sampling Advanced Statistical Computing
7.1 Background Advanced Statistical Computing
7.2 Metropolis-Hastings Advanced Statistical Computing
69 Best London AI Startups to Watch in 2024
90 Linux Commands frequently used by Linux Sysadmins
100M Token Context Windows — Magic
A Beginner's Guide to the proc File System in Linux
A complete guide to carbon offsetting Carbon offsetting The Guardian
A Comprehensive Guide to Building a Transformer Model with PyTorch DataCamp
A decoder-only foundation model for time-series forecasting – Google Research Blog
A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes
A Hitchhiker’s Guide to Speculative Decoding PyTorch
A Map of the Territory · Crafting Interpreters
A New Approach to the Data-Deletion Conundrum
A Practical Guide to fzf Shell Integration
A pyproject.toml Developer’s Cheat Sheet by Ricardo Mendes Better Programming
A Simplified Guide to Dynamic Programming
A Timeline of Large Transformer Models for Speech Jonathan Bgn
A Visual Guide to SSH Tunnels Local and Remote Port Forwarding
A Visual Guide to Vision Transformers MDTURP
about ICLR Blogposts 2024
About LAION
About Sofía Valdés
About - 152334H
About me - Wei-Ning Hsu (徐煒甯)
About workflows - GitHub Docs
Accelerating Generative AI with PyTorch II GPT, Fast PyTorch
Accelerating Generative AI with PyTorch IV Seamless M4T, fast PyTorch
Accelerating Generative AI with PyTorch Segment Anything, Fast PyTorch
Accuracy Benchmarking Speechmatics
ACL Policies for Review and Citation - Admin Wiki
Acoustic Word Embeddings for Low Resource Speech Processing with Herman Kamper - TWiML Talk 191 - YouTube
Adding Bluesky-powered comments to any website in five minutes Cory Zue
Advanced features — ocrmypdf 11.7.2 documentation
Advanced Iterators - Dive Into Python 3
Advanced Topics in Machine Learning
Advanced Topics in Machine Learning (COMP0083) UCL Module Catalogue - UCL – University College London
AI chip start-up Groq’s value rises to $2.8bn as it takes on Nvidia
AI Index Report 2024 – Artificial Intelligence Index
AI Is a Black Box. Anthropic Figured Out a Way to Look Inside WIRED
AI’s Walking Dog - Boston Review
AI2 Dolma 3 Trillion Token Open Corpus for LLMs AI2 Blog
Ai2 OpenScholar Scientific literature synthesis with retrieval-augmented language models Ai2
aiXplain Secures $6.5M pre-Series A to Universalize AI Agent Development - EIN Presswire
Aldo Lipani, PhD – CV
algorithm - Insertion Sort vs. Selection Sort - Stack Overflow
All Algorithms Sort Visualizations
All Nobel Prizes 2024 - NobelPrize.org
All Watched Over by Machines of Loving Grace (TV series) - Wikipedia
All Watched Over By Machines Of Loving Grace by Richard Brautigan - Famous poems, famous poets. - All Poetry
Allen Institute for AI - Wikipedia
An early peek at Dia, our second product A recruiting video - YouTube
An In-depth Guide to Benchmarking LLMs Symbl.ai
An Interactive Guide To The Fourier Transform – BetterExplained
An Introduction to LLM Benchmarking - Confident AI
An Introduction to the Mamba LLM Architecture A New Paradigm in Machine Learning DataCamp
An Intuitive Explanation of Connectionist Temporal Classification
An Intuitive Explanation of Connectionist Temporal Classification by Harald Scheidl Towards Data Science
An Opinionated Guide to ML Research
An Overview of Multi-Task Learning in Speech Recognition
Anaphora (linguistics) - Wikipedia
Anatomize Deep Learning with Information Theory Lil'Log
Anchoring Bias - The Decision Lab
André Martins was awarded an ERC Consolidator Grant to study artificial neural networks applied to natural language processing
Andrej Karpathy on X How is LLaMa.cpp possible great post by @finbarrtimbers httpst.coyF43inlY87 llama.cpp surprised many people (myself included) with how quickly you can run large LLMs on small computers, e.g. 7B runs @ ~16 toks on a M
Andrej Karpathy on X Speculative execution for LLMs is an excellent inference-time optimization. It hinges on the following unintuitive observation forwarding an LLM on a single input token takes about as much time as forwarding an LLM o
Announcing Grok
Announcing Tower An Open Multilingual LLM for Translation-Related Tasks
ANOM – Darknet Diaries
Answer.AI - Lessons from history’s greatest R&D labs
Antisymmetric Matrix -- from Wolfram MathWorld
Antium - Wikipedia
Anyone can Access Deleted and Private Repository Data on GitHub ◆ Truffle Security Co.
Anyone looking for VMware Fusion Player for Mac Found it rvmware
Aperiodic Functions From Fourier Series to Fourier Transform
Apple AI researchers boast useful on-device model that ‘substantially outperforms’ GPT-4 - 9to5Mac
Apple’s MM1 AI Model Shows a Sleeping Giant Is Waking Up WIRED
Application binary interface - Wikipedia
apt update vs apt-get update Differences Explained!
Arbitrary-precision arithmetic - Wikipedia
Are remote workers more productive That’s the wrong question. - Stack Overflow
Are there any licenses out there with LLM usage restrictions ropensource
argparse — Parser for command-line options, arguments and sub-commands — Python 3.12.6 documentation
Associative Learning - an overview ScienceDirect Topics
Async IO in Python A Complete Walkthrough – Real Python
Audio Language Models and Multimodal Architecture by Deepak Babu P R Mar, 2024 Medium
Audio sample rate converters comparison
Audio-based Machine Learning Model for Podcast Language Identification - Spotify Research Spotify Research
Autograd mechanics — PyTorch 2.5 documentation
AVSpeech Audio Visual Speech Dataset
Barack Obama on AI, free speech, and the future of the internet - The Verge
Base64 - Wikipedia
Baseline OpenAI end-to-end chat reference architecture - Azure Reference Architectures | Microsoft Learn
Baseline OpenAI end-to-end Chat Reference Architecture - InfoQ
bash - variable expansion in curly braces - Stack Overflow
Bash best practices cheat-sheets
Bash for NLP tutorial, advanced topics · John Hewitt
Bash for NLP tutorial, basics · John Hewitt
Bash Functions Linuxize
Bash Reference Manual
Basic Tutorial — Cython 3.0.11 documentation
BCEWithLogitsLoss — PyTorch 2.3 documentation
Beam Search Decoding in CTC-trained Neural Networks by Harald Scheidl Towards Data Science
BeEF - The Browser Exploitation Framework Project
Bellman equation - Wikipedia
Best practices for Dockerfile instructions Docker Docs
bfloat16 floating-point format - Wikipedia
BFloat16 The secret to high performance on Cloud TPUs Google Cloud Blog
BIG-benchbigbenchbenchmark_tasksREADME.md at main · googleBIG-bench
Binary search tree - Wikipedia
Blog peterbloem.nl
Bluesky tops 20M users, narrowing gap with Instagram Threads TechCrunch
Boltzmann machine - Wikipedia
Branch Cut -- from Wolfram MathWorld
Browser-based vulnerabilities in web applications Infosec
Building an Audience Through Technical Writing Strategies and Mistakes – Answer.AI
BuildKit Docker Docs
Byte-Pair Encoding tokenization - Hugging Face NLP Course
C Preprocessor and Macros
C++ Best Practices Erik Rigtorp
C++ Coding Standards 101 Rules, Guidelines, and Best Practices
C++ Enumeration (enum)
C++ Introduction
C++ reference - cppreference.com
C++ Standard Library headers - cppreference.com
C++ String Splitting Utility - Claude
C++ tutorial for beginners ⚡️ - YouTube
C++ type system
Calculating the Cost of a Google Deepmind Paper - 152334H
Call for Proposals - ELLIS Units European Lab for Learning & Intelligent Systems
Campo Pequeno Espectáculos & Eventos Agenda
Can Llama3.2 Vision be used by researchers in europe rLocalLLaMA
Can the audience dance to this - YouTube
Case study porting chardet to Python 3 - Dive Into Python 3
Categorical Deep Learning - Categorical Deep Learning
Category Theory (Stanford Encyclopedia of Philosophy)
CategoryCreative Commons-licensed films - Wikipedia
CEWithChunkedOutputLoss — torchtune 0.3 documentation
Champalimaud Foundation - Wikipedia
Chat Markup Language ChatML (Preview) - Azure
Chat Templates
Chat with Open Large Language Models
ChatGPT Defeated Doctors at Diagnosing Illness - The New York Times
chatml openai-python
Chemical Oscillations, Waves, and Turbulence SpringerLink
Chessprogramming wiki
Chinchilla’s Death
Chromium Docs - Chrome Security FAQ
Chromium Notes Ninja, a new build system
CIFAR – Convening extraordinary minds to address the most important questions facing science and humanity.
Clickjacking Attacks How to Detect and Prevent Ping Identity
Cloud GPUs The Best Servers, Services & Providers RANKED!
Codec SUPERB
CodeSearchNet by GitHub
Codestral Mamba Mistral AI Frontier AI in your hands
Collaborative filtering - Wikipedia
College students used Meta’s smart glasses to dox people in real time - The Verge
Command line interface - PyMuPDF 1.24.10 documentation
Command-R RAG at Production Scale
Communication Between Processes - Python Module of the Week
Computer Speed, CPU Cache, RAM Types - ChatGPT
Conda and the libmamba solver Roll-out plan 2023 conda.org
Conditional Probing and Usable Information · John Hewitt
Configuration - pytest documentation
Construct an envelope function for the acceptance-rejection method - The DO Loop
Continued musing on DPO – Kyunghyun Cho
Cookbook — ocrmypdf 16.5.0 documentation
Cookbook — ocrmypdf 16.5.1.dev1+g0e4cce2 documentation
Copy & Paste in Vim Vi
Copy-on-write - Wikipedia
Copyleft - Wikipedia
Coriolanus National Theatre
Coriolanus - Wikipedia
corte.si
CoVoST V2 Expanding the largest, most diverse multilingual speech-to-text translation dataset
Create and manage a repository
Cross-Attention in Transformer Architecture
CTC forced alignment API tutorial — Torchaudio 2.2.0.dev20240509 documentation
CUDA C++ Programming Guide
CUDA Cores vs. Tensor Cores – Which One is Right for Machine Learning
Cunningham's Law - Meta
Cython - an overview — Cython 3.0.11 documentation
D Mixed Precision Training Difference between BF16 and FP16 rMachineLearning
DALL·E Mega - Training Journal dalle-mini – Weights & Biases
Daniel McNamee Champalimaud Foundation
Data files Configuration
Data Science Academic Engagement Programs Fellowships and Grants
Data Science Ph.D. Fellowship Bloomberg LP
DataCrunch wants to be Europe's first AI cloud hyperscaler — powered by renewable energy TechCrunch
Dataset features
DataStructures-Algorithms This repo contains links of resources, theory subjects content and DSA questions & their solution for interview preparation from different websites like geeksforgeeks, leetcode, etc.
Dates and Venues – ACL Rolling Review – An initiative of the Association for Computational Linguistics
Decoded GNU coreutils – MaiZure's Projects
Deep dive conda init and activate — conda 4.13.0 documentation
DEF CON 32 - Inside the FBI’s Secret Encrypted Phone Company ‘Anom’ - Joseph Cox - YouTube
Dependency Resolution - pip documentation v24.2
Derivation of the Least Squares Estimator for Beta in Matrix Notation Economic Theory Blog
Derivation of the Least Squares Estimator for Beta in Matrix Notation – Proof Nr. 1 Economic Theory Blog
Designing and Interpreting Probes · John Hewitt
DevOps observability What is it and how to implement it
Difference Between Makefile.am and Makefile.in Baeldung on Linux
Diffusion Meets Flow Matching
Disk Usage Guidelines for SARDINE Servers · deep-spinwiki Wiki
docker buildx build Docker Docs
Docker Tips Install Package from a Private Git Repository - Siv Scripts
Dockerfile reference Docker Docs
Doing RAG Vector search is not enough
dotAI 2024 - Neil Zeghidour - Multimodal language models - YouTube
DOW 30
Download Llama
Download Llama - Meta Llama 2 Community License Agreement
Download Llama - Terms and Conditions - Meta Llama 3 Community License Agreement
Download Llama 3.2
DSA Hash Tables
DSA Stacks
Dynamic Sparsity in Machine Learning NeurIPS 2024 Tutorial
earth a global map of wind, weather, and ocean conditions
East West Street by Philippe Sands review – putting genocide into words Biography books The Guardian
Einops
Elaborative encoding - Wikipedia
ElevenLabs Releases New Voice AI Products and Raises $80M Series B
ELLIS Institutes Whitepaper European Lab for Learning & Intelligent Systems
Elon Musk has been in regular contact with Putin for two years, says report Elon Musk The Guardian
Embeddings - OpenAI API
Empathic Voice Interface (EVI) — Hume API
Encoding of speech in convolutional layers and the brain stem based on language experience Scientific Reports
End-to-End Workflow with torchtune — torchtune 0.3 documentation
Energy-based model - Wikipedia
Energy-based Models
Ensuring AI innovation in Europe Open letter to EU policymakers
European capital greenness evaluation
Evaluating speech features with the Minimal-Pair ABX task - Elicit Extraction
Evaluation in information retrieval
Everything about Distributed Training and Efficient Finetuning Sumanth's Personal Website
Exa Web API for AI
Explainable artificial intelligence - Wikipedia
Explained Multi-head Attention (Part 1)
Explaining Docker Image IDs · Adventures in a Cloud Native Landscape
Exploring Massively Multilingual, Massive Neural Machine Translation – Google Research Blog
Extracting Clear-Text Credentials Directly From Chromium’s Memory
facebookwav2vec2-large-960h-lv60-self · Hugging Face
Factorial Funds Under The Hood How OpenAI's Sora Model Works
fairseqexamplesmms at main · facebookresearchfairseq
fairseqexampleswav2vecREADME.md at main · facebookresearchfairseq
FAQ LAION
Farfetch vs Revolut vs Capgemini - Compare career levels across companies with Levels.fyi
Fast and Expressive LLM Inference with RadixAttention and SGLang LMSYS Org
fast.ai – AdamW and Super-convergence is now the fastest way to train neural nets
fast.ai – fast.ai—Making neural nets uncool again
fd An Alternative to the Linux find Command Baeldung on Linux
Feature Visualization
Features Cursor - The AI-first Code Editor
Feedzai - Wikipedia
Festvox CMU_ARCTIC Databases
Finding Syntax with Structural Probes · John Hewitt
Fine-tuning How-to guides
FineWeb decanting the web for the finest text data at scale - a Hugging Face Space by HuggingFaceFW
Fixes to the ls . operation not permitted error message
Fixing DPO but I have a dinner reservation … – Kyunghyun Cho
Flash-Decoding for long-context inference PyTorch
FNV Hash
Forced alignment for multilingual data — Torchaudio 2.2.0.dev20240214 documentation
Forced Alignment with Wav2Vec2 — Torchaudio 0.10.0 documentation
Formats · tmuxtmux Wiki
From PyTorch DDP to Accelerate to Trainer, mastery of distributed training with ease
From The Sky Swamps - YouTube
Full list of Booker Prize winners, shortlisted and longlisted authors and their books The Booker Prizes
Fused Softmax — Triton documentation
General Usage · deep-spinwiki Wiki
Generalized Language Models Lil'Log
Geohashing – xkcd
Geomatics - Wikipedia
Get started with development Containers in Visual Studio Code
Get Started With Tmux - Sunaina Pai
Getting Started · tmuxtmux Wiki
Getting started with Vim The basics Opensource.com
GH Archive
Ghostscript
Git - Git Hooks
Git - Rerere
Git Large File Storage Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enter
Git Reflog - How To Use Git Reflog W3Docs Online Git Tutorial
Git submodule Atlassian
GitHub Copilot in VS Code cheat sheet
GitHub Copilot overview
Glossary - HuggingFace Tokenizers
Glossary - Teach Me Audio
Glossary — Python 3.12.4 documentation
GLUE Benchmark
Gónô Tmutul Building A House Of Stories on Vimeo
Google AI PaLM 2 – Google AI
Google Announces 200M Parameter AI Forecasting Model TimesFM - InfoQ
Google Password Manager vs. 1Password r1Password
google-researchtuning_playbook A playbook for systematically maximizing the performance of deep learning models.
GPT-4 architecture, datasets, costs and more leaked
GPT-4o mini advancing cost-efficient intelligence OpenAI
GPT-5 Everything You Need to Know - by Alberto Romero
Grokking Diffusion Models – Non_Interactive – Software & ML
Guide to Expectation Maximization Algorithm Built In
Gumbel Softmax Loss Function Guide + How to Implement it in PyTorch
Gumbel-Softmax - Niansong Zhang
Hamilton–Jacobi–Bellman equation - Wikipedia
Hamiltonian mechanics - Wikipedia
Handbook of Markov Chain Monte Carlo
Has anyone tried TalkPal AI rlearnfrench
Heap Data Structure - GeeksforGeeks
Heap Data Structure Binary Heap, Time Complexity & Explanation
Hello OLMo A truly open LLM. As the world races to deploy AI models… by AI2 Feb, 2024 AI2 Blog
Hexagonal Grids
HfApi Client
Hidden Changes in GPT-4, Uncovered dmicz devblog
Highlights from Machine Translation and Multilinguality in December 2023 and January 2024 Jindřich’s blog
Highlights from Machine Translation and Multilinguality in February 2024 Jindřich’s blog
Hijacking Safetensors Conversion on Hugging Face HiddenLayer
Hilary Woods w Gabriel Ferrandini & Oliver Turvey ⟡ Tomé Silva - Galeria Zé dos Bois
Holistic Evaluation of Language Models (HELM)
Hollywood stars’ estates agree to the use of their voices with AI
Home - F6S Innovation
Homepage — Essentia 2.1-beta6-dev documentation
Horizon Europe - European Commission
How do I create a custom domain email rtechsupport
How I learned to code in 3 months (and got several offers) - YouTube
How is LLaMa.cpp possible
How People Create and Destroy Value with Generative AI BCG
How Threads will integrate with the Fediverse – plasticbag.org
How to Become a Machine Learning Engineer Complete Career Path Glassdoor
How to Change Folder Color on Mac
How To Checkout Git Tags – devconnected
How to choose a career Prospects.ac.uk
How to Contribute to Open Source Open Source Guides
How To Cross-Compile ClangLLVM using ClangLLVM — LLVM 20.0.0git documentation
How to Disconnect After Running a nohup Command Over SSH Baeldung on Linux
How to prefetch data when processing with GPU - PyTorch Forums
How to save memory by fusing the optimizer step into the backward pass — PyTorch Tutorials 2.4.0+cu121 documentation
How to set up SSH Public-key Authentication to Connect to a Remote Server - SnapShooter Tutorials
How to Train Your Robot
How to use `git grep`
How to Use Command Line Arguments in a Bash Script Baeldung on Linux
How to Use PostgreSQL in Python
How to Use the less Command on Linux
HuBERT Explained by Miguel Aspis Dev Genius
HuBERT Speech representations for recognition & generation
Hugging Face Datasets Process
Hugging Face Evaluate - A quick tour
Hugging Face Transformers Weights & Biases Documentation
huggingfacediarizers
huggingfacedistil-whisper Distilled variant of Whisper for speech recognition. 6x faster, 50_ smaller, within 1_ word error rate.
HUMAN VOICE FREQUENCY RANGE - SEA
Hunter Biden’s criminal conviction is good for nobody politically
I can now run a GPT-4 class model on my laptop
I-XRAY - Google Docs
Idea List - Ishan's Cafe
IEOR E4525 Machine Learning for OR & FE - Martin Haugh
Il cielo in una stanza (album) - Wikipedia
Illustrating Reinforcement Learning from Human Feedback (RLHF)
Imagen on Vertex AI (image Generative AI) overview Google Cloud
Imgur - Wikipedia
In “Triangle of Sadness,” the Crudity Is the Point The New Yorker
Inference Mode — PyTorch main documentation
InfiniBand - Wikipedia
Initializing New Word Embeddings for Pretrained Language Models · John Hewitt
Input Sequences - HuggingFace Tokenizers
Inside an AI Training for Doctors
Inside the U.S. Government-Bought Tool That Can Track Phones at Abortion Clinics
Insight. Selection Sort Vs Insertion Sort
Interfaces for Explaining Transformer Language Models – Jay Alammar – Visualizing machine learning one concept at a time.
Internal links - Obsidian Help
InternVL2
Introducing a foundational multimodal model for speech translation
Introducing Gemini Google’s most capable AI model yet
Introducing hertz-dev - Standard Intelligence
Introducing Jamba AI21's Groundbreaking SSM-Transformer Model
Introducing Llama 3.1 Our most capable models to date
Introducing Meta Llama 3 The most capable openly available LLM to date
Introducing the next generation of Claude Anthropic
Introducing the Sixth Cohort of Bloomberg Data Science Ph.D. Fellows (2023-2024) Bloomberg LP
Introducing Voicebox The first generative AI model for speech to generalize across tasks with state-of-the-art performance
Introducing Voicebox The Most Versatile AI for Speech Generation Meta
Introducing Whisper
Introducing Whisper OpenAI
Introduction W&B Weave
Introduction - The best open source AI powered answer engine.
Introduction to EM Gaussian Mixture Models
Introduction to fzf command Baeldung on Linux
Introduction to ggml
Introduction to gRPC gRPC
Introduction to Information Retrieval
Introduction to K-D Trees Baeldung on Computer Science
Introduction to the Binary Tree Data Structure Baeldung on Computer Science
Introduction to the Fourier Transform
Intuitive understanding of MFCCs. The mel frequency cepstral coefficients… by Emmanuel Deruty Medium
Is Google Password Manager Safe in 2024
Is your master’s degree useless
ISBN - Wikipedia
ISCA Archive
It Is Now Legal to Hack McFlurry Machines (and Medical Devices) to Fix Them
Italy blocks Gutenberg book publishing website OONI
Ivan Vulić
J. W. J. Williams - Wikipedia
Jack Parker-Holder
Jane Street Real-Time Market Data Forecasting Kaggle
Jobs are changing, so should education, Royal Society (2019) - MEI
Joe Biden abused a medieval power to pardon his son
JOREK non-linear MHD Code
Josh Meyer's Website
Jupyter Notebook Example
Just Be Bored, and You'll Level Up - YouTube
k-d tree - Wikipedia
K-PAX - Wikipedia
Kaldi Kaldi
Kaldi The build process (how Kaldi is compiled)
Katz's back-off model - Wikipedia
Keyboard shortcut to jump between words in iTerm2 - Upendar Gareri - Medium
Kyutai Open Sources Moshi A Real-Time Native Multimodal Foundation AI Model that can Listen and Speak - MarkTechPost
Laion coco 600M synthetic captions from Laion2B-en LAION
LAION2B Dataset
Lakh - Wikipedia
Language models for information retrieval
Lara Launch - The Power of Languages - Translated
Large Transformer Model Inference Optimization Lil'Log
Large-scale neurophysiology and single-cell profiling in human neuroscience Nature
Layer Normalization Explained Papers With Code
LayerNorm — PyTorch 2.4 documentation
Layoffs.fyi appears to show the tide is turning in the UK rcscareerquestionsuk
Le Frecce - Wikipedia
Lee Kuan Yew - Wikipedia
Lernapparat - Machine Learning
Levels of Processing model - Wikipedia
Lexman Artificial Podcast
Libri-light
libsndfile
Linux Tutorial - Static, Shared Dynamic and Loadable Linux Libraries
Lisbon for Runners A Guide to Running in Lisbon - Portugalist
List 100 - Huyen Chip
List of Datasets for Automatic Speech Recognition (ASR) and Text To Speech Synthesis (TTS)
List of films in the public domain in the United States - Wikipedia
List Parquet files
Live Music In London, Karaoke Colours Nightclub
LLaMA 1 vs LLaMA 2 A Deep Dive into Meta’s LLMs
Llama 3 Model Cards and Prompt formats
Llama 3.2 Model Cards and Prompt formats
Llama 3.2 Acceptable Use Policy
Llama 3.2 Revolutionizing edge AI and vision with open, customizable models
LLaMA Now Goes Faster on CPUs
llama-modelsmodelsllama3_2MODEL_CARD.md at main · meta-llamallama-models
LlamaIndex - LlamaIndex
llamaMODEL_CARD.md at main · meta-llamallama
LLM Inference Series 3. KV caching explained by Pierre Lienhart Medium
LLM Inference Series 4. KV caching, a deeper look by Pierre Lienhart Medium
LLM Parameter Counting kipply's blog
LLM.int8() and Emergent Features — Tim Dettmers
LMSys Chatbot Arena Leaderboard - a Hugging Face Space by lmsys
Logistic regression - Wikipedia
LORA(Low Rank Adaptation) A Deeper Dive Rajan Ghimire
Losing it Film The Guardian
LxMLS 2024 - The 14th Lisbon Machine Learning Summer School
Lyapunov function - Wikipedia
Machine Learning Engineer Career Guide (2024) by Careervira Medium
Machine Learning LLMVLM Training and Engineering by by Stas Bekman
Machines of Caring Grace - Boston Review
Macros and its types in C - GeeksforGeeks
Main classes
Making Sense of Hexdump SUSE Communities
MAL software saved “Revolver” mix – The Daily Beatle
Mamba - a replacement for Transformers - YouTube
Managing ArXiv RSS Feeds in Emacs Chris Cundy
Mandatory Premarital HIV Testing Political Exploitation of the AIDS Epidemic — Tulane Law Review
Mass X-odus professionals desert Elon Musk’s network
Matching CUDA arch and CUDA gencode for various NVIDIA architectures - Arnon Shimoni
MathΣtral Mistral AI Frontier AI in your hands
matrices - How to rotate the positions of a matrix by 90 degrees - Mathematics Stack Exchange
Matrix decomposition - Wikipedia
Matrix decompositions and latent semantic indexing
mattdesl
Max-Heapify A Binary Tree Baeldung on Computer Science
Maximizing training throughput using PyTorch FSDP PyTorch
Maximum cut and related problems - Proofs, beliefs and algorithms through the lens of Sum of Squares
Maximum subarray problem - Wikipedia
Media type - Wikipedia
MediaPipe Holistic — Simultaneous Face, Hand and Pose Prediction, on Device – Google Research Blog
Medical Algorithms Are Failing Communities Of Color Health Affairs
Medusa Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads — Together AI
Meet Your New Assistant Meta AI, Built With Llama 3 Meta
Mel Frequency Cepstral Coefficient (MFCC) tutorial - Practical Cryptography
Memory-mapped files | .NET
Meta AI Research Topic - No Language Left Behind
Meta fires staff for abusing $25 meal credits
Meta is getting ready for post-quantum cryptography - Engineering at Meta
Meta lays off employees across multiple teams TechCrunch
Meta PyTorch Team 2024 H2 Roadmaps - PyTorch Developer Mailing List
Meta Rolls Out Multimodal Llama 3.2 — But Not in Europe - Slator
Meta won't bring future multimodal AI models to EU
meta-llamaLlama-3.2-11B-Vision · Why EXACTLY this model is not available in Europe
Microsoft joins OpenAI’s board with Sam Altman officially back as CEO - The Verge
MIME types (IANA media types) - HTTP MDN
Min Heap in Python - GeeksforGeeks
Minerva Solving Quantitative Reasoning Problems with Language Models
Mistral NeMo Mistral AI Frontier AI in your hands
MIT 6.S091 Introduction to Deep Reinforcement Learning (Deep RL) - YouTube
MIT CSAIL Spoken Language Systems Group - Publications
Mixtral of experts Mistral AI Open-weight models
Mixture of Experts Explained
MLOps Basics Week 3 Data Version Control - DVC – Raviraja's Blog
MLOps Basics Week 4 Model Packaging - ONNX – Raviraja's Blog
MLOps Basics Week 6 CICD - GitHub Actions – Raviraja's Blog
MLOps Basics Week 7 Container Registry - AWS ECR – Raviraja's Blog
MLOps guide
mmap — Memory-mapped file support — Python 3.12.7 documentation
Mnemonic - Wikipedia
Models and libraries - Meta AI
MosaicBERT Pretraining BERT from Scratch for $20 Databricks Blog
moshi.chat
Motivation & Vision - Thorsten Voice
Movie, Release date between 1993-01-23 and 2024-08-21, Number of votes at least 5000 (Sorted by User rating Descending)
Mozilla Foundation - Training Data for the Price of a Sandwich
Multimodal Mastery The Qwen Audio Foundation Models for Advanced Audio Understanding and Reasoning by Deepak Babu P R Medium
MultiNLI
Multiprocessing VS Threading VS AsyncIO in Python - Lei Mao's Log Book
MuST-C a multilingual corpus for speech translation by Mattia Di Gangi Machine Translation @ FBK Medium
Mutable vs Immutable Objects - ChatGPT
My deep learning rig – Non_Interactive – Software & ML
My French colleague used the work finitions. What could he be mistranslating We speak Italian as well, so consider mistranslations from Italian too
Named entity recognition NLP-progress
Named entity recognition with Bert
Navigating the Challenges and Opportunities of Synthetic Voices
Nearly-Optimal Mergesorts Fast, Practical Sorting Methods That Optimally Adapt to Existing Runs
Neural encoding of sound - Wikipedia
New embedding models and API updates OpenAI
New LLM Pre-training and Post-training Paradigms
New open source field of study classifier S2FOS AI2 Blog
Nick Bostrom - Wikipedia
NIGHTMARE ON ELM DRIVE Vanity Fair October 1990
Nike + Run Club Lisboa – NiT
Ninja, a small build system with a focus on speed
NLP’s word2vec Negative Sampling Explained Baeldung on Computer Science
NLTK Sample usage for wordnet
Noisy speech database for training speech enhancement algorithms and TTS models
Now and Then (Beatles song) - Wikipedia
NVIDIA CUDA Compiler Driver
NVIDIA RTX 3090 vs RTX A6000 Consumer vs. Professional
Nvidia-backed CoreWeave picks up $650 million credit line
NYU Computer Science Department
Okapi BM25 - Wikipedia
OLMo Open Language Model. A State-Of-The-Art, Truly Open LLM and… by AI2 Feb, 2024 AI2 Blog
OpenAI gets $4 billion revolving credit line on top of latest funding
OpenAI Introduced Chat Markup Language (ChatML) Based Input To Non-Chat Modes by Cobus Greyling Medium
OpenAI Platform
OpenAI raises at $157 billion valuation; Microsoft, Nvidia join round
OpenAI wants to make a walking, talking humanoid robot smarter Popular Science
OpenAI's board approached Anthropic CEO about top job and merger Reuters
openaiwhisper-large-v3 · Hugging Face
Optimizing AI Inference at Character.AI
Optimizing builds with cache management Docker Docs
Opus (audio format) - Wikipedia
Orthonormal Basis -- from Wolfram MathWorld
OSINT Framework
ōtoro.net
Our next generation Meta Training and Inference Accelerator
Over 1.5 TB’s of Labeled Audio Datasets by Christopher Dossman Towards Data Science
P.862.2 Wideband extension to Recommendation P.862 for the assessment of wideband telephone networks and speech codecs
Paolo Sorrentino - Wikipedia
Paper review Hyena Hierarchy Towards Larger Convolutional Language Models by Andrew Lukyanenko Medium
Parallel Thread Execution - Wikipedia
Password Managers.
Patronus AI Introducing CopyrightCatcher, the first Copyright Detection API for LLMs
PEP 409 – Suppressing exception context peps.python.org
PEP 508 – Dependency specification for Python Software Packages peps.python.org
PEP 3104 – Access to Names in Outer Scopes peps.python.org
PEP 3134 – Exception Chaining and Embedded Tracebacks peps.python.org
Per l’immediato ripristino dell’accesso a Project Gutenberg - AIB WEB
Percent-encoding - Wikipedia
Performance and Scalability How To Fit a Bigger Model and Train It Faster
PhD Students - InDeep - ILLC UvA
Phil Woodland Department of Engineering
Phone (phonetics) - Wikipedia
Phonetics - Wikipedia
Phonetics vs. Phonology
Picterra - Geospatial AI solutions for a sustainable future
Pigz – Compress And Decompress Files In Parallel In Linux
Play and Record Sound with Python — python-sounddevice, version 0.5.1
PortAudio - an Open-Source Cross-Platform Audio API
Postgres.app – the easiest way to get started with PostgreSQL on the Mac
Practical Cryptography
Pre-trained models for text-to-speech - Hugging Face Audio Course
Premarital medical examination - Wikipedia
Private use area (PUA) characters and End-user-defined characters (EUDCs)
Procedural Knowledge in Pretraining Drives LLM Reasoning Laura’s AI research blog
Proofs, beliefs and algorithms through the lens of Sum of Squares
Protocol Buffers Documentation
Publications Hosein Mohebbi
Purple Llama CyberSecEval A benchmark for evaluating the cybersecurity risks of large language models Research - AI at Meta
Pushing the frontiers of audio generation - Google DeepMind
pyannotepyannote-audio Neural building blocks for speaker diarization speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
python - AdamW and Adam with weight decay - Stack Overflow
Python Dictionaries are Ordered now, but how…and why by Pavan Skipo Junior Dev Medium
Python Generated Code Guide Protocol Buffers Documentation
Python internals Arbitrary-precision integer implementation Artem Golubin
Python JSON load() and loads() for JSON Parsing
Python Linked List - GeeksforGeeks
PyTorch internals ezyang’s blog
Q-Former. The ability to seamlessly integrate and… by Abdulkader Helwan Dec, 2023 Medium
Q-Former. The ability to seamlessly integrate and… by Abdulkader Helwan Medium
rabbit failed to properly reset all keys emails can be sent from rabbit.tech domains
Rank (linear algebra) - Wikipedia
Ranking All 108 GNULinux Coreutils Commands - GNU Coreutils Tier List - YouTube
Read Spotify (SPOT) CEO Daniel Ek's full memo on latest layoffs
Redpajama-Data-v2 is Incredible rLocalLLaMA
RedTeam Arena
Refactoring - Dive Into Python 3
Relevance in keyword search (BM25 scoring)
Repository limitations and recommendations
Republic of Venice - Wikipedia
Requirements File Format - pip documentation v23.3.1
Researchers Prove Rabbit AI Breach By Sending Email to Us as Admin
Retrieval Augmented Generation Streamlining the creation of intelligent natural language processing models
Reverse Engineering TicketMaster's Rotating Barcodes (SafeTix)
Revisiting Feature Prediction for Learning Visual Representations from Video Research - AI at Meta
Richard E. Bellman - Wikipedia
Richard Stallman - Wikipedia
Right to Left (R2L) Integer Tokenization
RLHF Reinforcement Learning from Human Feedback
rsrch space
RTX A6000 vs RTX 3090 Deep Learning Benchmarks Lambda
RWKV Open Source Development Blog Substack
Sadhika Malladi
Salary Benchmarking Carta
Sam Altman explains being fired and rehired by OpenAI - The Verge
Sampling for Text Generation
Sarmad Masud - Curtis Brown
Scaling ChatGPT Five Real-World Engineering Challenges
Scaling Monosemanticity Extracting Interpretable Features from Claude 3 Sonnet
Scalpers Reverse-Engineer Ticketmaster's 'Non-Transferrable' Tickets
Schema (psychology) - Wikipedia
Science’s genius complex Dirk Hovy
Scientists on Bluesky - Influential Members of the Science Community
Scientists on Bluesky - What is this
Scope (C++)
SeemlessM4T - Introducing a foundational multimodal model for speech translation
Selection (linguistics) - Wikipedia
Selection Sort Algorithm - GeeksforGeeks
Self-Supervised Representation Learning Lil'Log
Semantic Scholar - Academic Graph API
SemCor – sense-tagged English corpus Sketch Engine
SemEval-2007
SentencePiece Python binding structure - Codeium Chat - fcQnqWJZdoODeNAk78jYFqALIsPDcY20
SentencePiece README
Sentinel value - Wikipedia
Series Funding A, B, and C
Sha (Cyrillic) - Wikipedia
Share a dataset to the Hub
Shared space · deep-spinwiki Wiki
ShareGPT lets you easily share your ChatGPT conversations TechCrunch
Sharing new research, models, and datasets from Meta FAIR
Sharpened Cosine Distance as an Alternative for Convolutions rpisoni.dev
Should I be leaving my Macbook plugged in at 100_ to ensure battery health rmacbookpro
Should I Open Source my Company
SHRDLU
Signal (IPC) - Wikipedia
Signification de Je disparais dans tes bras par Christine and the Queens
Simplified Wrapper and Interface Generator
Slurm Usage Guidelines for SARDINE Servers · deep-spinwiki Wiki
Slurm Workload Manager - Quality of Service (QOS)
Slurm Workload Manager - Quick Start User Guide
Sofía Valdés Flaunt Premiere “Little Did I Know”
Softmax function - Wikipedia
SolidGoldMagikarp (plus, prompt generation) — LessWrong
Something weird is happening with LLMs and chess
Sonal Sannigrahi
Sorting Algorithms Animations Toptal®
SoundStorm Efficient parallel audio generation – Google Research Blog
SoundStream An End-to-End Neural Audio Codec – Google Research Blog
Speculative Sampling Jay Mody
Speech disfluency - Wikipedia
SpeechBrain Open-Source Conversational AI for Everyone
Speeding Up gzip Compression Baeldung on Linux
Spoken Language Modeling - Task 4
Spoken Language Modeling - Task 4 - ZeroSpeech
Spotify’s AI Voice Translation Pilot Means Your Favorite Podcasters Might Be Heard in Your Native Language — Spotify
Stable Code 3B Coding on the Edge — Stability AI
Stable Diffusion 3 Research Paper — Stability AI
Stanford CRFM
Stanford CS236 Deep Generative Models I 2023 I Lecture 11 - Energy Based Models - YouTube
Startups to Follow
stas00ml-engineering Machine Learning Engineering Open Book
stas00the-art-of-debugging The Art of Debugging
State of startup compensation, H2 2023
Staying safe online with our updated Google Password Manager
Stephen Roberts' Home Page
STRIVER DSA SHEET DataStructures-Algorithms
struct — Interpret bytes as packed binary data — Python 3.12.7 documentation
StyleTTS2 – open-source Eleven-Labs-quality Text To Speech Hacker News
Submission Policy - Interspeech 2024
Submissions Transactions of the Association for Computational Linguistics
Summary of the tokenizers
SUPERB Benchmark
SUPERB Benchmark Leaderboard
SuperGLUE Benchmark
Supremum vs Maximum
SynthID - Google DeepMind
T-Shaped People and Academia
TaL Corpus - UltraSuite Repository
Talkpal Review Our Insider Tips and Verdict 2024
Taste the World How Our New Machine Translation Feature Transforms Your Ordering Experience by Ahmad Hamouda & Stefania Russo Medium The Glovo Tech Blog
Tearing Apart Google’s TPU 3.0 AI Coprocessor
Template (C++) - Wikipedia
Tensor Parallelism
Tensor Views — PyTorch 2.3 documentation
Terry Winograd - Wikipedia
Tesseract User Manual tessdoc
Text classification and Naive Bayes
Text-to-speech datasets - Hugging Face Audio Course
Textbooks - Ishan's Cafe
Textless NLP Generating expressive speech from raw audio
The 1_ of scientific publishing Science AAAS
The AI Scientist Towards Fully Automated Open-Ended Scientific Discovery
The AI We Deserve - Boston Review
The Annotated Transformer
The AT Protocol Bluesky
The Basics - PyMuPDF 1.24.10 documentation
The Best GPUs for Deep Learning in 2023 — An In-depth Analysis
The best ways to help others with your career, compared
The Bitter Lesson
The Case for Free Online Books (FOBs) Experiences with Operating Systems Three Easy Pieces From A To RemZi
The Case for Pull Rebase
The Church-Turing Thesis (Stanford Encyclopedia of Philosophy)
The complete beginners guide to dynamic programming - Stack Overflow
The continuing rise in suspensions and exclusions - FFT Education Datalab
The Dark Net Jamie Bartlett Talks at Google - YouTube
The Discrete Cosine Transform in Action
The Editors Protecting Wikipedia from AI Hoaxes
The fastest and easiest way to install Ruby on a Mac in 2024
The first AI model based on Yann LeCun’s vision for more human-like AI
The Fraser Lab Method of Following the Scientific Literature
The Gumbel-Max Trick Explained. Softmax’s slicker sibling. by Leonard Tang The Startup Medium
The Gumbel-Softmax Distribution – Emma Benjaminson – Mechanical Engineering Graduate Student
The History of the Clothes Hanger
The Illustrated Retrieval Transformer – Jay Alammar – Visualizing machine learning one concept at a time.
The KV Cache Memory Usage in Transformers - YouTube
The MAESTRO Dataset
The MAESTRO Dataset and Wave2Midi2Wave
The Nine Books Sam Altman Recommends Everyone Should Read
The Power of Languages
The Problem with Reasoners
The Silence of the Lambs (novel) - Wikipedia
The Stanford Natural Language Inference (SNLI) Corpus
The subprocess Module Wrapping Programs With Python – Real Python
The Technology Behind BLOOM Training
The Textless NLP project
The tool where experts improve AI models
The Transformer Family Lil'Log
The Transformer Family Version 2.0 Lil'Log
The Ultimate Machine Learning Engineer Career Path for 2024
The Unreasonable Syntactic Expressivity of RNNs · John Hewitt
The Winograd Schema Challenge - Ernest Davis, Leora Morgenstern, and Charles Ortiz
The Zero Resource Speech Benchmark (series)
This Is the Data Facebook Gave Police to Prosecute a Teenager for Abortion
Thoughts on Google Password Manager rLastpass
Tiktokenizer
Timeline of the London Underground - Wikipedia
TIMIT Acoustic-Phonetic Continuous Speech Corpus - Linguistic Data Consortium
tmux shortcuts & cheatsheet
tmux(1) - Linux manual page
Tokenizer
Top 30 Cloud GPU Providers & the GPUs They Offer in 2024
torch.Tensor.view — PyTorch 2.3 documentation
torchaudio.pipelines — Torchaudio 2.2.0.dev20240418 documentation
torchtune Easy and Accessible Finetuning in Native PyTorch - Evan Smothers, Meta - YouTube
TorToiSe Architectural Design Doc – Non_Interactive – Software & ML
Toy Models of Superposition
Training a new tokenizer from an old one - Hugging Face NLP Course
Training and fine-tuning large language models - Borealis AI
Transformer Inference Arithmetic kipply's blog
Transformers from scratch peterbloem.nl
Transformers Illustrated!. I was greatly inspired by Jay Alammar’s… by Tamoghna Saha Medium
Transforming the future of music creation - Google DeepMind
Trie - Wikipedia
Truncation Sampling as Language Model Desmoothing · John Hewitt
UAX 44 Unicode Character Database
Ukraine is now struggling to survive, not to win
Understanding FAANG Leveling rleetcode
Understanding GitHub Actions - GitHub Docs
Understanding GPU Memory 1 Visualizing All Allocations over Time PyTorch
Understanding Okapi BM25 A Guide to Modern Information Retrieval - Association of Data Scientists
Understanding SentencePiece (UnderStanding_SentencePiece) by Jacky Medium
Understanding the Python Traceback – Real Python
Unicode Glossary
Unified Transcription and Translation for Extended Reality UTTER
Unit testing - Dive Into Python 3
UnitedHealth uses AI model with 90_ error rate to deny care, lawsuit alleges - Ars Technica
Universal Speech Model (USM) State-of-the-art speech AI for 100+ languages
Unlocking Zero-Resource Machine Translation to Support New Languages in Google Translate – Google Research Blog
unpaper-basic-concepts
Unsupervised speech-to-speech translation from monolingual data
Unsupervised speech-to-speech translation from monolingual data – Google Research Blog
Upload files to the Hub
Uploading datasets
URGENT Challenge
Use the Tools Available · C++ Best Practices
Using AI to find post-quantum cryptography’s vulnerabilities
Using Bluesky posts as blog comments
Using gzip and gunzip in Linux Baeldung on Linux
UTM Virtual machines for Mac
UTM parameters - Wikipedia
V-JEPA The next step toward advanced machine intelligence
Valentini Noisy Speech Database
VCTK
VDTTS Visually-Driven Text-To-Speech – Google Research Blog
Vector Basis -- from Wolfram MathWorld
Vector projection - Wikipedia
Vector space classification
Vector Space Projection -- from Wolfram MathWorld
Vector Space Span -- from Wolfram MathWorld
Versioning and formatting your Python code
Vicuna An Open-Source Chatbot Impressing GPT-4 with 90_ ChatGPT Quality LMSYS Org
Video generation models as world simulators
VizSeq
vLLM Easy, Fast, and Cheap LLM Serving with PagedAttention vLLM Blog
Volsci - Wikipedia
Wassermann Before Wedding Bells Premarital Examination Laws in the United States, 1937–1950 Social History of Medicine Oxford Academic
Wav2vec 2.0 Learning the structure of speech from raw audio
Wav2Vec2 - Model card - Hugging Face
What and Where Are the Memory Stack and Heap Baeldung on Computer Science
What Are Naïve Bayes Classifiers IBM
What can I do with SpeechBrain — SpeechBrain 0.5.0 documentation
What I learned from competing against a ConvNet on ImageNet
What I Wish I Knew When I Was Younger - YouTube
What Is a Machine Learning Engineer (+ How to Get Started) Coursera
What is a Makefile and how does it work Opensource.com
What Is BM25 (Best Match 25) Full Breakdown - Luigi's Box
What Is ChatGPT Doing … and Why Does It Work—Stephen Wolfram Writings
What is collaborative filtering - IBM
What is information retrieval IBM
What is pgvector, and How Can It Help You EDB
What is retrieval-augmented generation IBM Research Blog
What Is SwiGLU How to Implement It And Why Does it Work
What is the difference between FP16 and BF16 Here a good explanation for you by Furkan Gözükara - PhD Computer Engineer, SECourses Medium
What skills do employers want Prospects.ac.uk
What's the Best Language for App Development
who and w commands are not working - Red Hat Customer Portal
Why can TorToiSe be fine-tuned - 152334H
Why can't TorToiSe be fine-tuned - 152334H
Why does everyone sing it like THAT - YouTube
Why does the hostname on my Mac keep changing | Phind
Why I attack
Why Premature Optimization Is the Root of All Evil - Stackify
Why we want insurance executives dead - by Taylor Lorenz
Why your AI Code Completion tool needs to Fill in the Middle
Wideband audio - Wikipedia
Winograd schema challenge - Wikipedia
WinoGrande An Adversarial Winograd Schema Challenge at Scale
With 10x growth since 2023, Llama is the leading engine of AI innovation
With Bluesky, the social media echo chamber is back in vogue
wngloss(7WN) WordNet
Wolfram User Portal inc Activation Keys
Word Sense Disambiguation NLP-progress
WordNet
Workflow syntax for GitHub Actions - GitHub Docs
Write Pythonic and Clean Code With namedtuple – Real Python
Writing Nicholas Carlini
Writing Clean Shell Scripts • Dimitri Merejkowsky
WSTG - v4.1 OWASP Foundation
X-Frame-Options - HTTP MDN
XLM
XLM-R State-of-the-art cross-lingual understanding through self-supervision
XLS-R Self-supervised speech processing for 128 languages
XTTS v2 Notes - Machine Learns
XTTS-v1 technical notes - Machine Learns
YAP.
Yonatan Belinkov
Your Complete Guide to Spotify Wrapped, 2023 TIME
Zero-Shot Tokenizer Transfer for transferring LLMs to a new tokenizer without any training by SACHIN KUMAR Medium
Zero-Shot Translation with Google’s Multilingual Neural Machine Translation System – Google Research Blog
首页 · 魔搭社区
Code
Acceleration
Bash
Build Systems
C
Compilers and Interpreters
Concurrency and Async
Conda
Copilot
CUDA
Data Structures
Databases
Debugging
Development Containers
Distributed and Multi-GPU Training
Git
GitHub Actions
Hugging Face
Linux and Unix
Make
Python
Python Best Practices
PyTorch
Questions
Security
Notes
AI in Society
Bluesky
Cinema
Coding Projects for Development
D3 Health Dashboard
Digital Garden
Electoral Systems
Flights
Graphs Spectral Clustering
Hidden Markov Models
Journocoders
Mental Anchors
Music
Music Understanding and Analysis
Privacy - Staying Secure Online
PyTorch's Transformer and Multi-Head Attention Implementation
Reading with a Motive vs Reading
Speech LLM-based Language Learning
Videography
Volts, Watts, Amps
Worth Following
YouTube Automated Uploader
Papers
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
[PLACEHOLDER] hertz-dev - Standard Intelligence
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
100,000 Podcasts: A Spoken English Document Corpus
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
A Brief Overview of Unsupervised Neural Speech Representation Learning
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
A Closer Look at Few-shot Classification
A Closer Look at Spatiotemporal Convolutions for Action Recognition
A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models
A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
A Comprehensive Survey of Machine Translation Approaches
A Cookbook of Self-Supervised Learning
A Corpus and Evaluation Framework for Deeper Understanding of Commonsense Stories
A Diagnostic Study of Explainability Techniques for Text Classification
A Generalized EigenGame with Extensions to Multiview Representation Learning
A Kernel-Based View of Language Model Fine-Tuning
A Large-Scale Evaluation of Speech Foundation Models
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
A method to convert neural signals into sound sequences
A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained Models
A Neural Algorithm of Artistic Style
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
A Primer on Causal Analysis
A Review of Deep Learning Techniques for Speech Processing
A Review of Sparse Expert Models in Deep Learning
A Simple Framework for Contrastive Learning of Visual Representations
A Suite for Acoustic Language Model Evaluation
A Survey of Large Language Models
A Survey of Mamba
A Survey of Visual Transformers
A Survey on Evaluation of Large Language Models
A Survey on In-context Learning
A Survey on Language Models for Code
A Survey on LLM-as-a-Judge
A Survey on Neural Speech Synthesis
A Survey on Subgraph Counting: Concepts, Algorithms and Applications to Network Motifs and Graphlets
A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition
A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning
A unified architecture for natural language processing: deep neural networks with multitask learning
A Universal Law of Robustness via Isoperimetry
A Watermark for Large Language Models
Accelerating Large Language Model Decoding with Speculative Sampling
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Active Data Curation Effectively Distills Large-Scale Multimodal Models
Active Self-Supervised Learning: A Few Low-Cost Relationships Are All You Need
Adam-mini: Use Fewer Learning Rates To Gain More
Adam: A Method for Stochastic Optimization
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers
Adapting Language Models to Compress Contexts
Adaptive Machine Translation with Large Language Models
Adaptive Prototype Learning and Allocation for Few-Shot Segmentation
Adaptive Semiparametric Language Models
Adaptively Sparse Transformers
AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data
AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios
AdaSpeech: Adaptive Text to Speech for Custom Voice
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations
Advancing the State of the Art in Open Domain Dialog Systems through the Alexa Prize
Adversarial Attacks and Defences: A Survey
Adversarial Feature Learning
Adversarial NLI: A New Benchmark for Natural Language Understanding
AfriMTE and AfriCOMET: Empowering COMET to Embrace Under-resourced African Languages
Agent Skill Acquisition for Large Language Models via CycleQD
AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline
ALBA : Reinforcement Learning for Video Object Segmentation
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming
Alice's Adventures in a Differentiable Wonderland -- Volume I, A Tour of the Land
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
Aligning Speech to Languages to Enhance Code-switching Speech Recognition
Alpaca: A Strong, Replicable Instruction-Following Model
An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition
An Analysis of Energy Consumption and Carbon Footprints of Cryptocurrencies and Possible Solutions
An Attention Free Transformer
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
An Empirical Exploration of Curriculum Learning for Neural Machine Translation
An Empirical Study of Mamba-based Language Models
An Empirical Study of Translation Hypothesis Ensembling with Large Language Models
An Emulator for Fine-Tuning Large Language Models using Small Language Models
An Explanation of In-context Learning as Implicit Bayesian Inference
An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
An Information-Theoretic Analysis of Self-supervised Discrete Representations of Speech
An Integration of Pre-Trained Speech and Language Models for End-to-End Speech Recognition
An introduction to graph theory
Analyzing Context Contributions in LLM-based Machine Translation
Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Apple Intelligence Foundation Language Models
Architectures of Topological Deep Learning: A Survey on Topological Neural Networks
Are All Good Word Vector Spaces Isomorphic?
Are discrete units necessary for Spoken Language Modeling?
Arithmetic coding for data compression
Artificial Kuramoto Oscillatory Neurons
ASIF: Coupled Data Turns Unimodal Models to Multimodal Without Training
Associative Embedding: End-to-End Learning for Joint Detection and Grouping
Attention as a Guide for Simultaneous Speech Translation
Attention Is All You Need
Attention-Based Models for Speech Recognition
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs
AudioGen: Textually Guided Audio Generation
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
AudioLM: a Language Modeling Approach to Audio Generation
AudioPaLM: A Large Language Model That Can Speak and Listen
Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling
Augmented Language Models: a Survey
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Auto-Encoding Variational Bayes
Avocodo: Generative Adversarial Network for Artifact-free Vocoder
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
Bag of Tricks for Efficient Text Classification
Balancing, Regression, Difference-In-Differences and Synthetic Control Methods: A Synthesis
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Bayesian Measures of Model Complexity and Fit
Benchmarking Attacks on Learning with Errors
BERT Learns to Teach: Knowledge Distillation with Meta Learning
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERTScore: Evaluating Text Generation with BERT
BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
Better & Faster Large Language Models via Multi-token Prediction
Better Instruction-Following Through Minimum Bayes Risk
Better speech synthesis through scaling
Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures
Beyond Left and Right: The Role of System Trust in COVID-19 Attitudes and Behaviors
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents
Big Bird: Transformers for Longer Sequences
Big Self-Supervised Models are Strong Semi-Supervised Learners
Billion-scale semi-supervised learning for image classification
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Blockwise Parallel Decoding for Deep Autoregressive Models
Boosting Distributed Training Performance of the Unpadded BERT Model
Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning
Bootstrap your own latent: A new approach to self-supervised Learning
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition
Building Machine Translation Systems for the Next Thousand Languages
ByT5 model for massively multilingual grapheme-to-phoneme conversion
Byte Pair Encoding is Suboptimal for Language Model Pretraining
Can Automatic Metrics Assess High-Quality Translations?
Can language models learn from explanations in context?
Can Large Language Models Reason and Plan?
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
Can Whisper Perform Speech-Based In-Context Learning?
Canonical Capsules: Self-Supervised Capsules in Canonical Pose
Careless Whisper: Speech-to-Text Hallucination Harms
Cascade versus Direct Speech Translation: Do the Differences Still Make a Difference?
Categorical Reparameterization with Gumbel-Softmax
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data
CDXFormer: Boosting Remote Sensing Change Detection with Extended Long Short-Term Memory
Cem Mil Podcasts: A Spoken Portuguese Document Corpus For Multi-modal, Multi-lingual and Multi-Dialect Information Access Research
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting for Speech Translation
Character-Aware Neural Language Models
ChatMusician: Understanding and Generating Music Intrinsically with LLM
ChipNeMo: Domain-Adapted LLMs for Chip Design
CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition
Clotho: An Audio Captioning Dataset
CMU's IWSLT 2024 Simultaneous Speech Translation System
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Cockpit: A Practical Debugging Tool for the Training of Deep Neural Networks
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
CodeRAG-Bench: Can Retrieval Augment Code Generation?
CodeSearchNet Challenge: Evaluating the State of Semantic Code Search
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
COMET-22: Unbabel-IST 2022 Submission for the Metrics Shared Task
COMET: A Neural Framework for MT Evaluation
CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task
Common Voice: A Massively-Multilingual Speech Corpus
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
Compact Speech Translation Models via Discrete Speech Units Pretraining
Comparative layer-wise analysis of self-supervised speech models
Comparing Discrete and Continuous Space LLMs for Speech Recognition
Competence-based Curriculum Learning for Neural Machine Translation
Computational Optimal Transport
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Condita: A state machine like architecture for multimodal task bots
Conditional Image Generation with PixelCNN Decoders
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Confidence-Aware Scheduled Sampling for Neural Machine Translation
Confident Adaptive Language Modeling
Conformal Prediction for Natural Language Processing: A Survey
Conformer: Convolution-augmented Transformer for Speech Recognition
Connecting Speech Encoder and Large Language Model for ASR
Connecting the Dots: Evaluating Abstract Reasoning Capabilities of LLMs Using the New York Times Connections Word Game
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
ConSeC: Word Sense Disambiguation as Continuous Sense Comprehension
Context Encoders: Feature Learning by Inpainting
Context Encoding for Semantic Segmentation
Context-aware Neural Machine Translation for English-Japanese Business Scene Dialogues
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models
Continuous Speech Tokenizer in Text To Speech
Contrastive language and vision learning of general fashion concepts
Contrastive Language-Image Pre-training for the Italian Language
Contrastive Learning with Hard Negative Samples
Contrastive Multiview Coding
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
Contrastive Representation Learning: A Framework and Review
Controllable Speech Representation Learning Via Voice Conversion and AIC Loss
Controlling Neural Networks with Rule Representations
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought
Counterfactual harm
CoVoST 2 and Massively Multilingual Speech-to-Text Translation
CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus
CroissantLLM: A Truly Bilingual French-English Language Model
Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models
Cross-lingual Language Model Pretraining
Cryptanalytic Extraction of Neural Network Models
CTC-based Compression for Direct Speech Translation
CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Current Limitations of Language Models: What You Need is Retrieval
CVSS Corpus and Massively Multilingual Speech-to-Speech Translation
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models
DASB - Discrete Audio and Speech Benchmark
Data Augmentation Approaches in Natural Language Processing: A Survey
Data Augmenting Contrastive Learning of Speech Representations in the Time Domain
Data Selection for Language Models via Importance Resampling
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
Decoding speech perception from non-invasive brain recordings
Decoupled Weight Decay Regularization
Deep Biaffine Attention for Neural Dependency Parsing
Deep Clustering for Unsupervised Learning of Visual Features
Deep contextualized word representations
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond
Deep Learning with Differential Privacy
Deep Mask Memory Network with Semantic Dependency and Context Moment for Aspect Level Sentiment Classification
Deep Neural Networks and Tabular Data: A Survey
Deep reinforcement learning from human preferences
Deep Residual Learning for Image Recognition
Deep Voice 2: Multi-Speaker Neural Text-to-Speech
Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning
Deep Voice: Real-time Neural Text-to-Speech
DeepGaze II: Reading fixations from deep features trained on object recognition
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
DeepMMSE: A Deep Learning Approach to MMSE-Based Noise Power Spectral Density Estimation
DeepSpace: Dynamic Spatial and Source Cue Based Source Separation for Dialog Enhancement
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021
DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory
DEMix Layers: Disentangling Domains for Modular Language Modeling
Dense Associative Memory for Pattern Recognition
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models
Depthwise Convolution is All You Need for Learning Multiple Visual Domains
Describing Multimedia Content using Attention-based Encoder--Decoder Networks
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Did Translation Models Get More Robust Without Anyone Even Noticing?
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Direct speech-to-speech translation with a sequence-to-sequence model
Direct speech-to-speech translation with discrete units
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction
Discrete Latent Structure in Neural Networks
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
Disentangling Textual and Acoustic Features of Neural Speech Representations
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT
Distilling the Knowledge in a Neural Network
Distributed Representations of Words and Phrases and their Compositionality
DM-Codec: Distilling Multimodal Representations for Speech Tokenization
DMDSpeech: Distilled Diffusion Model Surpassing The Teacher in Zero-shot Speech Synthesis via Direct Metric Optimization
dMel: Speech Tokenization made Simple
DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors
DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Do Context-Aware Translation Models Pay the Right Attention?
DOCE: Finding the Sweet Spot for Execution-Based Code Generation
Does Simultaneous Speech Translation need Simultaneous Models?
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation
Don't Decay the Learning Rate, Increase the Batch Size
Don't Discard Fixed-Window Audio Segmentation in Speech-to-Text Translation
Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization
DRAW: A Recurrent Neural Network For Image Generation
Dropout: A Simple Way to Prevent Neural Networks from Overfitting
DTrOCR: Decoder-only Transformer for Optical Character Recognition
E-Branchformer: Branchformer with Enhanced merging for speech recognition
Ecco: An Open Source Library for the Explainability of Transformer Language Models
Effective Approaches to Attention-based Neural Machine Translation
Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform
Efficient Estimation of Word Representations in Vector Space
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Efficient Methods for Natural Language Processing: A Survey
Efficient Neural Audio Synthesis
Efficient Parallel Audio Generation using Group Masked Language Modeling
Efficient Pre-training for Localized Instruction Generation of Videos
Efficient Representation Learning via Adaptive Context Pooling
Efficient softmax approximation for GPUs
Efficient Stagewise Pretraining via Progressive Subnetworks
Efficient Tool Use with Chain-of-Abstraction Reasoning
Efficient Training of Language Models to Fill in the Middle
Efficient Transformers: A Survey
Efficient Visual Pretraining with Contrastive Detection
Efficiently Programming Large Language Models using SGLang
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Elucidating the Design Space of Diffusion-Based Generative Models
Embarrassingly Easy Document-Level MT Metrics: How to Convert Any Pretrained Metric Into a Document-Level Metric
Emergent and Predictable Memorization in Large Language Models
Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task
Emerging Properties in Self-Supervised Vision Transformers
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
EMMeTT: Efficient Multimodal Machine Translation Training
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models
Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Encoding of speech in convolutional layers and the brain stem based on language experience
Encoding sound in the cochlea: from receptor potential to afferent discharge
End-to-End Simultaneous Speech Translation with Differentiable Segmentation
End-to-End Speech Recognition: A Survey
End-to-End Speech-to-Text Translation: A Survey
Energy and Policy Considerations for Deep Learning in NLP
Enhanced Hallucination Detection in Neural Machine Translation through Simple Detector Aggregation
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
Enriching Word Vectors with Subword Information
Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum Bayes Risk Decoding for Machine Translation
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
ESPnet-ST: All-in-One Speech Translation Toolkit
Estimating the Completeness of Discrete Speech Units
Estimating Training Data Influence by Tracing Gradient Descent
Estimation of Non-Normalized Statistical Models by Score Matching
ETC: Encoding Long and Structured Inputs in Transformers
EuroLLM: Multilingual Language Models for Europe
Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization
Europarl-ST: A Multilingual Corpus For Speech Translation Of Parliamentary Debates
Evaluating Frontier Models for Dangerous Capabilities
Evaluating Language Model Agency through Negotiations
Evaluating language models as risk scores
Evaluating Large Language Models Trained on Code
Evaluation data contamination in LLMs: how do we measure it and (when) does it matter?
Evolution through Large Models
Explainability Via Causal Self-Talk
Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning
Exploring Simple Siamese Representation Learning
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Exploring the Benefits of Tokenization of Discrete Acoustic Units
Exploring the Limits of Language Modeling
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis
Extracting Training Data from Diffusion Models
Extracting Training Data from Large Language Models
Extraction of Salient Sentences from Labelled Documents
Extreme Masking for Learning Instance and Distributed Visual Representations
fairseq S2T: Fast Speech-to-Text Modeling with fairseq
Falcon2-11B Technical Report
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Fast and Vectorizable Alternative to Binary Search in O(1) Applicable to a Wide Domain of Sorted Arrays of Floating Point Numbers
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
Fast Inference from Transformers via Speculative Decoding
Fast Model Editing at Scale
Fast Transformer Decoding: One Write-Head is All You Need
FastPitch: Parallel Text-to-speech with Pitch Prediction
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
FastSpeech: Fast, Robust and Controllable Text to Speech
Fauno: The Italian Large Language Model that will leave you senza parole!
Federated Learning: Strategies for Improving Communication Efficiency
FEVER: a large-scale dataset for Fact Extraction and VERification
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Fine-tuning Language Models for Factuality
Finetuned Language Models Are Zero-Shot Learners
Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models
Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models
Flamingo: a Visual Language Model for Few-Shot Learning
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech
Flow Matching for Generative Modeling
FNet: Mixing Tokens with Fourier Transforms
Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis
Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
From Handcrafted Features to LLMs: A Brief Survey for Machine Translation Quality Estimation
From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification
From Sparse to Soft Mixtures of Experts
Full Parameter Fine-tuning for Large Language Models with Limited Resources
Fully Character-Level Neural Machine Translation without Explicit Segmentation
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
GEIC: Universal and Multilingual Named Entity Recognition with Large Language Models
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Gemini: A Family of Highly Capable Multimodal Models
Gemma 2: Improving Open Language Models at a Practical Size
Gemma: Open Models Based on Gemini Research and Technology
Gender Bias in Contextualized Word Embeddings
Gender Bias in Coreference Resolution
Generalization in diffusion models arises from geometry-adaptive harmonic representations
Generalization through Memorization: Nearest Neighbor Language Models
Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Long Sequences with Sparse Transformers
Generative Models: What do they know? Do they know things? Let's find out!
Generative Spoken Dialogue Language Modeling
Generative Spoken Language Modeling from Raw Audio
Generator Matching: Generative modeling with arbitrary Markov processes
Genie: Generative Interactive Environments
Geographic Adaptation of Pretrained Language Models
Geographic and Geopolitical Biases of Language Models
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
GFlowNet Foundations
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Git Re-Basin: Merging Models modulo Permutation Symmetries
Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Globally Normalized Transition-Based Neural Networks
GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Glow: Generative Flow with Invertible 1x1 Convolutions
GLU Variants Improve Transformer
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Good Night at 4 pm?! Time Expressions in Different Cultures
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Gorilla: Large Language Model Connected with Massive APIs
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Gradient Descent Converges to Minimizers
Grandmaster-Level Chess Without Search
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Group Normalization
HGRN2: Gated Linear RNNs with State Expansion
Hi-Fi Multi-Speaker English TTS Dataset
Hierarchical nucleation in deep neural networks
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
HiFi-GAN-2: Studio-Quality Speech Enhancement via Generative Adversarial Networks Conditioned on Acoustic Features
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
High Fidelity Neural Audio Compression
HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation
Highway Networks
Holistic Evaluation of Language Models
Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval
Houdini: Fooling Deep Structured Prediction Models
How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena
How Does Batch Normalization Help Optimization?
How Familiar Does That Sound? Cross-Lingual Representational Similarity Analysis of Acoustic Word Embeddings
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not
How to represent part-whole hierarchies in a neural network
How to Train Your Energy-Based Models
How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis
How well can VMEC predict the initial saturation of external kink modes in near circular tokamaks and $l=2$ stellarators?
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Human-in-the-Loop Causal Discovery under Latent Confounding using Ancestral GFlowNets
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Hyena Hierarchy: Towards Larger Convolutional Language Models
HyperAttention: Long-context Attention in Near-Linear Time
Hyperbolic Active Learning for Semantic Segmentation under Domain Shift
Hypergraph Neural Networks through the Lens of Message Passing: A Common Perspective to Homophily and Architecture Design
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models
I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
ILLUME: Rationalizing Vision-Language Models through Human Interactions
ImageBind: One Embedding Space To Bind Them All
ImageNet Large Scale Visual Recognition Challenge
Imitation Learning as $f$-Divergence Minimization
Implicit Generation and Generalization in Energy-Based Models
Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation
Improved Baselines with Momentum Contrastive Learning
Improved Baselines with Visual Instruction Tuning
Improving language models by retrieving from trillions of tokens
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
Improving Neural Language Models with a Continuous Cache
Improving Neural Machine Translation Models with Monolingual Data
Improving Personalized Explanation Generation through Visualization
Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy
Improving Zero-Shot Translation by Disentangling Positional Information
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
Inferring and Executing Programs for Visual Reasoning
InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization
Information Theory and Statistics: an overview
Information-Theoretic Probing for Linguistic Structure
Inseq: An Interpretability Toolkit for Sequence Generation Models
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks
Instruction Tuning for Large Language Models: A Survey
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition
Internalizing ASR with Implicit Chain of Thought for Efficient Speech-to-Speech Conversational LLM
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Intrinsic dimension of data representations in deep neural networks
Intrusive And Non-Intrusive Perceptual Speech Quality Assessment Using A Convolutional Neural Network
Intuitive Multilingual Audio-Visual Speech Recognition
Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting
Investigating Backtranslation in Neural Machine Translation
Investigating Decoder-only Large Language Models for Speech-to-text Translation
Investigating Lexical Sharing in Multilingual Machine Translation for Indian Languages
Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Is Context Helpful for Chat Translation Evaluation?
Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis
Is Training Data Quality or Quantity More Impactful to Small Language Model Performance?
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
ITU-T coders for wideband, superwideband, and fullband speech communication [Series Editorial]
Jamba: A Hybrid Transformer-Mamba Language Model
Johnson-Lindenstrauss Lemma, Linear and Nonlinear Random Projections, Random Fourier Features, and Random Kitchen Sinks: Tutorial and Survey
Joint-task Self-supervised Learning for Temporal Correspondence
JOREK3D: An extension of the JOREK nonlinear MHD code to stellarators
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
KAN: Kolmogorov-Arnold Networks
KIT's Multilingual Speech Translation System for IWSLT 2023
Knowledge Conflicts for LLMs: A Survey
Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges
LAION-5B: An open large-scale dataset for training next generation image-text models
Language agents achieve superhuman synthesis of scientific knowledge
Language Contamination Helps Explain the Cross-lingual Capabilities of English Pretrained Models
Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus
Language Model Can Listen While Speaking
Language Modeling with Gated Convolutional Networks
Language Models are Few-Shot Learners
Language Models are Multilingual Chain-of-Thought Reasoners
Language Models are Realistic Tabular Data Generators
Language Models Represent Space and Time
Language Models: A Guide for the Perplexed
Language-Universal Speech Attributes Modeling for Zero-Shot Multilingual Spoken Keyword Recognition
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Large Associative Memory Problem in Neurobiology and Machine Learning
Large Batch Training of Convolutional Networks
Large Language Model Influence on Diagnostic Reasoning A Randomized Clinical Trial
Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences
Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners
Large Language Models Are State-of-the-Art Evaluators of Translation Quality
Large Language Models for Compiler Optimization
Large Language Models for Data Annotation: A Survey
Large-Scale Automatic Audiobook Creation
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification
Lattice Recurrent Unit: Improving Convergence and Statistical Efficiency for Sequence Modeling
Layer Normalization
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Learned feature representations are biased by complexity, learning order, position, and more
Learning a similarity metric discriminatively, with application to face verification
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
Learning Correspondence from the Cycle-Consistency of Time
Learning Differentially Private Recurrent Language Models
Learning Filterbanks from Raw Speech for Phone Recognition
Learning Interactive Real-World Simulators
Learning Language-Specific Layers for Multilingual Machine Translation
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Learning Source Disentanglement in Neural Audio Codec
Learning the Predictability of the Future
Learning to Compress Prompts with Gist Tokens
Learning to Generate Reviews and Discovering Sentiment
Learning to Merge Word Senses
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Learning to summarize from human feedback
Learning Transferable Visual Models From Natural Language Supervision
Learning with Fenchel-Young Losses
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Leveraging Audio-Only Data for Text-Queried Target Sound Extraction
Leveraging Gloss Knowledge in Neural Word Sense Disambiguation by Hierarchical Co-Attention
Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
Libri-Light: A Benchmark for ASR with Limited or No Supervision
Librispeech An ASR corpus based on public domain audio books
LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Lifting the Curse of Multilinguality by Pre-training Modular Transformers
Lightweight and Efficient Spoken Language Identification of Long-form Audio
Lightweight Audio Segmentation for Long-form Speech Translation
Linear Connectivity Reveals Generalization Strategies
Linear-time Minimum Bayes Risk Decoding with Reference Aggregation
Linformer: Self-Attention with Linear Complexity
Linguini: A benchmark for language-agnostic linguistic reasoning
Listen, Think, and Understand
LiT: Zero-Shot Transfer with Locked-image text Tuning
Llama 2: Open Foundation and Fine-Tuned Chat Models
LLaMA: Open and Efficient Foundation Language Models
LLaSM: Large Language and Speech Model
LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
LLM-as-a-Judge & Reward Model: What They Can and Cannot Do
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models
Localizing Objects with Self-Supervised Transformers and no Labels
Locating and Editing Factual Associations in GPT
Logits of API-Protected LLMs Leak Proprietary Information
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
Long-Context Language Modeling with Parallel Context Encoding
Longformer: The Long-Document Transformer
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation
Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation
LoRA: Low-Rank Adaptation of Large Language Models
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
Lost in the Middle: How Language Models Use Long Contexts
LRS3-TED: a large-scale dataset for visual speech recognition
Lumiere: A Space-Time Diffusion Model for Video Generation
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Making AI Forget You: Data Deletion in Machine Learning
Making New Connections: LLMs as Puzzle Generators for The New York Times' Connections Word Game
Making Pre-trained Language Models Better Few-shot Learners
Mamba in Speech: Towards an Alternative to Self-Attention
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Many-Shot In-Context Learning
Marian: Fast Neural Machine Translation in C++
Mask-Predict: Parallel Decoding of Conditional Masked Language Models
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders that Listen
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
MaskGIT: Masked Generative Image Transformer
MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages
Massively Multilingual Neural Grapheme-to-Phoneme Conversion
Massively Multilingual Neural Machine Translation
Matryoshka Diffusion Models
Matryoshka Representation Learning
MAVD: The First Open Large-Scale Mandarin Audio-Visual Dataset with Depth Information
MAWPS: A Math Word Problem Repository
Measuring and Increasing Context Usage in Context-Aware Machine Translation
Measuring Massive Multitask Language Understanding
Measuring the Mixing of Contextual Information in the Transformer
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Membership Inference Attacks on Machine Learning: A Survey
MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory
Meta-Learning Online Adaptation of Language Models
Meta-Transformer: A Unified Framework for Multimodal Learning
METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments
MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task
MEXMA: Token-level objectives improve sentence representations
mHuBERT-147: A Compact Multilingual HuBERT Model
MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Minimum Bayes-Risk Decoding for Statistical Machine Translation
MIO: A Foundation Model on Multimodal Tokens
Mistral 7B
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings
Mixtral of Experts
MLP-Mixer: An all-MLP Architecture for Vision
MLS: A Large-Scale Multilingual Dataset for Speech Research
MM-LLMs: Recent Advances in MultiModal Large Language Models
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Model Editing with Canonical Examples
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
Modeling User Preferences with Automatic Metrics: Creating a High-Quality Preference Dataset for Machine Translation
Modelling low-resource accents without accent-specific TTS frontend
Modelling of saturated external MHD instabilities in tokamaks: a comparison of 3D free boundary equilibria and nonlinear stability calculations
Modular Deep Learning
Modular Transformers: Compressing Transformers into Modularized Layers for Flexible Efficient Inference
ModuleFormer: Modularity Emerges from Mixture-of-Experts
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Momentum Contrast for Unsupervised Visual Representation Learning
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Moshi: a speech-text foundation model for real-time dialogue
MOSNet: Deep Learning based Objective Assessment for Voice Conversion
MouSi: Poly-Visual-Expert Vision-Language Models
Movie Gen: A Cast of Media Foundation Models
MovieNet: A Holistic Dataset for Movie Understanding
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations
mSLAM: Massively multilingual joint pre-training for speech and text
MuLan: A Joint Embedding of Music Audio and Natural Language
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
Multi-Scale Context Aggregation by Dilated Convolutions
Multi-sense embeddings through a word sense disambiguation process
Multi-Source Diffusion Models for Simultaneous Music Generation and Separation
Multi-ToM: Evaluating Multilingual Theory of Mind Capabilities in Large Language Models
Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts
Multilingual Pretraining Using a Large Corpus Machine-Translated from a Single Source Language
Multilingual Speech Models for Automatic Speech Recognition Exhibit Gender Performance Gaps
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Multimodal Few-Shot Learning with Frozen Language Models
Multimodal Neural Databases
Multitask Prompted Training Enables Zero-Shot Task Generalization
MusicLM: Generating Music From Text
MuST-C: A multilingual corpus for end-to-end speech translation
MuST-C: a Multilingual Speech Translation Corpus
Natural language guidance of high-fidelity text-to-speech with synthetic annotations
Natural Language Processing (almost) from Scratch
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Nearly-Optimal Mergesorts: Fast, Practical Sorting Methods That Optimally Adapt to Existing Runs
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Neural Collaborative Filtering
Neural Combinatorial Optimization with Reinforcement Learning
Neural Discrete Representation Learning
Neural Grapheme-to-Phoneme Conversion with Pre-trained Grapheme Models
Neural Language Model Pruning for Automatic Speech Recognition
Neural Machine Translation by Jointly Learning to Align and Translate
Neural Machine Translation of Rare Words with Subword Units
Neural Machine Translation: A Review and Survey
Neural Machine Translation: Challenges, Progress and Future
Neural Network Acceptability Judgments
Neural Networks are Decision Trees
Neural Sequence Learning Models for Word Sense Disambiguation
Neural Speech Synthesis with Transformer Network
Neural Voice Cloning with a Few Samples
NeuralDEM - Real-time Simulation of Industrial Particulate Flows
No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages
No Language Left Behind: Scaling Human-Centered Machine Translation
Noise-contrastive estimation: A new estimation principle for unnormalized statistical models
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling
Non-Autoregressive Neural Machine Translation
Non-Exchangeable Conformal Language Generation with Nearest Neighbors
Non-Exchangeable Conformal Risk Control
Non-intrusive Speech Quality Assessment Using Neural Networks
Nonlinear MHD modeling of soft $β$ limits in W7-AS
Nonlinear MHD simulations of external kinks in quasi-axisymmetric stellarators using an axisymmetric external rotational transform approximation
Normalization Techniques in Training DNNs: Methodology, Analysis and Application
Not Just a Black Box: Learning Important Features Through Propagating Activation Differences
Nougat: Neural Optical Understanding for Academic Documents
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers
NVLM: Open Frontier-Class Multimodal LLMs
OLMo: Accelerating the Science of Language Models
On Information and Sufficiency
On Instruction-Finetuning Neural Machine Translation Models
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
On Layer Normalization in the Transformer Architecture
On the cyclic nature of perception in vision versus audition
On the difficulty of training Recurrent Neural Networks
On the Implications of Verbose LLM Outputs: A Case Study in Translation Evaluation
On the Integration of Optical Flow and Action Recognition
On the Limitations of Compute Thresholds as a Governance Strategy
On the Measure of Intelligence
On the Number of Linear Regions of Deep Neural Networks
On the Opportunities and Risks of Foundation Models
On the Representation Collapse of Sparse Mixture of Experts
One Mind, Many Tongues: A Deep Dive into Language-Agnostic Knowledge Neurons in Large Language Models
One TTS Alignment To Rule Them All
One Wide Feedforward is All You Need
One-To-Many Multilingual End-to-end Speech Translation
OneLLM: One Framework to Align All Modalities with Language
Only Time Can Tell: Discovering Temporal Data for Temporal Modeling
OpenAssistant Conversations -- Democratizing Large Language Model Alignment
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs
OPT: Open Pre-trained Transformer Language Models
Optical Flow with Semantic Segmentation and Localized Layers
Optimization Methods for Large-Scale Machine Learning
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Over-Generation Cannot Be Rewarded: Length-Adaptive Average Lagging for Simultaneous Speech Translation
Overcoming catastrophic forgetting in neural networks
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
PaLM 2 Technical Report
PaLM: Scaling Language Modeling with Pathways
PALO: A Polyglot Large Multimodal Model for 5B People
Parakeet A natural sounding, conversational text-to-speech model
Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue
Parallel Scheduled Sampling
Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Parallel WaveNet: Fast High-Fidelity Speech Synthesis
Parameter-efficient fine-tuning of large-scale pre-trained language models
Parameter-Efficient Transfer Learning for NLP
Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation
Pay Attention to MLPs
Pengi: An Audio Language Model for Audio Tasks
Perceiver: General Perception with Iterative Attention
Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Phonetic Analysis of Self-supervised Representations of English Speech
Physician Detection of Clinical Harm in Machine Translation: Quality Estimation Aids in Reliance and Backtranslation Identifies Critical Errors
Playing Atari with Deep Reinforcement Learning
Playing Language Game with LLMs Leads to Jailbreaking
Poisoning Language Models During Instruction Tuning
Poisoning Web-Scale Training Datasets is Practical
PolyLM: An Open Source Polyglot Large Language Model
PolyVoice: Language Models for Speech to Speech Translation
Position: Categorical Deep Learning is an Algebraic Theory of All Architectures
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Preliminary WMT24 Ranking of General MT Systems and LLMs
Principles of Visual Tokens for Efficient Video Understanding
Probing the phonetic and phonological knowledge of tones in Mandarin TTS models
Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Progress Report: Towards European LLMs
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models
Prompting Large Language Models with Speech Recognition Abilities
Prompting with Phonemes: Enhancing LLM Multilinguality for non-Latin Script Languages
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis
Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases
Proximal Policy Optimization Algorithms
PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems
Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition
Pushing the Limits of Zero-shot End-to-End Speech Translation
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Quality-Aware Decoding for Neural Machine Translation
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting
Quantifying Memorization Across Neural Language Models
Quantifying the Plausibility of Context Reliance in Neural Machine Translation
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Re-ranking Person Re-identification with k-reciprocal Encoding
Real Time Speech Enhancement in the Waveform Domain
ReALM: Reference Resolution As Language Modeling
Recent Advances in Direct Speech-to-text Translation
Recent Advances in Speech Language Models: A Survey
Recent Developments on ESPnet Toolkit Boosted by Conformer
RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation
Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors
Recurrent Memory Transformer
Reducing Activation Recomputation in Large Transformer Models
Reformer: The Efficient Transformer
Reframing Human-AI Collaboration for Generating Free-Text Explanations
Regularized Evolution for Image Classifier Architecture Search
Reinforcement Learning: An Overview
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Relative representations enable zero-shot latent space communication
Representation Learning with Contrastive Predictive Coding
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Reranking Laws for Language Generation: A Communication-Theoretic Perspective
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech
Retentive Network: A Successor to Transformer for Large Language Models
Rethinking and Improving Multi-task Learning for End-to-end Speech Translation
Rethinking Attention with Performers
Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Revisiting Acoustic Features for Robust ASR
Revisiting Feature Prediction for Learning Visual Representations from Video
Revisiting minimum description length complexity in overparameterized models
Revisiting Model Stitching to Compare Neural Representations
Revisiting Over-Smoothness in Text to Speech
Revisiting Self-Distillation
Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective
Risks from Learned Optimization in Advanced Machine Learning Systems
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Robust Speech Recognition via Large-Scale Weak Supervision
Robustness May Be at Odds with Accuracy
RoFormer: Enhanced Transformer with Rotary Position Embedding
RWKV: Reinventing RNNs for the Transformer Era
S2ORC: The Semantic Scholar Open Research Corpus
SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
SALMONN: Towards Generic Hearing Abilities for Large Language Models
Sample Efficient Adaptive Text-to-Speech
SaulLM-7B: A pioneering Large Language Model for Law
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain
Scalable Diffusion Models with Transformers
Scaling Instructable Agents Across Many Simulated Worlds
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Scaling Laws for Generative Mixed-Modal Language Models
Scaling Laws for Multilingual Neural Machine Translation
Scaling Laws for Neural Language Models
Scaling Laws for Transfer
Scaling Properties of Speech Language Models
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Scaling Speech Technology to 1,000+ Languages
Scaling Transformer to 1M tokens and beyond with RMT
Scaling Up Influence Functions
Scaling Vision with Sparse Mixture of Experts
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
Score-Based Generative Modeling through Stochastic Differential Equations
SeamlessM4T: Massively Multilingual & Multimodal Machine Translation
Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation Capability
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Self-Alignment with Instruction Backtranslation
Self-Attention with Relative Position Representations
Self-critical Sequence Training for Image Captioning
Self-Instruct: Aligning Language Model with Self Generated Instructions
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Self-labelling via simultaneous clustering and representation learning
Self-supervised Context-aware Style Representation for Expressive Speech Synthesis
Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
Self-Supervised Learning of Pretext-Invariant Representations
Self-Supervised Speech Representations are More Phonetic than Semantic
Self-supervised Video Object Segmentation by Motion Grouping
Self-Taught Evaluators
Sentence Length
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
Sequence Level Training with Recurrent Neural Networks
SGDR: Stochastic Gradient Descent with Warm Restarts
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Shortcut Learning in Deep Neural Networks
Shortformer: Better Language Modeling using Shorter Inputs
Should You Mask 15% in Masked Language Modeling?
Simple and Controllable Music Generation
Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning
Simplifying Transformer Blocks
Skip-Thought Vectors
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
SLURP: A Spoken Language Understanding Resource Package
Soft Merging of Experts with Adaptive Routing
softmax is not enough (for sharp out-of-distribution)
SONAR: Sentence-Level Multimodal and Language-Agnostic Representations
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
SoundStorm: Efficient Parallel Audio Generation
SoundStream: An End-to-End Neural Audio Codec
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources
Space-Time Correspondence as a Contrastive Random Walk
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparse and Continuous Attention Mechanisms
Sparse and Structured Hopfield Networks
Sparse Attention with Linear Units
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Sparse Communication via Mixed Distributions
Sparse continuous distributions and Fenchel-Young losses
Sparse Sequence-to-Sequence Models
Sparse Text Generation
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
Speech Translation with Large Language Models: An Industrial Practice
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond
Speech-to-Speech Translation For A Real-world Unwritten Language
SpeechAlign: Aligning Speech Generation to Human Preferences
SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation
SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
SpeechVerse: A Large-scale Generalizable Audio Language Model
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Speed/accuracy trade-offs for modern convolutional object detectors
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition
SpiRit-LM: Interleaved Spoken and Written Language Model
Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction
Spoken Language Corpora Augmentation with Domain-Specific Voice-Cloned Speech
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
ST-LLM: Large Language Models Are Effective Temporal Learners
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
State Spaces Aren't Enough: Machine Translation Needs Attention
Stealing Part of a Production Language Model
Stealing User Prompts from Mixture of Experts
Steerable CNNs
Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning
Step-by-Step Diffusion: An Elementary Tutorial
STLight: a Fully Convolutional Approach for Efficient Predictive Learning by Spatio-Temporal joint Processing
StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection
Structured Neural Summarization
Structured Pruning of Large Language Models
Structured Training for Neural Network Transition-Based Parsing
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates
Super Tiny Language Models
SUPERB: Speech processing Universal PERformance Benchmark
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Supervised Contrastive Learning
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
Surrogate Gradient Learning in Spiking Neural Networks
Survey of Automatic Metrics for Evaluating Machine Translation at the Document Level
Surveying the MLLM Landscape: A Meta-Review of Current Surveys
SWEb: A Large Web Dataset for the Scandinavian Languages
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
Symbolic Discovery of Optimization Algorithms
T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine Translation
Tacotron: Towards End-to-End Speech Synthesis
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models
Task Vectors are Cross-Modal
Task-aware Retrieval with Instructions
Task-Aware Unified Source Separation
Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training
Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting
Text and Code Embeddings by Contrastive Pre-Training
Text-Free Prosody-Aware Generative Spoken Language Modeling
Textbooks Are All You Need
Textless Speech-to-Speech Translation on Real Data
Textually Pretrained Speech Language Models
Texygen: A Benchmarking Platform for Text Generation Models
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
The Biological Basis of Audition
The boundary of neural network trainability is fractal
The Causal-Neural Connection: Expressiveness, Learnability, and Inference
The challenge of realistic music generation: modelling raw audio at scale
The Curious Case of Neural Text Degeneration
The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
The Defeat of the Winograd Schema Challenge
The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
The Emotions of the Crowd: Learning Image Sentiment from Tweets via Cross-modal Distillation
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction
The Hardware Lottery
The Inside Story: Towards Better Understanding of Machine Translation Neural Evaluation Metrics
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
The JOREK non-linear extended MHD code and applications to large-scale instabilities and their control in magnetically confined fusion plasmas
The Kinetics Human Action Video Dataset
The Llama 3 Herd of Models
The Matrix Calculus You Need For Deep Learning
The Metropolis-Hastings algorithm
The Modern Mathematics of Deep Learning
The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Power of Scale for Parameter-Efficient Prompt Tuning
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks
The Spotify Podcast Dataset
The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval
The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language
The Topological BERT: Transforming Attention into Topology for Natural Language Processing
The unreasonable effectiveness of few-shot learning for machine translation
The Winograd schema challenge
The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling
The Zero Resource Speech Challenge 2021: Spoken language modelling
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Time-Contrastive Networks: Self-Supervised Learning from Video
TinyLlama: An Open-Source Small Language Model
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
TLDR: Extreme Summarization of Scientific Documents
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs
Toolformer: Language Models Can Teach Themselves to Use Tools
TopoBenchmarkX: A Framework for Benchmarking Topological Deep Learning
Toward Joint Language Modeling for Speech Units and Text
Towards a definition of transcreation: a systematic literature review
Towards Deep Learning Models Resistant to Adversarial Attacks
Towards Expert-Level Medical Question Answering with Large Language Models
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR
Towards Robust Speech Representation Learning for Thousands of Languages
Towards Understanding Grokking: An Effective Theory of Representation Learning
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Tower: An Open Multilingual Large Language Model for Translation-Related Tasks
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Training Compute-Optimal Large Language Models
Training data-efficient image transformers & distillation through attention
Training Deep Nets with Sublinear Memory Cost
Training language models to follow instructions with human feedback
Training Language Models with Memory Augmentation
Training Neural Networks from Scratch with Parallel Low-Rank Adapters
Training Verifiers to Solve Math Word Problems
Transcendence: Generative Models Can Outperform The Experts That Train Them
Transferable speech-to-text large language model alignment module
Transformation of Mean Opinion Scores to Avoid Misleading of Ranked based Statistical Techniques
Transformer Feed-Forward Layers Are Key-Value Memories
Transformer Networks for Trajectory Forecasting
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
TransformerFAM: Feedback attention is working memory
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Transformers learn in-context by gradient descent
Transformers need glasses! Information over-squashing in language tasks
Translating Step-by-Step: Decomposing the Translation Process for Improved Translation Quality of Long-Form Texts
Translatotron 2: High-quality direct speech-to-speech translation with voice preservation
Translatotron 3: Speech to Speech Translation with Monolingual Data
Transparent and Scrutable Recommendations Using Natural Language User Profiles
TruthfulQA: Measuring How Models Mimic Human Falsehoods
TuBA: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning
Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring
Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
U-Net: Convolutional Networks for Biomedical Image Segmentation
UL2: Unifying Language Learning Paradigms
Uncovering Latent Style Factors for Expressive Speech Synthesis
Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View
Understanding Black-box Predictions via Influence Functions
Understanding deep learning requires rethinking generalization
Understanding Intra-Class Knowledge Inside CNN
Understanding natural language
Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control
Unified Language Model Pre-training for Natural Language Understanding and Generation
Unified Speech-Text Pretraining for Spoken Dialog Modeling
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources
UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data
Unitary Evolution Recurrent Neural Networks
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
Universal Language Model Fine-tuning for Text Classification
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation
Unlimiformer: Long-Range Transformers with Unlimited Length Input
Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance
Unsupervised Cross-lingual Representation Learning at Scale
Unsupervised Deep Tracking
Unsupervised Dense Information Retrieval with Contrastive Learning
Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
Unsupervised Neural Machine Translation
Unsupervised Source Separation via Bayesian Inference in the Latent Domain
Unsupervised Translation of Programming Languages
Unsupervised Visual Representation Learning by Context Prediction
Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mechanism
Unveiling the Role of Pretraining in Direct Speech Translation
URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors
Using Forced Alignment for Phonetics Research
Using the Output Embedding to Improve Language Models
VALHALLA: Visual Hallucination for Machine Translation
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
Variational Inference: A Review for Statisticians
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Vec-Tok Speech: speech vectorization and tokenization for neural speech generation
VeLO: Training Versatile Learned Optimizers by Scaling Up
Video as the New Language for Real-World Decision Making
Video Swin Transformer
VideoPrism: A Foundational Visual Encoder for Video Understanding
VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Vision Transformers Need Registers
ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric
Visual Instruction Tuning
Visualizing and Understanding Convolutional Networks
Visualizing Data using t-SNE
VITA: Towards Open-Source Interactive Omni Multimodal LLM
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
VoxCeleb2: Deep Speaker Recognition
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Watching the Watchers: Exposing Gender Disparities in Machine Translation Quality Estimation
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
Watt For What: Rethinking Deep Learning's Energy-Performance Relationship
wav2letter++: The Fastest Open-source Speech Recognition System
Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
wav2vec: Unsupervised Pre-training for Speech Recognition
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
WaveGlow: A Flow-based Generative Network for Speech Synthesis
WaveNet: A Generative Model for Raw Audio
WavLLM: Towards Robust and Adaptive Speech Large Language Model
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition
What Are Tools Anyway? A Survey from the Language Model Perspective
What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
What Should Not Be Contrastive in Contrastive Learning
What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study
What's In My Big Data?
When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion
When Do Neural Networks Outperform Kernel Methods?
When Does Translation Require Context? A Data-driven, Multilingual Exploration
When Good and Reproducible Results are a Giant with Feet of Clay: The Importance of Software Quality in NLP
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers
Why should we add early exits to neural networks?
Why Warmup the Learning Rate? Underlying Mechanisms and Improvements
WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia
WinoGrande: An Adversarial Winograd Schema Challenge at Scale
WinoWhy: A Deep Diagnosis of Essential Commonsense Knowledge for Answering Winograd Schema Challenge
Word Translation Without Parallel Data
word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method
WT5?! Training Text-to-Text Models to Explain their Predictions
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
XL-WSD An Extra-Large and Cross-Lingual Evaluation Framework for Word Sense Disambiguation
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
xLSTM: Extended Long Short-Term Memory
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
xTower: A Multilingual LLM for Explaining and Correcting Translation Errors
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
Yi: Open Foundation Models by 01.AI
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Zero-Shot Tokenizer Transfer
Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations
People
Afra Alishahi
Akari Asai
Alan Jeffares
Aldo Lipani
Alexandre Défossez
Alexei A. Efros
Alicia Curth
André F. T. Martins
Andrea Bacciu
Andrew K. Lampinen
António Farinhas
Armand Joulin
Beatrice Savoldi
Ben Peters
Beomseok Lee
Boris Ginsburg
Christian Szegedy
David Ha
David Silver
Dennis Fucci
Diederik P. Kingma
Edward Grefenstette
Emmanouil Zaranis
Emmanuel Dupoux
Eugene Kharitonov
Gabriele Sarti
Gautier Izacard
Graham Neubig
Grzegorz Chrupała
Hao Tang
Hector J. Levesque
Hosein Mohebbi
Itai Gat
James Chapman
Jay Alammar
Joshua Ainslie
Julia Kempe
Kevin Murphy
Kohei Saijo
Kushal Lakhotia
Kyunghyun Cho
Laura Ruis
Laura Sevilla-Lara
Laurent Besacier
Luca Soldaini
Luisa Bentivogli
Luke Zettlemoyer
Maarten Sap
Marco Gaido
Matteo Negri
Mauro Cettolo
Max Bartolo
Max Welling
Michael Hassid
Mihaela van der Schaar
Neil Zeghidour
Nuno M. Guerreiro
Oleksii Hrinchuk
Peter Holderrieth
Quoc Le
Ramón Fernandez Astudillo
Razvan Pascanu
Roberto Navigli
Rohan Ramasamy
Ronan Collobert
Sergey Ioffe
Shayne Longpre
Shinji Watanabe
Simone Conia
Tal Remez
Tatsunori B. Hashimoto
Telmo Pessoa Pires
Tim Rocktäschel
Tsz Kin Lam
Tu-Anh Nguyen
Vadim Borisov
Vlad Niculae
Wei-Ning Hsu
Xin Zhang
Yair Lakretz
Yann LeCun
Yonatan Belinkov
Yoshua Bengio
Yossi Adi
Zalan Borsos
Zalán Borsos
Posts
An Evolutionary Perspective on Language
Animal Navigation Systems
Bayes: Conjugate Inference
CPC: Representation Learning with Contrastive Predictive Coding
Four Early Lessons from Working on Machine Learning Projects
Generalized Linear Models and the Exponential Family
Graphs: Community Structure
Graphs: Motifs, Graphlets and Structural Roles in Networks
Jabri, Owens and Efros (2020) Space-Time Correspondence as a Contrastive Random Walk
LSTMs + Grammar as a Foreign Language
Mean, Median and Mode as Representatives
Self-Supervised Visual Representation Learning
Some Information Theory
The Hierarchical Softmax
The Probability Distributions
The Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Research
Conferences
ICLR-2024
2024 Conference
Blogposts Track ICLR 2024 Announcing Accepted Blogposts – ICLR Blog
ICLR 2024 Outstanding Paper Awards – ICLR Blog
ICLR 2024 Papers
ICLR 2024 Test of Time Award – ICLR Blog
ICLR2024 Papers - a Hugging Face Space by ICLR2024
ICLR-2025
2025 Dates and Deadlines
NeurIPS-2024
Announcing the NeurIPS 2024 Test of Time Paper Awards – NeurIPS Blog
Dynamic Sparsity in Machine Learning NeurIPS 2024 Tutorial
NeurIPS 2024 Call for Papers
SIGIR
SIGIR 2024
Conferences Overview
ICLR
ICTIR 2024
International Conference on the Theory of Information Retrieval (ICTIR) - SIGIR
SIGdial – Special Interest Group on Discourse and Dialogue
Dataset Cards
RedPajama-Data-v2 An open dataset with 30 trillion tokens for training large language models
Landscape
Horizon Europe
Organisations
Linguistics
Anaphora
Selection
ML Notes
Cosine Similarity vs Pearson Moment Correlation Coefficiant
Decaying Learning Rate Exponentially when Scaling Batch Size and Base Learning Rate
How many iterations will a training run last?
Multiclass vs multilabel classification
Sampling for Text Generation, Nucleus Sampling (top-$p$), the need for top-$k$ and Beam Search
Vector Projection
Vector Quantization
Weight Initialisation
Whitening, sharpening & smoothing
AI Regulation
Audio
Computer Vision
Datasets
Energy Based Models
eXplainability
Hardware
Implementation
Information Retrieval
Kernels 🍿
Language Models
Llamas 🦙
Machine Translation
Multimodality
Music
Natural Language Inference
Neuroscience
Optimisation
Phonetics
Recommendation Systems
Reinforcement Learning
Speech and Audio
Statistical Learning Theory
Theory of Deep Learning
Tokenisation
Variational Inference
Winograd and WinoGrande
Word Sense Disambiguation
Talks
Designing efficient and modular neural networks - Simone Scardapane
Efficient Transformers - Łukasz Kaiser
Hurdles at WMT - Keeping up with the MT progress - Tom Kocmi
Questions
READUS
Research Tools
Home
❯
People
❯
Gabriele Sarti
Gabriele Sarti
Graph View
Backlinks
Contrastive Language-Image Pre-training for the Italian Language
Inseq: An Interpretability Toolkit for Sequence Generation Models
Quantifying the Plausibility of Context Reliance in Neural Machine Translation
eXplainability