đŸȘŽ Anil's Garden

      • __main__ — Top-level code environment
      • -nomicon - Wiktionary, the free dictionary
      • ‘A place of joy’ why scientists are joining the rush to Bluesky
      • ‘Monsters The Lyle and Erik Menendez Story’ Has One Great Episode, but Doesn’t Know What to Do With It Vanity Fair
      • ‘Sputnik moment’: $1tn wiped off US stocks after Chinese firm unveils AI chatbot
      • "Unmasking the Godfather - Reverse Engineering the Latest Android Banking Trojan" by Laurie Kirk - YouTube
      • (9) A brief history of word embeddings (and some clarifications) | LinkedIn
      • (12) Can Deepseek continue to succeed? | LinkedIn
      • [NLVP seminar] 16 Jan 2025 (date updated) - Prof Roberto Navigli at the Sapienza University of Rome
      • 🩅 Eagle 7B Soaring past Transformers with 1 Trillion Tokens Across 100+ Languages
      • €100B? €86.6B? A Brussels puzzle: How big is the new research budget?
      • 1. Getting started - csvkit 2.1.0 documentation
      • 1. Why CUDA Compatibility — CUDA Compatibility
      • 2.4 Scaling Laws AI Safety, Ethics, and Society Textbook
      • 3 Montreal Forced Aligner Corpus Phonetics Tutorial
      • 3. Data model
      • 3. Data model — Python 3.12.4 documentation
      • 3.10. Fundamental frequency (F0) — Introduction to Speech Processing
      • 4.9. usrlocal Local hierarchy
      • 5 free monospaced fonts with coding ligatures Better Web Type
      • 5.2 Model formulation and estimation Notes for Predictive Modeling
      • 6.3 Rejection Sampling Advanced Statistical Computing
      • 7.1 Background Advanced Statistical Computing
      • 7.2 Metropolis-Hastings Advanced Statistical Computing
      • 10 Tips for Research and a PhD
      • 10.4 Adversarial Examples | Interpretable Machine Learning
      • 11. Brief Tour of the Standard Library — Part II
      • 12 Obsidian Plug-ins I *Actually* Use
      • 15 Darkest Garfield Minus Garfield Strips (So Far)
      • 31 brilliant birthday ideas in London
      • 32 Bit Vs 64 Bit - What's the Difference?
      • 69 Best AI Startups in London to Watch in 2024
      • 69 Best London AI Startups to Watch in 2024
      • 90 Linux Commands frequently used by Linux Sysadmins
      • 100M Token Context Windows — Magic
      • 161 outstanding machine learning researchers accepted as new ELLIS Fellows and Scholars
      • A (Relatively Easy To Understand) Primer on Elliptic Curve Cryptography
      • A Beginner's Guide to the proc File System in Linux
      • A Beginner's Guide to Variational Methods: Mean-Field Approximation
      • A complete guide to carbon offsetting Carbon offsetting The Guardian
      • A Comprehensive Guide to Building a Transformer Model with PyTorch DataCamp
      • A crash course in compilers – Increment: Programming Languages
      • A decoder-only foundation model for time-series forecasting – Google Research Blog
      • A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes
      • A Guide To Parsing: Algorithms And Terminology
      • A half-hour to learn Rust
      • A Hitchhiker’s Guide to Speculative Decoding PyTorch
      • A Map of the Territory · Crafting Interpreters
      • A new AI-powered speech translation system for Hokkien pioneers a new approach for a primarily oral language
      • A New Approach to the Data-Deletion Conundrum
      • A new open data set for multilingual speech research
      • A Practical Guide to fzf Shell Integration
      • A pyproject.toml Developer’s Cheat Sheet
      • A pyproject.toml Developer’s Cheat Sheet by Ricardo Mendes Better Programming
      • A Recipe for Training Neural Networks
      • A Short Introduction to Optimal Transport and Wasserstein Distance · Its Neuronal
      • A Simplified Guide to Dynamic Programming
      • A state-of-the-art, self-supervised framework for video understanding
      • A time for truth and reconciliation
      • A Timeline of Large Transformer Models for Speech Jonathan Bgn
      • A Visual Exploration of Gaussian Processes
      • A Visual Git Reference
      • A Visual Guide to SSH Tunnels Local and Remote Port Forwarding
      • A Visual Guide to Using BERT for the First Time
      • A Visual Guide to Vision Transformers MDTURP
      • a-tour-of-pytorch-optimizers/a-tour-of-pytorch-optimizers.ipynb at main · bentrevett/a-tour-of-pytorch-optimizers
      • Abelian group - Wikipedia
      • Abjad - Wikipedia
      • about ICLR Blogposts 2024
      • About LAION
      • About Sofía Valdés
      • About - 152334H
      • About | Kagi's Docs
      • About CC Licenses
      • About me - Wei-Ning Hsu (ćŸç…’ç”Ż)
      • About workflows - GitHub Docs
      • About Y Combinator Y Combinator
      • Abugida - Wikipedia
      • Accelerating Generative AI with PyTorch II GPT, Fast PyTorch
      • Accelerating Generative AI with PyTorch IV Seamless M4T, fast PyTorch
      • Accelerating Generative AI with PyTorch Segment Anything, Fast PyTorch
      • Accuracy Benchmarking Speechmatics
      • Achille Castiglioni - Wikipedia
      • ACL Policies for Review and Citation - Admin Wiki
      • acl-presedential-response.md
      • Acoustic Word Embeddings for Low Resource Speech Processing with Herman Kamper - TWiML Talk 191 - YouTube
      • Adding Bluesky-powered comments to any website in five minutes Cory Zue
      • Advanced Encryption Standard - Wikipedia
      • Advanced features — ocrmypdf 11.7.2 documentation
      • Advanced Iterators - Dive Into Python 3
      • Advanced Topics in Machine Learning
      • Advanced Topics in Machine Learning (COMP0083) UCL Module Catalogue - UCL – University College London
      • Adversarial Attacks on Neural Networks: Exploring the Fast Gradient Sign Method
      • Adversarial attacks with FGSM (Fast Gradient Sign Method) - PyImageSearch
      • AI Accelerators — Part II: Transistors and Pizza (or: Why Do We Need Accelerators)?
      • AI and Memory Wall
      • AI chip start-up Groq’s value rises to $2.8bn as it takes on Nvidia
      • AI Index Report 2024 – Artificial Intelligence Index
      • AI Is a Black Box. Anthropic Figured Out a Way to Look Inside WIRED
      • AI paid for by Ads – the gpt-4o mini inflection point
      • AI’s Walking Dog - Boston Review
      • AI2 Dolma 3 Trillion Token Open Corpus for LLMs AI2 Blog
      • Ai2 OpenScholar Scientific literature synthesis with retrieval-augmented language models Ai2
      • AIcrowd | Spotify Million Playlist Dataset Challenge | Challenges
      • Airdrop (cryptocurrency) - Wikipedia
      • aiXplain Secures $6.5M pre-Series A to Universalize AI Agent Development - EIN Presswire
      • AKS primality test - Wikipedia
      • Alan Kay Did Not Invent Objects
      • Aldo Lipani, PhD – CV
      • Alex Hitchcock
      • Algebra over a field - Wikipedia
      • Algebraic structure - Wikipedia
      • algorithm - Insertion Sort vs. Selection Sort - Stack Overflow
      • All Algorithms Sort Visualizations
      • All Nobel Prizes 2024 - NobelPrize.org
      • All Watched Over by Machines of Loving Grace (TV series) - Wikipedia
      • All Watched Over By Machines Of Loving Grace by Richard Brautigan - Famous poems, famous poets. - All Poetry
      • Allen Institute for AI - Wikipedia
      • ALMANACH
      • AlphaFold
      • Alphonse Mucha - Wikipedia
      • american fuzzy lop
      • American Fuzzy Lop (software) - Wikipedia
      • Americans think "#SignalGate" is worse than Hillary's emails
      • An early peek at Dia, our second product A recruiting video - YouTube
      • An In-depth Guide to Benchmarking LLMs Symbl.ai
      • An Interactive Guide To The Fourier Transform – BetterExplained
      • An introduction to Codes of Practice for the AI Act EU Artificial Intelligence Act
      • An Introduction to Deep Reinforcement Learning
      • An introduction to Dynamic Time Warping
      • An Introduction to LLM Benchmarking - Confident AI
      • An Introduction to the Mamba LLM Architecture A New Paradigm in Machine Learning DataCamp
      • An Intuitive Discrete Fourier Transform Tutorial - Practical Cryptography
      • An Intuitive Explanation of Connectionist Temporal Classification
      • An Intuitive Explanation of Connectionist Temporal Classification by Harald Scheidl Towards Data Science
      • An Intuitive Explanation of Policy Gradient — Part 1 REINFORCE
      • An Opinionated Guide to ML Research
      • An Overview of Multi-Task Learning in Speech Recognition
      • Anaphora (linguistics) - Wikipedia
      • Anatomize Deep Learning with Information Theory Lil'Log
      • Anchoring Bias - The Decision Lab
      • AndrĂ© Martins was awarded an ERC Consolidator Grant to study artificial neural networks applied to natural language processing
      • Andrea Palladio - Wikipedia
      • Andreessen Horowitz - Wikipedia
      • Andrej Karpathy on X Speculative execution for LLMs is an excellent inference-time optimization. It hinges on the following unintuitive observation forwarding an LLM on a single input token takes about as much time as forwarding an LLM o
      • Ann Peebles - Wikipedia
      • Announcing Grok
      • Announcing Tower An Open Multilingual LLM for Translation-Related Tasks
      • ANOM – Darknet Diaries
      • Answer.AI - Lessons from history’s greatest R&D labs
      • Anthropic Education Report: How University Students Use Claude
      • Antisymmetric Matrix -- from Wolfram MathWorld
      • Antium - Wikipedia
      • Anyone can Access Deleted and Private Repository Data on GitHub ◆ Truffle Security Co.
      • Anyone looking for VMware Fusion Player for Mac Found it rvmware
      • Apache Arrow
      • Aperiodic Functions From Fourier Series to Fourier Transform
      • Apple AI researchers boast useful on-device model that ‘substantially outperforms’ GPT-4 - 9to5Mac
      • Apple loses more than $300bn in market value from Trump tariff hit
      • Apple M1 - Wikipedia
      • Apple M2 - Wikipedia
      • Apple Public Source License - Wikipedia
      • Apple’s MM1 AI Model Shows a Sleeping Giant Is Waking Up WIRED
      • Application binary interface - Wikipedia
      • apt update vs apt-get update Differences Explained!
      • Arbitrary-precision arithmetic - Wikipedia
      • Architecture of Windows NT - Wikipedia
      • ArchWiki
      • Arco (lamp) - Wikipedia
      • Are remote workers more productive That’s the wrong question. - Stack Overflow
      • Are there any licenses out there with LLM usage restrictions ropensource
      • argparse — Parser for command-line options, arguments and sub-commands — Python 3.12.6 documentation
      • ARM architecture family - Wikipedia
      • ASCII Table - ASCII Character Codes, HTML, Octal, Hex, Decimal
      • ASLized!
      • Associative Learning - an overview ScienceDirect Topics
      • Async IO in Python A Complete Walkthrough – Real Python
      • AT Protocol
      • Attention and Augmented Recurrent Neural Networks
      • Audio Language Models and Multimodal Architecture by Deepak Babu P R Mar, 2024 Medium
      • Audio sample rate converters comparison
      • Audio-based Machine Learning Model for Podcast Language Identification - Spotify Research Spotify Research
      • AudioCraft: A simple one-stop shop for audio modeling
      • Autograd mechanics — PyTorch 2.5 documentation
      • Automatic post-editing for machine translation: a look at the future
      • Avalon Project - Washington's Farewell Address 1796
      • AVSpeech Audio Visual Speech Dataset
      • AWK - Wikipedia
      • backdoor in US medical device calls out to chinese university
      • Backup Strategies: Why the 3-2-1 Backup Strategy is the Best
      • Barack Obama on AI, free speech, and the future of the internet - The Verge
      • Base64 - MDN Web Docs Glossary: Definitions of Web-related terms | MDN
      • Base64 - Wikipedia
      • Baseline OpenAI end-to-end chat reference architecture - Azure Reference Architectures | Microsoft Learn
      • Baseline OpenAI end-to-end Chat Reference Architecture - InfoQ
      • bash - variable expansion in curly braces - Stack Overflow
      • Bash best practices cheat-sheets
      • Bash Builtins (Bash Reference Manual)
      • Bash Colors - ShellHacks
      • Bash for NLP tutorial, advanced topics · John Hewitt
      • Bash for NLP tutorial, basics · John Hewitt
      • Bash Functions Linuxize
      • Bash Globbing Tutorial
      • Bash Reference Manual
      • Bash Strict Mode
      • BashPitfalls - Greg's Wiki
      • Basic Tutorial — Cython 3.0.11 documentation
      • Bayesian Neural Networks
      • Bazel (software) - Wikipedia
      • BCEWithLogitsLoss — PyTorch 2.3 documentation
      • Beachhead Strategy
      • Beam Search Decoding in CTC-trained Neural Networks by Harald Scheidl Towards Data Science
      • BeEF - The Browser Exploitation Framework Project
      • Beej's Guide to C Programming
      • before you code, learn how computers work
      • Bellman equation - Wikipedia
      • Berkeley sockets - Wikipedia
      • Best Computer Science Conferences Ranking Machine Learning & Artificial intelligence 2024 Research.com
      • Best practices for Dockerfile instructions Docker Docs
      • Better language models and their implications
      • BĂ©zout's identity - Wikipedia
      • bfloat16 floating-point format - Wikipedia
      • BFloat16 The secret to high performance on Cloud TPUs Google Cloud Blog
      • BIG-benchbigbenchbenchmark_tasksREADME.md at main · googleBIG-bench
      • Bijection, injection and surjection - Wikipedia
      • Bilinear interpolation - Wikipedia
      • Binary search tree - Wikipedia
      • Birthday problem - Wikipedia
      • Bitcoin Block Reward Halving Countdown
      • Bitcoin Dust: Overview, Disadvantages, and Example
      • Bitmap - Wikipedia
      • Block cipher mode of operation - Wikipedia
      • Blog peterbloem.nl
      • Bloom's taxonomy - Wikipedia
      • Bluesky tops 20M users, narrowing gap with Instagram Threads TechCrunch
      • Boltzmann machine - Wikipedia
      • Bourne Shell Builtins (Bash Reference Manual)
      • Box, stack and heap - Rust By Example
      • Branch Cut -- from Wolfram MathWorld
      • Brandon Rohrer
      • Brave browser explains Facebook whitelist to concerned users
      • Bridging AI and Cognitive Science (BAICS)
      • BrowseComp: a benchmark for browsing agents
      • Browser-based vulnerabilities in web applications Infosec
      • Browser, OS, Search Engine including Mobile Usage Share
      • Broyden–Fletcher–Goldfarb–Shanno algorithm - Wikipedia
      • Build a theme
      • Build Scripts - The Cargo Book
      • Building an Audience Through Technical Writing Strategies and Mistakes – Answer.AI
      • Building architectures that can handle the world’s data
      • BuildKit Docker Docs
      • Byte-Pair Encoding tokenization - Hugging Face NLP Course
      • bzip2 - Wikipedia
      • C data types - Wikipedia
      • C Preprocessor and Macros
      • C++ Best Practices Erik Rigtorp
      • C++ Coding Standards 101 Rules, Guidelines, and Best Practices
      • C++ Enumeration (enum)
      • C++ Introduction
      • C++ reference - cppreference.com
      • C++ Standard Library headers - cppreference.com
      • C++ String Splitting Utility - Claude
      • C++ tutorial for beginners âšĄïž - YouTube
      • C++ type system
      • Calculating the Cost of a Google Deepmind Paper - 152334H
      • Calculus on Computational Graphs: Backpropagation -- colah's blog
      • Call for Proposals - ELLIS Units European Lab for Learning & Intelligent Systems
      • Campo Pequeno EspectĂĄculos & Eventos Agenda
      • Can Llama3.2 Vision be used by researchers in europe rLocalLLaMA
      • Can someone explain the terminology of "bouncing" to me please?
      • Can the audience dance to this - YouTube
      • Canadian Aboriginal syllabics - Wikipedia
      • Canvass White - Wikipedia
      • Capability Brown - Wikipedia
      • Carbon (programming language) - Wikipedia
      • Carl Gustav Jacob Jacobi - Wikipedia
      • Case study porting chardet to Python 3 - Dive Into Python 3
      • Casual Conversations Dataset
      • Categorical Deep Learning - Categorical Deep Learning
      • Category Theory (Stanford Encyclopedia of Philosophy)
      • CategoryCreative Commons-licensed films - Wikipedia
      • Cato the Elder - Wikipedia
      • Causal Bayesian Networks: A flexible tool to enable fairer machine learning
      • Causal inference 4: Causal Diagrams, Markov Factorization, Structural Equation Models
      • Causal Inference With Python Part 2 - Causal Graphical Models
      • Central Tano languages - Wikipedia
      • Centrality - Wikipedia
      • CEWithChunkedOutputLoss — torchtune 0.3 documentation
      • Champalimaud Foundation - Wikipedia
      • Change in Guidance on Committing Lockfiles | Rust Blog
      • Character encoding - Wikipedia
      • CHARMING PYTHON #B26: Python Elegance, Python Warts, Part 2 -- Properties, attributes, methods and custom access --
      • Chart of the Week: Trump is not popular
      • Chat Markup Language ChatML (Preview) - Azure
      • Chat Templates
      • Chat with Open Large Language Models
      • ChatGPT Defeated Doctors at Diagnosing Illness - The New York Times
      • chatml openai-python
      • Chemical Oscillations, Waves, and Turbulence SpringerLink
      • Chessprogramming wiki
      • Chinchilla’s Death
      • Chinese characters - Wikipedia
      • Chinese remainder theorem - Wikipedia
      • Chromium Docs - Chrome Security FAQ
      • Chromium Notes Ninja, a new build system
      • Church–Turing thesis - Wikipedia
      • CIFAR – Convening extraordinary minds to address the most important questions facing science and humanity.
      • Circuit Tracing: Revealing Computational Graphs in Language Models
      • CJK characters - Wikipedia
      • CJK Compatibility Ideographs - Wikipedia
      • CJK Unified Ideographs - Wikipedia
      • Class invariant - Wikipedia
      • Classes & Iterators - Dive Into Python 3
      • Claus P. Schnorr - Wikipedia
      • Clickjacking Attacks How to Detect and Prevent Ping Identity
      • Closure (computer programming) - Wikipedia
      • Cloud GPUs The Best Servers, Services & Providers RANKED!
      • CMU Portugal Inside Story Patrick Fernandes
      • Codec SUPERB
      • CodeSearchNet by GitHub
      • Codestral Mamba Mistral AI Frontier AI in your hands
      • Coldplay and Upsahl songs stolen by Luton cyber hacker
      • Collaborative filtering - Wikipedia
      • College students used Meta’s smart glasses to dox people in real time - The Verge
      • Command line interface - PyMuPDF 1.24.10 documentation
      • Command R: RAG at Production Scale
      • Command-R RAG at Production Scale
      • Communication Between Processes - Python Module of the Week
      • Community Calculus
      • Company Information – Verisign
      • Compilers and Interpreters | HackerNoon
      • Complex instruction set computer - Wikipedia
      • Complex number fundamentals | Ep. 3 Lockdown live math
      • Computer memory - Wikipedia
      • Computer Speed, CPU Cache, RAM Types - ChatGPT
      • Computer-Using Agent
      • Conda and the libmamba solver Roll-out plan 2023 conda.org
      • Conditional Probing and Usable Information · John Hewitt
      • Configuration - pytest documentation
      • Conformer: An interesting ML architecture that I'm abandoning | Knowing.NET
      • Construct an envelope function for the acceptance-rejection method - The DO Loop
      • Continued musing on DPO – Kyunghyun Cho
      • Cookbook — ocrmypdf 16.5.0 documentation
      • Cookbook — ocrmypdf 16.5.1.dev1+g0e4cce2 documentation
      • Copy & Paste in Vim Vi
      • Copy-on-write - Wikipedia
      • Copyleft - Wikipedia
      • Coriolanus National Theatre
      • Coriolanus - Wikipedia
      • Coroutine - Wikipedia
      • Coroutines and Tasks
      • Corporate America’s diversity wars are just getting started
      • corte.si
      • Covariance and contravariance (computer science) - Wikipedia
      • CoVoST V2 Expanding the largest, most diverse multilingual speech-to-text translation dataset
      • cracked - Wiktionary, the free dictionary
      • Create a dataset loading script
      • Create and manage a repository
      • Creative Coding Crafts Space meetup #4: Three.js Part 1
      • Cross-Attention in Transformer Architecture
      • Cross-lingual pretraining sets new state of the art for natural language understanding
      • Crossing the uncanny valley of conversational voice
      • Crush Your 2025 Reading Challenge Goals with These Tips
      • Crypto Dust and Dusting Attacks Explained
      • CS231n Convolutional Neural Networks for Visual Recognition
      • CS231n Convolutional Neural Networks for Visual Recognition
      • CTC forced alignment API tutorial — Torchaudio 2.2.0.dev20240509 documentation
      • CUDA C++ Programming Guide
      • CUDA Cores vs. Tensor Cores – Which One is Right for Machine Learning
      • CUDA semantics — PyTorch 2.6 documentation
      • Cunningham's Law - Meta
      • Cython - an overview — Cython 3.0.11 documentation
      • D Mixed Precision Training Difference between BF16 and FP16 rMachineLearning
      • D-Wave Systems - Wikipedia
      • DALL·E Mega - Training Journal dalle-mini – Weights & Biases
      • Daniel McNamee Champalimaud Foundation
      • Daring Fireball: Markdown Basics
      • Daring Fireball: Markdown Syntax Documentation
      • Dario Amodei — Machines of Loving Grace
      • Dario Amodei — On DeepSeek and Export Controls
      • Data files Configuration
      • Data for A.I. Training Is Disappearing Fast, Study Shows - The New Yo

      • Data Science Academic Engagement Programs Fellowships and Grants
      • Data Science Ph.D. Fellowship Bloomberg LP
      • Data type - Wikipedia
      • Data-rate units - Wikipedia
      • DataCrunch wants to be Europe's first AI cloud hyperscaler — powered by renewable energy TechCrunch
      • Dataset features
      • Datasets đŸ€ Arrow
      • DataStructures-Algorithms This repo contains links of resources, theory subjects content and DSA questions & their solution for interview preparation from different websites like geeksforgeeks, leetcode, etc.
      • Dates and Venues – ACL Rolling Review – An initiative of the Association for Computational Linguistics
      • davepeck.org
      • dB: What is a decibel?
      • DeafVIDEO.TV - ASL Videos & Vlogs - Sign Language Entertainment
      • Declarative programming - Wikipedia
      • Decoded GNU coreutils – MaiZure's Projects
      • Deep dive conda init and activate — conda 4.13.0 documentation
      • Deep Implicit Layers - Neural ODEs, Deep Equilibirum Models, and Beyond
      • Deep learning transcends the edges of our imagination
      • Deep Neural Nets: 33 years ago and 33 years from now
      • DeepSeek advances could heighten safety risk, says ‘godfather’ of AI
      • DEF CON 32 - Inside the FBI’s Secret Encrypted Phone Company ‘Anom’ - Joseph Cox - YouTube
      • Demoscene - Wikipedia
      • Dependency injection - Wikipedia
      • Dependency Resolution - pip documentation v24.2
      • Derivation of the Least Squares Estimator for Beta in Matrix Notation Economic Theory Blog
      • Derivation of the Least Squares Estimator for Beta in Matrix Notation – Proof Nr. 1 Economic Theory Blog
      • Designated Market Maker (DMM): Definition, NYSE Role, Vs. Broker
      • Designing and Interpreting Probes · John Hewitt
      • DevOps observability What is it and how to implement it
      • Difference Between Makefile.am and Makefile.in Baeldung on Linux
      • Difference between single and double square brackets in Bash - BashFAQ031 - Greg's Wiki
      • Differences Between Single and Double Brackets in Bash Baeldung on Linux
      • Diffusion Meets Flow Matching
      • Digital hygiene
      • Digital object identifier - Wikipedia
      • Digital Signature Algorithm - Wikipedia
      • dill package documentation — dill 0.4.1.dev0 documentation
      • Dimensionality Reduction via JL Lemma and Random Projection
      • Disabling and Enabling System Integrity Protection
      • Discovering novel algorithms with AlphaTensor
      • Discrete logarithm - Wikipedia
      • Disk Usage Guidelines for SARDINE Servers · deep-spinwiki Wiki
      • Distributed communication package - torch.distributed — PyTorch 2.6 documentation
      • do any odd perfect numbers exist?
      • docker buildx build Docker Docs
      • Docker Tips Install Package from a Private Git Repository - Siv Scripts
      • Dockerfile reference Docker Docs
      • Does Goodreads support the use of APIs?
      • Doesn't the Deref trait go against everything Rust stands for?
      • Doing RAG Vector search is not enough
      • Domain Name System - Wikipedia
      • dotAI 2024 - Neil Zeghidour - Multimodal language models - YouTube
      • Double-Dot “..” vs. Triple-Dot “
” in Git Commit Ranges | Baeldung on Ops
      • DOW 30
      • Download Llama
      • Download Llama - Meta Llama 2 Community License Agreement
      • Download Llama - Terms and Conditions - Meta Llama 3 Community License Agreement
      • Download Llama 3.2
      • DSA Hash Tables
      • DSA Stacks
      • Duality (optimization) - Wikipedia
      • Duck typing - Wikipedia
      • Dusting attack - Wikipedia
      • Dynamic Sparsity in Machine Learning NeurIPS 2024 Tutorial
      • Dynamic time warping - Wikipedia
      • earth a global map of wind, weather, and ocean conditions
      • East West Street by Philippe Sands review – putting genocide into words Biography books The Guardian
      • EdDSA - Wikipedia
      • Edinburgh MSc Speech & Language Processing
      • Edinburgh University to seek ÂŁ140m in savings
      • EDVAC - Wikipedia
      • Einops
      • Elaborative encoding - Wikipedia
      • elevated by Rgba & TBC | 4k intro (FullHD 1080p demoscene demo)
      • ElevenLabs Releases New Voice AI Products and Raises $80M Series B
      • ElGamal signature scheme - Wikipedia
      • Elliptic curve - Wikipedia
      • Elliptic Curve Cryptography: a gentle introduction
      • Elliptic Curve Digital Signature Algorithm - Wikipedia
      • Elliptic-curve cryptography - Wikipedia
      • Elliptic-curve Diffie–Hellman - Wikipedia
      • ELLIS Institutes Whitepaper European Lab for Learning & Intelligent Systems
      • Elon Musk attempts hostile takeover of OpenAI

      • Elon Musk has been in regular contact with Putin for two years, says report Elon Musk The Guardian
      • Elon Musk is shredding America’s government as he did Twitter
      • Elon Musk's curious fixation with Britain
      • Embeddings - OpenAI API
      • Emojify
      • Empathic Voice Interface (EVI) — Hume API
      • Empowering innovation: The next generation of the Phi family | Microsoft Azure Blog
      • Emu Video
      • Emu Video and Emu Edit: Our latest generative AI research milestones
      • Encoding of speech in convolutional layers and the brain stem based on language experience Scientific Reports
      • End-to-end delay - Wikipedia
      • End-to-end hardware implementation of Artificial Neural Networks for Edge Computing in Autonomous Vehicles | Hailo-8 Project | Fact Sheet | H2020 | CORDIS | European Commission
      • End-to-End Workflow with torchtune — torchtune 0.3 documentation
      • Energy-based model - Wikipedia
      • Energy-based Models
      • Ensuring AI innovation in Europe Open letter to EU policymakers
      • eSpeak - Wikipedia
      • Ethereum - Wikipedia
      • Euclid's Elements - Wikipedia
      • Euclid's lemma - Wikipedia
      • Euclidean algorithm - Wikipedia
      • EuroHPC Summit 2025 - KRAKÓW
      • European capital greenness evaluation
      • European High-Performance Computing Joint Undertaking - Wikipedia
      • European languages with an evidentiality system?
      • Evaluating speech features with the Minimal-Pair ABX task - Elicit Extraction
      • Evaluation in information retrieval
      • Evaluation measures (information retrieval) - Wikipedia
      • Every Wonder How Base64 Encoding Works to Send Data Over Email?
      • Everything about Distributed Training and Efficient Finetuning Sumanth's Personal Website
      • Everything you need to know about Linux man pages | TechTarget
      • Evidentiality - Wikipedia
      • Exa Web API for AI
      • Exclusive The $2 Per Hour Workers Who Made ChatGPT Safer
      • Experts urge caution over use of Chinese AI DeepSeek
      • Explainable artificial intelligence - Wikipedia
      • Explained Multi-head Attention (Part 1)
      • Explaining Docker Image IDs · Adventures in a Cloud Native Landscape
      • Exploring Massively Multilingual, Massive Neural Machine Translation
      • Exposing the Honey Influencer Scam
      • Extended Euclidean algorithm - Wikipedia
      • Extract, transform, load - Wikipedia
      • Extracting Clear-Text Credentials Directly From Chromium’s Memory
      • Ezra Collective - Wikipedia
      • Facebook, NYU expand available languages for natural language understanding systems
      • facebookwav2vec2-large-960h-lv60-self · Hugging Face
      • Factorial Funds Under The Hood How OpenAI's Sora Model Works
      • Fairness (machine learning) - Wikipedia
      • fairseqexamplesmms at main · facebookresearchfairseq
      • fairseqexampleswav2vecREADME.md at main · facebookresearchfairseq
      • FAQ
      • FAQ LAION
      • Farfetch vs Revolut vs Capgemini - Compare career levels across companies with Levels.fyi
      • Fast and Expressive LLM Inference with RadixAttention and SGLang LMSYS Org
      • Fast Fourier Transforms
      • fast.ai – AdamW and Super-convergence is now the fastest way to train neural nets
      • fast.ai – fast.ai—Making neural nets uncool again
      • fd An Alternative to the Linux find Command Baeldung on Linux
      • Feature Visualization
      • Features Cursor - The AI-first Code Editor
      • Federal Information Processing Standards - Wikipedia
      • Feedzai - Wikipedia
      • Feistel cipher - Wikipedia
      • Fermat's Last Theorem - Wikipedia
      • Festvox CMU_ARCTIC Databases
      • Fiedler Vector -- from Wolfram MathWorld
      • Field (mathematics) - Wikipedia
      • File descriptor - Wikipedia
      • Finding Syntax with Structural Probes · John Hewitt
      • Fine-tuning How-to guides
      • FineWeb decanting the web for the finest text data at scale - a Hugging Face Space by HuggingFaceFW
      • Finite field - Wikipedia
      • Firmware - Wikipedia
      • First Draft of a Report on the EDVAC - Wikipedia
      • FirstKernelPatch - Linux Kernel Newbies
      • Fisher information - Wikipedia
      • Fisher Information Matrix
      • Fisher transformation - Wikipedia
      • Fixes to the ls . operation not permitted error message
      • Fixing DPO but I have a dinner reservation 
 – Kyunghyun Cho
      • Flame malware collision attack explained | MSRC Blog | Microsoft Security Response Center
      • Flash-Decoding for long-context inference PyTorch
      • Flexoki
      • Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation
      • Flow-based Deep Generative Models
      • Flow-based generative model - Wikipedia
      • Floyd–Warshall algorithm - Wikipedia
      • FNV Hash
      • Forced alignment for multilingual data — Torchaudio 2.2.0.dev20240214 documentation
      • Forced Alignment with Wav2Vec2 — Torchaudio 0.10.0 documentation
      • Ford–Fulkerson algorithm - Wikipedia
      • Foreign function interface - Wikipedia
      • Formats · tmuxtmux Wiki
      • Forth (programming language) - Wikipedia
      • Forward secrecy - Wikipedia
      • From Autoencoder to Beta-VAE
      • From Monospace to Duospace In Search of the perfect writing font
      • From PyTorch DDP to Accelerate to Trainer, mastery of distributed training with ease
      • Full list of Booker Prize winners, shortlisted and longlisted authors and their books The Booker Prizes
      • Function (computer programming) - Wikipedia
      • Functional programming - Wikipedia
      • Fundamental theorem of algebra - Wikipedia
      • Fundamental theorem of arithmetic - Wikipedia
      • Fused Softmax — Triton documentation
      • Fuzzing - Wikipedia
      • Galactic algorithm - Wikipedia
      • Garbage collection (computer science) - Wikipedia
      • General Usage · deep-spinwiki Wiki
      • Generalized Language Models Lil'Log
      • Generalized Visual Language Models
      • Generation Alpha - Wikipedia
      • Generations of Garbage Collection
      • Generative Flow Networks - Yoshua Bengio
      • Generic top-level domain - Wikipedia
      • Genie 2 A large-scale foundation world model
      • Geohashing – xkcd
      • Geomatics - Wikipedia
      • Germany’s Far-Right Comeback | NYT Opinion
      • Get started with development Containers in Visual Studio Code
      • Get Started With Tmux - Sunaina Pai
      • Getting silly with C, part (void*)2
      • Getting Started · tmuxtmux Wiki
      • Getting started with Vim The basics Opensource.com
      • Getting to Know the Linux Kernel: A Beginner's Guide - Kelsey Steele & Nischala Yelchuri, Microsoft
      • GH Archive
      • Ghostscript
      • giampaolo/psutil: Cross-platform lib for process and system monitoring in Python
      • Gil Elbaz - Wikipedia
      • Gimbal - Wikipedia
      • Gimbal lock - Wikipedia
      • Git - Git Hooks
      • Git - Rerere
      • Git Large File Storage Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enter
      • Git Reflog - How To Use Git Reflog W3Docs Online Git Tutorial
      • Git Refs: What You Need to Know | Atlassian Git Tutorial
      • Git submodule Atlassian
      • GitHub Copilot in VS Code cheat sheet
      • GitHub Copilot overview
      • glibc - Wikipedia
      • Glossary - HuggingFace Tokenizers
      • Glossary - Python Packaging User Guide
      • Glossary - Teach Me Audio
      • Glossary — Python 3.12.4 documentation
      • GLUE Benchmark
      • Go smol or go home | Harm de Vries
      • Godescalc Evangelistary - Wikipedia
      • GĂłnĂŽ Tmutul Building A House Of Stories on Vimeo
      • GoodCourse: The TikTok style corporate training platform | Y Combinator
      • Google AI PaLM 2 – Google AI
      • Google Announces 200M Parameter AI Forecasting Model TimesFM - InfoQ
      • Google Password Manager vs. 1Password r1Password
      • Google Search Is Dying
      • google-researchtuning_playbook A playbook for systematically maximizing the performance of deep learning models.
      • Google’s AI ‘learning companion’ takes chatbot answers a step further
      • Google's Illuminate vs. NotebookLM - Making Technical Papers Accessible - IzzyBreezyLife
      • GPT-4 architecture, datasets, costs and more leaked
      • GPT-4o mini advancing cost-efficient intelligence OpenAI
      • GPT-5 Everything You Need to Know - by Alberto Romero
      • GPU Architecture Explained | Cherry Servers
      • Gram-Schmidt process
      • Grammelot - Wikipedia
      • graydon2 | The Rust I Wanted Had No Future
      • Greenlandic language - Wikipedia
      • Grigori Perelman - Wikipedia
      • Grokking Diffusion Models – Non_Interactive – Software & ML
      • Group (mathematics) - Wikipedia
      • Guide to Expectation Maximization Algorithm Built In
      • Gumbel Softmax Loss Function Guide + How to Implement it in PyTorch
      • Gumbel-Softmax - Niansong Zhang
      • Hacker's guide to Neural Networks
      • HAI at Five Conference Godmothers of AI
      • Hamilton–Jacobi–Bellman equation - Wikipedia
      • Hamiltonian mechanics - Wikipedia
      • Handbook of Markov Chain Monte Carlo
      • Handle (computing) - Wikipedia
      • Harold W. Kuhn - Wikipedia
      • Has anyone found a good way to replicate Notion's databases in Obsidian, or offline databases with direct file storage like Obsidian?
      • Has anyone tried TalkPal AI rlearnfrench
      • Hausa language - Wikipedia
      • Heap Data Structure - GeeksforGeeks
      • Heap Data Structure Binary Heap, Time Complexity & Explanation
      • Hello OLMo A truly open LLM. As the world races to deploy AI models
 by AI2 Feb, 2024 AI2 Blog
      • Henri Cartier-Bresson ‱ Photographer Profile ‱ Magnum Photos Magnum Photos
      • Hessian and Curvatures in Machine Learning A Differential-Geometric View
      • Hexagonal Grids
      • HfApi Client
      • Hidden Changes in GPT-4, Uncovered dmicz devblog
      • High-performance self-supervised image classification with contrastive clustering
      • Highlights from Machine Translation and Multilinguality in December 2023 and January 2024 Jindƙich’s blog
      • Highlights from Machine Translation and Multilinguality in February 2024 Jindƙich’s blog
      • Hijacking Safetensors Conversion on Hugging Face HiddenLayer
      • Hilary Woods w Gabriel Ferrandini & Oliver Turvey ⟡ TomĂ© Silva - Galeria ZĂ© dos Bois
      • Hinge loss - Wikipedia
      • History of the International Phonetic Alphabet - Wikipedia
      • Hoisting - MDN Web Docs Glossary: Definitions of Web-related terms | MDN
      • Hölder condition - Wikipedia
      • Holistic Evaluation of Language Models (HELM)
      • Hollywood stars’ estates agree to the use of their voices with AI
      • Home
      • Home - F6S Innovation
      • Homepage — Essentia 2.1-beta6-dev documentation
      • Horizon Europe - European Commission
      • Horizon Europe - Wikipedia
      • How 4chan became the home of the elite reader
      • How a Single Bit Inside Your Processor Shields Your Operating System's Integrity - YouTube
      • How a stubborn computer scientist accidentally launched the deep learning boom
      • How Activation Checkpointing enables scaling up training deep learning models
      • How can a neural network be like the brain?
      • How David Lieb Turned a Failing Startup Into Google Photos | Backstory
      • How Do AI Models Actually Think?
      • How do I create a custom domain email rtechsupport
      • How I Ditched Google Photos and Built My Own Photo Server
      • How I Got a Job at DeepMind as a Research Engineer (without a Machine Learning Degree!)
      • How I learned to code in 3 months (and got several offers) - YouTube
      • How I reduced 90% errors for my Cursor (+ any other AI IDE)
      • How I Used AI to Create a Working Exploit for CVE-2025-32433 Before Public PoCs Existed
      • How Imaginary Numbers Were Invented
      • How is LLaMa.cpp possible
      • How Many Players Compete in the Wimbledon Championships? | Green & Purple
      • How not to build an AI Institute
      • How People Create and Destroy Value with Generative AI BCG
      • How Threads will integrate with the Fediverse – plasticbag.org
      • How to Become a Machine Learning Engineer Complete Career Path Glassdoor
      • How to Change Folder Color on Mac
      • How To Checkout Git Tags – devconnected
      • How to choose a career Prospects.ac.uk
      • How to Contribute to Open Source Open Source Guides
      • How To Crack WEP and WPA Wireless Networks
      • How To Cross-Compile ClangLLVM using ClangLLVM — LLVM 20.0.0git documentation
      • How to Disconnect After Running a nohup Command Over SSH Baeldung on Linux
      • How to Extract (Unzip) Tar Bz2 File
      • How To Get The Most Out Of Vibe Coding | Startup School
      • How to prefetch data when processing with GPU - PyTorch Forums
      • How to reset NVRAM on your Mac
      • How to save memory by fusing the optimizer step into the backward pass — PyTorch Tutorials 2.4.0+cu121 documentation
      • How To Scale Your Model
      • How to Send GET Requests with cURL
      • How to Set Up a Cron Job on Mac
      • How to set up SSH Public-key Authentication to Connect to a Remote Server - SnapShooter Tutorials
      • How to Train Your Robot
      • How to undo (almost) anything with Git
      • How to use `git grep`
      • How to Use Command Line Arguments in a Bash Script Baeldung on Linux
      • How to Use PostgreSQL in Python
      • How to Use the less Command on Linux
      • How to write essays and feel like a spy
      • HowTo100M
      • HuBERT Explained by Miguel Aspis Dev Genius
      • HuBERT Speech representations for recognition & generation
      • Huffman Coding - W3Schools.com
      • Hugging Face Datasets Process
      • Hugging Face Evaluate - A quick tour
      • Hugging Face Transformers Weights & Biases Documentation
      • huggingfacediarizers
      • huggingfacedistil-whisper Distilled variant of Whisper for speech recognition. 6x faster, 50_ smaller, within 1_ word error rate.
      • HuggingFaceM4/idefics2-8b · Hugging Face
      • HUMAN VOICE FREQUENCY RANGE - SEA
      • Hummingbird - Wikipedia
      • Hungarian algorithm - Wikipedia
      • Hunter Biden’s criminal conviction is good for nobody politically
      • I am rich and have no idea what to do with my life
      • I built an entire SaaS to prove that Cursor is the only engineer you need
      • I can now run a GPT-4 class model on my laptop
      • I Investigate Meta’s Secret Plan to Make Zuckerberg Cool
      • I-XRAY - Google Docs
      • iA Writer has three custom made writing fonts that are available for download
      • IBM 801 - Wikipedia
      • Idea List - Ishan's Cafe
      • Ideogram - Wikipedia
      • IEOR E4525 Machine Learning for OR & FE - Martin Haugh
      • If correlation doesn’t imply causation, then what does? – DDI
      • Il cielo in una stanza (album) - Wikipedia
      • Illustrating Reinforcement Learning from Human Feedback (RLHF)
      • Ilya Sutskever full talk at NeurIPS 2024 Vancouver 15122024
      • Imagen on Vertex AI (image Generative AI) overview Google Cloud
      • ImageNet
      • Imgur - Wikipedia
      • Imperative programming - Wikipedia
      • Improving Education Outcomes by Empowering Parents, States, and Communities
      • In “Triangle of Sadness,” the Crudity Is the Point The New Yorker
      • In deep learning, is there no (good) alternative to CUDA available? Isn't it bad to focus only on NVIDIA?
      • Inception Labs
      • Index - Polars user guide
      • India Why a nation of 1.45 billion wants more children
      • India's Got Latent: Ranveer Allahbadia's 'dirty' comments spark massive row in India
      • Inference Mode — PyTorch main documentation
      • InfiniBand - Wikipedia
      • Information-theoretic security - Wikipedia
      • Initializing New Word Embeddings for Pretrained Language Models · John Hewitt
      • Injective function - Wikipedia
      • Inline In Rust
      • Input Sequences - HuggingFace Tokenizers
      • Inside an AI Training for Doctors
      • Inside the U.S. Government-Bought Tool That Can Track Phones at Abortion Clinics
      • Insight. Selection Sort Vs Insertion Sort
      • Instruction pipelining - Wikipedia
      • Instruction set architecture - Wikipedia
      • Intel 8086 - Wikipedia
      • Inter-process communication - Wikipedia
      • Interfaces for Explaining Transformer Language Models – Jay Alammar – Visualizing machine learning one concept at a time.
      • Internal links - Obsidian Help
      • International cooperation with the United Kingdom in research and innovation
      • Internet Control Message Protocol - Wikipedia
      • InternVL2
      • intro(2) - Linux manual page
      • Introducing a foundational multimodal model for speech translation
      • Introducing Ai2 Paper Finder | Ai2
      • Introducing ChatGPT and Whisper APIs
      • Introducing Command R+: A Scalable LLM Built for Business
      • Introducing Command R7B: Fast and efficient generative AI
      • Introducing deep research
      • Introducing EuroBERT: A High-Performance Multilingual Encoder Model
      • Introducing Gemini 2.0 our new AI model for the agentic era
      • Introducing Gemini Google’s most capable AI model yet
      • Introducing hertz-dev - Standard Intelligence
      • Introducing Idefics2: A Powerful 8B Vision-Language Model for the community
      • Introducing Jamba AI21's Groundbreaking SSM-Transformer Model
      • Introducing Llama 3.1 Our most capable models to date
      • Introducing Ludwig, a Code-Free Deep Learning Toolbox
      • Introducing Meta Llama 3 The most capable openly available LLM to date
      • Introducing Operator
      • Introducing Phi-3 Redefining what's possible with SLMs Microsoft Azure Blog
      • Introducing Phi-4 Microsoft’s Newest Small Language Model Specializing in Complex Reasoning Microsoft Community Hub
      • Introducing structured outputs with JSON response format
      • Introducing the Model Context Protocol
      • Introducing the next generation of Claude Anthropic
      • Introducing the Sixth Cohort of Bloomberg Data Science Ph.D. Fellows (2023-2024) Bloomberg LP
      • Introducing Voicebox The first generative AI model for speech to generalize across tasks with state-of-the-art performance
      • Introducing Voicebox The Most Versatile AI for Speech Generation Meta
      • Introducing Whisper
      • Introducing Whisper OpenAI
      • Introduction IBM Plex
      • Introduction W&B Weave
      • Introduction - The best open source AI powered answer engine.
      • Introduction — The Linux Kernel documentation
      • Introduction to EM Gaussian Mixture Models
      • Introduction to fzf command Baeldung on Linux
      • Introduction to ggml
      • Introduction to gRPC gRPC
      • Introduction to Information Retrieval
      • Introduction to K-D Trees Baeldung on Computer Science
      • Introduction to Loco: the “Rust on Rails”
      • Introduction to the Binary Tree Data Structure Baeldung on Computer Science
      • Introduction to the Fourier Transform
      • Intuitive understanding of MFCCs. The mel frequency cepstral coefficients
 by Emmanuel Deruty Medium
      • Invariant (mathematics) - Wikipedia
      • Inversion of Control Containers and the Dependency Injection pattern
      • Investors think the Russia-Ukraine war will end soon
      • Is Google Password Manager Safe in 2024
      • Is Honey a scam? The popular money-saving browser extension touted by YouTubers like MrBeast is accused of ripping off customers and influencers
      • Is Japanese a Tonal Language? | ALTA Language Services
      • Is your master’s degree useless
      • Is your rent ever going to fall?
      • ISBN - Wikipedia
      • ISCA Archive
      • ISO - ISO 639 — Language code
      • ISO - ISO 3166 — Country Codes
      • Israel launches air strike on Beirut's southern suburbs
      • Israeli family mourns 'man of peace' as body returned from Gaza
      • It increasingly looks as if Lucy Letby’s conviction was unsafe
      • It Is Now Legal to Hack McFlurry Machines (and Medical Devices) to Fix Them
      • Italy blocks Gutenberg book publishing website OONI
      • Italy sends first data watchdog request to DeepSeek: 'The data of millions of Italians is at risk' | TechCrunch
      • Itamar Ben-Gvir - Wikipedia
      • ItĂŽ calculus - Wikipedia
      • Ivan Vulić
      • iWantHue
      • J. W. J. Williams - Wikipedia
      • Jack Parker-Holder
      • James Shore: Dependency Injection Demystified
      • Jane Street Real-Time Market Data Forecasting Kaggle
      • Joan Daemen - Wikipedia
      • Jobs are changing, so should education, Royal Society (2019) - MEI
      • Joe Biden abused a medieval power to pardon his son
      • John Gruber - Wikipedia
      • John Nash (architect) - Wikipedia
      • JOREK non-linear MHD Code
      • Joseph Paxton - Wikipedia
      • Josh Meyer's Website
      • Journal Club
      • Jupyter Notebook Example
      • Just Be Bored, and You'll Level Up - YouTube
      • k-d tree - Wikipedia
      • K-factor (marketing) - Wikipedia
      • K-PAX - Wikipedia
      • Kagi Small Web | Kagi Blog
      • Kaldi Kaldi
      • Kaldi The build process (how Kaldi is compiled)
      • Karush–Kuhn–Tucker conditions - Wikipedia
      • KaTeX – The fastest math typesetting library for the web
      • Katz's back-off model - Wikipedia
      • Kayaker swallowed by whale recalls feeling 'slimy texture' in its mouth
      • Kennedy Scholarship - Wikipedia
      • Kerckhoffs's principle - Wikipedia
      • Keyboard shortcut to jump between words in iTerm2 - Upendar Gareri - Medium
      • Khmer Empire - Wikipedia
      • Khmer language - Wikipedia
      • Klyne - Water Flow (Live Session)
      • KMAC: KECCAK Message Authentication Code
      • Kurtosis - Wikipedia
      • Kyutai Open Sources Moshi A Real-Time Native Multimodal Foundation AI Model that can Listen and Speak - MarkTechPost
      • Laion coco 600M synthetic captions from Laion2B-en LAION
      • LAION2B Dataset
      • Lakh - Wikipedia
      • Lambda calculus - Wikipedia
      • LamĂ©'s theorem - Wikipedia
      • Language models for information retrieval
      • Languages of India - Wikipedia
      • Languages of Sudan - Wikipedia
      • Lanyrd - Wikipedia
      • LAPD Publishes Crime Footage It Got From a Waymo Driverless Car
      • Lara Launch - The Power of Languages - Translated
      • Large language models aren't trained enough.
      • Large Transformer Model Inference Optimization
      • Large Transformer Model Inference Optimization Lil'Log
      • Large-scale neurophysiology and single-cell profiling in human neuroscience Nature
      • Latency vs Throughput vs Bandwidth: Unraveling the Complexities of Network Speed
      • Latent semantic analysis - Wikipedia
      • Layer Normalization Explained Papers With Code
      • LayerNorm — PyTorch 2.4 documentation
      • Layoffs.fyi appears to show the tide is turning in the UK rcscareerquestionsuk
      • Lazy evaluation - Wikipedia
      • Le Frecce - Wikipedia
      • Lead Machine Learning Scientist, Personalisation
      • Learning Discrete Latent Structure
      • Learning Resources for pytest | The PyCharm Blog
      • Learning Word Embedding
      • Lecture Notes | ConLangs: How to Construct a Language | Linguistics and Philosophy | MIT OpenCourseWare
      • Lee Kuan Yew - Wikipedia
      • Lernapparat - Machine Learning
      • Lessons from the happiest countries in the world
      • Letizia Battaglia: Life, Love and Death in Sicily
      • Levels of Processing model - Wikipedia
      • Lexman Artificial Podcast
      • LibGuides: Wolfson College Academic Skills: Speed reading
      • Libraries
      • Libraries
      • Libri-light
      • libsndfile
      • License Unsplash
      • Lie group - Wikipedia
      • Life of Pi - Wikipedia
      • Linear Algebra and Matrix Decompositions — Computational Statistics in Python 0.1 documentation
      • Linux Kernel Teaching — The Linux Kernel documentation
      • Linux Tutorial - Static, Shared Dynamic and Loadable Linux Libraries
      • Lipschitz continuity - Wikipedia
      • Liquid Foundation Models: Our First Series of Generative AI Models
      • Lisbon for Runners A Guide to Running in Lisbon - Portugalist
      • List 100 - Huyen Chip
      • List of build automation software - Wikipedia
      • List of Datasets for Automatic Speech Recognition (ASR) and Text To Speech Synthesis (TTS)
      • List of films in the public domain in the United States - Wikipedia
      • List Parquet files
      • Live Music In London, Karaoke Colours Nightclub
      • LLaMA 1 vs LLaMA 2 A Deep Dive into Meta’s LLMs
      • Llama 3 Model Cards and Prompt formats
      • Llama 3.2 Model Cards and Prompt formats
      • Llama 3.2 Acceptable Use Policy
      • Llama 3.2 Revolutionizing edge AI and vision with open, customizable models
      • LLaMA Now Goes Faster on CPUs
      • llama-modelsmodelsllama3_2MODEL_CARD.md at main · meta-llamallama-models
      • LlamaIndex - LlamaIndex
      • llamaMODEL_CARD.md at main · meta-llamallama
      • LLM Inference Series 3. KV caching explained by Pierre Lienhart Medium
      • LLM Inference Series 4. KV caching, a deeper look by Pierre Lienhart Medium
      • LLM Parameter Counting kipply's blog
      • LLM.int8() and Emergent Features — Tim Dettmers
      • LLMs as Graph Neural Networks | Petar Veličković @ GLOW
      • LMSys Chatbot Arena Leaderboard - a Hugging Face Space by lmsys
      • Load
      • Load a dataset from the Hub
      • Load a dataset from the Hub
      • Load–store architecture - Wikipedia
      • Lock (computer science) - Wikipedia
      • Locked-Image Tuning: Adding Language Understanding to Image Models
      • Logan Kilpatrick
      • Logistic regression - Wikipedia
      • Logogram - Wikipedia
      • London's #1 Jazz, Funk & Soul Festival
      • LORA(Low Rank Adaptation) A Deeper Dive Rajan Ghimire
      • Lord Kelvin - Wikipedia
      • Losing it Film The Guardian
      • Loss functions for classification - Wikipedia
      • LxMLS 2024 - The 14th Lisbon Machine Learning Summer School
      • Lyapunov function - Wikipedia
      • m4 (computer language) - Wikipedia
      • MAC and Key Derivation | Practical Cryptography for Developers
      • Mac transition to Intel processors - Wikipedia
      • Mach-O - Wikipedia
      • Machine Bias
      • Machine Learning Engineer Career Guide (2024) by Careervira Medium
      • Machine Learning LLMVLM Training and Engineering by by Stas Bekman
      • Machine Learning Systems
      • Machines of Caring Grace - Boston Review
      • macOS 15
      • macOS: How to run your Applications in a Mac OS X sandbox to enhance security
      • Macros and its types in C - GeeksforGeeks
      • Main classes
      • Make Beautiful Desktop Applications in C++
      • Make your own GUI apps in C++ (with ImGui and Vulkan) - YouTube
      • Maker's Schedule, Manager's Schedule
      • Making Flow – Interview with director Gints Zilbalodis
      • Making Sense of Hexdump SUSE Communities
      • MAL software saved “Revolver” mix – The Daily Beatle
      • Mamba - a replacement for Transformers - YouTube
      • Managing ArXiv RSS Feeds in Emacs Chris Cundy
      • Mandatory Premarital HIV Testing Political Exploitation of the AIDS Epidemic — Tulane Law Review
      • Mandela Effect: Examples and explanation
      • Manifold -- from Wolfram MathWorld
      • Maps in C++ (std::map and std::unordered_map) - YouTube
      • Marc Andreessen - Wikipedia
      • Marching cubes - Wikipedia
      • Market Maker Definition: What It Means and How They Make Money
      • Mass X-odus professionals desert Elon Musk’s network
      • Matching CUDA arch and CUDA gencode for various NVIDIA architectures - Arnon Shimoni
      • Math Behind CNNs for Image Processing | Svitla Systems
      • Mathematics for the adventurous self-learner
      • MathÎŁtral Mistral AI Frontier AI in your hands
      • matrices - How to rotate the positions of a matrix by 90 degrees - Mathematics Stack Exchange
      • Matrix decomposition - Wikipedia
      • Matrix decompositions and latent semantic indexing
      • mattdesl
      • Max-Heapify A Binary Tree Baeldung on Computer Science
      • Maximizing training throughput using PyTorch FSDP PyTorch
      • Maximum cut and related problems - Proofs, beliefs and algorithms through the lens of Sum of Squares
      • Maximum subarray problem - Wikipedia
      • MBROLA - Wikipedia
      • MCP Security Notification: Tool Poisoning Attacks
      • Measuring perception in AI models
      • Media type - Wikipedia
      • MediaPipe Holistic — Simultaneous Face, Hand and Pose Prediction, on Device – Google Research Blog
      • Medical Algorithms Are Failing Communities Of Color Health Affairs
      • Medusa Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads — Together AI
      • Medusa: Simple framework for accelerating LLM generation with multiple decoding heads
      • Meet Your New Assistant Meta AI, Built With Llama 3 Meta
      • Mel Frequency Cepstral Coefficient (MFCC) tutorial - Practical Cryptography
      • Memory-mapped files | .NET
      • Merkle–DamgĂ„rd construction - Wikipedia
      • Message Passing Interface - Wikipedia
      • Message Passing Interface :: High Performance Computing
      • Meta AI Research Topic - No Language Left Behind
      • Meta fires staff for abusing $25 meal credits
      • Meta is getting ready for post-quantum cryptography - Engineering at Meta
      • Meta lays off employees across multiple teams TechCrunch
      • Meta PyTorch Team 2024 H2 Roadmaps - PyTorch Developer Mailing List
      • Meta Rolls Out Multimodal Llama 3.2 — But Not in Europe - Slator
      • Meta won't bring future multimodal AI models to EU
      • meta-llamaLlama-3.2-11B-Vision · Why EXACTLY this model is not available in Europe
      • Meteor Lake - Wikipedia
      • MichaƂ Zalewski - Wikipedia
      • Microsoft joins OpenAI’s board with Sam Altman officially back as CEO - The Verge
      • Microsoft TypeScript Devs Explain Why They Chose Go Over Rust, C#
      • Microsoft’s new chip looks like science fiction

      • microsoft/Phi-4-multimodal-instruct · Hugging Face
      • MIME types (IANA media types) - HTTP MDN
      • Min Heap in Python - GeeksforGeeks
      • Minerva Solving Quantitative Reasoning Problems with Language Models
      • MIPS architecture - Wikipedia
      • Mistral NeMo Mistral AI Frontier AI in your hands
      • MIT 6.S091 Introduction to Deep Reinforcement Learning (Deep RL) - YouTube
      • MIT CSAIL Spoken Language Systems Group - Publications
      • Mitchell and Webb at the Footlights 1995 - YouTube
      • Mixed-Precision Training of Deep Neural Networks | NVIDIA Technical Blog
      • Mixtral of experts Mistral AI Open-weight models
      • Mixture of Experts Explained
      • MLOps Basics Week 3 Data Version Control - DVC – Raviraja's Blog
      • MLOps Basics Week 4 Model Packaging - ONNX – Raviraja's Blog
      • MLOps Basics Week 6 CICD - GitHub Actions – Raviraja's Blog
      • MLOps Basics Week 7 Container Registry - AWS ECR – Raviraja's Blog
      • MLOps guide
      • mmap — Memory-mapped file support — Python 3.12.7 documentation
      • Mnemonic - Wikipedia
      • Model Context Protocol has prompt injection security problems
      • Model Spec (2024/05/08)
      • Models and libraries - Meta AI
      • Modular exponentiation - Wikipedia
      • Modular form - Wikipedia
      • Modular programming - Wikipedia
      • Modularity theorem - Wikipedia
      • Monk S8 Mr Monk Says Goodbye
      • Monoid - Wikipedia
      • Monthy online ILFC Seminar | RĂ©seau thĂ©matique LIFT 2
      • Monticello - Wikipedia
      • Morphological typology - Wikipedia
      • MosaicBERT Pretraining BERT from Scratch for $20 Databricks Blog
      • moshi.chat
      • Motivation & Vision - Thorsten Voice
      • Movie, Release date between 1993-01-23 and 2024-08-21, Number of votes at least 5000 (Sorted by User rating Descending)
      • moviebarcode
      • Mozilla Foundation - Training Data for the Price of a Sandwich
      • MQM (Multidimensional Quality Metrics) – The place to go to learn about MQM
      • Multi node PyTorch Distributed Training Guide For People In A Hurry
      • Multimodal Mastery The Qwen Audio Foundation Models for Advanced Audio Understanding and Reasoning by Deepak Babu P R Medium
      • Multimodal Mastery: The Qwen Audio Foundation Models for Advanced Audio Understanding and Reasoning
      • Multimodal Neurons in Artificial Neural Networks
      • MultiNLI
      • Multiprocessing VS Threading VS AsyncIO in Python - Lei Mao's Log Book
      • Music Transformer: Generating Music with Long-Term Structure
      • musictheory.net - Lessons
      • Musk Inc is under serious threat
      • MuST-C a multilingual corpus for speech translation by Mattia Di Gangi Machine Translation @ FBK Medium
      • Mutable vs Immutable Objects - ChatGPT
      • Mutual exclusion - Wikipedia
      • MVC - MDN Web Docs Glossary: Definitions of Web-related terms | MDN
      • My deep learning rig – Non_Interactive – Software & ML
      • My French colleague used the work finitions. What could he be mistranslating We speak Italian as well, so consider mistranslations from Italian too
      • Named entity recognition NLP-progress
      • Named entity recognition with Bert
      • National Security Agency/Central Security Service Web Site
      • Navigating the Challenges and Opportunities of Synthetic Voices
      • Nearly-Optimal Mergesorts Fast, Practical Sorting Methods That Optimally Adapt to Existing Runs
      • Neocities
      • Nerd Fonts - Iconic font aggregator, glyphsicons collection, & fonts patcher
      • Network
      • Network socket - Wikipedia
      • Neural Audio Codecs & (Residual) Vector Quantization | Francesco Cariaggi
      • Neural encoding of sound - Wikipedia
      • NeurIPS SAS 2020
      • Neuroverse
      • New embedding models and API updates OpenAI
      • New LLM Pre-training and Post-training Paradigms
      • New open source field of study classifier S2FOS AI2 Blog
      • newb question: why is println! a macro?
      • NeXT: Improved reasoning, OCR, and world knowledge
      • Nick Bostrom - Wikipedia
      • Nick Szabo - Wikipedia
      • Nigel Farage distances himself from Elon Musk on Tommy Robinson
      • NIGHTMARE ON ELM DRIVE Vanity Fair October 1990
      • Nike + Run Club Lisboa – NiT
      • Ninja, a small build system with a focus on speed
      • NLP From Scratch: Classifying Names with a Character-Level RNN — PyTorch Tutorials 2.6.0+cu124 documentation
      • NLP’s word2vec Negative Sampling Explained Baeldung on Computer Science
      • NLTK Sample usage for wordnet
      • Noisy speech database for training speech enhancement algorithms and TTS models
      • Norberto Lobo Trio - Galeria ZĂ© dos Bois
      • Not so fast, Mr. Fourier!
      • Now and Then (Beatles song) - Wikipedia
      • NP-hardness - Wikipedia
      • Null References: The Billion Dollar Mistake
      • NVIDIA CUDA Compiler Driver
      • Nvidia is in danger of losing its monopoly-like margins
      • NVIDIA RTX 3090 vs RTX A6000 Consumer vs. Professional
      • Nvidia-backed CoreWeave picks up $650 million credit line
      • NYU Computer Science Department
      • Observed information - Wikipedia
      • Obsidian Roadmap
      • Obsidian.md: The Good Parts [Part 2]
      • Odia language - Wikipedia
      • Official Top 250 Narrative Feature Films
      • Oh Shit, Git!?!
      • Okapi BM25 - Wikipedia
      • OLMo Open Language Model. A State-Of-The-Art, Truly Open LLM and
 by AI2 Feb, 2024 AI2 Blog
      • OmniParser V2: Turning Any LLM into a Computer Use Agent - Microsoft Research
      • On undoing, fixing, or removing commits in git
      • On-Demand Content on Twitch
      • One-time pad - Wikipedia
      • One-way compression function - Wikipedia
      • One-way function - Wikipedia
      • Online Safety Act: explainer
      • Opaque data type - Wikipedia
      • Open-source DeepResearch – Freeing our search agents
      • OpenAI gets $4 billion revolving credit line on top of latest funding
      • OpenAI Introduced Chat Markup Language (ChatML) Based Input To Non-Chat Modes by Cobus Greyling Medium
      • OpenAI Platform
      • OpenAI Platform
      • OpenAI raises at $157 billion valuation; Microsoft, Nvidia join round
      • OpenAI reportedly developing new strategies to deal with AI improvement slowdown TechCrunch
      • OpenAI wants to make a walking, talking humanoid robot smarter Popular Science
      • OpenAI's board approached Anthropic CEO about top job and merger Reuters
      • openaiwhisper-large-v3 · Hugging Face
      • OpenStax | Free Textbooks Online with No Catch
      • Operating system - Wikipedia
      • Operation -- from Wolfram MathWorld
      • Operation (mathematics) - Wikipedia
      • Optimizing AI Inference at Character.AI
      • Optimizing builds with cache management Docker Docs
      • Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch
      • Option & unwrap - Rust By Example
      • Opus (audio format) - Wikipedia
      • Oracle Solaris - Wikipedia
      • Orthonormal Basis -- from Wolfram MathWorld
      • OSINT Framework
      • ƍtoro.net
      • Our next generation Meta Training and Inference Accelerator
      • Over 1.5 TB’s of Labeled Audio Datasets by Christopher Dossman Towards Data Science
      • OverOptimization | RLHF Book by Nathan Lambert
      • Overview
      • Overview of Pi-hole - Pi-hole documentation
      • P versus NP problem - Wikipedia
      • p-adic number - Wikipedia
      • P.862.2 Wideband extension to Recommendation P.862 for the assessment of wideband telephone networks and speech codecs
      • Package Formats - Python Packaging User Guide
      • Paolo Sorrentino - Wikipedia
      • Paper review Hyena Hierarchy Towards Larger Convolutional Language Models by Andrew Lukyanenko Medium
      • Papers Explained 166: Command R Models
      • Papers with Code - Focal Loss Explained
      • Parallel Thread Execution - Wikipedia
      • Password Hashing Competition
      • Password Managers.
      • Pasta alla genovese
      • Pasta alla Genovese (Pasta With Neapolitan Beef and Onion RagĂč)
      • Patronus AI Introducing CopyrightCatcher, the first Copyright Detection API for LLMs
      • PEP 409 – Suppressing exception context peps.python.org
      • PEP 508 – Dependency specification for Python Software Packages peps.python.org
      • PEP 3104 – Access to Names in Outer Scopes peps.python.org
      • PEP 3134 – Exception Chaining and Embedded Tracebacks peps.python.org
      • Per l’immediato ripristino dell’accesso a Project Gutenberg - AIB WEB
      • Percent-encoding - Wikipedia
      • Performance and Scalability How To Fit a Bigger Model and Train It Faster
      • Performance per watt - Wikipedia
      • Perlin Noise: A Procedural Generation Algorithm
      • PhD Students - InDeep - ILLC UvA
      • Phi-2 The surprising power of small language models
      • Phil Woodland Department of Engineering
      • Phone (phonetics) - Wikipedia
      • Phone bans in schools don't help grades or health, study suggests
      • Phonemic vs Phonetic Transcription | Phonetics
      • Phonetics - Wikipedia
      • Phonetics vs. Phonology
      • Photos — kennethreitz.org
      • Picterra - Geospatial AI solutions for a sustainable future
      • Pier Giacomo Castiglioni - Wikipedia
      • Pigz – Compress And Decompress Files In Parallel In Linux
      • Ping (networking utility) - Wikipedia
      • Pipelines & Prompt Optimization with DSPy Drew Breunig
      • Pipelining
      • pipx
      • Pitch accent (intonation) - Wikipedia
      • Pitch-accent language - Wikipedia
      • Play and Record Sound with Python — python-sounddevice, version 0.5.1
      • Pleias
      • Podcast | Neuroverse
      • PoincarĂ© conjecture - Wikipedia
      • Pointer (computer programming) - Wikipedia
      • Polars vs. pandas: What’s the Difference? | The PyCharm Blog
      • Policy Gradient Algorithms
      • Polymathic
      • PortAudio - an Open-Source Cross-Platform Audio API
      • Postgres.app – the easiest way to get started with PostgreSQL on the Mac
      • Pre-trained models for text-to-speech - Hugging Face Audio Course
      • Premarital medical examination - Wikipedia
      • Prepping for post-quantum: a beginner’s guide to lattice cryptography
      • Primality test - Wikipedia
      • Privacy in Statistics and Machine Learning - Adam Smith
      • Private use area (PUA) characters and End-user-defined characters (EUDCs)
      • Problems by Year
      • Procedural Knowledge in Pretraining Drives LLM Reasoning Laura’s AI research blog
      • Processing and narrating a video with GPT's visual capabilities and the TTS API | OpenAI Cookbook
      • Processor register - Wikipedia
      • Programming paradigm - Wikipedia
      • Proofs, beliefs and algorithms through the lens of Sum of Squares
      • Propagation delay - Wikipedia
      • Protocol Buffers Documentation
      • Publications Hosein Mohebbi
      • Purple Llama CyberSecEval A benchmark for evaluating the cybersecurity risks of large language models Research - AI at Meta
      • Purple prose - Wikipedia
      • Pushing the frontiers of audio generation - Google DeepMind
      • Putting the "You" in CPU
      • pyannotepyannote-audio Neural building blocks for speaker diarization speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
      • Pyre | Pyre
      • Python __sizeof__() Method – Be on the Right Side of Change
      • python - AdamW and Adam with weight decay - Stack Overflow
      • Python Dictionaries are Ordered now, but how
and why by Pavan Skipo Junior Dev Medium
      • Python Generated Code Guide Protocol Buffers Documentation
      • Python internals Arbitrary-precision integer implementation Artem Golubin
      • Python JSON load() and loads() for JSON Parsing
      • Python Linked List - GeeksforGeeks
      • Python Object Graphs — objgraph 3.6.2 documentation
      • PyTorch internals ezyang’s blog
      • PyTorch internals : ezyang’s blog
      • PyTorch Native Architecture Optimization torchao PyTorch
      • Q-Former. The ability to seamlessly integrate and
 by Abdulkader Helwan Dec, 2023 Medium
      • Q-Former. The ability to seamlessly integrate and
 by Abdulkader Helwan Medium
      • Qing dynasty - Wikipedia
      • Quechuan languages - Wikipedia
      • Quelli che sottotitolavano “Lost” in Italia
      • rabbit failed to properly reset all keys emails can be sent from rabbit.tech domains
      • RAID - Wikipedia
      • RAII - cppreference.com
      • RAII - Rust By Example
      • Random number generation
      • Random subspace method - Wikipedia
      • Rank (linear algebra) - Wikipedia
      • Ranking All 108 GNULinux Coreutils Commands - GNU Coreutils Tier List - YouTube
      • Ray Tracing in One Weekend
      • Read Inside The Python Virtual Machine
      • Read Intermediate Python
      • Read Spotify (SPOT) CEO Daniel Ek's full memo on latest layoffs
      • Real-Time Messaging Protocol - Wikipedia
      • realis - Wiktionary, the free dictionary
      • Rebus - Wikipedia
      • Red Blob Games Hexagonal Grids
      • Redpajama-Data-v2 is Incredible rLocalLLaMA
      • RedTeam Arena
      • Reduced instruction set computer - Wikipedia
      • Refactoring - Dive Into Python 3
      • Regex engine internals as a library - Andrew Gallant's Blog
      • Regex Tutorial - POSIX Bracket Expressions
      • Regex Tutorial - Unicode Characters and Properties
      • Register-transfer level - Wikipedia
      • Rejection Sampling
      • Relevance in keyword search (BM25 scoring)
      • Repository limitations and recommendations
      • Republic of Rose Island - Wikipedia
      • Republic of Venice - Wikipedia
      • Requirements File Format - pip documentation v23.3.1
      • Research Papers in January 2024
      • Researchers Prove Rabbit AI Breach By Sending Email to Us as Admin
      • Resource acquisition is initialization - Wikipedia
      • Retrieval Augmented Generation Streamlining the creation of intelligent natural language processing models
      • Reverse Engineering TicketMaster's Rotating Barcodes (SafeTix)
      • Revisiting Feature Prediction for Learning Visual Representations from Video Research - AI at Meta
      • Richard E. Bellman - Wikipedia
      • Richard Stallman - Wikipedia
      • Riemann hypothesis - Wikipedia
      • Right to Left (R2L) Integer Tokenization
      • Ring (mathematics) - Wikipedia
      • RIPEMD - Wikipedia
      • ripgrep is faster than {grep, ag, git grep, ucg, pt, sift} - Andrew Gallant's Blog
      • RISC-V - Wikipedia
      • RLHF Reinforcement Learning from Human Feedback
      • ROCStories and the Story Cloze Test
      • ROS: Home
      • Round-trip delay - Wikipedia
      • Royal Game of Ur - Wikipedia
      • RPC vs REST - Difference Between API Architectures - AWS
      • RSA (cryptosystem) - Wikipedia
      • RSA vs ECC: which one is better, and why?
      • RSA: a simple and easy-to-read implementation « Python recipes « ActiveState Code
      • rsrch space
      • RTX A6000 vs RTX 3090 Deep Learning Benchmarks Lambda
      • Rubik's Cube group - Wikipedia
      • Run-length encoding - Wikipedia
      • Runes - Wikipedia
      • Runge-Kutta method — ESE Jupyter Material
      • Rust for ML?
      • RWKV Open Source Development Blog Substack
      • Sadhika Malladi
      • Sakana AI
      • Salary Benchmarking Carta
      • Sam Altman explains being fired and rehired by OpenAI - The Verge
      • Sampling for Text Generation
      • Sarmad Masud - Curtis Brown
      • Satoshi Nakamoto - Wikipedia
      • Scaling ChatGPT Five Real-World Engineering Challenges
      • Scaling Laws – O1 Pro Architecture, Reasoning Training Infrastructure, Orion and Claude 3.5 Opus “Failures”
      • Scaling Monosemanticity Extracting Interpretable Features from Claude 3 Sonnet
      • Scalpers Reverse-Engineer Ticketmaster's 'Non-Transferrable' Tickets
      • Schema (psychology) - Wikipedia
      • Schnorr signature - Wikipedia
      • Science’s genius complex Dirk Hovy
      • Scientists on Bluesky - Influential Members of the Science Community
      • Scientists on Bluesky - What is this
      • Scope (C++)
      • Scriptio continua - Wikipedia
      • Searchable Linux Syscall Table for x86_64
      • SeemlessM4T - Introducing a foundational multimodal model for speech translation
      • Selection (linguistics) - Wikipedia
      • Selection Sort Algorithm - GeeksforGeeks
      • Self-Supervised Representation Learning Lil'Log
      • Semantic Scholar - Academic Graph API
      • Semantic security - Wikipedia
      • SemCor – sense-tagged English corpus Sketch Engine
      • SemEval-2007
      • Senator Hawley Proposes Jail Time for People Who Download DeepSeek
      • Senior Data Scientist
      • Senior Decision Scientist
      • SentencePiece Python binding structure - Codeium Chat - fcQnqWJZdoODeNAk78jYFqALIsPDcY20
      • SentencePiece README
      • Sentinel value - Wikipedia
      • Sequence Modeling with CTC
      • Series Funding A, B, and C
      • Sha (Cyrillic) - Wikipedia
      • SHA-1 - Wikipedia
      • SHA-2 - Wikipedia
      • SHA-3 - Wikipedia
      • Share a dataset to the Hub
      • Shared space · deep-spinwiki Wiki
      • ShareGPT lets you easily share your ChatGPT conversations TechCrunch
      • Sharing new research, models, and datasets from Meta FAIR
      • Sharpened Cosine Distance as an Alternative for Convolutions rpisoni.dev
      • Shell Ninja Mastering the Art of Shell Scripting Roland Huß
      • Should I be leaving my Macbook plugged in at 100_ to ensure battery health rmacbookpro
      • Should I Open Source my Company
      • SHRDLU
      • Sieve of Eratosthenes - Wikipedia
      • Signal (IPC) - Wikipedia
      • Signification de Je disparais dans tes bras par Christine and the Queens
      • Simplified Wrapper and Interface Generator
      • Slurm Usage Guidelines for SARDINE Servers · deep-spinwiki Wiki
      • Slurm Workload Manager - Quality of Service (QOS)
      • Slurm Workload Manager - Quick Start User Guide
      • Slurm Workload Manager - sbatch
      • Small Web
      • Smart contract - Wikipedia
      • Smashing The Stack For Fun And Profit
      • Snooze: a simpler cron
      • SOAP vs REST - Difference Between API Technologies - AWS
      • Sofía Valdés Flaunt Premiere “Little Did I Know”
      • Softmax function - Wikipedia
      • Software rot - Wikipedia
      • SOLID - Wikipedia
      • SolidGoldMagikarp (plus, prompt generation) — LessWrong
      • Solving Least-Squares Regression with Missing Data · Its Neuronal
      • Someday, by Spike Jonze | AirPods 4 with Active Noise Cancellation
      • Something Is Rotten in the State of Cupertino
      • Something weird is happening with LLMs and chess
      • Sonal Sannigrahi
      • Sorting Algorithms Animations ToptalÂź
      • SoundStorm Efficient parallel audio generation – Google Research Blog
      • SoundStream An End-to-End Neural Audio Codec – Google Research Blog
      • Spearman's rank correlation coefficient - Wikipedia
      • Special Builtins (Bash Reference Manual)
      • Special parameters and shell variables [Bash Hackers Wiki]
      • Speculative Sampling Jay Mody
      • Speech disfluency - Wikipedia
      • Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What’s In-Between
      • Speech Research
      • SpeechBrain Open-Source Conversational AI for Everyone
      • Speeding Up gzip Compression Baeldung on Linux
      • Splitting a subfolder out into a new repository - GitHub Docs
      • Spoken Language Modeling - Task 4
      • Spoken Language Modeling - Task 4 - ZeroSpeech
      • Sponge function - Wikipedia
      • Spotify for Creators Terms and Conditions of Use
      • Spotify’s AI Voice Translation Pilot Means Your Favorite Podcasters Might Be Heard in Your Native Language — Spotify
      • Square (cipher) - Wikipedia
      • SSH login without password
      • Stable Code 3B Coding on the Edge — Stability AI
      • Stable Diffusion 3 Research Paper — Stability AI
      • Stanford CRFM
      • Stanford CS236 Deep Generative Models I 2023 I Lecture 11 - Energy Based Models - YouTube
      • Starship Cross-Shell Prompt
      • Starter pack (meme) - Wikipedia
      • Startups to Follow - Ishan's Cafe
      • stas00ml-engineering Machine Learning Engineering Open Book
      • stas00the-art-of-debugging The Art of Debugging
      • State of startup compensation, H2 2023
      • State of the art in Voice Cloning: A review - Marvik
      • Staying safe online with our updated Google Password Manager
      • Steghide
      • Stephen Roberts' Home Page
      • Stippling pictures with Lloyd's algorithm
      • Stop using std::vector wrong
      • STRIVER DSA SHEET DataStructures-Algorithms
      • Stroustrup: C++ Style and Technique FAQ
      • struct — Interpret bytes as packed binary data — Python 3.12.7 documentation
      • Structural equation modeling - Wikipedia
      • StyleTTS2 – open-source Eleven-Labs-quality Text To Speech Hacker News
      • Submission Policy - Interspeech 2024
      • Submissions Transactions of the Association for Computational Linguistics
      • Subword Modeling
      • Sudan rap scene
      • Sum Types Are Coming: What You Should Know
      • Summary of the tokenizers
      • Sums-of-squares for dummies: a view from the Fourier domain – Machine Learning Research Blog
      • Sun Java TV Commercial
      • Sun Microsystems The Spy Java Commercial
      • SUPERB Benchmark
      • SUPERB Benchmark Leaderboard
      • SuperGLUE Benchmark
      • Superintelligence: Futurology vs. Science - Yoshua Bengio
      • Supremum vs Maximum
      • Syllabary - Wikipedia
      • Syncthing
      • Syntactic Structures - Wikipedia
      • SynthID - Google DeepMind
      • syscalls(2) - Linux manual page
      • System resource - Wikipedia
      • T-Shaped People and Academia
      • TaL Corpus - UltraSuite Repository
      • Talkpal Review Our Insider Tips and Verdict 2024
      • Tamasheq language - Wikipedia
      • tar (computing) - Wikipedia
      • Taste the World How Our New Machine Translation Feature Transforms Your Ordering Experience by Ahmad Hamouda & Stefania Russo Medium The Glovo Tech Blog
      • Taumatawhakatangi­hangakoauauotamatea­turipukakapikimaunga­horonukupokaiwhen­uakitanatahu - Wikipedia
      • Tearing Apart Google’s TPU 3.0 AI Coprocessor
      • Template (C++) - Wikipedia
      • Tensor Parallelism
      • Tensor Views — PyTorch 2.3 documentation
      • Termux
      • Termux Wiki Getting Started
      • Terry Winograd - Wikipedia
      • TESCREAL - Wikipedia
      • Tesseract User Manual tessdoc
      • Test harness - Wikipedia
      • Text classification and Naive Bayes
      • Text-to-speech datasets - Hugging Face Audio Course
      • Textbooks - Ishan's Cafe
      • Textless NLP Generating expressive speech from raw audio
      • The "Basics"
      • The “S” in MCP Stands for Security
      • The 1_ of scientific publishing Science AAAS
      • The 21-Year-Old Who Destroyed LeetCode (Then Got Expelled)
      • The Age of PageRank is Over | Kagi Blog
      • The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition - Sakana AI
      • The AI Scientist Towards Fully Automated Open-Ended Scientific Discovery
      • The AI We Deserve - Boston Review
      • The Annotated Transformer
      • The Artificial Intelligence Top 50 UK 2024
      • The AT Protocol Bluesky
      • The Basics - PyMuPDF 1.24.10 documentation
      • The Best GPUs for Deep Learning in 2023 — An In-depth Analysis
      • The best ways to help others with your career, compared
      • The Bitter Lesson
      • The Case for Free Online Books (FOBs) Experiences with Operating Systems Three Easy Pieces From A To RemZi
      • The Case for Pull Rebase
      • The Church-Turing Thesis (Stanford Encyclopedia of Philosophy)
      • The complete beginners guide to dynamic programming - Stack Overflow
      • The Concept So Much of Modern Math is Built On | Compactness
      • The continuing rise in suspensions and exclusions - FFT Education Datalab
      • The Curvature of the Manifold of Gaussian Distributions
      • The Dark Net Jamie Bartlett Talks at Google - YouTube
      • The Design of C++ , lecture by Bjarne Stroustrup
      • The Discovery of Penicillin—New Insights After More Than 75 Years of Clinical Use
      • The Discrete Cosine Transform in Action
      • The dos and don’ts of show art The complete guide
      • The Economist’s country of the year for 2024
      • The Editors Protecting Wikipedia from AI Hoaxes
      • The Enduring Mystery of “Moldy Mary” | Tellus
      • The evidence lower bound (ELBO)
      • The fastest and easiest way to install Ruby on a Mac in 2024
      • The Final Barrier to (Nearly) Infinite Energy
      • The first AI model based on Yann LeCun’s vision for more human-like AI
      • The first-ever multilingual model to win WMT, beating out bilingual models
      • The Fourier Series—A Primer
      • The Fraser Lab Method of Following the Scientific Literature
      • The Future of Web Software Is HTML-over-WebSockets
      • The GFlowNet Tutorial
      • The GFlowNets and Amortized Marginalization Tutorial
      • The Government Knows AGI is Coming | The Ezra Klein Show
      • The government’s 80% employment rate target lessons from history and abroad Institute for Fiscal Studies
      • The great chain of being sure about things | The Economist
      • The Gumbel-Max Trick Explained. Softmax’s slicker sibling. by Leonard Tang The Startup Medium
      • The Gumbel-Softmax Distribution – Emma Benjaminson – Mechanical Engineering Graduate Student
      • The History of the Clothes Hanger
      • The Illustrated Retrieval Transformer – Jay Alammar – Visualizing machine learning one concept at a time.
      • The Illustrated Transformer
      • The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
      • The Introduction Of Chat Markup Language (ChatML) Is Important For A Number Of Reasons
      • The KV Cache Memory Usage in Transformers - YouTube
      • The Leopard - Wikipedia
      • The Letterboxd 2024 Year in Review
      • The Linux Kernel Archives
      • The MAESTRO Dataset
      • The MAESTRO Dataset and Wave2Midi2Wave
      • The Model is the Product | Vintage Data
      • The new home of podcasting on Spotify monetize, get discovered, and stand out with video
      • The Nine Books Sam Altman Recommends Everyone Should Read
      • The Official YAML Web Site
      • The Power of Languages
      • The Problem with Reasoners
      • The Rust RFC process does not seem as amazing as I initially thought
      • The Second Perception Test Challenge - ECCV Workshop 2024
      • The Settlers TV review — Louis Theroux returns to the West Bank in potent BBC documentary
      • The Silence of the Lambs (novel) - Wikipedia
      • The small web is beautiful
      • The Stanford Natural Language Inference (SNLI) Corpus
      • The subprocess Module Wrapping Programs With Python – Real Python
      • The Technology Behind BLOOM Training
      • The Textless NLP project
      • The Theoretical Minimum - Wikipedia
      • The tool where experts improve AI models
      • The Transformer Family
      • The Transformer Family Lil'Log
      • The Transformer Family Version 2.0 Lil'Log
      • The trial of Lucy Letby has shocked British statisticians
      • The Ultimate Machine Learning Engineer Career Path for 2024
      • The Unreasonable Effectiveness of Recurrent Neural Networks
      • The Unreasonable Syntactic Expressivity of RNNs · John Hewitt
      • The Warmup Trick for Training Deep Neural Networks
      • The Winograd Schema Challenge - Ernest Davis, Leora Morgenstern, and Charles Ortiz
      • The Wire - Wikipedia
      • The Zero Resource Speech Benchmark (series)
      • They Said It Couldn’t Be Done
      • This ‘College Protester’ Isn’t Real. It’s an AI-Powered Undercover Bot for Cops
      • This Is the Data Facebook Gave Police to Prosecute a Teenager for Abortion
      • This Is Why Vegetables Taste Better In Restaurants - YouTube
      • Thoughts on Google Password Manager rLastpass
      • Thread by @karpathy
      • Tiktokenizer
      • Timeline of the London Underground - Wikipedia
      • Tiny but mighty The Phi-3 small language models with big potential
      • TinyChat: Large Language Model on the Edge
      • Tiredness and Diabetes
      • tmux shortcuts & cheatsheet
      • tmux(1) - Linux manual page
      • Tokenizer
      • TOML: English v1.0.0
      • Tony Blair - Why Political Leaders Keep Failing at Major Change
      • Too many adults are absolutely clueless
      • Top 30 Cloud GPU Providers & the GPUs They Offer in 2024
      • Top Artificial Intelligence Companies in London - Dec 2024 Reviews GoodFirms
      • Topic 34: Things You Need to Know About Inference
      • Topological Space -- from Wolfram MathWorld
      • torch.nn.functional.pad — PyTorch 2.6 documentation
      • torch.Tensor.view — PyTorch 2.3 documentation
      • torchaudio.pipelines — Torchaudio 2.2.0.dev20240418 documentation
      • torchtune Easy and Accessible Finetuning in Native PyTorch - Evan Smothers, Meta - YouTube
      • TorToiSe Architectural Design Doc – Non_Interactive – Software & ML
      • Touchstone (assaying tool) - Wikipedia
      • Toy Models of Superposition
      • traceroute - Wikipedia
      • Trae IDE
      • Training a new tokenizer from an old one - Hugging Face NLP Course
      • Training and fine-tuning large language models - Borealis AI
      • Transcreation - Wikipedia
      • Transformer Inference Arithmetic kipply's blog
      • Transformer: A Novel Neural Network Architecture for Language Understanding
      • Transformers from scratch peterbloem.nl
      • Transformers Illustrated!. I was greatly inspired by Jay Alammar’s
 by Tamoghna Saha Medium
      • Transforming the future of music creation - Google DeepMind
      • Trapdoor function - Wikipedia
      • Trie - Wikipedia
      • Trump Can Keep America’s AI Advantage - WSJ
      • Trump echoes Russia as he flips US position on Ukraine
      • Truncation Sampling as Language Model Desmoothing · John Hewitt
      • Tsinghua University - Wikipedia
      • Turkish verb 'selamlamak' conjugated
      • Turn On or Off Color Syntax Highlighting In vim Editor
      • Turning Google smart speakers into wiretaps for $100k
      • Tutorial | Semantic Scholar Academic Graph API
      • Twisted Edwards curve - Wikipedia
      • Two interviews with the founder of DeepSeek — LessWrong
      • Type system - Wikipedia
      • UAX 44 Unicode Character Database
      • UK universities see drop in foreign student visa applications
      • Ukraine is now struggling to survive, not to win
      • Undergraduate Disproves 40-Year-Old Conjecture, Invents New Kind of Hash Table
      • Understanding AES Encryption Modes: AES-GCM, AES-CBC, AES-CTR
      • Understanding CUDA Memory Usage — PyTorch 2.5 documentation
      • Understanding DRAM | Tech Talk | Simms International
      • Understanding Dust Limits | Magic Eden Help Center
      • Understanding FAANG Leveling rleetcode
      • Understanding Git commit SHAs
      • Understanding GitHub Actions - GitHub Docs
      • Understanding GPU Memory 1 Visualizing All Allocations over Time
      • Understanding GPU Memory 2: Finding and Removing Reference Cycles
      • Understanding GRU Networks
      • Understanding LSTM Networks -- colah's blog
      • Understanding Okapi BM25 A Guide to Modern Information Retrieval - Association of Data Scientists
      • Understanding SentencePiece (UnderStanding_SentencePiece) by Jacky Medium
      • Understanding the Python Traceback – Real Python
      • Undertrained tokens in DeepSeek R1
      • Unicode - Wikipedia
      • Unicode – a brief introduction (advanced)
      • Unicode Basic Multilingual Plane (BMP)
      • Unicode character property - Wikipedia
      • Unicode Glossary
      • Unicode HOWTO
      • Unified Transcription and Translation for Extended Reality UTTER
      • Unit testing - Dive Into Python 3
      • UnitedHealth uses AI model with 90_ error rate to deny care, lawsuit alleges - Ars Technica
      • Universal Speech Model (USM) State-of-the-art speech AI for 100+ languages
      • Universally unique identifier - Wikipedia
      • UniversitĂ€t Hildesheim | Fachbereich 3: Sprach- und Informationswissenschaften | Institut fĂŒr Übersetzungswissenschaft & Fachkommunikation | UCCTS2025
      • Unix - Wikipedia
      • Unix domain socket - Wikipedia
      • Unlocking Zero-Resource Machine Translation to Support New Languages in Google Translate
      • Unlocking Zero-Resource Machine Translation to Support New Languages in Google Translate – Google Research Blog
      • unpaper-basic-concepts
      • Unsupervised Feature Learning and Deep Learning Tutorial
      • Unsupervised machine translation: A novel approach to provide fast, accurate translations for more languages
      • Unsupervised speech-to-speech translation from monolingual data
      • Unsupervised speech-to-speech translation from monolingual data – Google Research Blog
      • Upload files to the Hub
      • Uploading datasets
      • URGENT Challenge
      • US tech stocks partly recover after Trump says DeepSeek AI chatbot is ‘wake-up call’
      • Use the Tools Available · C++ Best Practices
      • User space and kernel space - Wikipedia
      • userland a book about the command line for humans
      • Using AI to compress audio files for quick and easy sharing
      • Using AI to find post-quantum cryptography’s vulnerabilities
      • Using Bluesky posts as blog comments
      • Using gzip and gunzip in Linux Baeldung on Linux
      • Using unwrap() in Rust is Okay - Andrew Gallant's Blog
      • UTF-8 - Wikipedia
      • UTF-8 Everywhere
      • UTM Virtual machines for Mac
      • UTM parameters - Wikipedia
      • V-JEPA The next step toward advanced machine intelligence
      • VĂĄclav Volhejn’s page
      • Valentini Noisy Speech Database
      • Valgrind: About
      • Vaporware - Wikipedia
      • Variadic function - Wikipedia
      • Variational autoencoder - Wikipedia
      • VCR: Visual Commonsense Reasoning
      • VDTTS Visually-Driven Text-To-Speech – Google Research Blog
      • Vector Basis -- from Wolfram MathWorld
      • Vector projection - Wikipedia
      • Vector space - Wikipedia
      • Vector space classification
      • Vector Space Projection -- from Wolfram MathWorld
      • Vector Space Span -- from Wolfram MathWorld
      • Versioning and formatting your Python code
      • Vicuna An Open-Source Chatbot Impressing GPT-4 with 90_ ChatGPT Quality LMSYS Org
      • Video generation models as world simulators
      • Video Lectures | Structure and Interpretation of Computer Programs | Electrical Engineering and Computer Science
      • View Datasets provided by MLCommons
      • Violetear - Wikipedia
      • Vipul Ved Prakash - Wikipedia
      • Virtual memory - Wikipedia
      • Vision Transformer in pure JAX.
      • Vitalik Buterin - Wikipedia
      • VizSeq
      • vLLM Easy, Fast, and Cheap LLM Serving with PagedAttention vLLM Blog
      • vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention
      • VoiceBox
      • Volsci - Wikipedia
      • Von Neumann architecture - Wikipedia
      • VoxPopuli: The largest open multilingual speech corpus for AI translation and more
      • WALS Online - Chapter Coding of Evidentiality
      • WALS Online - Chapter Semantic Distinctions of Evidentiality
      • Wassermann Before Wedding Bells Premarital Examination Laws in the United States, 1937–1950 Social History of Medicine Oxford Academic
      • Wasserstein GAN implementation in TensorFlow and Pytorch
      • Wav2vec 2.0 Learning the structure of speech from raw audio
      • Wav2Vec2 - Model card - Hugging Face
      • Web Browser Market Share In 2025: 85+ Browser Usage Statistics
      • Weight units - Bitcoin Wiki
      • Weights & Biases
      • Welcome to Minicrypt
      • Welcome to the Orion User Guide! | Kagi's Docs
      • Wes McKinney - Apache Arrow and the “10 Things I Hate About pandas”
      • Wget Command in Linux with Examples
      • What and Where Are the Memory Stack and Heap Baeldung on Computer Science
      • What are build systems?
      • What are Linux System Calls and how hackers follow them to understand Executables - YouTube
      • What are Liquid Neural Networks? - viso.ai
      • What Are Naïve Bayes Classifiers IBM
      • What are the problems facing Scottish universities?
      • What can I do with SpeechBrain — SpeechBrain 0.5.0 documentation
      • What does optimizer.step() do in PyTorch?
      • What Every Programmer Absolutely, Positively Needs to Know About Encodings and Character Sets to Work With Text
      • what happens when your CPU has a bug? (GhostWrite) - YouTube
      • What I learned from competing against a ConvNet on ImageNet
      • What I Wish I Knew When I Was Younger - YouTube
      • What Is a Bid-Ask Spread, and How Does It Work in Trading?
      • What is a Makefile and how does it work Opensource.com
      • What is a REST API? IBM
      • What is a Variational Autoencoder? IBM
      • What is a woman? Britain’s Supreme Court gives its answer
      • What Is AI TOPS? How It Differs from TeraFLOPS.
      • What Is BM25 (Best Match 25) Full Breakdown - Luigi's Box
      • What Is ChatGPT Doing 
 and Why Does It Work—Stephen Wolfram Writings
      • What is chuchotage?
      • What is collaborative filtering - IBM
      • What is Elon Musk getting up to with America’s payment system?
      • What is Google Zanzibar?
      • What is information retrieval IBM
      • What is Ownership? - The Rust Programming Language
      • What is pgvector, and How Can It Help You EDB
      • What is Remote Code Execution (RCE)? | CrowdStrike
      • What is remote code execution?
      • What is retrieval-augmented generation IBM Research Blog
      • What Is SwiGLU How to Implement It And Why Does it Work
      • What is the difference between FP16 and BF16 Here a good explanation for you by Furkan GözĂŒkara - PhD Computer Engineer, SECourses Medium
      • What is the difference between webk.telegram.org and webz.telegram.org ?
      • What Is Unit Testing? (Definition, Benefits, How-To)
      • What skills do employers want Prospects.ac.uk
      • What was the Golden Age of Antibiotics, and how can we spark a new one?
      • What's the Best Language for App Development
      • What's the difference between a tonal language and a pitch accent language?
      • What’s the endgame of neuroAI?
      • WhatsApp MCP Exploited: Exfiltrating your message history via MCP
      • When does generative AI qualify for fair use?
      • When Nanoseconds Matter: Ultrafast Trading Systems in C++ - David Gross - CppCon 2024 - YouTube
      • Where Students Have Had Their Visas Revoked
      • who and w commands are not working - Red Hat Customer Portal
      • Why can TorToiSe be fine-tuned - 152334H
      • Why can't TorToiSe be fine-tuned - 152334H
      • Why Did Jamie Do It? Stephen Graham On the Devastating Adolescence Finale
      • Why does everyone sing it like THAT - YouTube
      • Why does the hostname on my Mac keep changing | Phind
      • Why I attack
      • Why I Don't Like Singletons
      • Why I Write The Orwell Foundation
      • Why Isn't Functional Programming the Norm? – Richard Feldman
      • Why LLMs Are Hitting a Wall The Low-Hanging Fruit Has Been Eaten
      • Why Premature Optimization Is the Root of All Evil - Stackify
      • Why rents are still rising too fast
      • why rust libraries may never exist.
      • Why Use InfoNCE Loss in Self-supervised Learning
      • Why we want insurance executives dead - by Taylor Lorenz
      • Why you should use `python -m pip`
      • Why your AI Code Completion tool needs to Fill in the Middle
      • Wideband audio - Wikipedia
      • Winograd schema challenge - Wikipedia
      • WinoGrande An Adversarial Winograd Schema Challenge at Scale
      • With 10x growth since 2023, Llama is the leading engine of AI innovation
      • With Bluesky, the social media echo chamber is back in vogue
      • wngloss(7WN) WordNet
      • Wolfram User Portal inc Activation Keys
      • Word divider - Wikipedia
      • Word Sense Disambiguation NLP-progress
      • WordNet
      • Workflow syntax for GitHub Actions - GitHub Docs
      • Write Pythonic and Clean Code With namedtuple – Real Python
      • Writing
      • Writing Nicholas Carlini
      • Writing a C Compiler, Part 1
      • Writing better code with pytorch and einops
      • Writing C for curl - daniel.haxx.se
      • Writing Clean Shell Scripts ‱ Dimitri Merejkowsky
      • Writing Distributed Applications with PyTorch — PyTorch Tutorials 2.6.0+cu124 documentation
      • Writing your pyproject.toml - Python Packaging User Guide
      • WSTG - v4.1 OWASP Foundation
      • X-Frame-Options - HTTP MDN
      • x86 - Wikipedia
      • XAI-KG@ESWC2025
      • XLM
      • XLM-R: State-of-the-art cross-lingual understanding through self-supervision
      • XLS-R Self-supervised speech processing for 128 languages
      • XTTS v2 Notes - Machine Learns
      • XTTS-v1 technical notes - Machine Learns
      • YAML Syntax — Ansible Community Documentation
      • YAP.
      • Yes you should understand backprop
      • Yevgeniy Nikulin - Wikipedia
      • Yonatan Belinkov
      • Your Bluesky Posts Are Probably In A Bunch of Datasets Now
      • Your Complete Guide to Spotify Wrapped, 2023 TIME
      • Your guide to the new anti-immigration argument
      • Zenodo - Wikipedia
      • Zero-copy - Wikipedia
      • Zero-Shot Tokenizer Transfer for transferring LLMs to a new tokenizer without any training by SACHIN KUMAR Medium
      • Zero-shot transfer across 93 languages: Open-sourcing enhanced LASER library
      • Zero-Shot Translation with Google’s Multilingual Neural Machine Translation System – Google Research Blog
      • Zeroing out gradients in PyTorch — PyTorch Tutorials 2.6.0+cu124 documentation
      • Zhuang languages - Wikipedia
      • Zig Language | Thoughts After 2 Years
      • zig will change programming forever
      • 銖饔 · 魔搭瀟ćŒș
        • Assembly
        • AWK
        • Bash - Notes
        • Bash - Resources
        • Bash - Snippets
        • C
        • C++
        • Carbon
        • Dart
        • Erlang
        • Go
        • Haskell
        • Java
        • JavaScript
        • Lua
        • Python - Best Practices
        • Python - Internals
        • Python - Notes
        • Python - Resources
        • R
        • Rust
        • Scala
        • Swift, SwiftUI and Developing for macOS
        • TOML
        • TypeScript
        • WASM Web Assembly
        • YAML
        • Zig
      • Algorithms and Data Structures
      • Asynchronous Programming & Concurrency
      • Build Systems
      • Compilers, Interpreters and Binaries
      • Computer Architecture
      • Computer Science
      • Conda
      • Copilot (GitHub Copilot)
      • cron
      • Cryptography and Cybersecurity
      • CUDA
      • Databases and Data Interchange
      • Debugging
      • Development Containers
      • DevOps and MLOps
      • Distributed Computing, Distributed and Multi-GPU Training
      • Documentation (Maintaining Docs)
      • Fuzzing and Fuzzers
      • Git
      • GitHub Actions
      • Globbing
      • Graphs
      • Hardware Acceleration
      • Hugging Face
      • Make
      • MLX
      • Networking and Computer Networks
      • Operating Systems (OS), Kernels, Linux and Unix
      • PyTorch - Functions
      • PyTorch - Notes
      • PyTorch - Resources
      • Questions
      • Regex
      • Reverse Engineering
      • Software Development
      • Software Licences
      • Text Encoding, UTF, ASCII and more
      • tmux
      • Vim
            • Apple Media Services Terms and Conditions
            • Apple Xcode Terms
            • Ask Siri, Dictation & Privacy - macOS
            • iCloud Terms and Conditions - macOS
            • SOFTWARE LICENSE AGREEMENT FOR macOS Sequoia - macOS
            • Contact Forms
            • Object to your information being used for AI at Meta - Email Confirmation
          • Streamlabs Terms and Conditions
        • Advertising
        • Blender
        • Bluesky
        • Chess
        • Cinema
        • Coding Projects for Development
        • Commercial LLMs (inc APIs)
        • Creative Coding
        • Creative Coding Crafts Space (C3S)
        • D3 Health Dashboard
        • Darknet Diaries
        • Data Visualisation
        • Design
        • Diabetes
        • Digital Garden
        • DNS Server
        • Edinburgh Guide
        • Education
        • Effective Use of LLMs
        • Electoral Systems
        • Figma
        • Finance
        • Fitness
        • Flags of the World
        • Flights
        • Fonts
        • Food
        • Goodreads
        • Home Server
        • Housing and Rents
        • Investing
        • Israel-Palestine
        • Journocoders
        • Kagi
        • Kids
        • London Guide
        • MacBook and macOS
        • MacBook Setup Checklist
        • Mental Anchors
        • Model Context Protocol
        • Music
        • Music Theory
        • Music Understanding and Analysis, and Spotify Fun
        • NotebookLM and Automated Podcasting
        • Obsidian
        • Obsidian - Installing Plugins Manually
        • Overview of Company Valuation Methods
        • Palettes
        • Photography
        • Privacy - Staying Secure Online
        • PyTorch's Transformer and Multi-Head Attention Implementation
        • Reading
        • Reading with a Motive vs Reading
        • Semantic Querying of Obsidian
        • Small Web
        • Spaced Repetition Learning
        • Speech LLM-based Language Learning
        • Streaming, Twitch, YouTube, Videography
        • Time Tracking App - Single User, Native Swift
        • Vibe Coding
        • Volts, Watts, Amps
        • Web Browsers
        • Web Development and Building a Website
        • YouTube Automated Uploader
        • Base64 Encoding
        • Bilinear Interpolation
        • ChatML
        • Connectionist Temporal Classification
        • Content Addressability
        • Cosine Similarity vs Pearson Moment Correlation Coefficiant
        • Decaying Learning Rate Exponentially when Scaling Batch Size and Base Learning Rate
        • Differential Privacy in Machine Learning and Stats Lectures
        • EinOps
        • Exiting Early from Nested Functions - Case Study with Epoch and Batch-wise Training Loops
        • Expectation Maximisation Algorithm
        • Fisher Information
        • Generating from LLMs
        • Gibberlink
        • Gram Matrix and Linear Regression
        • Graphs Spectral Clustering
        • Hidden Markov Models
        • How many iterations will a training run last?
        • Kalman Filtering
        • Learning Rate Warmup
        • Multiclass vs multilabel classification
        • Sampling for Text Generation, Nucleus Sampling (top-$p$), the need for top-$k$ and Beam Search
        • Typing for PyTorch
        • Vector Projection
        • Vector Quantization
        • Weight Initialisation
        • What are the differences between a digital signature, a MAC and a hash?
        • Whitening, sharpening & smoothing
        • "Why Should I Trust You?": Explaining the Predictions of Any Classifier
        • $\infty$-former: Infinite Memory Transformer
        • $\infty$-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation
        • $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
        • 100,000 Podcasts: A Spoken English Document Corpus
        • A Bayesian approach to translators' reliability assessment
        • A Bayesian Perspective on Generalization and Stochastic Gradient Descent
        • A Brief Overview of Unsupervised Neural Speech Representation Learning
        • A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
        • A Call for Clarity in Reporting BLEU Scores
        • A Causal Bayesian Networks Viewpoint on Fairness
        • A Closer Look at Few-shot Classification
        • A Closer Look at Spatiotemporal Convolutions for Action Recognition
        • A Closer Look at Temporal Ordering in the Segmentation of Instructional Videos
        • A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models
        • A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
        • A Comprehensive Survey of Machine Translation Approaches
        • A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
        • A Cookbook of Self-Supervised Learning
        • A Corpus and Evaluation Framework for Deeper Understanding of Commonsense Stories
        • A Diagnostic Study of Explainability Techniques for Text Classification
        • A Generalized EigenGame with Extensions to Multiview Representation Learning
        • A halo model approach for mock catalogs of time-variable strong gravitational lenses
        • A Kernel-Based View of Language Model Fine-Tuning
        • A Large-Scale Evaluation of Speech Foundation Models
        • A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
        • A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
        • A method to convert neural signals into sound sequences
        • A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions via Iterative Refinement and LLM-Driven Feedback Loops
        • A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained Models
        • A Neural Algorithm of Artistic Style
        • A Neural Probabilistic Language Model
        • A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models
        • A practical tutorial on Variational Bayes
        • A predictive learning model can simulate temporal dynamics and context effects found in neural representations of continuous speech
        • A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
        • A Primer on Bayesian Neural Networks: Review and Debates
        • A Primer on Causal Analysis
        • A Probabilistic Neuro-symbolic Layer for Algebraic Constraint Satisfaction
        • A Review of Deep Learning Techniques for Speech Processing
        • A Review of Sparse Expert Models in Deep Learning
        • A Simple and Effective $L_2$ Norm-Based Strategy for KV Cache Compression
        • A Simple Framework for Contrastive Learning of Visual Representations
        • A Suite for Acoustic Language Model Evaluation
        • A Survey of Large Language Models
        • A Survey of Mamba
        • A Survey of Visual Transformers
        • A Survey on Evaluation of Large Language Models
        • A Survey on In-context Learning
        • A Survey on Language Models for Code
        • A Survey on Large Language Models for Code Generation
        • A Survey on LLM-as-a-Judge
        • A Survey on Multimodal Large Language Models
        • A Survey on Neural Speech Synthesis
        • A Survey on Retrieval-Augmented Text Generation for Large Language Models
        • A Survey on Speech Large Language Models
        • A Survey on Subgraph Counting: Concepts, Algorithms and Applications to Network Motifs and Graphlets
        • A Tutorial on Fisher Information
        • A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition
        • A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning
        • A unified architecture for natural language processing: deep neural networks with multitask learning
        • A unified view of entropy-regularized Markov decision processes
        • A Universal Law of Robustness via Isoperimetry
        • A Vulnerability in Implementations of SHA-3, SHAKE, EdDSA, and Other NIST-Approved Algorithms
        • A Watermark for Large Language Models
        • Accelerating Large Language Model Decoding with Speculative Sampling
        • Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
        • Active Data Curation Effectively Distills Large-Scale Multimodal Models
        • Active Self-Supervised Learning: A Few Low-Cost Relationships Are All You Need
        • Adam-mini: Use Fewer Learning Rates To Gain More
        • Adam: A Method for Stochastic Optimization
        • Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers
        • Adapting Language Models to Compress Contexts
        • Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference
        • Adaptive Computation Time for Recurrent Neural Networks
        • Adaptive deconvolutional networks for mid and high level feature learning
        • Adaptive Machine Translation with Large Language Models
        • Adaptive Prototype Learning and Allocation for Few-Shot Segmentation
        • Adaptive Retrieval-Augmented Generation for Conversational Systems
        • Adaptive Semiparametric Language Models
        • Adaptively Sparse Transformers
        • AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data
        • AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style
        • AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios
        • AdaSpeech: Adaptive Text to Speech for Custom Voice
        • AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
        • Adding Chocolate to Mint: Mitigating Metric Interference in Machine Translation
        • Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations
        • Advancing the State of the Art in Open Domain Dialog Systems through the Alexa Prize
        • Adversarial Attacks and Defences: A Survey
        • Adversarial Feature Learning
        • Adversarial NLI: A New Benchmark for Natural Language Understanding
        • AfriMTE and AfriCOMET: Empowering COMET to Embrace Under-resourced African Languages
        • Agent Skill Acquisition for Large Language Models via CycleQD
        • AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
        • AI and Memory Wall
        • AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation
        • AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline
        • AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale
        • ALBA : Reinforcement Learning for Video Object Segmentation
        • ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming
        • Algorithmic Collective Action in Recommender Systems: Promoting Songs by Reordering Playlists
        • Alice's Adventures in a Differentiable Wonderland -- Volume I, A Tour of the Land
        • Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
        • Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
        • AlignFormer: Modality Matching Can Achieve Better Zero-shot Instruction-Following Speech-LLM
        • Aligning Speech to Languages to Enhance Code-switching Speech Recognition
        • Aligning to Adults Is Easy, Aligning to Children Is Hard: A Study of Linguistic Alignment in Dialogue Systems
        • Alpaca: A Strong, Replicable Instruction-Following Model
        • An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition
        • An Analysis of Energy Consumption and Carbon Footprints of Cryptocurrencies and Possible Solutions
        • An Attention Free Transformer
        • An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
        • An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
        • An Empirical Exploration of Curriculum Learning for Neural Machine Translation
        • An Empirical Study of Mamba-based Language Models
        • An Empirical Study of Translation Hypothesis Ensembling with Large Language Models
        • An Emulator for Fine-Tuning Large Language Models using Small Language Models
        • An End-to-End Transformer Model for 3D Object Detection
        • An engine not a camera: Measuring performative power of online search
        • An Evolved Universal Transformer Memory
        • An Explanation of In-context Learning as Implicit Bayesian Inference
        • An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing
        • An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
        • An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
        • An Information-Theoretic Analysis of Self-supervised Discrete Representations of Speech
        • An Integration of Pre-Trained Speech and Language Models for End-to-End Speech Recognition
        • An introduction to graph theory
        • An Introduction to Variational Autoencoders
        • An Introduction to Vision-Language Modeling
        • Analyzing Context Contributions in LLM-based Machine Translation
        • Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing
        • AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
        • Apollo: An Exploration of Video Understanding in Large Multimodal Models
        • Apple Intelligence Foundation Language Models
        • Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods
        • Architectures of Topological Deep Learning: A Survey on Topological Neural Networks
        • Are All Good Word Vector Spaces Isomorphic?
        • Are discrete units necessary for Spoken Language Modeling?
        • Are Sixteen Heads Really Better than One?
        • Are We Done with MMLU?
        • Areas of Attention for Image Captioning
        • Arithmetic coding for data compression
        • Artificial Kuramoto Oscillatory Neurons
        • ASIF: Coupled Data Turns Unimodal Models to Multimodal Without Training
        • Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
        • Associative Embedding: End-to-End Learning for Joint Detection and Grouping
        • AST: Audio Spectrogram Transformer
        • Attention as a Guide for Simultaneous Speech Translation
        • Attention Is All You Need
        • Attention-Based Models for Speech Recognition
        • Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
        • Audio-Language Models for Audio-Centric Tasks: A survey
        • AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs
        • AudioGen: Textually Guided Audio Generation
        • AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
        • AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
        • AudioLM: a Language Modeling Approach to Audio Generation
        • AudioPaLM: A Large Language Model That Can Speak and Listen
        • AudioX: Diffusion Transformer for Anything-to-Audio Generation
        • Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling
        • Augmented Language Models: a Survey
        • Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
        • Auto-Encoding Variational Bayes
        • Autoregressive Image Generation using Residual Quantization
        • AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
        • Avocodo: Generative Adversarial Network for Artifact-free Vocoder
        • Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
        • Bag of Tricks for Efficient Text Classification
        • Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM
        • Balancing, Regression, Difference-In-Differences and Synthetic Control Methods: A Synthesis
        • BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
        • BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
        • BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
        • Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
        • Bayesian Learning for Neural Networks: an algorithmic survey
        • Bayesian Measures of Model Complexity and Fit
        • Benchmarking Attacks on Learning with Errors
        • BERT Learns to Teach: Knowledge Distillation with Meta Learning
        • BERT Rediscovers the Classical NLP Pipeline
        • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
        • BERTScore: Evaluating Text Generation with BERT
        • BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
        • Better & Faster Large Language Models via Multi-token Prediction
        • Better Instruction-Following Through Minimum Bayes Risk
        • Better speech synthesis through scaling
        • Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures
        • Beyond Left and Right: The Role of System Trust in COVID-19 Attitudes and Behaviors
        • Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
        • Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
        • Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents
        • Big Bird: Transformers for Longer Sequences
        • Big Self-Supervised Models are Strong Semi-Supervised Learners
        • Big Transfer (BiT): General Visual Representation Learning
        • BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
        • Billion-scale semi-supervised learning for image classification
        • BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
        • BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
        • Blockwise Parallel Decoding for Deep Autoregressive Models
        • Boltzmann Exploration Done Right
        • Boosting Distributed Training Performance of the Unpadded BERT Model
        • Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning
        • Bootstrap your own latent: A new approach to self-supervised Learning
        • Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination
        • Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
        • BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation
        • Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
        • Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition
        • Building a Time-Aligned Cross-Linguistic Reference Corpus from Language Documentation Data (DoReCo)
        • Building Bridges between Regression, Clustering, and Classification
        • Building Machine Translation Systems for the Next Thousand Languages
        • Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech from Existing Podcast Recordings
        • BurTorch: Revisiting Training from First Principles by Coupling Autodiff, Math Optimization, and Systems
        • ByT5 model for massively multilingual grapheme-to-phoneme conversion
        • Byte Latent Transformer: Patches Scale Better Than Tokens
        • Byte Pair Encoding is Suboptimal for Language Model Pretraining
        • Can AI writing be salvaged? Mitigating Idiosyncrasies and Improving Human-AI Alignment in the Writing Process through Edits
        • Can Automatic Metrics Assess High-Quality Translations?
        • Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
        • Can language models learn from explanations in context?
        • Can Large Language Models Reason and Plan?
        • Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
        • Can Whisper Perform Speech-Based In-Context Learning?
        • Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift
        • CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
        • Canonical Capsules: Self-Supervised Capsules in Canonical Pose
        • Careless Whisper: Speech-to-Text Hallucination Harms
        • Cascade versus Direct Speech Translation: Do the Differences Still Make a Difference?
        • CAT: Content-Adaptive Image Tokenization
        • Categorical Reparameterization with Gumbel-Softmax
        • Causal inference with Bayes rule
        • Causal Reasoning for Algorithmic Fairness
        • CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data
        • CDXFormer: Boosting Remote Sensing Change Detection with Extended Long Short-Term Memory
        • Cem Mil Podcasts: A Spoken Portuguese Document Corpus For Multi-modal, Multi-lingual and Multi-Dialect Information Access Research
        • Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
        • Chain-of-Thought Prompting for Speech Translation
        • Character-Aware Neural Language Models
        • Character-level Convolutional Networks for Text Classification
        • Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
        • ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks
        • ChatMusician: Understanding and Generating Music Intrinsically with LLM
        • ChipNeMo: Domain-Adapted LLMs for Chip Design
        • CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition
        • Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering
        • Clotho: An Audio Captioning Dataset
        • CMU's IWSLT 2024 Simultaneous Speech Translation System
        • Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
        • CoCa: Contrastive Captioners are Image-Text Foundation Models
        • Cockpit: A Practical Debugging Tool for the Training of Deep Neural Networks
        • Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
        • CodeRAG-Bench: Can Retrieval Augment Code Generation?
        • CodeSearchNet Challenge: Evaluating the State of Semantic Code Search
        • Cognitive Science in the era of Artificial Intelligence: A roadmap for reverse-engineering the infant language-learner
        • COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis
        • CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
        • COMET-22: Unbabel-IST 2022 Submission for the Metrics Shared Task
        • COMET: A Neural Framework for MT Evaluation
        • CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task
        • Common Voice: A Massively-Multilingual Speech Corpus
        • CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
        • Compact Speech Translation Models via Discrete Speech Units Pretraining
        • Comparative layer-wise analysis of self-supervised speech models
        • Comparing Discrete and Continuous Space LLMs for Speech Recognition
        • Competence-based Curriculum Learning for Neural Machine Translation
        • Compositional Entailment Learning for Hyperbolic Vision-Language Models
        • Computational Optimal Transport
        • Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
        • Condita: A state machine like architecture for multimodal task bots
        • Conditional Image Generation with PixelCNN Decoders
        • Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
        • Confidence-Aware Scheduled Sampling for Neural Machine Translation
        • Confident Adaptive Language Modeling
        • Conformal Prediction for Natural Language Processing: A Survey
        • Conformer: Convolution-augmented Transformer for Speech Recognition
        • Connecting Speech Encoder and Large Language Model for ASR
        • Connecting the Dots: Evaluating Abstract Reasoning Capabilities of LLMs Using the New York Times Connections Word Game
        • Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
        • ConSeC: Word Sense Disambiguation as Continuous Sense Comprehension
        • Consent in Crisis: The Rapid Decline of the AI Data Commons
        • Context Encoders: Feature Learning by Inpainting
        • Context Encoding for Semantic Segmentation
        • Context-aware Neural Machine Translation for English-Japanese Business Scene Dialogues
        • ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models
        • Continuous Learning from Human Post-Edits for Neural Machine Translation
        • Continuous Speech Tokenizer in Text To Speech
        • Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
        • Contrastive language and vision learning of general fashion concepts
        • Contrastive Language-Image Pre-training for the Italian Language
        • Contrastive Learning with Hard Negative Samples
        • Contrastive Multiview Coding
        • Contrastive prediction strategies for unsupervised segmentation and categorization of phonemes and words
        • Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
        • Contrastive Representation Learning: A Framework and Review
        • Controllable Speech Representation Learning Via Voice Conversion and AIC Loss
        • Controlling Neural Networks with Rule Representations
        • ConvMLP: Hierarchical Convolutional MLPs for Vision
        • CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech
        • CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
        • CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought
        • Counterfactual Fairness
        • Counterfactual harm
        • Counterfactual Reasoning and Learning Systems
        • CoVoST 2 and Massively Multilingual Speech-to-Text Translation
        • CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus
        • CroissantLLM: A Truly Bilingual French-English Language Model
        • CroMo: Cross-Modal Learning for Monocular Depth Estimation
        • Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models
        • Cross-lingual Language Model Pretraining
        • Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training
        • Cross-task weakly supervised learning from instructional videos
        • Cryptanalytic Extraction of Neural Network Models
        • CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages
        • CTC-based Compression for Direct Speech Translation
        • CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
        • Current Limitations of Language Models: What You Need is Retrieval
        • CVSS Corpus and Massively Multilingual Speech-to-Speech Translation
        • Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models
        • DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models
        • DASB - Discrete Audio and Speech Benchmark
        • DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation
        • DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners
        • Data Augmentation Approaches in Natural Language Processing: A Survey
        • Data Augmenting Contrastive Learning of Speech Representations in the Time Domain
        • Data Efficient Reflow for Few Step Audio Generation
        • Data Selection for Language Models via Importance Resampling
        • data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
        • Dataset Distillation: A Comprehensive Review
        • DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
        • DeBERTa: Decoding-enhanced BERT with Disentangled Attention
        • DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
        • Decoding speech perception from non-invasive brain recordings
        • Decoupled Weight Decay Regularization
        • Deep Biaffine Attention for Neural Dependency Parsing
        • Deep Clustering for Unsupervised Learning of Visual Features
        • Deep contextualized word representations
        • Deep Ensemble as a Gaussian Process Approximate Posterior
        • Deep Ensembles: A Loss Landscape Perspective
        • Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond
        • Deep Learning with Differential Privacy
        • Deep Mask Memory Network with Semantic Dependency and Context Moment for Aspect Level Sentiment Classification
        • Deep Neural Networks and Tabular Data: A Survey
        • Deep reinforcement learning from human preferences
        • Deep Residual Learning for Image Recognition
        • Deep Voice 2: Multi-Speaker Neural Text-to-Speech
        • Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning
        • Deep Voice: Real-time Neural Text-to-Speech
        • DeepGaze II: Reading fixations from deep features trained on object recognition
        • DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
        • DeepMMSE: A Deep Learning Approach to MMSE-Based Noise Power Spectral Density Estimation
        • DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
        • DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
        • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
        • DeepSeek-V3 Technical Report
        • DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
        • DeepSpace: Dynamic Spatial and Source Cue Based Source Separation for Dialog Enhancement
        • Defeating Prompt Injections by Design
        • Deformable DETR: Deformable Transformers for End-to-End Object Detection
        • Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
        • DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders
        • DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021
        • DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory
        • DEMix Layers: Disentangling Domains for Modular Language Modeling
        • Dense Associative Memory for Pattern Recognition
        • Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
        • DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models
        • DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
        • Depthwise Convolution is All You Need for Learning Multiple Visual Domains
        • Describing Multimedia Content using Attention-based Encoder--Decoder Networks
        • Designing and Interpreting Probes with Control Tasks
        • DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
        • DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
        • Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement
        • DETRs with Collaborative Hybrid Assignments Training
        • DeVAn: Dense Video Annotation for Video-Language Models
        • Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
        • Did Translation Models Get More Robust Without Anyone Even Noticing?
        • Difference-Masking: Choosing What to Mask in Continued Pretraining
        • Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme
        • Direct Preference Optimization: Your Language Model is Secretly a Reward Model
        • Direct speech-to-speech translation with a sequence-to-sequence model
        • Direct speech-to-speech translation with discrete units
        • Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction
        • Discrete Latent Structure in Neural Networks
        • DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
        • Disentangling Textual and Acoustic Features of Neural Speech Representations
        • Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
        • DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT
        • Distillation Scaling Laws
        • Distilling the Knowledge in a Neural Network
        • Distributed Representations of Words and Phrases and their Compositionality
        • Distribution Fields for Tracking
        • Distributional term representations: an experimental comparison
        • Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization
        • DM-Codec: Distilling Multimodal Representations for Speech Tokenization
        • DMDSpeech: Distilled Diffusion Model Surpassing The Teacher in Zero-shot Speech Synthesis via Direct Metric Optimization
        • dMel: Speech Tokenization made Simple
        • DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors
        • DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors
        • Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
        • Do Context-Aware Translation Models Pay the Right Attention?
        • Do Multi-Sense Embeddings Improve Natural Language Understanding?
        • DOCE: Finding the Sweet Spot for Execution-Based Code Generation
        • Does Simultaneous Speech Translation need Simultaneous Models?
        • Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
        • Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation
        • Don't Decay the Learning Rate, Increase the Batch Size
        • Don't Discard Fixed-Window Audio Segmentation in Speech-to-Text Translation
        • Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization
        • Don't Read Too Much into It: Adaptive Computation for Open-Domain Question Answering
        • DoWhy: An End-to-End Library for Causal Inference
        • DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models
        • DRAW: A Recurrent Neural Network For Image Generation
        • Dropout: A Simple Way to Prevent Neural Networks from Overfitting
        • DTrOCR: Decoder-only Transformer for Optical Character Recognition
        • Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
        • E-Branchformer: Branchformer with Enhanced merging for speech recognition
        • Ecco: An Open Source Library for the Explainability of Transformer Language Models
        • Effective Approaches to Attention-based Neural Machine Translation
        • Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform
        • Efficient Compression of Multitask Multilingual Speech Models
        • Efficient Estimation of Word Representations in Vector Space
        • Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
        • Efficient Memory Management for Large Language Model Serving with PagedAttention
        • Efficient Methods for Natural Language Processing: A Survey
        • Efficient Neural Audio Synthesis
        • Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space
        • Efficient Parallel Audio Generation using Group Masked Language Modeling
        • Efficient Pre-training for Localized Instruction Generation of Videos
        • Efficient Representation Learning via Adaptive Context Pooling
        • Efficient softmax approximation for GPUs
        • Efficient Stagewise Pretraining via Progressive Subnetworks
        • Efficient Tool Use with Chain-of-Abstraction Reasoning
        • Efficient Training of Language Models to Fill in the Middle
        • Efficient Transformers: A Survey
        • Efficient Visual Pretraining with Contrastive Detection
        • Efficiently Identifying Low-Quality Language Subsets in Multilingual Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset
        • Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs
        • Efficiently Programming Large Language Models using SGLang
        • ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
        • Elucidating the Design Space of Diffusion-Based Generative Models
        • Embarrassingly Easy Document-Level MT Metrics: How to Convert Any Pretrained Metric Into a Document-Level Metric
        • Emergent and Predictable Memorization in Large Language Models
        • Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task
        • Emerging Properties in Self-Supervised Vision Transformers
        • Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
        • EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
        • EMMeTT: Efficient Multimodal Machine Translation Training
        • EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
        • EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
        • Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions
        • EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
        • EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models
        • Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features
        • Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
        • Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
        • Encoding of speech in convolutional layers and the brain stem based on language experience
        • Encoding sound in the cochlea: from receptor potential to afferent discharge
        • End-to-End Dense Video Captioning with Parallel Decoding
        • End-to-End Learning of Visual Representations from Uncurated Instructional Videos
        • End-to-End Object Detection with Transformers
        • End-to-End Simultaneous Speech Translation with Differentiable Segmentation
        • End-to-End Speech Recognition: A Survey
        • End-to-End Speech-to-Text Translation: A Survey
        • End-to-end Temporal Action Detection with Transformer
        • End-to-End Text-Dependent Speaker Verification
        • Energy and Policy Considerations for Deep Learning in NLP
        • Enhanced Hallucination Detection in Neural Machine Translation through Simple Detector Aggregation
        • Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
        • EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges
        • Enriching Word Vectors with Subword Information
        • eP-ALM: Efficient Perceptual Augmentation of Language Models
        • Epitran: Precision G2P for Many Languages
        • Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum Bayes Risk Decoding for Machine Translation
        • Error detecting and error correcting codes
        • ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
        • ESPnet-ST: All-in-One Speech Translation Toolkit
        • ESPnet: End-to-End Speech Processing Toolkit
        • Estimating the Completeness of Discrete Speech Units
        • Estimating Training Data Influence by Tracing Gradient Descent
        • Estimation of Non-Normalized Statistical Models by Score Matching
        • ETC: Encoding Long and Structured Inputs in Transformers
        • Euclidean Embedding of Co-occurrence Data
        • EuroBERT: Scaling Multilingual Encoders for European Languages
        • EuroLLM: Multilingual Language Models for Europe
        • Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization
        • Europarl-ST: A Multilingual Corpus For Speech Translation Of Parliamentary Debates
        • Evaluating deep learning architectures for speech emotion recognition
        • Evaluating Frontier Models for Dangerous Capabilities
        • Evaluating Language Model Agency through Negotiations
        • Evaluating language models as risk scores
        • Evaluating Large Language Models Trained on Code
        • Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation
        • Evaluating the Stability of Embedding-based Word Similarities
        • Evaluation data contamination in LLMs: how do we measure it and (when) does it matter?
        • EVE: Explainable Vector Based Embedding Technique Using Wikipedia
        • Evolution through Large Models
        • Explainability for Large Language Models: A Survey
        • Explainability for Speech Models: On the Challenges of Acoustic Feature Selection
        • Explainability Via Causal Self-Talk
        • Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features
        • Exploiting Similarities among Languages for Machine Translation
        • Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning
        • Exploring Simple Siamese Representation Learning
        • Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
        • Exploring the Benefits of Tokenization of Discrete Acoustic Units
        • Exploring the Limits of Language Modeling
        • Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
        • EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis
        • Extracting Training Data from Diffusion Models
        • Extracting Training Data from Large Language Models
        • Extraction of Salient Sentences from Labelled Documents
        • Extreme Masking for Learning Instance and Distributed Visual Representations
        • F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
        • Facebook AI WMT21 News Translation Task Submission
        • fairseq S2T: Fast Speech-to-Text Modeling with fairseq
        • Faith and Fate: Limits of Transformers on Compositionality
        • Falcon2-11B Technical Report
        • Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
        • Fast and Vectorizable Alternative to Binary Search in O(1) Applicable to a Wide Domain of Sorted Arrays of Floating Point Numbers
        • Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
        • Fast Inference from Transformers via Speculative Decoding
        • Fast Model Editing at Scale
        • Fast Transformer Decoding: One Write-Head is All You Need
        • FastPitch: Parallel Text-to-speech with Pitch Prediction
        • FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
        • FastSpeech: Fast, Robust and Controllable Text to Speech
        • Fauno: The Italian Large Language Model that will leave you senza parole!
        • Federated Learning: Strategies for Improving Communication Efficiency
        • FEVER: a large-scale dataset for Fact Extraction and VERification
        • Few-Shot Keyword Spotting in Any Language
        • Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
        • Fine-tuning Language Models for Factuality
        • Finetuned Language Models Are Zero-Shot Learners
        • Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models
        • Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
        • Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models
        • Flamingo: a Visual Language Model for Few-Shot Learning
        • FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
        • FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
        • FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks
        • FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech
        • Flow Matching for Generative Modeling
        • Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
        • Flying and swimming animals cruise at a Strouhal number tuned for high power efficiency
        • FNet: Mixing Tokens with Fourier Transforms
        • Focal Loss for Dense Object Detection
        • FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks
        • Following the Human Thread in Social Navigation
        • Formal Limitations on the Measurement of Mutual Information
        • Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis
        • Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
        • Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
        • From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function
        • From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
        • From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion
        • From Handcrafted Features to LLMs: A Brief Survey for Machine Translation Quality Estimation
        • From Recognition to Cognition: Visual Commonsense Reasoning
        • From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification
        • From Sparse to Soft Mixtures of Experts
        • From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM
        • Full Parameter Fine-tuning for Large Language Models with Limited Resources
        • Fully Character-Level Neural Machine Translation without Explicit Segmentation
        • FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
        • FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
        • Fundamentals of Grammatology
        • GAIA: a benchmark for General AI Assistants
        • GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
        • GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
        • Gaussian Mixture Latent Vector Grammars
        • GEIC: Universal and Multilingual Named Entity Recognition with Large Language Models
        • Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
        • Gemini: A Family of Highly Capable Multimodal Models
        • Gemma 2: Improving Open Language Models at a Practical Size
        • Gemma: Open Models Based on Gemini Research and Technology
        • Gender Bias in Contextualized Word Embeddings
        • Gender Bias in Coreference Resolution
        • Generalization Ability of MOS Prediction Networks
        • Generalization in diffusion models arises from geometry-adaptive harmonic representations
        • Generalization through Memorization: Nearest Neighbor Language Models
        • Generalized Shape Metrics on Neural Representations
        • Generating Diverse High-Fidelity Images with VQ-VAE-2
        • Generating Long Sequences with Sparse Transformers
        • Generative Adversarial Networks
        • Generative Models: What do they know? Do they know things? Let's find out!
        • Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
        • Generative Spoken Dialogue Language Modeling
        • Generative Spoken Language Modeling from Raw Audio
        • Generator Matching: Generative modeling with arbitrary Markov processes
        • Genie: Generative Interactive Environments
        • Geographic Adaptation of Pretrained Language Models
        • Geographic and Geopolitical Biases of Language Models
        • Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
        • GFlowNet Foundations
        • GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
        • Git Re-Basin: Merging Models modulo Permutation Symmetries
        • Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models
        • GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot
        • Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
        • Globally Normalized Transition-Based Neural Networks
        • GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge
        • Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
        • Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
        • Glow: Generative Flow with Invertible 1x1 Convolutions
        • GLU Variants Improve Transformer
        • GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
        • Goku: Flow Based Video Generative Foundation Models
        • Good Night at 4 pm?! Time Expressions in Different Cultures
        • Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
        • Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
        • Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
        • Gorilla: Large Language Model Connected with Massive APIs
        • GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
        • Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
        • Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
        • Gradient Descent Converges to Minimizers
        • Grandmaster-Level Chess Without Search
        • Graph Pre-training for AMR Parsing and Generation
        • Grapheme-to-Phoneme Models for (Almost) Any Language
        • Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
        • Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
        • Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
        • Group Normalization
        • Group Robust Preference Optimization in Reward-free RLHF
        • GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
        • Guess What I Think: Streamlined EEG-to-Image Generation with Latent Diffusion Models
        • Guiding a Diffusion Model with a Bad Version of Itself
        • HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
        • HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis
        • Hands-on Bayesian Neural Networks -- a Tutorial for Deep Learning Users
        • HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
        • HGRN2: Gated Linear RNNs with State Expansion
        • Hi-Fi Multi-Speaker English TTS Dataset
        • Hierarchical nucleation in deep neural networks
        • HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
        • HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec
        • HiFi-GAN-2: Studio-Quality Speech Enhancement via Generative Adversarial Networks Conditioned on Acoustic Features
        • HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
        • HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
        • High Fidelity Neural Audio Compression
        • High-Fidelity Audio Compression with Improved RVQGAN
        • High-Fidelity Simultaneous Speech-To-Speech Translation
        • High-speed high-security signatures
        • HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation
        • Highly accurate protein structure prediction with AlphaFold
        • Highway Networks
        • Holistic Evaluation of Language Models
        • Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval
        • Houdini: Fooling Deep Structured Prediction Models
        • How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?
        • How (not) to do Phonological Typology: The Case of Pitch-Accent
        • How Context Affects Language Models' Factual Predictions
        • How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena
        • How do Multimodal Foundation Models Encode Text and Speech? An Analysis of Cross-Lingual and Cross-Modal Representations
        • How Does Batch Normalization Help Optimization?
        • How Effective are State Space Models for Machine Translation?
        • How Familiar Does That Sound? Cross-Lingual Representational Similarity Analysis of Acoustic Word Embeddings
        • How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
        • How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
        • How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation
        • How many degrees of freedom do we need to train deep networks: a loss landscape perspective
        • How Much Knowledge Can You Pack Into the Parameters of a Language Model?
        • How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
        • How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not
        • How to represent part-whole hierarchies in a neural network
        • How to Train Your Energy-Based Models
        • How transferable are features in deep neural networks?
        • How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis
        • How well can VMEC predict the initial saturation of external kink modes in near circular tokamaks and $l=2$ stellarators?
        • HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
        • HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
        • Human Action Localization with Sparse Spatial Supervision
        • Human-in-the-Loop Causal Discovery under Latent Confounding using Ancestral GFlowNets
        • Humanity's Last Exam
        • Hungry Hungry Hippos: Towards Language Modeling with State Space Models
        • Hyena Hierarchy: Towards Larger Convolutional Language Models
        • HyperAttention: Long-context Attention in Near-Linear Time
        • Hyperbolic Active Learning for Semantic Segmentation under Domain Shift
        • Hyperbolic Deep Neural Networks: A Survey
        • Hyperbolic Geometry
        • Hyperbolic Learning with Multimodal Large Language Models
        • Hyperbolic Neural Networks
        • HYperbolic Self-Paced Learning for Self-Supervised Skeleton-based Action Representations
        • HyperCLOVA X Technical Report
        • Hypergraph Neural Networks through the Lens of Message Passing: A Common Perspective to Homophily and Architecture Design
        • HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models
        • I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
        • Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant
        • ILLUME: Rationalizing Vision-Language Models through Human Interactions
        • Im2Text: Describing Images Using 1 Million Captioned Photographs
        • Image Captioning and Visual Question Answering Based on Attributes and External Knowledge
        • ImageBind: One Embedding Space To Bind Them All
        • ImageNet Large Scale Visual Recognition Challenge
        • Imitation Learning as $f$-Divergence Minimization
        • Impact of Tokenization on Language Models: An Analysis for Turkish
        • Implicit Generation and Generalization in Energy-Based Models
        • Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation
        • Improved Baselines with Momentum Contrastive Learning
        • Improved Baselines with Visual Instruction Tuning
        • Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction
        • Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback
        • Improving language models by retrieving from trillions of tokens
        • Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
        • Improving Neural Language Models with a Continuous Cache
        • Improving Neural Machine Translation Models with Monolingual Data
        • Improving neural networks by preventing co-adaptation of feature detectors
        • Improving Personalized Explanation Generation through Visualization
        • Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy
        • Improving Word Representations via Global Context and Multiple Word Prototypes
        • Improving Zero-Shot Translation by Disentangling Positional Information
        • Improving Zero-shot Voice Style Transfer via Disentangled Representation Learning
        • In Defense of Grid Features for Visual Question Answering
        • INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
        • Inferring and Executing Programs for Visual Reasoning
        • InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization
        • InfoNCE: Identifying the Gap Between Theory and Practice
        • Information Theory and Statistics: an overview
        • Information-Theoretic Probing for Linguistic Structure
        • InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models
        • Inseq: An Interpretability Toolkit for Sequence Generation Models
        • Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks
        • Instruction Tuning for Large Language Models: A Survey
        • InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
        • Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition
        • Internalizing ASR with Implicit Chain of Thought for Efficient Speech-to-Speech Conversational LLM
        • InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
        • Interpolating Compressed Parameter Subspaces
        • Interpretable Convolutional Filters with SincNet
        • Interpretation of convolutional neural networks for speech spectrogram regression from intracranial recordings
        • Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations
        • Intrinsic dimension of data representations in deep neural networks
        • Intrusive And Non-Intrusive Perceptual Speech Quality Assessment Using A Convolutional Neural Network
        • Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model
        • Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting
        • Investigating Backtranslation in Neural Machine Translation
        • Investigating Decoder-only Large Language Models for Speech-to-text Translation
        • Investigating Lexical Sharing in Multilingual Machine Translation for Indian Languages
        • Investigating Multilingual NMT Representations at Scale
        • Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
        • Is Context Helpful for Chat Translation Evaluation?
        • Is Feedback All You Need? Leveraging Natural Language Feedback in Goal-Conditioned Reinforcement Learning
        • Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation
        • Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis
        • Is Temporal Prompting All We Need For Limited Labeled Action Recognition?
        • Is Training Data Quality or Quantity More Impactful to Small Language Model Performance?
        • iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform
        • It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
        • ITALIC: An Italian Intent Classification Dataset
        • ITU-T coders for wideband, superwideband, and fullband speech communication [Series Editorial]
        • Jamba: A Hybrid Transformer-Mamba Language Model
        • JetFormer: An Autoregressive Generative Model of Raw Images and Text
        • Johnson-Lindenstrauss Lemma, Linear and Nonlinear Random Projections, Random Fourier Features, and Random Kitchen Sinks: Tutorial and Survey
        • Joint-task Self-supervised Learning for Temporal Correspondence
        • JOREK3D: An extension of the JOREK nonlinear MHD code to stellarators
        • JudgeBlender: Ensembling Judgments for Automatic Relevance Assessment
        • Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
        • Just What do You Think You're Doing, Dave?' A Checklist for Responsible Data Use in NLP
        • KAN: Kolmogorov-Arnold Networks
        • Kimi-Audio Technical Report
        • KIT's Multilingual Speech Translation System for IWSLT 2023
        • kNN For Whisper And Its Effect On Bias And Speaker Adaptation
        • Knowledge Conflicts for LLMs: A Survey
        • Knowledge distillation: A good teacher is patient and consistent
        • Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges
        • LAION-5B: An open large-scale dataset for training next generation image-text models
        • LaMP: When Large Language Models Meet Personalization
        • Language agents achieve superhuman synthesis of scientific knowledge
        • Language Agnostic Speech Embeddings for Emotion Classification
        • Language Contamination Helps Explain the Cross-lingual Capabilities of English Pretrained Models
        • Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus
        • Language Model Can Listen While Speaking
        • Language Modeling with Deep Transformers
        • Language Modeling with Gated Convolutional Networks
        • Language Models are Few-Shot Learners
        • Language Models are Multilingual Chain-of-Thought Reasoners
        • Language Models are Realistic Tabular Data Generators
        • Language Models as Knowledge Bases?
        • Language Models Represent Space and Time
        • Language Models: A Guide for the Perplexed
        • Language-Universal Speech Attributes Modeling for Zero-Shot Multilingual Spoken Keyword Recognition
        • LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
        • Laplace Redux -- Effortless Bayesian Deep Learning
        • Large Associative Memory Problem in Neurobiology and Machine Learning
        • Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
        • Large Batch Training of Convolutional Networks
        • Large Concept Models: Language Modeling in a Sentence Representation Space
        • Large Language Diffusion Models
        • Large Language Model Influence on Diagnostic Reasoning A Randomized Clinical Trial
        • Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences
        • Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners
        • Large Language Models Are State-of-the-Art Evaluators of Translation Quality
        • Large Language Models As Evolution Strategies
        • Large Language Models for Compiler Optimization
        • Large Language Models for Data Annotation: A Survey
        • Large Language Models: A Survey
        • Large-Scale Automatic Audiobook Creation
        • Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
        • Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
        • Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification
        • Lattice Recurrent Unit: Improving Convergence and Statistical Efficiency for Sequence Modeling
        • Layer Normalization
        • LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
        • Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
        • Learn2Augment: Learning to Composite Videos for Data Augmentation in Action Recognition
        • Learnability and the Vapnik-Chervonenkis dimension
        • Learned feature representations are biased by complexity, learning order, position, and more
        • Learning a similarity metric discriminatively, with application to face verification
        • Learning Action Changes by Measuring Verb-Adverb Textual Relationships
        • Learning and Evaluating General Linguistic Intelligence
        • Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting
        • Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
        • Learning Correspondence from the Cycle-Consistency of Time
        • Learning Differentially Private Recurrent Language Models
        • Learning Filterbanks from Raw Speech for Phone Recognition
        • Learning Interactive Real-World Simulators
        • Learning Language-Specific Layers for Multilingual Machine Translation
        • Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
        • Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks
        • Learning Source Disentanglement in Neural Audio Codec
        • Learning Speaker Representations with Mutual Information
        • Learning Temporal Dynamics from Cycles in Narrated Video
        • Learning Temporal Sentence Grounding From Narrated EgoVideos
        • Learning the Predictability of the Future
        • Learning to Compress Prompts with Gist Tokens
        • Learning to Generate Reviews and Discovering Sentiment
        • Learning to Merge Word Senses
        • Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
        • Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
        • Learning to summarize from human feedback
        • Learning Transferable Visual Models From Natural Language Supervision
        • Learning with Fenchel-Young Losses
        • Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
        • Leveraging Audio-Only Data for Text-Queried Target Sound Extraction
        • Leveraging Content and Acoustic Representations for Speech Emotion Recognition
        • Leveraging Gloss Knowledge in Neural Word Sense Disambiguation by Hierarchical Co-Attention
        • Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
        • Libri-Light: A Benchmark for ASR with Limited or No Supervision
        • Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
        • Librispeech An ASR corpus based on public domain audio books
        • LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models
        • LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
        • Lifting the Curse of Multilinguality by Pre-training Modular Transformers
        • Lightweight and Efficient Spoken Language Identification of Long-form Audio
        • Lightweight Audio Segmentation for Long-form Speech Translation
        • LIMO: Less is More for Reasoning
        • Linear Connectivity Reveals Generalization Strategies
        • Linear-time Minimum Bayes Risk Decoding with Reference Aggregation
        • Linformer: Self-Attention with Linear Complexity
        • Linguini: A benchmark for language-agnostic linguistic reasoning
        • Linguistic Regularities in Sparse and Explicit Word Representations
        • Liquid Time-constant Networks
        • Liquid: Language Models are Scalable Multi-modal Generators
        • Listen, Think, and Understand
        • Listenable Maps for Audio Classifiers
        • LiT: Zero-Shot Transfer with Locked-image text Tuning
        • Llama 2: Open Foundation and Fine-Tuned Chat Models
        • LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
        • LLaMA-Omni: Seamless Speech Interaction with Large Language Models
        • LLaMA: Open and Efficient Foundation Language Models
        • Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis
        • LLaSM: Large Language and Speech Model
        • LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
        • LLaVA-OneVision: Easy Visual Task Transfer
        • LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
        • LLM Post-Training: A Deep Dive into Reasoning Large Language Models
        • LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations
        • LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History
        • LLM-as-a-Judge & Reward Model: What They Can and Cannot Do
        • LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
        • LLM4Eval: Large Language Model for Evaluation in IR
        • LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models
        • Localizing Objects with Self-Supervised Transformers and no Labels
        • LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding
        • Locating and Editing Factual Associations in GPT
        • Logits of API-Protected LLMs Leak Proprietary Information
        • Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
        • Long-Context Language Modeling with Parallel Context Encoding
        • Longformer: The Long-Document Transformer
        • LongNet: Scaling Transformers to 1,000,000,000 Tokens
        • LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
        • Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation
        • Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation
        • LoRA: Low-Rank Adaptation of Large Language Models
        • LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
        • Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
        • Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
        • Lost in the Middle: How Language Models Use Long Contexts
        • LRS3-TED: a large-scale dataset for visual speech recognition
        • Lumiere: A Space-Time Diffusion Model for Video Generation
        • LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
        • M-Prometheus: A Suite of Open Multilingual LLM Judges
        • Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
        • Making AI Forget You: Data Deletion in Machine Learning
        • Making New Connections: LLMs as Puzzle Generators for The New York Times' Connections Word Game
        • Making Pre-trained Language Models Better Few-shot Learners
        • Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
        • Mamba in Speech: Towards an Alternative to Self-Attention
        • Mamba: Linear-Time Sequence Modeling with Selective State Spaces
        • Many-Shot In-Context Learning
        • MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting
        • Marian: Fast Neural Machine Translation in C++
        • Mask-Predict: Parallel Decoding of Conditional Masked Language Models
        • Masked Autoencoders Are Scalable Vision Learners
        • Masked Autoencoders that Listen
        • MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
        • MaskGIT: Masked Generative Image Transformer
        • MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages
        • Massively Multilingual Neural Grapheme-to-Phoneme Conversion
        • Massively Multilingual Neural Machine Translation
        • Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges
        • Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
        • Matrix Decomposition and Applications
        • Matryoshka Diffusion Models
        • Matryoshka Quantization
        • Matryoshka Representation Learning
        • MAVD: The First Open Large-Scale Mandarin Audio-Visual Dataset with Depth Information
        • MAWPS: A Math Word Problem Repository
        • Measuring and Increasing Context Usage in Context-Aware Machine Translation
        • Measuring Massive Multitask Language Understanding
        • Measuring the Effects of Data Parallelism on Neural Network Training
        • Measuring the Intrinsic Dimension of Objective Landscapes
        • Measuring the Mixing of Contextual Information in the Transformer
        • MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
        • MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf
        • MEG-MASC: a high-quality magneto-encephalography dataset for evaluating natural speech processing
        • MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
        • Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
        • MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
        • Membership Inference Attacks on Machine Learning: A Survey
        • MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory
        • Memory Layers at Scale
        • Memory Performance Attacks: Denial of Memory Service in {Multi-Core} Systems
        • MERaLiON-AudioLLM: Bridging Audio and Language with Large Language Models
        • MERaLiON-SpeechEncoder: Towards a Speech Foundation Model for Singapore and Beyond
        • MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
        • MERLOT: Multimodal Neural Script Knowledge Models
        • Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound
        • Meta-Learning Online Adaptation of Language Models
        • Meta-Transformer: A Unified Framework for Multimodal Learning
        • METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments
        • MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement
        • MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task
        • MEXMA: Token-level objectives improve sentence representations
        • MFPP: Morphological Fragmental Perturbation Pyramid for Black-Box Model Explanations
        • mGeNTE: A Multilingual Resource for Gender-Neutral Language and Translation
        • mHuBERT-147: A Compact Multilingual HuBERT Model
        • Microsoft COCO: Common Objects in Context
        • MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data
        • Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
        • MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
        • MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
        • Minimum Bayes-Risk Decoding for Statistical Machine Translation
        • MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
        • MIO: A Foundation Model on Multimodal Tokens
        • Mistral 7B
        • Mixed Precision Training
        • Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
        • Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings
        • Mixtral of Experts
        • Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection
        • ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
        • MLP-Mixer: An all-MLP Architecture for Vision
        • MLS: A Large-Scale Multilingual Dataset for Speech Research
        • MM-LLMs: Recent Advances in MultiModal Large Language Models
        • MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
        • MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
        • MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
        • MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
        • MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
        • Model Editing with Canonical Examples
        • Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning
        • Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
        • Modeling User Preferences with Automatic Metrics: Creating a High-Quality Preference Dataset for Machine Translation
        • Modelling low-resource accents without accent-specific TTS frontend
        • Modelling of saturated external MHD instabilities in tokamaks: a comparison of 3D free boundary equilibria and nonlinear stability calculations
        • Modular Deep Learning
        • Modular Transformers: Compressing Transformers into Modularized Layers for Flexible Efficient Inference
        • ModuleFormer: Modularity Emerges from Mixture-of-Experts
        • MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
        • Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
        • Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
        • Momentum Contrast for Unsupervised Visual Representation Learning
        • Monte Carlo Temperature: a robust sampling strategy for LLM's uncertainty quantification methods
        • MoonCast: High-Quality Zero-Shot Podcast Generation
        • More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
        • MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
        • Moshi: a speech-text foundation model for real-time dialogue
        • MOSNet: Deep Learning based Objective Assessment for Voice Conversion
        • MouSi: Poly-Visual-Expert Vision-Language Models
        • Movie Gen: A Cast of Media Foundation Models
        • MovieNet: A Holistic Dataset for Movie Understanding
        • mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
        • mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
        • mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
        • MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations
        • mSLAM: Massively multilingual joint pre-training for speech and text
        • MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research
        • MSR-VTT: A Large Video Description Dataset for Bridging Video and Language
        • MSTS: A Multimodal Safety Test Suite for Vision-Language Models
        • MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues
        • MuLan: A Joint Embedding of Music Audio and Natural Language
        • Multi-Prototype Vector-Space Models of Word Meaning
        • Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
        • Multi-Scale Context Aggregation by Dilated Convolutions
        • Multi-sense embeddings through a word sense disambiguation process
        • Multi-Source Diffusion Models for Simultaneous Music Generation and Separation
        • Multi-task self-supervised learning for Robust Speech Recognition
        • Multi-ToM: Evaluating Multilingual Theory of Mind Capabilities in Large Language Models
        • Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts
        • Multilingual Pretraining Using a Large Corpus Machine-Translated from a Single Source Language
        • Multilingual Speech Models for Automatic Speech Recognition Exhibit Gender Performance Gaps
        • Multimodal and Multilingual Embeddings for Large-Scale Speech Mining
        • Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
        • Multimodal Few-Shot Learning with Frozen Language Models
        • Multimodal Machine Learning: A Survey and Taxonomy
        • Multimodal Neural Databases
        • Multiple Importance Sampling ELBO and Deep Ensembles of Variational Approximations
        • Multiple Object Recognition with Visual Attention
        • Multitask Prompted Training Enables Zero-Shot Task Generalization
        • Music Transformer
        • MusicLM: Generating Music From Text
        • MuST-C: A multilingual corpus for end-to-end speech translation
        • MuST-C: a Multilingual Speech Translation Corpus
        • MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
        • Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
        • Natural language guidance of high-fidelity text-to-speech with synthetic annotations
        • Natural Language Processing (almost) from Scratch
        • Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
        • NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
        • NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
        • Navigating the Maze of Explainable AI: A Systematic Approach to Evaluating Methods and Metrics
        • NBDT: Neural-Backed Decision Trees
        • Nearly-Optimal Mergesorts: Fast, Practical Sorting Methods That Optimally Adapt to Existing Runs
        • Needle In A Multimodal Haystack
        • Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
        • Neural Collaborative Filtering
        • Neural Combinatorial Optimization with Reinforcement Learning
        • Neural Discrete Representation Learning
        • Neural Grapheme-to-Phoneme Conversion with Pre-trained Grapheme Models
        • Neural Language Model Pruning for Automatic Speech Recognition
        • Neural Machine Translation by Jointly Learning to Align and Translate
        • Neural Machine Translation of Rare Words with Subword Units
        • Neural Machine Translation: A Review and Survey
        • Neural Machine Translation: Challenges, Progress and Future
        • Neural Motifs: Scene Graph Parsing with Global Context
        • Neural Network Acceptability Judgments
        • Neural Networks are Decision Trees
        • Neural Sequence Learning Models for Word Sense Disambiguation
        • Neural Speech Synthesis with Transformer Network
        • Neural Voice Cloning with a Few Samples
        • Neural Word Embedding as Implicit Matrix Factorization
        • NeuralDEM - Real-time Simulation of Industrial Particulate Flows
        • Neurosymbolic AI -- Why, What, and How
        • Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
        • No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages
        • No Language Left Behind: Scaling Human-Centered Machine Translation
        • Noise-contrastive estimation: A new estimation principle for unnormalized statistical models
        • NoLiMa: Long-Context Evaluation Beyond Literal Matching
        • Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling
        • Non-Autoregressive Neural Machine Translation
        • Non-Exchangeable Conformal Language Generation with Nearest Neighbors
        • Non-Exchangeable Conformal Risk Control
        • Non-intrusive Speech Quality Assessment Using Neural Networks
        • Nonlinear Dimensionality Reduction by Locally Linear Embedding
        • Nonlinear MHD modeling of soft $ÎČ$ limits in W7-AS
        • Nonlinear MHD simulations of external kinks in quasi-axisymmetric stellarators using an axisymmetric external rotational transform approximation
        • Normalization Techniques in Training DNNs: Methodology, Analysis and Application
        • Not Just a Black Box: Learning Important Features Through Propagating Activation Differences
        • Nougat: Neural Optical Understanding for Academic Documents
        • Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
        • Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers
        • NUTSHELL: A Dataset for Abstract Generation from Scientific Talks
        • NVLM: Open Frontier-Class Multimodal LLMs
        • OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
        • OLMo: Accelerating the Science of Language Models
        • OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
        • OmniParser for Pure Vision Based GUI Agent
        • On Compositions of Transformations in Contrastive Self-Supervised Learning
        • On Divergence Measures for Training GFlowNets
        • On Information and Sufficiency
        • On Instruction-Finetuning Neural Machine Translation Models
        • On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
        • On Layer Normalization in the Transformer Architecture
        • On the cyclic nature of perception in vision versus audition
        • On the difficulty of training Recurrent Neural Networks
        • On the Implications of Verbose LLM Outputs: A Case Study in Translation Evaluation
        • On the Integration of Optical Flow and Action Recognition
        • On the Limitations of Compute Thresholds as a Governance Strategy
        • On the Measure of Intelligence
        • On the Number of Linear Regions of Deep Neural Networks
        • On the Opportunities and Risks of Foundation Models
        • On the Out-of-distribution Generalization of Probabilistic Image Modelling
        • On the Representation Collapse of Sparse Mixture of Experts
        • One Mind, Many Tongues: A Deep Dive into Language-Agnostic Knowledge Neurons in Large Language Models
        • One TTS Alignment To Rule Them All
        • One Wide Feedforward is All You Need
        • ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
        • One-Shot Open Affordance Learning with Foundation Models
        • One-To-Many Multilingual End-to-end Speech Translation
        • OneChart: Purify the Chart Structural Extraction via One Auxiliary Token
        • OneLLM: One Framework to Align All Modalities with Language
        • Only Time Can Tell: Discovering Temporal Data for Temporal Modeling
        • Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena
        • Open-Source Conversational AI with SpeechBrain 1.0
        • OpenAssistant Conversations -- Democratizing Large Language Model Alignment
        • OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
        • OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
        • OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs
        • OpenVoice: Versatile Instant Voice Cloning
        • OPT: Open Pre-trained Transformer Language Models
        • Optical Flow with Semantic Segmentation and Localized Layers
        • Optimal Bounds for Open Addressing Without Reordering
        • Optimization Methods for Large-Scale Machine Learning
        • Otter: A Multi-Modal Model with In-Context Instruction Tuning
        • Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
        • Over-Generation Cannot Be Rewarded: Length-Adaptive Average Lagging for Simultaneous Speech Translation
        • Overcoming catastrophic forgetting in neural networks
        • Ovis: Structural Embedding Alignment for Multimodal Large Language Model
        • OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models
        • OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
        • OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
        • P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting
        • PaLI: A Jointly-Scaled Multilingual Language-Image Model
        • PaliGemma 2: A Family of Versatile VLMs for Transfer
        • PaliGemma: A versatile 3B VLM for transfer
        • PaLM 2 Technical Report
        • PaLM: Scaling Language Modeling with Pathways
        • PALO: A Polyglot Large Multimodal Model for 5B People
        • Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
        • Parakeet A natural sounding, conversational text-to-speech model
        • Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation
        • Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue
        • Parallel Scheduled Sampling
        • Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
        • Parallel Tacotron: Non-Autoregressive and Controllable TTS
        • Parallel WaveNet: Fast High-Fidelity Speech Synthesis
        • Parameter-efficient fine-tuning of large-scale pre-trained language models
        • Parameter-Efficient Transfer Learning for NLP
        • Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation
        • Parsing with Compositional Vector Grammars
        • Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
        • Pay Attention to MLPs
        • PCFGs Can Do Better: Inducing Probabilistic Context-Free Grammars with Many Symbols
        • Pengi: An Audio Language Model for Audio Tasks
        • Perceiver IO: A General Architecture for Structured Inputs & Outputs
        • Perceiver: General Perception with Iterative Attention
        • Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs
        • Perceptual Losses for Real-Time Style Transfer and Super-Resolution
        • Personality-aware Human-centric Multimodal Reasoning: A New Task, Dataset and Baselines
        • Phase behavior of Cacio and Pepe sauce
        • Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
        • Phi-4 Technical Report
        • Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
        • Phonetic Analysis of Self-supervised Representations of English Speech
        • Physician Detection of Clinical Harm in Machine Translation: Quality Estimation Aids in Reliance and Backtranslation Identifies Critical Errors
        • Pitfalls and Outlooks in Using COMET
        • PIXAR: Auto-Regressive Language Modeling in Pixel Space
        • PLACEHOLDER hertz-dev - Standard Intelligence
        • Playing Atari with Deep Reinforcement Learning
        • Playing Language Game with LLMs Leads to Jailbreaking
        • Poisoning Language Models During Instruction Tuning
        • Poisoning Web-Scale Training Datasets is Practical
        • PolyLM: An Open Source Polyglot Large Language Model
        • PolyVoice: Language Models for Speech to Speech Translation
        • Position: Categorical Deep Learning is an Algebraic Theory of All Architectures
        • Practical recommendations for gradient-based training of deep architectures
        • Prefix-Tuning: Optimizing Continuous Prompts for Generation
        • Preliminary WMT24 Ranking of General MT Systems and LLMs
        • Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
        • Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions
        • Principles of Visual Tokens for Efficient Video Understanding
        • Probabilistic Artificial Intelligence
        • Probabilistic encryption & how to play mental poker keeping secret all partial information
        • Probing the phonetic and phonological knowledge of tones in Mandarin TTS models
        • Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
        • Progress Report: Towards European LLMs
        • Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
        • Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
        • Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models
        • Prompting Large Language Models with Speech Recognition Abilities
        • Prompting with Phonemes: Enhancing LLM Multilinguality for non-Latin Script Languages
        • Property Neurons in Self-Supervised Speech Transformers
        • Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis
        • Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases
        • Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features
        • Proximal Policy Optimization Algorithms
        • PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems
        • Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition
        • Pushing the Limits of Zero-shot End-to-End Speech Translation
        • Pyramid Feature Attention Network for Saliency detection
        • Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
        • Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression
        • Qualitatively characterizing neural network optimization problems
        • Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
        • Quality-Aware Decoding for Neural Machine Translation
        • Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting
        • Quantifying Memorization Across Neural Language Models
        • Quantifying the Plausibility of Context Reliance in Neural Machine Translation
        • Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
        • Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
        • Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
        • Qwen2 Technical Report
        • Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
        • Randomized Approximation of the Gram Matrix: Exact Computation and Probabilistic Bounds
        • Re-ranking Person Re-identification with k-reciprocal Encoding
        • Read, Look or Listen? What's Needed for Solving a Multimodal Dataset
        • Reading Digits in Natural Images with Unsupervised Feature Learning
        • Real Time Speech Enhancement in the Waveform Domain
        • ReALM: Reference Resolution As Language Modeling
        • Recent Advances in Direct Speech-to-text Translation
        • Recent Advances in Speech Language Models: A Survey
        • Recent Developments on ESPnet Toolkit Boosted by Conformer
        • RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation
        • Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors
        • Recurrent Memory Transformer
        • Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
        • Reducing Activation Recomputation in Large Transformer Models
        • Reducing the Dimensionality of Data with Neural Networks
        • Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning
        • Reformer: The Efficient Transformer
        • Reframing Human-AI Collaboration for Generating Free-Text Explanations
        • Regularized Evolution for Image Classifier Architecture Search
        • Reinforcement Learning: An Overview
        • Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
        • Relative representations enable zero-shot latent space communication
        • Replacing the do-calculus with Bayes rule
        • Representation Learning with Contrastive Predictive Coding
        • Representational dissimilarity metric spaces for stochastic neural networks
        • Representational similarity analysis – connecting the branches of systems neuroscience
        • Representations of language in a model of visually grounded speech signal
        • Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
        • Reranking Laws for Language Generation: A Communication-Theoretic Perspective
        • ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech
        • Residual Contrastive Learning for Image Reconstruction: Learning Transferable Representations from Noisy Images
        • Retentive Network: A Successor to Transformer for Large Language Models
        • Rethinking and Improving Multi-task Learning for End-to-end Speech Translation
        • Rethinking Attention with Performers
        • Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
        • Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective
        • Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
        • Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
        • Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
        • Revisiting Acoustic Features for Robust ASR
        • Revisiting Feature Prediction for Learning Visual Representations from Video
        • Revisiting minimum description length complexity in overparameterized models
        • Revisiting Model Stitching to Compare Neural Representations
        • Revisiting Over-Smoothness in Text to Speech
        • Revisiting Self-Distillation
        • Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective
        • Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
        • ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos
        • Rho-1: Not All Tokens Are What You Need
        • Risks from Learned Optimization in Advanced Machine Learning Systems
        • RoBERTa: A Robustly Optimized BERT Pretraining Approach
        • Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
        • Robust Speech Recognition via Large-Scale Weak Supervision
        • Robustness May Be at Odds with Accuracy
        • RoFormer: Enhanced Transformer with Rotary Position Embedding
        • Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts
        • RWKV: Reinventing RNNs for the Transformer Era
        • S2ORC: The Semantic Scholar Open Research Corpus
        • SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives
        • SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
        • SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation
        • SALMONN: Towards Generic Hearing Abilities for Large Language Models
        • Sample Efficient Adaptive Text-to-Speech
        • SaulLM-7B: A pioneering Large Language Model for Law
        • SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain
        • Scalable Diffusion Models with Transformers
        • Scalable Expectation Estimation with Subtractive Mixture Models
        • Scaling Analysis of Interleaved Speech-Text Language Models
        • Scaling Instructable Agents Across Many Simulated Worlds
        • Scaling Language Models: Methods, Analysis & Insights from Training Gopher
        • Scaling Laws for Generative Mixed-Modal Language Models
        • Scaling Laws for Multilingual Neural Machine Translation
        • Scaling Laws for Neural Language Models
        • Scaling Laws for Reward Model Overoptimization
        • Scaling Laws for Transfer
        • Scaling Properties of Speech Language Models
        • Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
        • Scaling Speech Technology to 1,000+ Languages
        • Scaling Transformer to 1M tokens and beyond with RMT
        • Scaling Transformers for Low-Bitrate High-Quality Speech Coding
        • Scaling Up Influence Functions
        • Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
        • Scaling Vision with Sparse Mixture of Experts
        • Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
        • SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation
        • Score-Based Generative Modeling through Stochastic Differential Equations
        • Seamless: Multilingual Expressive and Streaming Speech Translation
        • SeamlessM4T: Massively Multilingual & Multimodal Machine Translation
        • SEANet: A Multi-modal Speech Enhancement Network
        • Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation Capability
        • Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
        • Selective State Space Model for Monaural Speech Enhancement
        • Self-Alignment with Instruction Backtranslation
        • Self-Attention with Relative Position Representations
        • Self-Chained Image-Language Model for Video Localization and Question Answering
        • Self-critical Sequence Training for Image Captioning
        • Self-Instruct: Aligning Language Model with Self Generated Instructions
        • Self-Instruct: Aligning Language Models with Self-Generated Instructions
        • Self-labelling via simultaneous clustering and representation learning
        • Self-Rewarding Language Models
        • Self-supervised Context-aware Style Representation for Expressive Speech Synthesis
        • Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation
        • Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
        • Self-Supervised Learning of Pretext-Invariant Representations
        • Self-Supervised Speech Representation Learning: A Review
        • Self-Supervised Speech Representations are More Phonetic than Semantic
        • Self-supervised Video Object Segmentation by Motion Grouping
        • Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey
        • Self-Taught Evaluators
        • SELM: Speech Enhancement Using Discrete Tokens and Language Models
        • Sentence Length
        • Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
        • SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
        • Sequence Level Training with Recurrent Neural Networks
        • Sequence-Level Knowledge Distillation
        • SGDR: Stochastic Gradient Descent with Warm Restarts
        • Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
        • Shortcut Learning in Deep Neural Networks
        • Shortformer: Better Language Modeling using Shorter Inputs
        • Should You Mask 15% in Masked Language Modeling?
        • SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction
        • Sigmoid Loss for Language Image Pre-Training
        • Simple and Controllable Music Generation
        • Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
        • Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning
        • Simple, Scalable Adaptation for Neural Machine Translation
        • Simplifying Transformer Blocks
        • Skip-Thought Vectors
        • SLIC Superpixels Compared to State-of-the-Art Superpixel Methods
        • SliceGPT: Compress Large Language Models by Deleting Rows and Columns
        • SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
        • SLURP: A Spoken Language Understanding Resource Package
        • Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
        • Snapshot Ensembles: Train 1, get M for free
        • Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
        • SODA: Story Oriented Dense Video Captioning Evaluation Framework
        • Soft Merging of Experts with Adaptive Routing
        • softmax is not enough (for sharp out-of-distribution)
        • SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
        • SONAR: Sentence-Level Multimodal and Language-Agnostic Representations
        • Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
        • SoundStorm: Efficient Parallel Audio Generation
        • SoundStream: An End-to-End Neural Audio Codec
        • Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources
        • Space-Time Correspondence as a Contrastive Random Walk
        • SpanBERT: Improving Pre-training by Representing and Predicting Spans
        • Sparks of Artificial General Intelligence: Early experiments with GPT-4
        • Sparse and Continuous Attention Mechanisms
        • Sparse and Structured Hopfield Networks
        • Sparse Attention with Linear Units
        • Sparse Autoencoders Find Highly Interpretable Features in Language Models
        • Sparse Communication via Mixed Distributions
        • Sparse continuous distributions and Fenchel-Young losses
        • Sparse Sequence-to-Sequence Models
        • Sparse Text Generation
        • Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
        • Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
        • SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
        • SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities
        • Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition
        • Speech Foundation Models and Crowdsourcing for Efficient, High-Quality Data Collection
        • Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
        • Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
        • Speech Translation with Large Language Models: An Industrial Practice
        • Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
        • Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond
        • Speech-to-Speech Translation For A Real-world Unwritten Language
        • SpeechAlign: Aligning Speech Generation to Human Preferences
        • SpeechBrain: A General-Purpose Speech Toolkit
        • SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
        • SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation
        • SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation
        • SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities
        • SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
        • SpeechT: Findings of the First Mentorship in Speech Translation
        • SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
        • SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
        • SpeechVerse: A Large-scale Generalizable Audio Language Model
        • SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
        • Speed/accuracy trade-offs for modern convolutional object detectors
        • SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
        • SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition
        • SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training
        • SpiRit-LM: Interleaved Spoken and Written Language Model
        • Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction
        • Spoken Language Corpora Augmentation with Domain-Specific Voice-Cloned Speech
        • Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
        • Spread Flows for Manifold Modelling
        • SQ-GAN: Semantic Image Communications Using Masked Vector Quantization
        • SQuId: Measuring Speech Naturalness in Many Languages
        • ST-LLM: Large Language Models Are Effective Temporal Learners
        • Stabilising and accelerating light gated recurrent units for automatic speech recognition
        • Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
        • STAR: A Benchmark for Situated Reasoning in Real-World Videos
        • StarSpace: Embed All The Things!
        • State Spaces Aren't Enough: Machine Translation Needs Attention
        • Statistical Rejection Sampling Improves Preference Optimization
        • Stealing Part of a Production Language Model
        • Stealing User Prompts from Mixture of Experts
        • Steerable CNNs
        • Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning
        • Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
        • Step-by-Step Diffusion: An Elementary Tutorial
        • STLight: a Fully Convolutional Approach for Efficient Predictive Learning by Spatio-Temporal joint Processing
        • Stochastic Average Gradient : A Simple Empirical Investigation
        • Stochastic Taylor Derivative Estimator: Efficient amortization for arbitrary differential operators
        • StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection
        • Structured Neural Summarization
        • Structured Pruning of Large Language Models
        • Structured Training for Neural Network Transition-Based Parsing
        • Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
        • Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
        • StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
        • Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates
        • Super Tiny Language Models
        • SUPERB: Speech processing Universal PERformance Benchmark
        • SuperBPE: Space Travel for Language Models
        • SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
        • Supervised Contrastive Learning
        • Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
        • Surrogate Gradient Learning in Spiking Neural Networks
        • Survey of Automatic Metrics for Evaluating Machine Translation at the Document Level
        • Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
        • Surveying the MLLM Landscape: A Meta-Review of Current Surveys
        • SVLA: A Unified Speech-Vision-Language Assistant with Multimodal Reasoning and Speech Generation
        • SWEb: A Large Web Dataset for the Scandinavian Languages
        • Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
        • SyllableLM: Learning Coarse Semantic Units for Speech Language Models
        • Symbolic Discovery of Optimization Algorithms
        • Synthetic DNA applications in information technology
        • T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining
        • T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine Translation
        • Tacotron: Towards End-to-End Speech Synthesis
        • Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics
        • Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models
        • Task Singular Vectors: Reducing Task Interference in Model Merging
        • Task Vectors are Cross-Modal
        • Task-aware Retrieval with Instructions
        • Task-Aware Unified Source Separation
        • TASTY: A Transformer based Approach to Space and Time complexity
        • Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training
        • TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation
        • TED-LIUM: an Automatic Speech Recognition dedicated corpus
        • Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting
        • Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs
        • Text and Code Embeddings by Contrastive Pre-Training
        • Text-Free Prosody-Aware Generative Spoken Language Modeling
        • Textbooks Are All You Need
        • Textless Speech-to-Speech Translation on Real Data
        • Textually Pretrained Speech Language Models
        • Texygen: A Benchmarking Platform for Text Generation Models
        • TGIF: A New Dataset and Benchmark on Animated GIF Description
        • The "something something" video database for learning and evaluating visual common sense
        • The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition
        • The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
        • The AMI Meeting Corpus
        • The Anatomy of a Large-Scale Hypertextual Web Search Engine
        • The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation
        • The Biological Basis of Audition
        • The boundary of neural network trainability is fractal
        • The case for 4-bit precision: k-bit Inference Scaling Laws
        • The Causal-Neural Connection: Expressiveness, Learnability, and Inference
        • The challenge of realistic music generation: modelling raw audio at scale
        • The Curious Case of Neural Text Degeneration
        • The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
        • The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
        • The Defeat of the Winograd Schema Challenge
        • The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
        • The Elements of Differentiable Programming
        • The Emotions of the Crowd: Learning Image Sentiment from Tweets via Cross-modal Distillation
        • The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
        • The first collision for full SHA-1
        • The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
        • The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
        • The Forward-Forward Algorithm: Some Preliminary Investigations
        • The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction
        • The Goldilocks zone: Towards better understanding of neural network loss landscapes
        • The Hardware Lottery
        • The Hungarian Method for the Assignment Problem
        • The Inside Story: Towards Better Understanding of Machine Translation Neural Evaluation Metrics
        • The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
        • The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results
        • The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
        • The JOREK non-linear extended MHD code and applications to large-scale instabilities and their control in magnetically confined fusion plasmas
        • The Kinetics Human Action Video Dataset
        • The Llama 3 Herd of Models
        • The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
        • The Marginal Value of Adaptive Gradient Methods in Machine Learning
        • The Matrix Calculus You Need For Deep Learning
        • The Metropolis-Hastings algorithm
        • The Modern Mathematics of Deep Learning
        • The Multimodal Universe: Enabling Large-Scale Machine Learning with 100TB of Astronomical Scientific Data
        • The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence
        • The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage
        • The Pile: An 800GB Dataset of Diverse Text for Language Modeling
        • The pitfalls of next-token prediction
        • The Power of Scale for Parameter-Efficient Prompt Tuning
        • The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
        • The Relativity of Causal Knowledge
        • The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks
        • The Semantic Scholar Open Data Platform
        • The semantics of the (so-called) clausal determiner nÂŽo in Akan (Kwa)
        • The Seven Tools of Causal Inference with Reflections on Machine Learning
        • The Spotify Podcast Dataset
        • The sun compass revisited
        • The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval
        • The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language
        • The THUMOS Challenge on Action Recognition for Videos "in the Wild"
        • The Topological BERT: Transforming Attention into Topology for Natural Language Processing
        • The unreasonable effectiveness of few-shot learning for machine translation
        • The VoiceMOS Challenge 2022
        • The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning
        • The Winograd schema challenge
        • The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling
        • The Zero Resource Speech Challenge 2021: Spoken language modelling
        • Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
        • Three models for the description of language
        • Time-Contrastive Networks: Self-Supervised Learning from Video
        • Tiny Pointers
        • TinyLlama: An Open-Source Small Language Model
        • TinyLLaVA: A Framework of Small-scale Large Multimodal Models
        • Titans: Learning to Memorize at Test Time
        • TLDR: Extreme Summarization of Scientific Documents
        • TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
        • Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs
        • Toolformer: Language Models Can Teach Themselves to Use Tools
        • TopoBenchmarkX: A Framework for Benchmarking Topological Deep Learning
        • Toward Joint Language Modeling for Speech Units and Text
        • Towards a definition of transcreation: a systematic literature review
        • Towards audio language modeling -- an overview
        • Towards Automatic Learning of Procedures from Web Instructional Videos
        • Towards Causal Representation Learning
        • Towards Deep Learning Models Resistant to Adversarial Attacks
        • Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
        • Towards Expert-Level Medical Question Answering with Large Language Models
        • Towards Measuring Fairness in AI: the Casual Conversations Dataset
        • Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
        • Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR
        • Towards Robust Speech Representation Learning for Thousands of Languages
        • Towards Understanding Grokking: An Effective Theory of Representation Learning
        • Towards Understanding Sycophancy in Language Models
        • Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
        • Tower: An Open Multilingual Large Language Model for Translation-Related Tasks
        • Toxicity of the Commons: Curating Open-Source Pre-Training Data
        • Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
        • Training Adaptive Computation for Open-Domain Question Answering with Computational Constraints
        • Training Compute-Optimal Large Language Models
        • Training data-efficient image transformers & distillation through attention
        • Training Deep Nets with Sublinear Memory Cost
        • Training language models to follow instructions with human feedback
        • Training Language Models with Memory Augmentation
        • Training Neural Networks from Scratch with Parallel Low-Rank Adapters
        • Training Verifiers to Solve Math Word Problems
        • Transcendence: Generative Models Can Outperform The Experts That Train Them
        • Transductive Active Learning: Theory and Applications
        • Transferable speech-to-text large language model alignment module
        • Transformation of Mean Opinion Scores to Avoid Misleading of Ranked based Statistical Techniques
        • Transformer Feed-Forward Layers Are Key-Value Memories
        • Transformer Networks for Trajectory Forecasting
        • Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
        • TransformerFAM: Feedback attention is working memory
        • Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
        • Transformers learn in-context by gradient descent
        • Transformers need glasses! Information over-squashing in language tasks
        • Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral
        • Translating Step-by-Step: Decomposing the Translation Process for Improved Translation Quality of Long-Form Texts
        • Translation in the Hands of Many:Centering Lay Users in Machine Translation Interactions
        • Translatotron 2: High-quality direct speech-to-speech translation with voice preservation
        • Translatotron 3: Speech to Speech Translation with Monolingual Data
        • Transparent and Scrutable Recommendations Using Natural Language User Profiles
        • TruthfulQA: Measuring How Models Mimic Human Falsehoods
        • TuBA: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning
        • Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring
        • TVQA: Localized, Compositional Video Question Answering
        • Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps
        • Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
        • U-Net: Convolutional Networks for Biomedical Image Segmentation
        • UL2: Unifying Language Learning Paradigms
        • UltraFeedback: Boosting Language Models with Scaled AI Feedback
        • UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition
        • Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation
        • Uncovering Latent Style Factors for Expressive Speech Synthesis
        • Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View
        • Understanding Black-box Predictions via Influence Functions
        • Understanding deep learning requires rethinking generalization
        • Understanding Intra-Class Knowledge Inside CNN
        • Understanding natural language
        • Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation
        • UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
        • UniAudio: An Audio Foundation Model Toward Universal Audio Generation
        • UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control
        • Unified Language Model Pre-training for Natural Language Understanding and Generation
        • Unified Speech-Text Pretraining for Spoken Dialog Modeling
        • Unified Video-Language Pre-training with Synchronized Audio
        • Unified Vision-Language Pre-Training for Image Captioning and VQA
        • Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources
        • UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data
        • Unitary Evolution Recurrent Neural Networks
        • UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
        • Universal Language Model Fine-tuning for Text Classification
        • Universal Transformers
        • UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation
        • Unlimiformer: Long-Range Transformers with Unlimited Length Input
        • Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance
        • Unsupervised Cross-lingual Representation Learning at Scale
        • Unsupervised Deep Tracking
        • Unsupervised Dense Information Retrieval with Contrastive Learning
        • Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination
        • Unsupervised Learning by Competing Hidden Units
        • Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data
        • Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
        • Unsupervised Neural Machine Translation
        • Unsupervised Source Separation via Bayesian Inference in the Latent Domain
        • Unsupervised Translation of Programming Languages
        • Unsupervised Visual Representation Learning by Context Prediction
        • Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mechanism
        • Unveiling the Role of Pretraining in Direct Speech Translation
        • URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors
        • Using Forced Alignment for Phonetics Research
        • Using the Output Embedding to Improve Language Models
        • UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022
        • VALHALLA: Visual Hallucination for Machine Translation
        • VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
        • VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
        • Variable-rate hierarchical CPC leads to acoustic unit discovery in speech
        • Variational Bayes: A report on approaches and applications
        • Variational Inference: A Review for Statisticians
        • VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
        • VCoder: Versatile Vision Encoders for Multimodal Large Language Models
        • Vec-Tok Speech: speech vectorization and tokenization for neural speech generation
        • VeLO: Training Versatile Learned Optimizers by Scaling Up
        • Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
        • Video as the New Language for Real-World Decision Making
        • Video Instruction Tuning With Synthetic Data
        • Video Swin Transformer
        • Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
        • Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
        • Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
        • Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
        • VideoBERT: A Joint Model for Video and Language Representation Learning
        • VideoChat: Chat-Centric Video Understanding
        • VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
        • VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding
        • VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
        • VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation
        • VideoPrism: A Foundational Visual Encoder for Video Understanding
        • VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
        • VIMA: General Robot Manipulation with Multimodal Prompts
        • VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
        • Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
        • Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
        • Vision Transformers Need Registers
        • Vision-Language Integration in Multimodal Video Transformers (Partially) Aligns with the Brain
        • Vision-Speech Models: Teaching Speech Models to Converse about Images
        • ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric
        • Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
        • Visual Instruction Tuning
        • Visual Prompt Tuning
        • Visualizing and Understanding Convolutional Networks
        • Visualizing Data using t-SNE
        • Visualizing the Loss Landscape of Neural Nets
        • VITA: Towards Open-Source Interactive Omni Multimodal LLM
        • Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
        • VoiceBench: Benchmarking LLM-Based Voice Assistants
        • Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
        • VoxCeleb2: Deep Speaker Recognition
        • VoxCommunis: A Corpus for Cross-linguistic Phonetic Analysis
        • VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
        • Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
        • W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
        • Wasserstein GAN
        • Watching the Watchers: Exposing Gender Disparities in Machine Translation Quality Estimation
        • Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
        • Watt For What: Rethinking Deep Learning's Energy-Performance Relationship
        • wav2letter++: The Fastest Open-source Speech Recognition System
        • Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning
        • wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
        • wav2vec: Unsupervised Pre-training for Speech Recognition
        • WavChat: A Survey of Spoken Dialogue Models
        • Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
        • WaveGlow: A Flow-based Generative Network for Speech Synthesis
        • WaveNet: A Generative Model for Raw Audio
        • WavLLM: Towards Robust and Adaptive Speech Large Language Model
        • WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
        • WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
        • Weighted Voronoi Stippling
        • WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition
        • What Are They Doing? Joint Audio-Speech Co-Reasoning
        • What Are Tools Anyway? A Survey from the Language Model Perspective
        • What Do Speech Foundation Models Not Learn About Speech?
        • What Does BERT Look At? An Analysis of BERT's Attention
        • What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning
        • What matters when building vision-language models?
        • What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
        • What Should Not Be Contrastive in Contrastive Learning
        • What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study
        • What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?
        • What's In My Big Data?
        • When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion
        • When Do Neural Networks Outperform Kernel Methods?
        • When Does Translation Require Context? A Data-driven, Multilingual Exploration
        • When Good and Reproducible Results are a Giant with Feet of Clay: The Importance of Software Quality in NLP
        • When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
        • Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation
        • Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
        • Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
        • Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers
        • Why Larger Language Models Do In-context Learning Differently?
        • Why should we add early exits to neural networks?
        • Why Warmup the Learning Rate? Underlying Mechanisms and Improvements
        • WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia
        • WinoGrande: An Adversarial Winograd Schema Challenge at Scale
        • WinoWhy: A Deep Diagnosis of Essential Commonsense Knowledge for Answering Winograd Schema Challenge
        • Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective
        • Word Emdeddings through Hellinger PCA
        • Word Translation Without Parallel Data
        • Word-prosodic typology
        • word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method
        • WT5?! Training Text-to-Text Models to Explain their Predictions
        • X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
        • xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection
        • XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
        • XGBoost: A Scalable Tree Boosting System
        • xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
        • XL-WSD An Extra-Large and Cross-Lingual Evaluation Framework for Word Sense Disambiguation
        • XLNet: Generalized Autoregressive Pretraining for Language Understanding
        • XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
        • xLSTM-SENet: xLSTM for Single-Channel Speech Enhancement
        • xLSTM: Extended Long Short-Term Memory
        • XNLI: Evaluating Cross-lingual Sentence Representations
        • XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech
        • xSIM++: An Improved Proxy to Bitext Mining Performance for Low-Resource Languages
        • XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
        • xTower: A Multilingual LLM for Explaining and Correcting Translation Errors
        • XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
        • Yet Another Algorithm for Pitch Tracking
        • Yi: Open Foundation Models by 01.AI
        • YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
        • YuE: Scaling Open Foundation Models for Long-Form Music Generation
        • Zephyr: Direct Distillation of LM Alignment
        • Zero-shot Speech Translation
        • Zero-Shot Tokenizer Transfer
        • Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations
        • ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training
        • Aaron van den Oord
        • Abdelrahman Mohamed
        • Adam Polyak
        • Adel Moumen
        • Afra Alishahi
        • Agustinus Kristiadi
        • Akari Asai
        • Alan Jeffares
        • Aldo Lipani
        • Aleksa Gordić
        • Alessio Devoto
        • Alex Graves
        • Alex H. Williams
        • Alex Krizhevsky
        • Alexander Kolesnikov
        • Alexander M. Rush
        • Alexandra Birch
        • Alexandre DĂ©fossez
        • Alexei A. Efros
        • Alexey Dosovitskiy
        • Alexis Conneau
        • Alicia Curth
        • AmĂ©lie Royer
        • AndrĂ© F. T. Martins
        • AndrĂ© Martins
        • Andrea Bacciu
        • Andrej Karpathy
        • Andrew K. Lampinen
        • Andrew Zisserman
        • Anil Batra
        • Anil Keshwani
        • Anna Rogers
        • AntĂłnio Farinhas
        • Antonio Vergari
        • Armand Joulin
        • Artem Ploujnikov
        • Badr M. Abdullah
        • Barry Haddow
        • Beatrice Savoldi
        • Belen Alastruey
        • Ben Peters
        • Benjamin Minixhofer
        • Beomseok Lee
        • Boris Ginsburg
        • Bruno Martins
        • Cagri Toraman
        • Carla Bombi
        • Celestine Mendler-DĂŒnner
        • Cem Subakan
        • Christian Szegedy
        • Christopher D. Manning
        • Chrysoula Zerva
        • Daniele Venturi
        • David Duvenaud
        • David Ha
        • David R. Mortensen
        • David Silver
        • Dennis Fucci
        • Diederik P. Kingma
        • Dietrich Klakow
        • Donato Crisostomi
        • Dong Zhang
        • Douwe Kiela
        • Duarte M. Alves
        • Edoardo Debenedetti
        • Edoardo Maria Ponti
        • Edouard Grave
        • Edward Grefenstette
        • Ekaterina Shutova
        • Eliezer de Souza da Silva
        • Emanuele RodolĂ 
        • Emine Yilmaz
        • Emmanouil Zaranis
        • Emmanuel Dupoux
        • Essam Sleiman
        • Eugene Kharitonov
        • Fabio Galasso
        • Fabrizio Silvestri
        • Felix Kreuk
        • Ferenc HuszĂĄr
        • Francesco Cariaggi
        • Frank Keller
        • Gabriel Synnaeve
        • Gabriele Sarti
        • Gautier Izacard
        • Geoffrey Hinton
        • Gergely Neu
        • Giuseppe Attanasio
        • Graham K. Taylor
        • Graham Neubig
        • Grzegorz ChrupaƂa
        • Guillaume Lample
        • H. W. Kuhn
        • Haibin Wu
        • Hao Tang
        • Haytham M Fayek
        • Hector J. Levesque
        • Holger Schwenk
        • Hosein Mohebbi
        • Hossein A. Rahmani
        • Hugo Pitorro
        • Ian J. Goodfellow
        • Ilya Sutskever
        • Ishan Misra
        • Itai Gat
        • Jade Copet
        • James Allen
        • James Chapman
        • Jan Niehues
        • Jarod Duret
        • Javier Iranzo-SĂĄnchez
        • Jay Alammar
        • Jean-Baptiste Alayrac
        • Jeremy Howard
        • Jonas HĂŒbotter
        • JosĂ© G. C. de Souza
        • JosĂ© Pombal
        • Joshua Ainslie
        • Judea Pearl
        • Julia Kempe
        • JĂŒrgen A. Schmidhuber
        • Karen Livescu
        • Kevin Flanagan
        • Kevin Murphy
        • Kohei Saijo
        • Kshitij Ambilduke
        • Kushal Lakhotia
        • Kyunghyun Cho
        • Larry M. Hyman
        • Laura Ruis
        • Laura Sevilla-Lara
        • Laurent Besacier
        • Laurent MazarĂ©
        • Lianmin Zheng
        • Lilian Weng
        • Luca Della Libera
        • Luca Franco
        • Luca Soldaini
        • Lucas Beyer
        • Luisa Bentivogli
        • Ɓukasz Kaiser
        • Luke Zettlemoyer
        • Maarten Sap
        • Marc Stevens
        • Marcely Zanon Boito
        • Marco Gaido
        • Marco Tagliasacchi
        • Marcos Treviso
        • Marcus Rohrbach
        • Maria Antoniak
        • Maria Sofia Bucarelli
        • Mark Mazumder
        • Martijn Bartelds
        • Mathilde Caron
        • Matteo Negri
        • Matthew D Zeiler
        • Mauro Cettolo
        • Max Bartolo
        • Max Welling
        • Michael Hassid
        • Michele Miranda
        • Mihaela van der Schaar
        • Miles Cranmer
        • Mirco Ravanelli
        • Moritz Böhle
        • Neil Zeghidour
        • Nicholas Carlini
        • Nils Reimers
        • Nina Miolane
        • Nuno M. Guerreiro
        • Oleksii Hrinchuk
        • Onur Mutlu
        • Oriol Vinyals
        • Paolo Mandica
        • Pasquale Minervini
        • Patrick Fernandes
        • Paul Röttger
        • Paul-Ambroise Duquenne
        • Pavlo Vasylenko
        • Petar Veličković
        • Peter Holderrieth
        • Pieter Abbeel
        • Pooneh Mousavi
        • Quoc Le
        • Quoc V. Le
        • RamĂłn Fernandez Astudillo
        • Razvan Pascanu
        • Ricardo Rei
        • Rico Sennrich
        • Roberto Navigli
        • Rohan Ramasamy
        • Ronan Collobert
        • Rongjie Huang
        • Rowan Zellers
        • Ruoming Pang
        • Salah Zaiem
        • Samuel R. Bowman
        • Sander Land
        • Sanyuan Chen
        • Sara Papi
        • Saul Santos
        • Sebastian Raschka
        • Sebastian Riedel
        • Sebastian Ruder
        • Sergey Ioffe
        • Shay B. Cohen
        • Shayne Longpre
        • Shinji Watanabe
        • Shital Shah
        • Shreyank N Gowda
        • Simon Willison
        • Simone Conia
        • Simone Scardapane
        • Sonal Sannigrahi
        • Stanislav Fort
        • Steven McDonagh
        • Taku Kudo
        • Tal Remez
        • Tatsunori B. Hashimoto
        • Telmo Pessoa Pires
        • Thomas Palmeira Ferraz
        • Tim Dettmers
        • Tim RocktĂ€schel
        • Titouan Parcollet
        • Tsz Kin Lam
        • Tu-Anh Nguyen
        • Vadim Borisov
        • Vaishnavh Nagarajan
        • Vijay Janapa Reddi
        • Vivek Iyer
        • Vlad Niculae
        • Wei-Ning Hsu
        • Xin Zhang
        • Xinyue Hao
        • Xipeng Qiu
        • Xubo Liu
        • Yair Lakretz
        • Yann LeCun
        • Yifan Peng
        • Yonatan Belinkov
        • Yoshua Bengio
        • Yossi Adi
        • Zalan Borsos
        • ZalĂĄn Borsos
        • An Evolutionary Perspective on Language
        • Animal Navigation Systems
        • Bayes: Conjugate Inference
        • CPC: Representation Learning with Contrastive Predictive Coding
        • Four Early Lessons from Working on Machine Learning Projects
        • Generalized Linear Models and the Exponential Family
        • Graphs: Community Structure
        • Graphs: Motifs, Graphlets and Structural Roles in Networks
        • Jabri, Owens and Efros (2020) Space-Time Correspondence as a Contrastive Random Walk
        • LSTMs + Grammar as a Foreign Language
        • Mean, Median and Mode as Representatives
        • Self-Supervised Visual Representation Learning
        • Some Information Theory
        • The Hierarchical Softmax
        • The Probability Distributions
        • The Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
          • Audio Concept Zoo
          • Audio Formulas and Snippets
          • Audio Signal Processing
          • Audio, Speech and Music Tools
            • 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing – Celebrating Signal Processing
            • Author kit instructions – 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing
            • Important Dates – 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing
            • Publishing and Paper Presentation Options – 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing
            • 2024 Conference
            • Blogposts Track ICLR 2024 Announcing Accepted Blogposts – ICLR Blog
            • ICLR 2024 Outstanding Paper Awards – ICLR Blog
            • ICLR 2024 Papers
            • ICLR 2024 Test of Time Award – ICLR Blog
            • ICLR2024 Papers - a Hugging Face Space by ICLR2024
            • 2025 Dates and Deadlines
            • Announcing the NeurIPS 2024 Test of Time Paper Awards – NeurIPS Blog
            • Dynamic Sparsity in Machine Learning NeurIPS 2024 Tutorial
            • NeurIPS 2024 Call for Papers
          • ACAIN 2025 – Advanced Course & Symposium on Artificial Intelligence and Neuroscience
          • Conferences
          • I Can’t Believe It’s Not Better Initiative - ICLR Workshop 2025 - Call for Papers
          • ICLR
          • ICTIR 2024
          • International Conference on the Theory of Information Retrieval (ICTIR) - SIGIR
          • Interspeech (International Speech Communication Association)
          • Interspeech 2025 - Call for Papers
          • Interspeech 2025 - Challenges
          • Interspeech 2025 - Home
          • NLP4DH - NLP4DH & IWCLUL 2023
          • SIGdial – Special Interest Group on Discourse and Dialogue
          • SIGIR 2024
          • Buckeye Corpus Information
          • DoReCo - Homepage
          • HuggingFaceM4/the_cauldron · Datasets at Hugging Face
          • iisys-hof/HUI-Audio-Corpus-German: This is the official repository for the HUI-Audio-Corpus-German. The corresponding paper is in the process of publication. With the repository it is possible to automatically recreate the dataset. It is also possible to add more speakers to the processing pipeline.
          • imdatceleste/m-ailabs-dataset: This is the M-AILABS Speech Dataset
          • Multilingual Spoken Words Dataset | MLCommons Datasets
          • OpenMIC-2018
          • People's Speech Dataset | MLCommons Datasets
          • PleIAs/common_corpus · Datasets at Hugging Face
          • RecipeNLG
          • RedPajama-Data-v2 An open dataset with 30 trillion tokens for training large language models
          • The LJ Speech Dataset
          • TIMIT Acoustic-Phonetic Continuous Speech Corpus - Linguistic Data Consortium
          • VCTK
          • Companies
          • EU Grants and Initiatives
          • Grants
            • Japanese
            • Mandarin
            • Turkish
          • anaphora
          • clitic
          • evidentiality
          • Languages of the World
          • Linguistics Notes
          • Phonetics vs Phonemics (Phonology)
          • realis
          • selection
          • Writing Systems
          • ELIAS-ELLIS-VISMAC Winter School 2025 | elias-ai
          • ELLIS Winter School on Foundation Models - Amsterdam 2024
          • LxMLS 2024
        • Adversarial Attacks
        • AI in Society
        • Bayesian Neural Networks
        • Causal Inference
        • Datasets
        • Diffusion Models
        • Efficient Machine Learning
        • Embeddings
        • Energy Based Models
        • eXplainability
        • Fairness
        • Flow Networks
        • G2P
        • Gaussian Processes
        • Generative Adversarial Networks
        • Hardware
        • Implementation
        • Information Retrieval
        • Information Theory
        • ISO Standards
        • Language Identification
        • Language Models
        • Llamas 🩙
        • Machine Translation
        • Multimodality
        • Music
        • Natural Language Inference
        • Neuroscience
        • Optimisation
        • Optimisation - Loss Functions
        • Recommendation Systems
        • Reinforcement Learning
        • Robotics
        • Speech and Audio
        • Statistical Learning Theory
        • Theory of Deep Learning
        • Variational Autoencoders
        • Variational Inference
        • Vision
        • Winograd and WinoGrande
        • Word Sense Disambiguation
        • Jobs, Careers, Companies
        • Kernels 🍿 & Support Vector Machines
        • Machine Learning
        • Math
        • Natural Language Processing
        • Neural Networks
        • Physics
        • Read
        • Signal Processing
        • Statistics and Probability
        • Unsorted
          • Conversational AI Reading Group
          • Launchpad
          • Monthy online Linguistique Informatique, Formelle et de Terrain (LIFT) Seminar
        • Analysing & Summarizing Movies via Turning Point Identification in Screenplays - Frank Keller
        • Designing efficient and modular neural networks - Simone Scardapane
        • Discrete Audio Tokens for Multimodal LLMs - Mirco Ravanelli
        • Efficient Transformers - Ɓukasz Kaiser
        • Hurdles at WMT - Keeping up with the MT progress - Tom Kocmi
        • Improving Universal Access to Modern Speech Technology - Martijn Bartelds
    Home

    ❯

    Notes

    ❯

    Goodreads

    Goodreads

    29 Apr 20251 min read

    • project

    Goodreads mirror reviews and ratings on webpage/blog/wiki.

    Goodreads seems to have discontinued support for their API on 2020-12-17 as described in Does Goodreads support the use of APIs?.

    To import or export your books, go to My Books, then click on Import and export under Tools on the left.

    • Goodreads Scraper - funnily enough this is written by Maria Antoniak
    • Goodreads API Reference

    Project repo: https://github.com/anilkeshwani/goodreads


    Graph View

    Backlinks

    • Reading
    • Website
    • Bluesky
    • Twitter/X
    • GitHub
    • LinkedIn
    • Instagram
    • Goodreads
    • Letterboxd
    • 🍋