Notable LLMs
See also 👉 Llamas 🦙
- EuroLLM Multilingual Language Models for Europe
- SaulLM-7B A pioneering Large Language Model for Law
- Tower An Open Multilingual Large Language Model for Translation-Related Tasks - Tower
- Finetuned Language Models Are Zero-Shot Learners - FLAN
- A Paradigm Shift in Machine Translation Boosting Translation Performance of Large Language Models - ALMA
- Contrastive Preference Optimization Pushing the Boundaries of LLM Performance in Machine Translation - ALMA-R
- Gemma Open Models Based on Gemini Research and Technology
- PaLM 2 Technical Report
- PaLM Scaling Language Modeling with Pathways
- Aya Model An Instruction Finetuned Open-Access Multilingual Language Model - skim - 118 pages
- Flamingo a Visual Language Model for Few-Shot Learning
- Finetuned Language Models Are Zero-Shot Learners - FLAN
- Efficient Training of Language Models to Fill in the Middle - Fill-in-the-Middle (FIM)
- Megatron-LM Training Multi-Billion Parameter Language Models Using Model Parallelism - Megatron
Instruction Tuning / Supervised Finetuning
Retrieval Augmented Generation (RAG)
- 👨🏫 Stanford CS25: V3 I Retrieval Augmented Language Models (December 5, 2023) - Douwe Kiela, introduces topic, surveys recent literature on retrieval augmented language models and finishes with some of the main open questions
Chain of Thought
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners
- Language Models are Multilingual Chain-of-Thought Reasoners
- Chain-of-Thought Prompting for Speech Translation
Chain-of-Thought Prompting induces Language Models to perform reasoning and leverages in-context learning. Chain-of-Thought (CoT) was introduced in Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (AFAIK) in the context of arithmetic reasoning, i.e. wordy numeracy problems like:
Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?
A: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11.
Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?
Tokenisation
Tokenisation is filed under Tokenisation
Watermarking of (Large) Language Models
- A Watermark for Large Language Models
- Watermarks in the Sand Impossibility of Strong Watermarking for Generative Models