Reinforcement Learning from Human Feedback for LLMs moved to 👉 Reinforcement Learning

Tokenization

Regex can be relevant, for example in the implementation of pretokenizers e.g. by tiktoken

Notable (L)LMs

Instruction Tuning & Supervised Fine-tuning

Retrieval Augmented Generation (RAG)

Reasoning & Adaptive Computation Time

See also thereasoning tag

Alignment

Chain of Thought

Chain-of-Thought Prompting induces Language Models to perform reasoning and leverages in-context learning. Chain-of-Thought (CoT) was introduced in Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (AFAIK) in the context of arithmetic reasoning, i.e. wordy numeracy problems like:

Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11. Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?

Agents

Attacks on and Defences for (L)LMs

Watermarking of (Large) Language Models

Scaling Laws

Evaluation and Leaderboards

Local LLMs

See also

See: