Resources đ
Need-to-know
- Brotli - Wikipedia
- LempelâZivâWelch - Wikipedia
- LZMA - Wikipedia
- LZ77 and LZ78 - Wikipedia
- zstd is incredible, but just in case the thought hasnât occurred to someone here⊠Hacker News
- Base64 Encoding
- uuencoding - Wikipedia
- yEnc - Wikipedia
- Huffman Coding - W3Schools.com
- Overview of Algorithms - The Hitchhikerâs Guide to Compression
- Arithmetic coding - Wikipedia
- 8-bit clean - Wikipedia
- In-band signaling - Wikipedia
JPEG
Text Encoding, UTF, ASCII and more
- UTF-8 Everywhere
- UTF-8 - Wikipedia - Unicode Transformation Format â 8-bit
- Unicode - Wikipedia
- Unicode Basic Multilingual Plane (BMP) - Unicode is divided into a total of 17 code areas, each with 65,536 characters (16 bits), currently only about 10 percent of these are used. The first and most important plane is the Basic Multilingual Plane (Plane 0, BMP), which contains nearly all commonly used writing systems and symbols. It is the home of the characters U+0000 to U+FFFF.
- ASCII Table - ASCII Character Codes, HTML, Octal, Hex, Decimal
- Character encoding - Wikipedia
- UTF-8, UTF-16, UTF-32: A Comprehensive Analysis by Grok 3
- UTF-8:
- Fully compatible with ASCII, meaning an ASCII file is also a valid UTF-8 file. This allows legacy programs, like Câs printf, to handle UTF-8-encoded files containing only ASCII characters, as they look for ASCII-specific characters like â%â.
- UTF-16 and UTF-32:
- Not compatible with ASCII files, requiring Unicode-aware programs to display, print, or manipulate them, even if the file contains only ASCII characters.
- Contain many zero bytes, which can break common null-terminated string handling logic (e.g., in C/C++), making them incompatible with legacy systems that rely on such logic.
- UTF-8: