- Chat Templates
- Create a dataset loading script
- Load - Loading Datasets with Hugging Face datasets v3.2.0
Hugging Face datasets đ€
Conceptual guides:
- Datasets đ€ Arrow
- The cache
- Dataset or IterableDataset
- Dataset features
- Build and load
- Batch mapping
Tutorials:
Excerpt from Environment variables (v0.32.4)
HF_HUB_ENABLE_HF_TRANSFER
Set to True
 for faster uploads and downloads from the Hub using hf_transfer
.
By default, huggingface_hub
 uses the Python-based requests.get
 and requests.post
 functions. Although these are reliable and versatile, they may not be the most efficient choice for machines with high bandwidth. hf_transfer
 is a Rust-based package developed to maximize the bandwidth used by dividing large files into smaller parts and transferring them simultaneously using multiple threads. This approach can potentially double the transfer speed. To use hf_transfer
:
- Specify theÂ
hf_transfer
 extra when installingÂhuggingface_hub
 (e.g.Âpip install huggingface_hub[hf_transfer]
). - SetÂ
HF_HUB_ENABLE_HF_TRANSFER=1
 as an environment variable.
Please note that using hf_transfer
 comes with certain limitations. Since it is not purely Python-based, debugging errors may be challenging. Additionally, hf_transfer
 lacks several user-friendly features such as resumable downloads and proxies. These omissions are intentional to maintain the simplicity and speed of the Rust logic. Consequently, hf_transfer
 is not enabled by default in huggingface_hub
.
hf_xet
 is an alternative to hf_transfer
. It provides efficient file transfers through a chunk-based deduplication strategy, custom Xet storage (replacing Git LFS), and a seamless integration with huggingface_hub
.
Read more about the package and enable with pip install "huggingface_hub[hf_xet]"
.
HF_XET_HIGH_PERFORMANCE
Set hf-xet
 to operate with increased settings to maximize network and disk resources on the machine. Enabling high performance mode will try to saturate the network bandwidth of this machine and utilize all CPU cores for parallel upload/download activity. Consider this analogous to setting HF_HUB_ENABLE_HF_TRANSFER=True
 when uploading / downloading using hf-xet
 to the Xet storage backend.