download
Download functionality for Context-Fabric corpora.
This module provides the main download function for fetching corpora from Hugging Face Hub.
Functions
function
download(corpus_id: str, revision: str | None = None, force: bool = False, compiled_only: bool = False) → PathDownload a corpus from Hugging Face Hub.
Download a corpus from Hugging Face Hub.
Args:
corpus_id: Either a short name from the registry (e.g., 'bhsa')
or a full HF repo ID (e.g., 'etcbc/cfabric-bhsa').
revision: Specific version (tag, branch, or commit hash).
If None, downloads the latest version.
force: Re-download even if cached locally.
compiled_only: Only download .cfm files (faster load, skip .tf source).
Returns:
Path to the downloaded corpus directory.
Raises:
ValueError: If corpus_id is not found and doesn't look like a repo ID.
ImportError: If huggingface_hub is not installed.
Example:
>>> import cfabric
>>> path = cfabric.download('bhsa')
>>> CF = cfabric.Fabric(locations=path)
>>> # Or with full repo ID for community corpora
>>> path = cfabric.download('researcher/cfabric-my-corpus')
>>> # Pin to specific version
>>> path = cfabric.download('bhsa', revision='v2023.1')
Parameters
corpus_id: strrevision: str | None= Noneforce: bool= Falsecompiled_only: bool= False