Documentation

download

Download functionality for Context-Fabric corpora.

This module provides the main download function for fetching corpora from Hugging Face Hub.

Functions

function
download(corpus_id: str, revision: str | None = None, force: bool = False, compiled_only: bool = False) Path

Download a corpus from Hugging Face Hub.

Download a corpus from Hugging Face Hub. Args: corpus_id: Either a short name from the registry (e.g., 'bhsa') or a full HF repo ID (e.g., 'etcbc/cfabric-bhsa'). revision: Specific version (tag, branch, or commit hash). If None, downloads the latest version. force: Re-download even if cached locally. compiled_only: Only download .cfm files (faster load, skip .tf source). Returns: Path to the downloaded corpus directory. Raises: ValueError: If corpus_id is not found and doesn't look like a repo ID. ImportError: If huggingface_hub is not installed. Example: >>> import cfabric >>> path = cfabric.download('bhsa') >>> CF = cfabric.Fabric(locations=path) >>> # Or with full repo ID for community corpora >>> path = cfabric.download('researcher/cfabric-my-corpus') >>> # Pin to specific version >>> path = cfabric.download('bhsa', revision='v2023.1')
Parameters
  • corpus_id: str
  • revision: str | None= None
  • force: bool= False
  • compiled_only: bool= False