download
Download and organize Text-Fabric corpora for benchmarking.
Downloads 10 biblical studies corpora, keeping only .tf files. Each corpus is organized as: .corpora/{corpus}/tf/*.tf with a README.md
Usage:
python benchmarks/download_corpora.pyFunctions
function
copy_from_local(source: str, dest: Path) → intCopy TF files from a local directory.
Parameters
source: strdest: Path
function
copy_tf_files(src: Path, dst: Path) → intCopy only .tf files from src to dst, excluding cache dirs.
Parameters
src: Pathdst: Path
function
download_from_github(repo: str, tf_path: str, dest: Path) → intClone a GitHub repo and copy TF files.
Parameters
repo: strtf_path: strdest: Path
function
generate_readme(corpus_name: str, config: dict, tf_dir: Path) → strGenerate README.md content for a corpus.
Parameters
corpus_name: strconfig: dicttf_dir: Path
function
main()Download all corpora.