Documentation

download

Download and organize Text-Fabric corpora for benchmarking.

Downloads 10 biblical studies corpora, keeping only .tf files. Each corpus is organized as: .corpora/{corpus}/tf/*.tf with a README.md

Usage:

python benchmarks/download_corpora.py

Functions

function
copy_from_local(source: str, dest: Path) int

Copy TF files from a local directory.

Parameters
  • source: str
  • dest: Path
function
copy_tf_files(src: Path, dst: Path) int

Copy only .tf files from src to dst, excluding cache dirs.

Parameters
  • src: Path
  • dst: Path
function
download_from_github(repo: str, tf_path: str, dest: Path) int

Clone a GitHub repo and copy TF files.

Parameters
  • repo: str
  • tf_path: str
  • dest: Path
function
generate_readme(corpus_name: str, config: dict, tf_dir: Path) str

Generate README.md content for a corpus.

Parameters
  • corpus_name: str
  • config: dict
  • tf_dir: Path
function
main()

Download all corpora.