Documentation

describe

Corpus description utilities for Context Fabric.

This module provides centralized utilities for describing corpora, features, and text representations. It generates exhaustive samples for text format character coverage.

Usage:

>>> from cfabric.describe import describe_corpus, describe_feature
>>> result = describe_corpus(api, "BHSA")
>>> feature_info = describe_feature(api, "sp")

Classes

class

CorpusDescription

Complete corpus description.

Complete corpus description. Attributes: name: Corpus name node_types: List of node types with counts sections: Section hierarchy information text_representations: Text format information with samples features: List of node feature metadata edge_features: List of edge feature metadata

Attributes

NameTypeDescription
edge_featureslist[dict[(str, str)]]
featureslist[dict[(str, str)]]
namestr
node_typeslist[dict[(str, Any)]]
sectionsdict[(str, Any)]
text_representationsTextRepresentationInfo

Methods

__init__(self, name: str, node_types: list[dict[(str, Any)]] = {'arguments': [], 'cls': 'ExprCall', 'function': {'cls': 'ExprName', 'member': None, 'name': 'list'}}, sections: dict[(str, Any)] = {'arguments': [], 'cls': 'ExprCall', 'function': {'cls': 'ExprName', 'member': None, 'name': 'dict'}}, text_representations: TextRepresentationInfo = {'arguments': [], 'cls': 'ExprCall', 'function': {'body': {'arguments': [{'cls': 'ExprKeyword', 'function': {'cls': 'ExprName', 'member': None, 'name': 'TextRepresentationInfo'}, 'name': 'description', 'value': "''"}], 'cls': 'ExprCall', 'function': {'cls': 'ExprName', 'member': None, 'name': 'TextRepresentationInfo'}}, 'cls': 'ExprLambda', 'parameters': []}}, features: list[dict[(str, str)]] = {'arguments': [], 'cls': 'ExprCall', 'function': {'cls': 'ExprName', 'member': None, 'name': 'list'}}, edge_features: list[dict[(str, str)]] = {'arguments': [], 'cls': 'ExprCall', 'function': {'cls': 'ExprName', 'member': None, 'name': 'list'}}) None
Parameters
  • name: str
  • node_types: list[dict[(str, Any)]]
  • sections: dict[(str, Any)]
  • text_representations: TextRepresentationInfo
  • features: list[dict[(str, str)]]
  • edge_features: list[dict[(str, str)]]
to_dict(self) dict[(str, Any)]
class

CorpusOverview

Slim corpus overview (node types and sections only).

Slim corpus overview (node types and sections only). Use this for lightweight discovery. For full details including text representations and feature lists, use CorpusDescription.

Attributes

NameTypeDescription
namestr
node_typeslist[dict[(str, Any)]]
sectionsdict[(str, Any)]

Methods

__init__(self, name: str, node_types: list[dict[(str, Any)]] = {'arguments': [], 'cls': 'ExprCall', 'function': {'cls': 'ExprName', 'member': None, 'name': 'list'}}, sections: dict[(str, Any)] = {'arguments': [], 'cls': 'ExprCall', 'function': {'cls': 'ExprName', 'member': None, 'name': 'dict'}}) None
Parameters
  • name: str
  • node_types: list[dict[(str, Any)]]
  • sections: dict[(str, Any)]
to_dict(self) dict[(str, Any)]
class

FeatureCatalogEntry

Lightweight feature entry for catalog listing.

Attributes

NameTypeDescription
descriptionstr
kindstr
namestr
value_typestr

Methods

__init__(self, name: str, kind: str, value_type: str, description: str = '') None
Parameters
  • name: str
  • kind: str
  • value_type: str
  • description: str= ''
to_dict(self) dict[(str, str)]
class

FeatureDescription

Detailed description of a feature.

Detailed description of a feature. Attributes: name: Feature name kind: 'node' or 'edge' value_type: 'str', 'int', or '' for edges without values description: Feature description from metadata node_types: List of node types this feature applies to unique_values: Number of unique values sample_values: Top values by frequency has_values: For edge features, whether edges have values error: Error message if feature not found

Attributes

NameTypeDescription
descriptionstr
errorstr | None
has_valuesbool | None
kindstr
namestr
node_typeslist[str]
sample_valueslist[dict[(str, Any)]]
unique_valuesint
value_typestr

Methods

__init__(self, name: str, kind: str, value_type: str = '', description: str = '', node_types: list[str] = {'arguments': [], 'cls': 'ExprCall', 'function': {'cls': 'ExprName', 'member': None, 'name': 'list'}}, unique_values: int = 0, sample_values: list[dict[(str, Any)]] = {'arguments': [], 'cls': 'ExprCall', 'function': {'cls': 'ExprName', 'member': None, 'name': 'list'}}, has_values: bool | None = None, error: str | None = None) None
Parameters
  • name: str
  • kind: str
  • value_type: str= ''
  • description: str= ''
  • node_types: list[str]
  • unique_values: int= 0
  • sample_values: list[dict[(str, Any)]]
  • has_values: bool | None= None
  • error: str | None= None
from_api(cls, api: Api, feature: str, sample_limit: int = 20) FeatureDescription

Create FeatureDescription from API.

Parameters
  • cls
  • api: Api
  • feature: str
  • sample_limit: int= 20
to_dict(self) dict[(str, Any)]
class

TextFormatInfo

Information about a text format pair (orig/trans).

Attributes

NameTypeDescription
namestr
original_specstr
sampleslist[TextFormatSample]
total_samplesint
transliteration_specstr
unique_charactersint

Methods

__init__(self, name: str, original_spec: str, transliteration_spec: str, samples: list[TextFormatSample] = {'arguments': [], 'cls': 'ExprCall', 'function': {'cls': 'ExprName', 'member': None, 'name': 'list'}}, unique_characters: int = 0, total_samples: int = 0) None
Parameters
  • name: str
  • original_spec: str
  • transliteration_spec: str
  • samples: list[TextFormatSample]
  • unique_characters: int= 0
  • total_samples: int= 0
to_dict(self) dict[(str, Any)]
class

TextFormatSample

A single text sample showing original and transliterated forms.

Attributes

NameTypeDescription
originalstr
transliteratedstr

Methods

__init__(self, original: str, transliterated: str) None
Parameters
  • original: str
  • transliterated: str
to_dict(self) dict[(str, str)]
class

TextRepresentationInfo

Complete text representation info for a corpus.

Attributes

NameTypeDescription
descriptionstr
formatslist[TextFormatInfo]

Methods

__init__(self, description: str, formats: list[TextFormatInfo] = {'arguments': [], 'cls': 'ExprCall', 'function': {'cls': 'ExprName', 'member': None, 'name': 'list'}}) None
Parameters
  • description: str
  • formats: list[TextFormatInfo]
to_dict(self) dict[(str, Any)]

Functions

function
describe_corpus(api: Api, name: str = '') CorpusDescription

Get complete corpus description.

Get complete corpus description. Returns node types, section structure, text representations with exhaustive character coverage, and feature catalogs. Parameters ---------- api : Api Context Fabric API instance name : str Corpus name for identification Returns ------- CorpusDescription Complete description including node types, sections, text representations, and feature lists
Parameters
  • api: Api
  • name: str= ''
function
describe_corpus_overview(api: Api, name: str = '') CorpusOverview

Get slim corpus overview (node types and sections only).

Get slim corpus overview (node types and sections only). Use this for lightweight discovery. For full details including text representations, use describe_corpus(). Parameters ---------- api : Api Context Fabric API instance name : str Corpus name for identification Returns ------- CorpusOverview Slim overview with node types and sections
Parameters
  • api: Api
  • name: str= ''
function
describe_feature(api: Api, feature: str, sample_limit: int = 20) FeatureDescription

Get detailed description of a single feature.

Get detailed description of a single feature. Parameters ---------- api : Api Context Fabric API instance feature : str Feature name sample_limit : int Maximum sample values to return Returns ------- FeatureDescription Feature details including samples and node types
Parameters
  • api: Api
  • feature: str
  • sample_limit: int= 20
function
describe_features(api: Api, features: list[str], sample_limit: int = 20) dict[(str, FeatureDescription)]

Get detailed descriptions for multiple features.

Get detailed descriptions for multiple features. Parameters ---------- api : Api Context Fabric API instance features : list[str] Feature names sample_limit : int Maximum sample values per feature Returns ------- dict[str, FeatureDescription] Feature descriptions keyed by name
Parameters
  • api: Api
  • features: list[str]
  • sample_limit: int= 20
function
describe_text_formats(api: Api) TextRepresentationInfo

Get text format descriptions with exhaustive character coverage.

Get text format descriptions with exhaustive character coverage. Parameters ---------- api : Api Context Fabric API instance Returns ------- TextRepresentationInfo Text format information with samples
Parameters
  • api: Api
function
get_all_feature_otypes(api: Api, samples_per_type: int = 100) dict[(str, list[str])]

Pre-compute otype mappings for all features.

Pre-compute otype mappings for all features. Parameters ---------- api : Api Context Fabric API instance samples_per_type : int Number of samples to check per node type Returns ------- dict[str, list[str]] Feature name to list of applicable node types
Parameters
  • api: Api
  • samples_per_type: int= 100
function
get_feature_otypes(api: Api, feature: str, samples_per_type: int = 100) list[str]

Determine which node types a feature applies to.

Determine which node types a feature applies to. Uses C.levels.data to efficiently sample each node type range and check for non-null values. Parameters ---------- api : Api Context Fabric API instance feature : str Feature name samples_per_type : int Number of samples to check per node type Returns ------- list[str] List of node types that have this feature
Parameters
  • api: Api
  • feature: str
  • samples_per_type: int= 100
function
list_features(api: Api, kind: str = 'all', node_types: list[str] | None = None) list[FeatureCatalogEntry]

List features with optional filtering.

List features with optional filtering. Returns lightweight catalog for discovery. Use node_types to filter by object type. For full details with samples, use describe_feature(). Parameters ---------- api : Api Context Fabric API instance kind : str Filter by "all", "node", or "edge" node_types : list[str] | None Filter to features for these types (e.g., ["word"]) Returns ------- list[FeatureCatalogEntry] List of features with name, kind, value_type, description
Parameters
  • api: Api
  • kind: str= 'all'
  • node_types: list[str] | None= None