Documentation

statistics

Statistical analysis functions for benchmark results.

Functions

function
aggregate_runs(measurements: list[list[float]]) tuple[(list[float], list[float])]

Aggregate multiple runs into mean and std for each step.

Aggregate multiple runs into mean and std for each step. Args: measurements: List of runs, where each run is a list of step values Returns: Tuple of (means, stds) for each step
Parameters
  • measurements: list[list[float]]
function
compare_implementations(tf_values: list[float], cf_values: list[float], metric_name: str, unit: str, metric_type: Literal[('time', 'memory', 'other')] = 'other', significance_level: float = 0.05) ComparisonResult

Compare TF and CF measurements for a metric.

Compare TF and CF measurements for a metric. Args: tf_values: Text-Fabric measurements cf_values: Context-Fabric measurements metric_name: Name of the metric unit: Unit of measurement metric_type: Type of metric for computing comparison ratios significance_level: P-value threshold for statistical significance Returns: ComparisonResult with statistics, comparison metrics, and effect size
Parameters
  • tf_values: list[float]
  • cf_values: list[float]
  • metric_name: str
  • unit: str
  • metric_type: Literal[('time', 'memory', 'other')]= 'other'
  • significance_level: float= 0.05
function
compute_confidence_interval(values: | list[float], confidence: float = 0.95) tuple[(float, float)]

Compute confidence interval for the mean.

Compute confidence interval for the mean. Args: values: Array of values confidence: Confidence level (default 0.95 for 95% CI) Returns: Tuple of (lower_bound, upper_bound)
Parameters
  • values: | list[float]
  • confidence: float= 0.95
function
compute_latency_percentiles(values: list[float]) dict[(str, float)]

Compute common latency percentiles.

Compute common latency percentiles. Args: values: Latency measurements in milliseconds Returns: Dictionary with p50, p95, p99 values
Parameters
  • values: list[float]
function
compute_percentiles(values: | list[float], percentiles: list[int]) dict[(int, float)]

Compute multiple percentiles.

Compute multiple percentiles. Args: values: Array of values percentiles: List of percentiles to compute (e.g., [50, 90, 95, 99]) Returns: Dictionary mapping percentile to value
Parameters
  • values: | list[float]
  • percentiles: list[int]
function
compute_summary(values: list[float], metric_name: str, unit: str) StatisticalSummary

Compute comprehensive statistical summary for a set of values.

Compute comprehensive statistical summary for a set of values. Args: values: List of measurement values metric_name: Name of the metric (e.g., "load_time", "memory") unit: Unit of measurement (e.g., "ms", "MB", "s") Returns: StatisticalSummary with all computed statistics
Parameters
  • values: list[float]
  • metric_name: str
  • unit: str
function
linear_regression(x: list[float] | , y: list[float] | ) tuple[(float, float, float)]

Perform simple linear regression.

Perform simple linear regression. Args: x: Independent variable values y: Dependent variable values Returns: Tuple of (slope, intercept, r_squared)
Parameters
  • x: list[float] |
  • y: list[float] |
function
welch_t_test(sample_a: list[float], sample_b: list[float]) tuple[(float, float)]

Perform Welch's t-test for independent samples.

Perform Welch's t-test for independent samples. Welch's t-test does not assume equal variances between groups. Args: sample_a: First sample sample_b: Second sample Returns: Tuple of (t_statistic, p_value)
Parameters
  • sample_a: list[float]
  • sample_b: list[float]