statistics

Statistical analysis functions for benchmark results.

Functions

function

aggregate_runs(measurements: list[list[float]]) → tuple[(list[float], list[float])]

Aggregate multiple runs into mean and std for each step.

Aggregate multiple runs into mean and std for each step. Args: measurements: List of runs, where each run is a list of step values Returns: Tuple of (means, stds) for each step

Parameters

measurements: list[list[float]]

function

compare_implementations(tf_values: list[float], cf_values: list[float], metric_name: str, unit: str, metric_type: Literal[('time', 'memory', 'other')] = 'other', significance_level: float = 0.05) → ComparisonResult

Compare TF and CF measurements for a metric.

Compare TF and CF measurements for a metric. Args: tf_values: Text-Fabric measurements cf_values: Context-Fabric measurements metric_name: Name of the metric unit: Unit of measurement metric_type: Type of metric for computing comparison ratios significance_level: P-value threshold for statistical significance Returns: ComparisonResult with statistics, comparison metrics, and effect size

Parameters

tf_values: list[float]
cf_values: list[float]
metric_name: str
unit: str
metric_type: Literal[('time', 'memory', 'other')]= 'other'
significance_level: float= 0.05

function

compute_confidence_interval(values: | list[float], confidence: float = 0.95) → tuple[(float, float)]

Compute confidence interval for the mean.

Compute confidence interval for the mean. Args: values: Array of values confidence: Confidence level (default 0.95 for 95% CI) Returns: Tuple of (lower_bound, upper_bound)

Parameters

values: | list[float]
confidence: float= 0.95

function

compute_latency_percentiles(values: list[float]) → dict[(str, float)]

Compute common latency percentiles.

Compute common latency percentiles. Args: values: Latency measurements in milliseconds Returns: Dictionary with p50, p95, p99 values

Parameters

values: list[float]

function

compute_percentiles(values: | list[float], percentiles: list[int]) → dict[(int, float)]

Compute multiple percentiles.

Compute multiple percentiles. Args: values: Array of values percentiles: List of percentiles to compute (e.g., [50, 90, 95, 99]) Returns: Dictionary mapping percentile to value

Parameters

values: | list[float]
percentiles: list[int]

function

compute_summary(values: list[float], metric_name: str, unit: str) → StatisticalSummary

Compute comprehensive statistical summary for a set of values.

Compute comprehensive statistical summary for a set of values. Args: values: List of measurement values metric_name: Name of the metric (e.g., "load_time", "memory") unit: Unit of measurement (e.g., "ms", "MB", "s") Returns: StatisticalSummary with all computed statistics

Parameters

values: list[float]
metric_name: str
unit: str

function

linear_regression(x: list[float] | , y: list[float] | ) → tuple[(float, float, float)]

Perform simple linear regression.

Perform simple linear regression. Args: x: Independent variable values y: Dependent variable values Returns: Tuple of (slope, intercept, r_squared)

Parameters

x: list[float] |
y: list[float] |

function

welch_t_test(sample_a: list[float], sample_b: list[float]) → tuple[(float, float)]

Perform Welch's t-test for independent samples.

Perform Welch's t-test for independent samples. Welch's t-test does not assume equal variances between groups. Args: sample_a: First sample sample_b: Second sample Returns: Tuple of (t_statistic, p_value)

Parameters

sample_a: list[float]
sample_b: list[float]

Getting Started

Corpora

Concepts

File Formats

Core Library

MCP Server

Resources

API Reference

statistics

Functions

Parameters

Parameters

Parameters

Parameters

Parameters

Parameters

Parameters

Parameters