statistics
Statistical analysis functions for benchmark results.
Functions
function
aggregate_runs(measurements: list[list[float]]) → tuple[(list[float], list[float])]Aggregate multiple runs into mean and std for each step.
Aggregate multiple runs into mean and std for each step.
Args:
measurements: List of runs, where each run is a list of step values
Returns:
Tuple of (means, stds) for each step
Parameters
measurements: list[list[float]]
function
compare_implementations(tf_values: list[float], cf_values: list[float], metric_name: str, unit: str, metric_type: Literal[('time', 'memory', 'other')] = 'other', significance_level: float = 0.05) → ComparisonResultCompare TF and CF measurements for a metric.
Compare TF and CF measurements for a metric.
Args:
tf_values: Text-Fabric measurements
cf_values: Context-Fabric measurements
metric_name: Name of the metric
unit: Unit of measurement
metric_type: Type of metric for computing comparison ratios
significance_level: P-value threshold for statistical significance
Returns:
ComparisonResult with statistics, comparison metrics, and effect size
Parameters
tf_values: list[float]cf_values: list[float]metric_name: strunit: strmetric_type: Literal[('time', 'memory', 'other')]= 'other'significance_level: float= 0.05
function
compute_confidence_interval(values: | list[float], confidence: float = 0.95) → tuple[(float, float)]Compute confidence interval for the mean.
Compute confidence interval for the mean.
Args:
values: Array of values
confidence: Confidence level (default 0.95 for 95% CI)
Returns:
Tuple of (lower_bound, upper_bound)
Parameters
values: | list[float]confidence: float= 0.95
function
compute_latency_percentiles(values: list[float]) → dict[(str, float)]Compute common latency percentiles.
Compute common latency percentiles.
Args:
values: Latency measurements in milliseconds
Returns:
Dictionary with p50, p95, p99 values
Parameters
values: list[float]
function
compute_percentiles(values: | list[float], percentiles: list[int]) → dict[(int, float)]Compute multiple percentiles.
Compute multiple percentiles.
Args:
values: Array of values
percentiles: List of percentiles to compute (e.g., [50, 90, 95, 99])
Returns:
Dictionary mapping percentile to value
Parameters
values: | list[float]percentiles: list[int]
function
compute_summary(values: list[float], metric_name: str, unit: str) → StatisticalSummaryCompute comprehensive statistical summary for a set of values.
Compute comprehensive statistical summary for a set of values.
Args:
values: List of measurement values
metric_name: Name of the metric (e.g., "load_time", "memory")
unit: Unit of measurement (e.g., "ms", "MB", "s")
Returns:
StatisticalSummary with all computed statistics
Parameters
values: list[float]metric_name: strunit: str
function
linear_regression(x: list[float] | , y: list[float] | ) → tuple[(float, float, float)]Perform simple linear regression.
Perform simple linear regression.
Args:
x: Independent variable values
y: Dependent variable values
Returns:
Tuple of (slope, intercept, r_squared)
Parameters
x: list[float] |y: list[float] |
function
welch_t_test(sample_a: list[float], sample_b: list[float]) → tuple[(float, float)]Perform Welch's t-test for independent samples.
Perform Welch's t-test for independent samples.
Welch's t-test does not assume equal variances between groups.
Args:
sample_a: First sample
sample_b: Second sample
Returns:
Tuple of (t_statistic, p_value)
Parameters
sample_a: list[float]sample_b: list[float]