evaluation#

Evaluation Helpers.

Classes

evaluation._arunner.AsyncExperimentResults(...)

evaluation._runner.ComparativeExperimentResults(results)

Represents the results of an evaluate_comparative() call.

evaluation._runner.ExperimentResultRow

evaluation._runner.ExperimentResults(...[, ...])

Represents the results of an evaluate() call.

evaluation.evaluator.Category

A category for categorical feedback.

evaluation.evaluator.ComparisonEvaluationResult

Feedback scores for the results of comparative evaluations.

evaluation.evaluator.DynamicComparisonRunEvaluator(func)

Compare predictions (as traces) from 2 or more runs.

evaluation.evaluator.DynamicRunEvaluator(func)

A dynamic evaluator that wraps a function and transforms it into a RunEvaluator.

evaluation.evaluator.EvaluationResult

Evaluation result.

evaluation.evaluator.EvaluationResults

Batch evaluation results.

evaluation.evaluator.FeedbackConfig

Configuration to define a type of feedback.

evaluation.evaluator.RunEvaluator()

Evaluator interface class.

evaluation.llm_evaluator.CategoricalScoreConfig

Configuration for a categorical score.

evaluation.llm_evaluator.ContinuousScoreConfig

Configuration for a continuous score.

Functions

evaluation._arunner.aevaluate(target, /, data)

Evaluate an async target system or function on a given dataset.

evaluation._arunner.aevaluate_existing(...)

Evaluate existing experiment runs asynchronously.

evaluation._arunner.async_chain_from_iterable(...)

Chain multiple async iterables.

evaluation._runner.evaluate(target, /, data)

Evaluate an application on a given dataset.

evaluation._runner.evaluate_comparative(...)

Evaluate existing experiment runs against each other.

evaluation._runner.evaluate_existing(...[, ...])

Evaluate existing experiment runs.

evaluation.evaluator.comparison_evaluator(func)

Create a comaprison evaluator from a function.

evaluation.evaluator.run_evaluator(func)

Create a run evaluator from a function.