`evaluation`#

Evaluation Helpers.

Classes

`evaluation._arunner.AsyncExperimentResults`(...)
`evaluation._runner.ComparativeExperimentResults`(results)	Represents the results of an evaluate_comparative() call.
`evaluation._runner.ExperimentResultRow`
`evaluation._runner.ExperimentResults`(...[, ...])	Represents the results of an evaluate() call.
`evaluation.evaluator.Category`	A category for categorical feedback.
`evaluation.evaluator.ComparisonEvaluationResult`	Feedback scores for the results of comparative evaluations.
`evaluation.evaluator.DynamicComparisonRunEvaluator`(func)	Compare predictions (as traces) from 2 or more runs.
`evaluation.evaluator.DynamicRunEvaluator`(func)	A dynamic evaluator that wraps a function and transforms it into a RunEvaluator.
`evaluation.evaluator.EvaluationResult`	Evaluation result.
`evaluation.evaluator.EvaluationResults`	Batch evaluation results.
`evaluation.evaluator.FeedbackConfig`	Configuration to define a type of feedback.
`evaluation.evaluator.RunEvaluator`()	Evaluator interface class.
`evaluation.llm_evaluator.CategoricalScoreConfig`	Configuration for a categorical score.
`evaluation.llm_evaluator.ContinuousScoreConfig`	Configuration for a continuous score.

Functions

`evaluation._arunner.aevaluate`(target, /, data)	Evaluate an async target system or function on a given dataset.
`evaluation._arunner.aevaluate_existing`(...)	Evaluate existing experiment runs asynchronously.
`evaluation._arunner.async_chain_from_iterable`(...)	Chain multiple async iterables.
`evaluation._runner.evaluate`(target, /, data)	Evaluate an application on a given dataset.
`evaluation._runner.evaluate_comparative`(...)	Evaluate existing experiment runs against each other.
`evaluation._runner.evaluate_existing`(...[, ...])	Evaluate existing experiment runs.
`evaluation.evaluator.comparison_evaluator`(func)	Create a comaprison evaluator from a function.
`evaluation.evaluator.run_evaluator`(func)	Create a run evaluator from a function.

evaluation#