gpt_engineer.benchmark.run.run

gpt_engineer.benchmark.run.run(agent: BaseAgent, benchmark: Benchmark, verbose=False) List[TaskResult][source]

Runs the benchmark tasks using the provided agent and returns a list of TaskResult objects.

Parameters:
  • agent (BaseAgent) – The agent to use for running the benchmark tasks.

  • benchmark (Benchmark) – The benchmark containing the tasks to run.

  • verbose (bool, default=False) – A flag to indicate whether to print verbose output during the benchmark.

Returns:

A list of TaskResult objects representing the results of the benchmark tasks.

Return type:

List[TaskResult]