A suite of cooperative programming benchmarks where agents propose, critique, and refine solutions. The environments shipped in LLM_Collab_Code_Generation cover:

  • MBPP – mostly basic Python problems for rapid iteration.
  • HumanEval – handwritten tasks from OpenAI for exact-match grading.
  • CoopHumanEval – HumanEval variants that explicitly require collaboration.