Code Generation

A suite of cooperative programming benchmarks where agents propose, critique, and refine solutions. The environments shipped in LLM_Collab_Code_Generation cover:

MBPP – mostly basic Python problems for rapid iteration.
HumanEval – handwritten tasks from OpenAI for exact-match grading.
CoopHumanEval – HumanEval variants that explicitly require collaboration.