A suite of cooperative programming benchmarks where agents propose, critique, and refine solutions. The environments shipped in LLM_Collab_Code_Generation cover:
- MBPP – mostly basic Python problems for rapid iteration.
- HumanEval – handwritten tasks from OpenAI for exact-match grading.
- CoopHumanEval – HumanEval variants that explicitly require collaboration.