Environments that simulate real-world tasks for training and evaluating LLM collaboration:
Writing Collaboration: Multiple LLM agents collaborate on processing articles.
- TLDR - Summarizing Reddit posts.
- ArXiv - Expanding abstracts into introductions.
Code Generation: Generate code solutions for programming problems.
- MBPP - Mostly basic python problems.
- HumanEval - Handwritten evaluation problems
- CoopHumanEval - HumanEval with cooperative nature.
Code Completion: Complete code snippets based on given contexts.
- ClassEval - Complete class-level code based on method stubs and docstrings.