code-conductor-bench

PyPI page
Home page
Author: ZhuangYumin
Summary: A benchmark based on swe-bench that evaluates the conceptual reasoning capabilities of LLMs in the context of software engineering tasks.
Latest version: 0.1.6
Required dependencies: datasets | jinja2 | loguru | mini-swe-agent | openai | openai-agents | swebench | tenacity | tqdm | wandb

Downloads last day: 10
Downloads last week: 16
Downloads last month: 68