PyPI page
Home page
Author:
ZhuangYumin
Summary:
A benchmark based on swe-bench that evaluates the conceptual reasoning capabilities of LLMs in the context of software engineering tasks.
Latest version:
0.1.6
Required dependencies:
datasets
|
jinja2
|
loguru
|
mini-swe-agent
|
openai
|
openai-agents
|
swebench
|
tenacity
|
tqdm
|
wandb
Downloads last day:
21
Downloads last week:
31
Downloads last month:
57