← Back to Projects
In Progress
Which LLM Fixes It Best? A Comparative Study of Automated Program Repair
Comparative benchmark study evaluating leading LLMs on automated program repair across diverse bug types.
Started: January 2026

About This Project
A custom APR tool that accepts a codebase path, error message, and model selection to automatically diagnose and fix software bugs. The tool leverages multiple state-of-the-art LLMs from companies like Anthropic, OpenAI, Google, and xAI, to generate and apply patches, then validates fixes by re-running failed tests. Using this tool, I benchmark each model's repair success across diverse bug categories to provide actionable guidelines for model selection in APR workflows.
Tech Stack
PythonOpenAI APIAnthropic APIGoogle Gemini API
academic