In Progress

Which LLM Fixes It Best? A Comparative Study of Automated Program Repair

Comparative benchmark study evaluating leading LLMs on automated program repair across diverse bug types.

Started: January 2026

About This Project

A custom APR tool that accepts a codebase path, error message, and model selection to automatically diagnose and fix software bugs. The tool leverages multiple state-of-the-art LLMs from companies like Anthropic, OpenAI, Google, and xAI, to generate and apply patches, then validates fixes by re-running failed tests. Using this tool, I benchmark each model's repair success across diverse bug categories to provide actionable guidelines for model selection in APR workflows.

Tech Stack

PythonOpenAI APIAnthropic APIGoogle Gemini API

academic