Senior Software Engineer, Python for LLM Evaluation
from $40/hour
Remote — Mexico, India or NigeriaNon-remote: unknownContracttechnologyUpdated Jun 3, 2026
Apply NowAbout this role
Role Overview: This position focuses on developing LLM evaluation and training datasets aimed at addressing realistic software engineering challenges. The role involves creating verifiable software engineering tasks derived from public repository histories, utilizing a synthetic approach with human-in-the-loop methodologies, while broadening dataset coverage across various programming languages and difficulty levels.
Key Responsibilities:
- Analyze and triage GitHub issues across trending open-source libraries.
- Set up and configure code repositories, including Dockerization and environment setup.
- Evaluate unit test coverage and quality.
- Modify and run codebases locally to assess LLM performance in bug-fixing scenarios.
- Collaborate with researchers to design and identify repositories and issues that present challenges for LLMs.
- Lead a team of junior engineers to collaborate on projects.
Qualifications:
- Minimum 3+ years of overall experience in software engineering.
- Strong experience with Python or similar programming languages.
- Proficiency with Git, Docker, and basic software pipeline setup.
- Ability to understand and navigate complex codebases.
- Comfortable running, modifying, and testing real-world projects locally.
- Experience contributing to or evaluating open-source projects is a plus.
Nice to Have:
- Previous participation in LLM research or evaluation projects.
- Experience building or testing developer tools or automation agents.
Work Terms:
- Commitments Required: At least 4 hours per day and a minimum of 20 hours per week with 4 hours of overlap with PST. Options for time commitment include 20 hrs/week, 30 hrs/week, or 40 hrs/week.
- Employment Type: Contractor assignment (no medical/paid leave).
- Duration of Contract: 3 months; expected start date is next week.
- Location: Open to candidates in India, Pakistan, Nigeria, Kenya, Egypt, Ghana, Bangladesh, Turkey, and Mexico.
Compensation: Details regarding compensation will be discussed during the interview process.
Eligibility:
- Must be able to work within the specified time commitments and locations.
Evaluation Process:
- Two rounds of interviews (60 minutes technical + 30 minutes technical & cultural discussion).