Senior Software Engineer for LLM Evaluation
from $50/hour
Remote — US onlyContracttechnologyUpdated Jun 3, 2026
Apply NowAbout this role
This role focuses on creating advanced datasets for training and evaluating large language models, collaborating closely with researchers to enhance AI-driven coding solutions. As a Software Engineering evaluator, you will curate code examples, provide precise solutions, and make corrections primarily in Python, while also working with JavaScript (including ReactJS), C/C++, Java, Rust, and Go. Your contributions will ensure the efficiency, scalability, and reliability of AI-generated code.
Key Responsibilities:
- Curate code examples, build solutions, and correct code primarily in Python, with additional work in JavaScript (including ReactJS), C/C++, Java, Rust, and Go.
- Evaluate and refine AI-generated code for efficiency, scalability, and reliability.
- Collaborate with cross-functional teams to enhance AI-driven coding solutions against industry performance benchmarks.
- Build agents and automated verification tools in Python to verify code quality and identify error patterns.
- Hypothesize on steps in the software engineering cycle and evaluate model capabilities.
- Design verification mechanisms to automatically verify solutions to software engineering tasks.
Qualifications:
- 3 or more years of software engineering experience.
- Strong expertise in Python, including deep knowledge of frameworks, tooling, and best practices for production-grade software.
- Experience in building full-stack applications and deploying scalable software using modern languages and tools.
- Deep understanding of software architecture, design, development, debugging, and code quality/review assessment.
- Excellent oral and written communication skills for clear, structured evaluation rationales.
Work Terms:
- Flexible engagement, minimum 10 hours/week, up to 40 hours/week.
- Contractor status (no medical/paid leave).
- Initial duration of 1 month with potential extensions based on performance and fit.
- Candidates must be based in the United States.
Compensation:
Compensation details are not specified in the listing.
Eligibility:
- Completion of an AI video interview is required as part of the evaluation process.
- The application process takes 15, 30 minutes.