SaidGig

Software Engineer for AI Model Evaluation

$220,000–$500,000/yr

RemoteFull-timetechnology
Apply Now

About this role

This role focuses on advancing the evaluation and development of cutting-edge coding agents. You will operate at the intersection of AI research, software engineering, and model evaluation, designing the benchmarks, methodologies, and data systems that shape how next-generation coding models are measured and improved.

Key Responsibilities
  • Design and own evaluation frameworks for coding agents, including benchmark specifications, scoring methodologies, rubrics, and quality standards.
  • Lead end-to-end research initiatives aimed at measuring and enhancing coding model performance across various software engineering tasks.
  • Develop high-quality datasets, golden examples, and evaluation protocols that facilitate reliable assessment of frontier coding systems.
  • Analyze model behavior and failure modes, identifying systematic weaknesses and translating findings into actionable improvements for training and evaluation.
  • Build tooling and infrastructure that support large-scale experimentation, data generation, review workflows, and evaluation pipelines.
  • Establish best practices for coding-agent assessment, ensuring methodological rigor, reproducibility, and measurement quality.
  • Collaborate closely with researchers, engineers, and applied AI teams to design experiments and evaluate emerging model capabilities.
  • Contribute to technical reports, benchmark studies, and client-facing research initiatives that communicate model performance and insights.
Qualifications
  • Strong software engineering background with expertise in Python, C++, or comparable programming languages.
  • 3+ years of experience in software engineering, machine learning, AI research, evaluation, or related technical disciplines.
  • Experience designing, reviewing, or validating technical assessments, benchmarks, coding tasks, or evaluation methodologies.
  • Familiarity with large language models, coding agents, reinforcement learning, model evaluation, or related AI systems.
  • Proven ability to build tooling, automate workflows, and enhance technical processes through systematic experimentation.
  • Strong analytical skills with the capacity to investigate model behavior and derive insights from complex technical systems.
  • Excellent written and verbal communication skills, with the ability to clearly articulate technical findings to diverse audiences.
  • Comfortable operating in fast-paced research environments with significant ambiguity and evolving priorities.
Work Terms

Full-time, remote position.

Compensation

Annual salary ranges from $220, 000 to $500, 000.

Eligibility

Open to candidates with the required skills and experience, regardless of location.

Related Jobs