SaidGig

Senior Python Engineer for LLM Evaluation

from $50/hour

Remote — US onlyContracttechnologyUpdated Jun 3, 2026
Apply Now

About this role

Role Overview

As a Senior Python Engineer focused on LLM Evaluation, you will play a crucial role in creating innovative datasets for training and benchmarking large language models. This position involves close collaboration with researchers to curate code examples, provide precise solutions, and enhance AI-driven coding solutions, primarily using Python.

Key Responsibilities

  • Curate code examples, build solutions, and correct code primarily in Python, with additional work in JavaScript (including ReactJS), C/C++, Java, Rust, and Go.
  • Evaluate and refine AI-generated code to ensure efficiency, scalability, and reliability.
  • Collaborate with cross-functional teams to enhance AI-driven coding solutions against industry performance benchmarks.
  • Develop agents and automated verification tools in Python to verify code quality and identify error patterns.
  • Hypothesize on steps in the software engineering cycle and evaluate model capabilities.
  • Design verification mechanisms to automatically verify solutions to software engineering tasks.

Qualifications

  • Minimum of 3 years of software engineering experience.
  • Strong expertise in Python, including frameworks, tooling, and best practices for production-grade software.
  • Experience in building full-stack applications and deploying scalable software using modern languages and tools.
  • Deep understanding of software architecture, design, development, debugging, and code quality assessment.
  • Excellent oral and written communication skills for clear evaluation rationales.

Work Terms

  • Flexible engagement with a minimum commitment of 10 hours per week, up to 40 hours per week.
  • Contractor role (no medical/paid leave).
  • Initial duration of 1 month, with potential extensions based on performance and fit.
  • Candidates must be based in the United States.

Compensation

Compensation details will be discussed during the interview process.

Eligibility

Applicants must be legally authorized to work in the United States.

Evaluation Process

  • The application process takes 15, 30 minutes.
  • Completion of an AI video interview is required.

Related Jobs