SaidGig

Senior Software Engineer for LLM Evaluation

from $50/hour

RemoteContracttechnologyUpdated Jun 5, 2026
Apply Now

About this role

Role Overview

As a Software Engineering evaluator, you will create cutting-edge datasets for training, benchmarking, and advancing large language models, collaborating closely with researchers. This includes curating code examples, providing precise solutions, and making corrections in Python, JavaScript (including ReactJS), C/C++, Java, Rust, and Go; evaluating and refining AI-generated code for efficiency, scalability, and reliability; and working with cross-functional teams to enhance enterprise-level AI-driven coding solutions.

Key Responsibilities

  • Work on AI model training initiatives by curating code examples, building solutions, and correcting code in Python, JavaScript (including ReactJS), C/C++, Java, Rust, and Go.
  • Evaluate and refine AI-generated code to ensure it is efficient, scalable, and reliable.
  • Collaborate with cross-functional teams to enhance AI-driven coding solutions against industry performance benchmarks.
  • Build agents that can verify the quality of the code and identify error patterns.
  • Hypothesize on steps in the software engineering cycle (prototyping, architecture design, API design, production implementation, launch, experiments, monitoring, operation maintenance) and evaluate model capabilities on them.
  • Design verification mechanisms that can automatically verify a solution to a software engineering task.

Qualifications

  • Several years of software engineering experience, including 2+ years of continuous full-time experience at a top-tier product company (e.g., Google, Amazon, Apple, Meta, Netflix, Microsoft, Datadog, Dropbox, Shopify, PayPal, IBM Research).
  • Strong expertise in building full-stack applications and deploying scalable, production-grade software using modern languages and tools.
  • Deep understanding of software architecture, design, development, debugging, and code quality/review assessment.
  • Excellent oral and written communication skills for clear, structured evaluation rationales.

Work Terms

  • Commitment: flexible engagement, minimum 10 hrs/week, up to 40 hrs/week (partial PST overlap required)
  • Type: Contractor (no medical/paid leave)
  • Duration: 1 month (starting next week; potential extensions based on performance and fit)

Related Jobs