This role focuses on creating a benchmark dataset aimed at evaluating AI models for professional document understanding and instruction following within the Technology domain. You will engage in tasks that involve complex, multi-step requests based on real-world workspace files, including technical specifications, architecture documents, API references, and codebases. Your work will also involve web searches and code execution, each aligned with a clearly defined ground truth output and an objective evaluation rubric.

Your primary responsibility will be to author tasks that assess an AI''s capability to reason through technical documentation, adhere to precise instructions, and generate accurate, well-structured outputs.

A minimum commitment of 15, 20 hours per week is expected.

Key Responsibilities

Develop complex, multi-step tasks based on real-world technical documents.
Conduct web searches and execute code as part of task creation.
Define ground truth outputs and evaluation rubrics for AI assessment.
Author tasks that evaluate AI reasoning and instruction-following capabilities.

Qualifications

3+ years of hands-on experience in software engineering, data science, or analytics.
Strong understanding of technical documentation and AI evaluation methodologies.

Work Terms

Employment type is hourly, with a commitment of 15, 20 hours per week.

Compensation

Hourly rate ranges from $90 to $110.

Eligibility

This position is remote and open to candidates with the required experience.

Software Engineer for AI Model Evaluation

About this role

Related Jobs

Cloud Architect for AI Model Training

Competitive Programming Checker for AI Training

Software Engineer, New Grad

Audio Engineer for AI Model Training

Senior Software Engineer for AI Systems