About this role
This role focuses on creating a benchmark dataset aimed at evaluating AI models for professional document understanding and instruction following within the Technology domain. You will engage in tasks that involve complex, multi-step requests based on real-world workspace files, including technical specifications, architecture documents, API references, and codebases. Your work will also involve web searches and code execution, each aligned with a clearly defined ground truth output and an objective evaluation rubric.
Your primary responsibility will be to author tasks that assess an AI''s capability to reason through technical documentation, adhere to precise instructions, and generate accurate, well-structured outputs.
A minimum commitment of 15, 20 hours per week is expected.
Key Responsibilities- Develop complex, multi-step tasks based on real-world technical documents.
- Conduct web searches and execute code as part of task creation.
- Define ground truth outputs and evaluation rubrics for AI assessment.
- Author tasks that evaluate AI reasoning and instruction-following capabilities.
- 3+ years of hands-on experience in software engineering, data science, or analytics.
- Strong understanding of technical documentation and AI evaluation methodologies.
Employment type is hourly, with a commitment of 15, 20 hours per week.
CompensationHourly rate ranges from $90 to $110.
EligibilityThis position is remote and open to candidates with the required experience.