SaidGig

Physics Researcher for AI Model Evaluation

$80–$140/hr

RemoteContractscienceUpdated Jun 13, 2026
Apply Now

About this role

Role Overview

Join a team of expert physics researchers to create and validate golden reference solutions for the CritPt benchmark, a cutting-edge research-level physics benchmark. This role involves solving complex physics problems from start to finish, auditing solutions from fellow experts, and determining the best solutions to produce fully human-verified reference data for evaluating large language models in advanced physics reasoning.

Physics Subdomains Covered
  • High Energy Physics & Mathematical Physics
  • Biophysics & Statistical Physics
  • Condensed Matter & AMO
  • Gravitation / Cosmology / Astrophysics
  • Quantum Information
  • Optical Properties of Materials
  • Magnetic Materials
  • Measurements in Quantum Mechanics
Key Responsibilities
  • Solve research-level physics challenges end-to-end with verifiable derivations, code, and peer-reviewed references.
  • Decompose challenges into standalone checkpoint sub-problems that require genuine physical reasoning.
  • Author Python answer templates with auto-grading functions for symbolic or numerical answers.
  • Audit submitted solutions for correctness, scope, and method soundness, providing actionable feedback across iterations.
  • Adjudicate between parallel solver attempts to determine which solution becomes the golden reference.
  • Document chain-of-thought reasoning, error tolerances, equivalent symbolic forms, and verification test cases.
Ideal Qualifications
  • Solver: PhD or postdoc in the relevant subfield (senior PhD student minimum).
  • Auditor: Postdoc or junior professor in the relevant subfield (PhD minimum).
  • Adjudicator: Full professor or industry research PI in the relevant subfield (senior postdoc or junior professor minimum).
  • Hands-on familiarity with at least two canonical methods of the target subfield, demonstrated through publications (broader coverage strongly preferred).
  • 3, 5 representative publications (arXiv ID or DOI), ideally within the last ~5 years and in the target subfield.
  • Working proficiency with LaTeX, Python, Jupyter, and SymPy.
  • Strong written English skills (B2/C1/C2 minimum; native or near-native preferred).
More About the Opportunity
  • Expected commitment: ~10 hours/week, sustained across an 8, 10 week window per task pool.
  • Pay range: $80, $140 per hour, based on role and demonstrated expertise.
  • Asynchronous work environment.

Related Jobs