Physics Researcher for AI Model Evaluation
$80–$140/hr
RemoteContractscienceUpdated Jun 13, 2026
Apply NowAbout this role
Role Overview
Join a team of expert physics researchers to create and validate golden reference solutions for the CritPt benchmark, a cutting-edge research-level physics benchmark. This role involves solving complex physics problems from start to finish, auditing solutions from fellow experts, and determining the best solutions to produce fully human-verified reference data for evaluating large language models in advanced physics reasoning.
Physics Subdomains Covered- High Energy Physics & Mathematical Physics
- Biophysics & Statistical Physics
- Condensed Matter & AMO
- Gravitation / Cosmology / Astrophysics
- Quantum Information
- Optical Properties of Materials
- Magnetic Materials
- Measurements in Quantum Mechanics
- Solve research-level physics challenges end-to-end with verifiable derivations, code, and peer-reviewed references.
- Decompose challenges into standalone checkpoint sub-problems that require genuine physical reasoning.
- Author Python answer templates with auto-grading functions for symbolic or numerical answers.
- Audit submitted solutions for correctness, scope, and method soundness, providing actionable feedback across iterations.
- Adjudicate between parallel solver attempts to determine which solution becomes the golden reference.
- Document chain-of-thought reasoning, error tolerances, equivalent symbolic forms, and verification test cases.
- Solver: PhD or postdoc in the relevant subfield (senior PhD student minimum).
- Auditor: Postdoc or junior professor in the relevant subfield (PhD minimum).
- Adjudicator: Full professor or industry research PI in the relevant subfield (senior postdoc or junior professor minimum).
- Hands-on familiarity with at least two canonical methods of the target subfield, demonstrated through publications (broader coverage strongly preferred).
- 3, 5 representative publications (arXiv ID or DOI), ideally within the last ~5 years and in the target subfield.
- Working proficiency with LaTeX, Python, Jupyter, and SymPy.
- Strong written English skills (B2/C1/C2 minimum; native or near-native preferred).
- Expected commitment: ~10 hours/week, sustained across an 8, 10 week window per task pool.
- Pay range: $80, $140 per hour, based on role and demonstrated expertise.
- Asynchronous work environment.