Join a groundbreaking project focused on developing a large-scale benchmark to evaluate the capabilities of advanced AI systems in solving complex scientific and engineering challenges. As an expert in astrophysics and cosmology, you will design intricate computational problems that assess whether AI can effectively utilize real scientific software for research-level tasks, including running simulations, interpreting results, and designing experiments.

Key Responsibilities

Your primary responsibilities will include:

Creating original, graduate-level problems based on authentic scientific workflows.
Testing these problems against cutting-edge AI models and refining them to achieve the desired level of difficulty.
Designing problems that require the AI to perform complex, multi-step workflows and plan strategic queries or experiments.
Engaging in a testing loop to ensure each problem meets the target difficulty.

Domains & Tools We''re Hiring For

We are particularly interested in candidates with extensive hands-on experience in:

Astrophysics & Cosmology, utilizing astropy and related tools for cosmological calculations, angular power spectra, galaxy survey analysis, and observational data reduction pipelines.

What Makes a Strong Candidate

The ideal candidate will possess:

Graduate-level expertise (MS or PhD preferred) in astrophysics or cosmology, with practical experience using relevant tools.
A track record of writing code with scientific libraries to address real research problems, along with an understanding of their limitations and complexities.
A mindset akin to a puzzle designer, capable of crafting challenges that require smart reasoning and analysis rather than mere computation.

Requirements

Graduate-level training in a relevant STEM field (MS, PhD, or equivalent research experience).
Proven proficiency with at least one scientific software library, demonstrated through research publications, open-source contributions, or professional experience.
Strong Python programming skills for writing problem setups, oracle functions, and solution validators.
Ability to work independently and iterate on problem designs based on feedback.
Comfortable operating in a Linux/terminal environment with remote compute sandboxes.
Availability for at least 15, 20 hours per week.

Nice to Have

Experience across multiple domains or tools listed.
Familiarity with benchmark or evaluation design.
Background in scientific teaching or exam/problem-set design.
Experience with computational reproducibility and containerized environments.

Please note that this application includes a coding assessment as part of the evaluation process.

Astrophysicist for AI Model Evaluation

About this role

Related Jobs

Physics Researcher for AI Model Training

Physics Expert for AI Model Training

Physicist for AI Model Training

Physics Research Auditor for AI Model Training

Energy Auditor for AI Model Evaluation