Join a groundbreaking project focused on developing a large-scale benchmark to evaluate the capabilities of advanced AI systems in solving complex scientific and engineering challenges. As a task designer, you will create original, graduate-level computational problems that assess whether AI can effectively utilize real scientific software for research tasks such as running simulations, interpreting results, designing experiments, and extracting insights from data.

Key Responsibilities

Design challenging problems that require the proficient use of specialized scientific software, ensuring they are suitable for testing AI capabilities.
Create problems that involve both straightforward computations and strategic planning, requiring the AI to uncover hidden information through a series of queries or experiments.
Engage in a testing loop with state-of-the-art AI models, refining problems to achieve the desired level of difficulty.

Domains & Tools We''re Hiring For

We are particularly interested in candidates with extensive, hands-on experience in:

Computational Bayesian Statistics and Applied Mathematics using libraries such as:

Bayesian statistics: PyMC, PyStan, PyJAGS, CmdStanPy
Applied mathematics and numerical PDEs: FEniCS, FEniCSx, DOLFINx, scikit-fem, FiPy, Devito, Dedalus
Computational topology: GUDHI
Differential algebra: DACEyPy

Experience with MCMC, Bayesian modeling, finite element or finite difference methods, mesh-based numerical modeling, computational topology, differential algebra, or other specialized Python-based math and statistics methods is valuable.
Familiarity with other specialized software in this domain will also be considered.

What Makes a Strong Candidate

The ideal candidate possesses graduate-level expertise (MS or PhD preferred) in the relevant domain, with practical experience using the specified tools. You should have a proven track record of writing code with these libraries to address real research problems, along with an understanding of their limitations and edge cases.

Strong candidates will also exhibit puzzle design thinking, crafting problems where the challenge arises from logical reasoning rather than mere computation, ensuring that multiple plausible approaches exist but only careful analysis leads to the correct solution.

Requirements

Graduate-level training in a relevant STEM field (MS, PhD, or equivalent research experience).
Proven proficiency with at least one of the listed scientific software libraries, demonstrated through research publications, open-source contributions, or professional work.
Strong Python programming skills for writing problem setups, oracle functions, and solution validators.
Ability to work independently and iteratively refine problem designs based on feedback.
Comfortable working in a Linux/terminal environment with remote compute sandboxes.
Availability for at least 15, 20 hours per week.

Nice to Have

Experience across multiple listed domains or tools.
Familiarity with benchmark or evaluation design.
Background in scientific teaching or exam/problem-set design.
Experience with computational reproducibility and containerized environments.

Please note that this application includes a coding assessment as part of the evaluation process.

Computational Bayesian Statistics and Applied Mathematics Expert for AI Bench...

About this role

Related Jobs

Physics Researcher for AI Model Training

Physics Expert for AI Model Training

Physicist for AI Model Training

Physics Research Auditor for AI Model Training

Mathematician for AI Model Training