Computational Bayesian Statistics and Applied Mathematics Expert for AI Bench...
$70–$100/hr
About this role
Join a groundbreaking project focused on developing a large-scale benchmark to evaluate the capabilities of advanced AI systems in solving complex scientific and engineering challenges. As a task designer, you will create original, graduate-level computational problems that assess whether AI can effectively utilize real scientific software for research tasks such as running simulations, interpreting results, designing experiments, and extracting insights from data.
Key Responsibilities- Design challenging problems that require the proficient use of specialized scientific software, ensuring they are suitable for testing AI capabilities.
- Create problems that involve both straightforward computations and strategic planning, requiring the AI to uncover hidden information through a series of queries or experiments.
- Engage in a testing loop with state-of-the-art AI models, refining problems to achieve the desired level of difficulty.
We are particularly interested in candidates with extensive, hands-on experience in:
- Computational Bayesian Statistics and Applied Mathematics using libraries such as:
- Bayesian statistics: PyMC, PyStan, PyJAGS, CmdStanPy
- Applied mathematics and numerical PDEs: FEniCS, FEniCSx, DOLFINx, scikit-fem, FiPy, Devito, Dedalus
- Computational topology: GUDHI
- Differential algebra: DACEyPy
- Experience with MCMC, Bayesian modeling, finite element or finite difference methods, mesh-based numerical modeling, computational topology, differential algebra, or other specialized Python-based math and statistics methods is valuable.
- Familiarity with other specialized software in this domain will also be considered.
The ideal candidate possesses graduate-level expertise (MS or PhD preferred) in the relevant domain, with practical experience using the specified tools. You should have a proven track record of writing code with these libraries to address real research problems, along with an understanding of their limitations and edge cases.
Strong candidates will also exhibit puzzle design thinking, crafting problems where the challenge arises from logical reasoning rather than mere computation, ensuring that multiple plausible approaches exist but only careful analysis leads to the correct solution.
Requirements- Graduate-level training in a relevant STEM field (MS, PhD, or equivalent research experience).
- Proven proficiency with at least one of the listed scientific software libraries, demonstrated through research publications, open-source contributions, or professional work.
- Strong Python programming skills for writing problem setups, oracle functions, and solution validators.
- Ability to work independently and iteratively refine problem designs based on feedback.
- Comfortable working in a Linux/terminal environment with remote compute sandboxes.
- Availability for at least 15, 20 hours per week.
- Experience across multiple listed domains or tools.
- Familiarity with benchmark or evaluation design.
- Background in scientific teaching or exam/problem-set design.
- Experience with computational reproducibility and containerized environments.
Please note that this application includes a coding assessment as part of the evaluation process.