Systems & Infrastructure Specialist for AI Model Training
$40–$70/hr
RemoteContracttechnologyUpdated Jun 9, 2026
Apply NowAbout this role
The role of Systems & Infrastructure Specialist offers a unique opportunity to leverage your technical expertise in shaping the future of AI systems. By providing high-quality, real-world input, you will play a crucial role in training next-generation AI models, enhancing their learning, reasoning, and performance capabilities. Your domain knowledge is key, and no prior experience in AI is required.
Key Responsibilities:- Navigate, troubleshoot, and recover dynamic infrastructure and long-running processes in real-time using command-line tools.
- Master and manage highly containerized environments, including orchestrating Dockerized sandboxes and CI/CD workflows.
- Build, maintain, and optimize systems for AI model training and high-throughput compute environments.
- Respond swiftly to system errors, executing dynamic mid-operation replanning and recovery.
- Collaborate with engineering and AI teams to ensure seamless integration, reliability, and performance.
- Document system architectures, incident responses, and recovery protocols with meticulous clarity.
- Contribute expertise to evolving project needs, adapting to new technologies and scaling strategies as required.
- Demonstrated expert proficiency working in terminal environments for system builds, server administration, and infrastructure management.
- Advanced problem-solving skills for multi-step troubleshooting, filesystem navigation, and process management within containerized settings.
- Hands-on experience with Python, Bash, JavaScript/TypeScript, Go, Rust, and/or C/C++.
- Deep familiarity with build systems, package managers, databases, web servers, ML frameworks, version control, and cryptography tools.
- Proven ability to execute dynamic infrastructure recovery and optimize long-running processes under pressure.
- Strong written and verbal communication skills, with a passion for precise technical documentation.
- Systems multilingualism: versatility across operating systems, languages, and emerging DevOps tools.
- Prior experience in high-compute environments for AI/ML workloads.
- Background in Site Reliability Engineering or DevOps roles focused on mission-critical infrastructure.
- Familiarity with advanced container orchestration and distributed system design.
Contract position with remote work flexibility.
Compensation:Hourly rate ranging from $40 to $70.
Eligibility:Open to candidates with the required skills and qualifications.