SaidGig

Site Reliability Engineer for AI Systems

$40–$70/hr

RemoteContracttechnologyUpdated Jun 9, 2026
Apply Now

About this role

Role Overview

This position focuses on enhancing the reliability and performance of next-generation AI systems. By leveraging your expertise, you will play a crucial role in shaping how AI models learn and operate, utilizing real-world insights to improve their functionality. No prior AI experience is necessary; your domain knowledge is the key asset.

Key Responsibilities
  • Lead the deployment, monitoring, and recovery of complex, containerized AI training environments using advanced terminal techniques.
  • Proactively identify, diagnose, and resolve infrastructure bottlenecks and failures in long-running processes.
  • Orchestrate resilient system builds and infrastructure management, ensuring stability and optimal resource utilization.
  • Collaborate closely with engineering teams to refine CI/CD pipelines and automate routine operational tasks.
  • Manage and optimize filesystem structures, networked storage, and process scheduling in Dockerized sandboxes.
  • Conduct rapid mid-execution replanning during error states and unforeseen runtime issues.
  • Document best practices, emergent solutions, and contribute to knowledge transfer across the team.
Qualifications
  • Demonstrated expert proficiency with terminal-based problem solving and complex system administration.
  • Mastery of dynamic infrastructure recovery and long-running operational process management.
  • Deep expertise in containerized environments (e.g., Docker, Kubernetes) and sandbox orchestration.
  • Strong Python skills, with the ability to script, automate, and debug real-world production systems.
  • Proficiency in Bash and familiarity with JavaScript/TypeScript, Go, Rust, C/C++.
  • Experience with build systems, package managers, databases, version control, and cryptography tools.
  • Adept at troubleshooting, documenting, and replanning in high-velocity technical environments.
Preferred Qualifications
  • Background in machine learning operations or AI infrastructure.
  • Familiarity with ML frameworks and distributed computing.
  • Experience supporting multi-phase, high-intensity engineering projects.
Work Terms

Contract position with remote work flexibility.

Compensation

Hourly rate ranging from $40 to $70.

Eligibility

Open to candidates with relevant experience and skills.

Related Jobs