Work closely with backend and ML engineering teams to design, deploy, and maintain reliable, high-performance, and secure cloud infrastructure for our AI engine and Studio.
Facilitate a “you build it, you run it” culture by providing the necessary tools and processes for monitoring the reliability, availability, and performance of services.
Manage CI/CD pipelines to ensure smooth and efficient code integration and deployment.
Identify and implement opportunities to enhance engineering speed and efficiency.
Conduct root cause analysis to identify critical issues and develop automated solutions to prevent recurrence.
Develop and share best practices to improve automation and efficiency across our engineering teams.
Requirements & Skills:
7 years of experience in software engineering.
5 years of experience with infrastructure-as-code.
Proficiency in managing Kubernetes clusters and applications, including creating Kustomize manifests/Helm charts for new applications.
Experience in creating and maintaining CI/CD pipelines for both applications and infrastructure deployments (using tools like Terraform/Terragrunt, ArgoCD, GitHub Actions, Ansible, etc.).
Deep knowledge of at least one major cloud provider (Google Cloud Platform, Microsoft Azure, Oracle Cloud).
Proficient in at least one backend programming/scripting language such as Golang, Python, and Bash.