Company | AI21 Labs |
Job title | DevOps Engineer |
Job location | Tel Aviv-Yafo, Israel |
Type | Full Time |
Responsibilities:
- Cloud & Kubernetes Expertise: Design and implement highly scalable multi-cluster Kubernetes environments across GCP & AWS.
- Developer Experience & Enablement: Lead the development of self-service tools and automation that improve efficiency for R&D teams.
- Incident & Reliability Engineering: Work with engineering teams to optimize cost, performance, and reliability of production infrastructure through monitoring, capacity planning, and scaling strategies.
- Security & Governance: Contribute to best practices for RBAC, IAM, cloud security, and compliance while ensuring infrastructure reliability.
- Automation & Infrastructure as Code: Drive adoption of GitOps workflows and Infrastructure as Code (Terraform, Helm, Crossplane) to enhance automation and consistency.
- Mentorship & Team Growth: Provide technical mentorship within the platform engineering team and contribute to knowledge-sharing across R&D.
- Cross-Team Collaboration: Work closely with engineering teams to align cloud infrastructure goals with business needs and reliability requirements.
Requirements & Skills:
- 5+ years of DevOps or SRE experience
- 3+ years working with public cloud platforms (AWS, GCP) at scale
- Deep Kubernetes expertise, including managing large-scale, multi-cluster enterprise-grade Kubernetes environments
- Experience designing and managing Custom Resource Definitions (CRDs) and custom controllers
- Strong background in Infrastructure as Code (Terraform, Helm) and GitOps principles (ArgoCD, Crossplane, FluxCD, etc.)
- Hands-on experience in observability & monitoring (Prometheus, Grafana, Datadog, OpenTelemetry, etc.)
- Proficiency in scripting & automation (Python, Go, Bash) for infrastructure automation
- Expertise in cloud networking (VPC, load balancers, service meshes) and security best practices (RBAC, IAM, security groups, network policies, etc.)
- Experience with CI/CD pipelines, optimizing for performance, security, and developer velocity
Nice-to-Have:
- Experience with self-hosted on-prem deployments and managed private VPC deployments (Bring Your Own Cloud models)
- Advanced expertise in Helm and Crossplane for Kubernetes resource management.
- Other cloud provider experience
- Experience in GenAI or large-scale SaaS platforms
- Familiarity with SQL/NoSQL databases and distributed systems
- DevSecOps experience, with a strong understanding of security automation and compliance frameworks