AI Infrastructure Engineer, Scale

AI Infrastructure Engineer, Scale

Company Scale
Job title AI Infrastructure Engineer
Job location San Francisco, CA; New York, NY
Type Full Time

Responsibilities:

  • Build highly available, observable, performant, and cost-effective APIs for model training.
  • Participate in our team’s on-call process to ensure the availability of our services.
  • Own projects end-to-end, from requirements, scoping, and design, to implementation, in a highly collaborative and cross-functional environment.
  • Exercise good taste in building systems and tools and know when to make build vs. buy tradeoffs, with an eye for cost efficiency.

Requirements & Skills:

  • 4+ years of experience building machine learning training pipelines or inference services in a production setting.
  • Experience with distributed training techniques such as DeepSpeed, FSDP, etc.
  • Experience building, deploying, and monitoring complex microservice architectures.
  • Experience with Python, Docker, Kubernetes, and Infrastructure as code (e.g. terraform).

apply for job button