
Company |
Scale |
Job title |
AI Infrastructure Engineer |
Job location |
San Francisco, CA; New York, NY |
Type |
Full Time |
Responsibilities:
- Build highly available, observable, performant, and cost-effective APIs for model training.
- Participate in our team’s on-call process to ensure the availability of our services.
- Own projects end-to-end, from requirements, scoping, and design, to implementation, in a highly collaborative and cross-functional environment.
- Exercise good taste in building systems and tools and know when to make build vs. buy tradeoffs, with an eye for cost efficiency.
Requirements & Skills:
- 4+ years of experience building machine learning training pipelines or inference services in a production setting.
- Experience with distributed training techniques such as DeepSpeed, FSDP, etc.
- Experience building, deploying, and monitoring complex microservice architectures.
- Experience with Python, Docker, Kubernetes, and Infrastructure as code (e.g. terraform).
