Participate directly in every part of the development life cycle of ML models: conception, implementation, automated testing, deployment, monitoring, etc.
Investigate, analyze, and improve the performance of our models and systems – including meeting critical SLOs for training models at scale and low-latency inference.
Maintain and improve several of our most important client-facing product features.
Facilitate the adoption and utilization of ML platform and observability resources and guidelines to improve operational efficiency and service reliability.
Engage with your community of peers to challenge the status quo, improve our shared ways of working, and influence overall architecture decisions.
Learn, utilize, and evolve our data and tech stack which includes Python, AWS, Kubernetes, Pytorch, Terraform, Snowflake, Honeycomb, and others
Requirements & Skills:
You have 5+ years of Machine Learning industry experience.
You have operationalized, instrumented, and supported AI models in production at a non-trivial scale before
You are fluent in good data and software engineering practices, and you are able to develop the tools and culture which enable your team to deliver reliable production code in an efficient manner.
You enjoy collaborating with scientists on a daily basis to understand their pain points and figure out how to improve their tools and increase their efficiency. You also have experience working in cross-functional teams.