Be part of a team working on building out scalable infrastructure to train, evaluate, deploy, perform inference, and monitor our ML models
Build, deploy, and maintain generative AI services & applications
Create data systems to collect, clean, label, and store data used for model features
Deploy and manage various applications in our Kubernetes clusters
Collaborate with Machine Learning engineers to build & support state-of-the-art experimentation platforms, training framework,s and associated tools
Work with stakeholders on requirements and solutions for ML infrastructure
And of course, you will be coding every day!
Requirements & Skills:
5+ years of industry experience building production-level ML platforms and infrastructure, including experience building ML systems/pipelines from the ground up.
Ability to write high-quality code in Python, Java, or Scala
Experience building production-ready RESTful APIs, as well as having scaled platforms in production to a large number of users.
A desire to own large parts of an ML Platform, with a strong understanding of ML models & principles.
Experience working with containers and deploying applications to Kubernetes
Experience with LLMs and building infrastructure to support LLM applications
Experience with relational and low-latency databases
Experience with transforming data in both batch and streaming contexts
A desire to learn new technologies quickly, and a proven track record of making quality vs. deadline tradeoffs in fast-paced environments.
Ability to scope out a large project and manage it through project delivery
Strong communication skills and ability to generate consensus and buy-in within the team
Organizational skills and the ability to simplify complex problems and prioritize what matters most for the sake of the team and the business
Experience working with highly sensitive data in a healthcare environment
Experience working with ML frameworks such as PyTorch, SciKit-learn, XGboost
Experience working with ML Ops tools such as MLFlow, Kubeflow, AWS Sagemaker
Experience building solutions on cloud infrastructure, particularly AWS