Senior Machine Learning Engineer, HCA Healthcare

Senior Machine Learning Engineer, HCA Healthcare

Company HCA Healthcare
Job title Senior Machine Learning Engineer
Job location Nashville, TN, United States
Type Full Time

Responsibilities:

  • Tool Development and Management: Build, manage, and maintain tools for system reliability, including dashboards, logging systems, and pager systems.
  • Infrastructure Maintenance: Help maintain and enhance our CI/CD pipelines, logging infrastructure, and other operational systems crucial for MLOps.
  • Monorepo Management: Keep the monorepo up-to-date with the latest dependency and security updates, ensuring a secure and efficient development environment.
  • Vendor Collaboration: Assist in implementing and maintaining infrastructure and systems managed by external vendor teams.
  • Incident Management: Lead and participate in incident management processes, including troubleshooting, root cause analysis, and implementing corrective measures to prevent future occurrences.

Requirements & Skills:

  • AI/ML Knowledge: Solid understanding of AI/ML principles and technologies.
  • System Monitoring and Tools: Experience with system monitoring tools and observability. Knowledge of GCP, Vertex AI, or other cloud platforms is highly beneficial.
  • Programming and Scripting: Proficiency in programming languages such as Python and scripting for automation.
  • Problem-Solving Skills: Strong analytical and problem-solving skills, with the ability to work under pressure.
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
  • 5+ years of experience in the technology field
  • Proven experience in a reliability engineering role, preferably with a focus on AI/ML systems.
  • Experience in incident management and performance optimization.
  • Excellent communication and teamwork skills.

apply for job button