ML Engineer – Model Training Infrastructure, ByteDance

ML Engineer - Model Training Infrastructure, ByteDance

Company ByteDance
Job title Machine Learning Engineer – Model Training Infrastructure
Job location San Jose, CA, US
Type Full Time

Responsibilities:

  • Responsible for the design and implementation of a global-scale machine learning system for feeds, ads, and search ranking models.
  • Responsible for improving usability and flexibility of the machine learning infrastructure.
  • Responsible for improving the workflow of model training and serving, data pipelines, storage system, and resource management for multi-tenancy machine learning systems.
  • Responsible for designing and developing key components of ML infrastructure and mentoring interns.

Requirements & Skills:

  • At least 5 years of experience in developing and deploying large-scale systems.
  • Proficient in C/C++/CUDA/Python, and have solid programming skills.
  • Familiar with deep learning frameworks (TensorFlow/Pytorch).
  • Experience in improving core machine learning infrastructure(TensorFlow, Pytorch, and Jax).
  • Experience contributing to an open-source machine learning framework (TensorFlow/PyTorch).
  • Experience in using/designing open-source machine learning lifecycle management systems: TFX

apply for job button