ML Engineer - Model Training Infrastructure, ByteDance

Company	ByteDance
Job title	Machine Learning Engineer – Model Training Infrastructure
Job location	San Jose, CA, US
Type	Full Time

Responsibilities:

Responsible for the design and implementation of a global-scale machine learning system for feeds, ads, and search ranking models.
Responsible for improving usability and flexibility of the machine learning infrastructure.
Responsible for improving the workflow of model training and serving, data pipelines, storage system, and resource management for multi-tenancy machine learning systems.
Responsible for designing and developing key components of ML infrastructure and mentoring interns.

At least 5 years of experience in developing and deploying large-scale systems.
Proficient in C/C++/CUDA/Python, and have solid programming skills.
Familiar with deep learning frameworks (TensorFlow/Pytorch).
Experience in improving core machine learning infrastructure(TensorFlow, Pytorch, and Jax).
Experience contributing to an open-source machine learning framework (TensorFlow/PyTorch).
Experience in using/designing open-source machine learning lifecycle management systems: TFX