Machine Learning Engineer – Model Training Infrastructure
Job location
San Jose, CA, US
Type
Full Time
Responsibilities:
Responsible for the design and implementation of a global-scale machine learning system for feeds, ads, and search ranking models.
Responsible for improving usability and flexibility of the machine learning infrastructure.
Responsible for improving the workflow of model training and serving, data pipelines, storage system, and resource management for multi-tenancy machine learning systems.
Responsible for designing and developing key components of ML infrastructure and mentoring interns.
Requirements & Skills:
At least 5 years of experience in developing and deploying large-scale systems.
Proficient in C/C++/CUDA/Python, and have solid programming skills.
Familiar with deep learning frameworks (TensorFlow/Pytorch).
Experience in improving core machine learning infrastructure(TensorFlow, Pytorch, and Jax).
Experience contributing to an open-source machine learning framework (TensorFlow/PyTorch).
Experience in using/designing open-source machine learning lifecycle management systems: TFX