Research Scientist Intern – ML, ByteDance

Research Scientist Intern - ML, ByteDance

Company ByteDance
Job title Research Scientist Intern – Machine Learning System
Job location San Jose, California, US
Type Full Time

Requirements & Skills:

  • Currently in a PhD program in distributed, parallel computing principles and know the recent advances in computing, storage, networking, and hardware technologies.
  • Familiar with machine learning algorithms, platforms, and frameworks such as PyTorch and Jax.
  • Have a basic understanding of how GPU and/or ASIC works.
  • Expert in at least one or two programming languages ​​in Linux environment: C/C++, CUDA, Python.
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment.
  • GPU-based high-performance computing, RDMA high-performance network (MPI, NCCL, ibverbs).
  • Distributed training framework optimizations such as DeepSpeed, FSDP, Megatron, and GSPMD.
  • AI compiler stacks such as torch.fx, XLA, and MLIR. – Large-scale data processing and parallel computing.
  • Experiences in designing and operating large-scale systems in cloud computing or machine learning.
  • Experiences in in-depth CUDA programming and performance tuning (cutlass, triton).

apply for job button