Design, develop, and optimize high-performance AI numeric and data manipulation kernels/operators for GPUs.
Achieve state-of-the-art performance by leveraging software and micro-architectural features of GPUs.
Work with compiler, framework, runtime, and serving teams to deliver end-to-end performance that fully utilizes GPU workstations and servers.
Collaborate with machine learning researchers to guide system development for future ML trends.
Requirements & Skills:
Deep understanding of computer architecture (memory hierarchies, caching, etc.) and their impact on algorithm design.
3+ years of relevant experience working on complex code and software systems.
Self-motivated and independent with the ability to execute on agreed-upon specifications.
Experience with GPU programming languages such as CUDA or OpenCL.
Creativity and curiosity for solving complex problems, a team-oriented attitude that enables you to work well with others, and alignment with our culture.