HPC AI Technologist, Cambridge Computer

HPC AI Technologist, Cambridge Computer

Company Cambridge Computer
Job title HPC AI Technologist
Job location Waltham, MA, USA
Type Full Time

Responsibilities:

  • Gather client requirements, and design optimized solutions, sometimes using a single vendor’s portfolio and more often using a broad variety of vendors and technologies.
  • Deploy a new solution or augment an existing HPC/AI solution from the ground up.
  • Consult on and assist with day-to-day management of clients’ research compute infrastructure environments.
  • Maintain HPC/AI infrastructure in Linux-based environments for new and existing clients.
  • Lead technical discussions and be the face of the client in preparation for and during engagements.
  • Validate solution designs, meet client requirements, and are technically feasible and deployable.
  • Ensure solutions are simple and easy to understand while taking into account the client’s overall capabilities/skills.
  • Scope out and detail professional services deliverables setting clear client expectations.
  • Build documentation and provide knowledge transfer required for clients to support their environments.
  • Display expertise in storage, networking, data protection, digital archiving, and other infrastructure technologies.
  • Gain advanced expertise and certifications from the vendors Cambridge uses in our solution stack.

Requirements & Skills:

  • Candidates must have at least 5+ years of providing deployment services or cluster administration.
  • University undergraduate degree in Computer Science, Computer Engineering, or science-related field required.
  • Candidates must also display a solid knowledge of GPU-focused hardware/ software and Linux system administration (package management, IP networking, troubleshooting, etc.).  They must also have solid fundamentals in cluster design/management technologies (Bright, Werewolf, XCat, etc.), a background with storage technologies and parallel filesystems (Lustre, GPFS, BeeGFS, etc.), experience with networking and configuring network switches (ethernet and InfiniBand), acquaintance with HPC schedulers (SLURM, UGE, LSF, etc.) and programming/libraries (MPI, CUDA, etc.), and proficiency with Scripting (Bash, Python, etc.).
  • Have deep knowledge of tech industry leaders including AMD, DDN, Dell, HPE, IBM, Intel, Juniper, Lenovo, Microsoft, NVIDIA, Oracle, Vast, VMWare, WEKA, and others.
  • As this is a field-based role, the employee must be able to work remotely, independently, and unsupervised.  Travel will be approximately 50% of the time which includes short day trips.
  • Candidates must have impeccable communication skills, an ability to multitask, and high attention to detail.  They must be effective problem solvers, organized, creative, intellectually curious, deal with ambiguity, and be able to work with different types of personalities.
  • Authorization to work in the United States on a full-time basis is required.

apply for job button