Staff Deep Learning Systems Engineer (CUDA Specialist)
Nov 09, 2023
Santa Clara, CA
XPeng Motors is one of China’s leading smart electric vehicle (EV) companies. We design, develop, and manufacture smart EVs that are seamlessly integrated with advanced Internet, AI and autonomous driving technologies. We are committed to in-house R&D and intelligent manufacturing to create a better mobility experience for our customers. We strive to transform smart electric vehicles with technology and data, shaping the mobility experience of the future.
We are seeking a highly skilled and motivated Deep Learning Systems Engineer with a strong background in CUDA programming to join our team. The successful candidate will be responsible for implementing and optimizing our large distributed system, a globally distributed scheduling service for efficient and reliable execution of deep learning workloads.
Implement and optimize a large distributed system, focusing on CUDA programming and GPU optimization.
Develop custom servers for CUDA kernel launches and manage JIT kernels.
Implement and manage a hardware abstraction layer for device-specific APIs.
Optimize GPU calls and manage memory allocation APIs.
Handle device synchronization APIs and ensure correct and efficient time-slicing and replica splicing.
Collaborate with the team to design and implement new features and improvements.
Troubleshoot and resolve issues related to CUDA programming and GPU optimization.
Minimum Skill Requirements:
Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related field.
Proven experience in CUDA programming and GPU optimization.
Strong knowledge of deep learning workloads and distributed systems.
Experience with NVIDIA GPUs and related toolkits, such as cuObjDump and nvrtcCompileProgram.
Familiarity with memory allocation and device synchronization APIs.
Strong problem-solving skills and ability to troubleshoot complex software issues.
Excellent communication and teamwork skills.
Preferred Skill Requirements:
Experience with a large distributed system or similar distributed scheduling services.
Knowledge of deep learning frameworks like PyTorch or TensorFlow.
Experience with NCCL or similar collective communication libraries.
Our system introduction: https://www.techgoing.com/xpeng-built-the-largest-self-driving-intelligent-computing-center-in-china-fuyao-with-six-times-more-computing-power-than-chinas-first-supercomputing/
To apply, please submit your resume detailing your experience with CUDA programming and deep learning systems. We look forward to reviewing your application.
The base salary range for this full-time position is $180,000 - $300,000, in addition to bonus, equity and benefits. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training.
We are an Equal Opportunity Employer. It is our policy to provide equal employment opportunities to all qualified persons without regard to race, age, color, sex, sexual orientation, religion, national origin, disability, veteran status or marital status or any other prescribed category set forth in federal or state regulations.