AI HPC Systems Performance Jobs in Santa Clara, CA

76 jobs (page 1)

Categories

All Categories

Engineering (36)

Software/IT (9)

AI / HPC Systems…

Meta (Menlo Park, CA)

…fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more

Meta (06/18/25)
- Save Job - Related Jobs - Block Source
Production Systems Engineer, Sustaining

Meta (Menlo Park, CA)

…hardware and software components, co-design 15. Experience in developing or debugging AI / HPC systems , performance optimizations, including familiarity ... or supporting production hardware at scale 9. Experience in deploying and productionizing AI / HPC systems and/or related components at scale 10. Experience in… more

Meta (06/25/25)
- Save Job - Related Jobs - Block Source
Senior AI - HPC Cluster Engineer

NVIDIA (Santa Clara, CA)

…to work effectively with diverse teams and individuals. + Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Passion for ... GPU compute clusters that run demanding deep learning, high performance computing, and computationally intensive workloads. We seek a...storage systems like Lustre and GPFS for AI / HPC workloads + Familiarity with deep learning… more

NVIDIA (07/12/25)
- Save Job - Related Jobs - Block Source
Senior AI - HPC Storage Engineer

NVIDIA (Santa Clara, CA)

…designing and operating large scale storage infrastructure. + Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Experience ... join us today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership...solutions to enable runs of demanding deep learning, high performance computing, and computationally intensive workloads. We seek an… more

NVIDIA (05/07/25)
- Save Job - Related Jobs - Block Source
Senior Observability Architect, AI…

NVIDIA (Santa Clara, CA)

…looking for a technical leader to define a vision and roadmap for distributed observability systems for large-scale AI and HPC clusters and workloads and ... and visualization to spectacularly improve efficiency, performance , and productivity of AI and HPC workloads. You will lead technical teams to develop,… more

NVIDIA (05/15/25)
- Save Job - Related Jobs - Block Source
Senior HPC and AI Networking…

NVIDIA (Santa Clara, CA)

…fit for you, we'd love to hear from you! NVIDIA is seeking a Senior High Performance Computing ( HPC ) and AI Networking Performance Research and Analysis ... In this exciting role, you will profile and analyze AI workloads on large GPUs and CPUs scale clusters...and platforms, such as HCAs, Switches, CPUs, GPUs, and Systems . You will develop performance analysis tools… more

NVIDIA (07/11/25)
- Save Job - Related Jobs - Block Source
Senior Solution Architect, HPC…

NVIDIA (Santa Clara, CA)

…Be Doing: + Primary responsibilities will include building and enabling robust AI / HPC infrastructure for customers + Support operational and reliability aspects ... of large-scale AI clusters, focusing on performance at scale,...in working with customers + Expertise with parallel file systems (eg Lustre, GPFS, BeeGFS, WekaIO) and high-speed interconnects… more

NVIDIA (06/18/25)
- Save Job - Related Jobs - Block Source
Sr. Worldwide Specialist Solutions Architect,…

Amazon (Santa Clara, CA)

…computing and its potential to overcome some of the biggest challenges in High Performance Computing ( HPC )? Do you have a unique combination of deep technical ... C++, Python, CUDA, Bash - Deep GPU knowledge in HPC and/or AI /ML frameworks. Preferred Qualifications -...life sciences or related discipline. - Working knowledge of HPC schedulers and distributed/parallel file systems , underlying… more

Amazon (06/12/25)
- Save Job - Related Jobs - Block Source
Senior Software Architect - Deep Learning…

NVIDIA (Santa Clara, CA)

…vision? What you will be doing: + Investigate opportunities to improve communication performance by identifying bottlenecks in today's systems . + Design and ... implement new communication technologies to accelerate AI and HPC workloads. + Explore innovative solutions in HW and SW for our next generation platforms as… more

NVIDIA (05/05/25)
- Save Job - Related Jobs - Block Source
Senior HPC Engineer, Infrastructure…

NVIDIA (Santa Clara, CA)

…and to power data centers. Join the team building many of the largest and fastest AI / HPC systems in the world! NVIDIA is looking for someone with the ... and internal teams to analyze, define, and implement large-scale AI / HPC projects. These efforts include a combination...they begin rolling out some of the most sophisticated systems in the world! + Provide feedback to internal… more

NVIDIA (06/12/25)
- Save Job - Related Jobs - Block Source
Sr. Software Development Engineer, HPC /ML…

Amazon (Cupertino, CA)

Description We are seeking an experienced engineer to work on distributed AI /ML systems . This role involves working on collective operations - the fundamental ... operations that enable AI to scale across multiple accelerators & servers. Most...building networking solutions that for Machine Learning (ML) and High- Performance Computing ( HPC ) workloads on AWS. We… more

Amazon (05/14/25)
- Save Job - Related Jobs - Block Source
Senior Software Engineer - HPC

NVIDIA (Santa Clara, CA)

…long term maintenance strategy. What you'll be doing: + Design highly available and scalable systems to meet the demands of our HPC clusters + Evaluate new and ... graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI and enabled the next era of computing. NVIDIA is a "learning… more

NVIDIA (05/28/25)
- Save Job - Related Jobs - Block Source
Senior HPC Architect, Networking

NVIDIA (Santa Clara, CA)

…improved workflows and develop new, leading differentiated solutions. You will interact with HPC , OS, GPU compute, and systems specialist to architect, develop ... parallel computing. More recently, GPU deep learning ignited modern AI - the next era of computing. NVIDIA is...looking for an outstanding hands-on architect/engineer for a Senior HPC architect role to support deployment and bringup of… more

NVIDIA (07/09/25)
- Save Job - Related Jobs - Block Source
Senior Site Reliability Engineer, HPC…

NVIDIA (Santa Clara, CA)

…experience. Ways to stand out from the crowd: + Experience analyzing and tuning performance for a variety of HPC or EDA workloads. + Solid understanding ... NVIDIA is the leader in AI , machine learning and datacenter acceleration. NVIDIA is...and operate these clusters at high reliability, efficiency, and performance and drive foundational improvements and automation to improve… more

NVIDIA (07/03/25)
- Save Job - Related Jobs - Block Source
HPC Middleware Developer

NVIDIA (Santa Clara, CA)

…Networking Protocols InfiniBand, Ethernet + Knowledge in computer architecture and operating systems + Experience in performance optimizations + MSc or ... We are now looking for a senior HPC software engineer. As a member of our the High Performance Computing Software development team, you will be responsible for… more

NVIDIA (06/30/25)
- Save Job - Related Jobs - Block Source
Research Scientist, AI & Systems…

Meta (Menlo Park, CA)

…on existing accelerator systems and guiding the future of models and AI HW at Meta. This drives improved performance , new model architectures and ... the following areas: Accelerators/GPU architectures, High Performance Computing ( HPC ), Machine Learning Compilers, Training/Inference ML Systems , Model… more

Meta (04/14/25)
- Save Job - Related Jobs - Block Source
Senior Site Reliability Engineer - AI…

NVIDIA (Santa Clara, CA)

… infrastructure. + Passion for solving complex technical challenges and optimizing system performance . + Experience with AI / HPC advanced job schedulers, and ... support operational and reliability aspects of large scale distributed systems with focus on performance at scale,...storage systems like Lustre and GPFS for AI / HPC workloads. + Familiarity with deep learning… more

NVIDIA (06/25/25)
- Save Job - Related Jobs - Block Source
Software Manager, AI Infrastructure System

NVIDIA (Santa Clara, CA)

…maintain infrastructure and large-scale applications for LLM-based solutions. Optimize these systems for performance , scalability, reliability, and secure data ... Strong technical background in cloud/distributed infrastructure + Experience debugging functional and performance issues in HPC GPU clusters + Background in… more

NVIDIA (07/01/25)
- Save Job - Related Jobs - Block Source
Software Engineering Manager - AI…

Meta (Menlo Park, CA)

…Qualifications: 7. Experience in leading teams working on high performance computing ( HPC ) and AI /ML systems , including: 8. Communication libraries (eg, ... of Meta AI infrastructure! **Required Skills:** Software Engineering Manager - AI Systems Co-Design Responsibilities: 1. Lead and support the communications… more

Meta (07/02/25)
- Save Job - Related Jobs - Block Source
Distinguished Engineer, AI Resiliency

NVIDIA (Santa Clara, CA)

… or AI , ideally across the entire lifecycle-from design to deployment-of large-scale High- Performance Computing ( HPC ) systems . Ways to Stand Out from the ... experience in software architecture or related fields, with a deep understanding of AI -optimized systems . + Excellent and proven ability to collaborate and… more

NVIDIA (07/12/25)
- Save Job - Related Jobs - Block Source

"Juju

Recent Searches

Recent Jobs

Account Login

Sign Up

Forgot your password?

Advanced Search