AI HPC Systems Performance Jobs in Milpitas, CA

62 jobs (page 1)

Categories

All Categories

Engineering (28)

Software/IT (7)

AI / HPC Systems…

Meta (Menlo Park, CA)

…fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...a loss-less fabric interconnect with minimal latency. To improve performance of these systems we constantly look… more

Meta (04/20/25)
- Save Job - Related Jobs - Block Source
AI / HPC Systems…

Meta (Menlo Park, CA)

…fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more

Meta (06/18/25)
- Save Job - Related Jobs - Block Source
Production Systems Engineer, Sustaining

Meta (Menlo Park, CA)

…hardware and software components, co-design 15. Experience in developing or debugging AI / HPC systems , performance optimizations, including familiarity ... or supporting production hardware at scale 9. Experience in deploying and productionizing AI / HPC systems and/or related components at scale 10. Experience in… more

Meta (05/20/25)
- Save Job - Related Jobs - Block Source
Senior AI - HPC Cluster Engineer

NVIDIA (Santa Clara, CA)

…to work effectively with diverse teams and individuals. + Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Passion for ... GPU compute clusters that run demanding deep learning, high performance computing, and computationally intensive workloads. We seek a...storage systems like Lustre and GPFS for AI / HPC workloads + Familiarity with deep learning… more

NVIDIA (04/02/25)
- Save Job - Related Jobs - Block Source
Senior AI - HPC Storage Engineer

NVIDIA (Santa Clara, CA)

…designing and operating large scale storage infrastructure. + Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Experience ... join us today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership...solutions to enable runs of demanding deep learning, high performance computing, and computationally intensive workloads. We seek an… more

NVIDIA (05/07/25)
- Save Job - Related Jobs - Block Source
Senior Observability Architect, AI…

NVIDIA (Santa Clara, CA)

…looking for a technical leader to define a vision and roadmap for distributed observability systems for large-scale AI and HPC clusters and workloads and ... and visualization to spectacularly improve efficiency, performance , and productivity of AI and HPC workloads. You will lead technical teams to develop,… more

NVIDIA (05/15/25)
- Save Job - Related Jobs - Block Source
AI / HPC Network Engineer

Meta (Menlo Park, CA)

…requirements of RDMA workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look for opportunities across ... fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Network Engineer Responsibilities: 1. Design, develop, test and… more

Meta (05/08/25)
- Save Job - Related Jobs - Block Source
Senior Solution Architect, HPC…

NVIDIA (Santa Clara, CA)

…Be Doing: + Primary responsibilities will include building and enabling robust AI / HPC infrastructure for customers + Support operational and reliability aspects ... of large-scale AI clusters, focusing on performance at scale,...in working with customers + Expertise with parallel file systems (eg Lustre, GPFS, BeeGFS, WekaIO) and high-speed interconnects… more

NVIDIA (06/18/25)
- Save Job - Related Jobs - Block Source
Sr. Worldwide Specialist Solutions Architect,…

Amazon (Santa Clara, CA)

…computing and its potential to overcome some of the biggest challenges in High Performance Computing ( HPC )? Do you have a unique combination of deep technical ... C++, Python, CUDA, Bash - Deep GPU knowledge in HPC and/or AI /ML frameworks. Preferred Qualifications -...life sciences or related discipline. - Working knowledge of HPC schedulers and distributed/parallel file systems , underlying… more

Amazon (06/12/25)
- Save Job - Related Jobs - Block Source
Senior Software Architect - Deep Learning…

NVIDIA (Santa Clara, CA)

…vision? What you will be doing: + Investigate opportunities to improve communication performance by identifying bottlenecks in today's systems . + Design and ... implement new communication technologies to accelerate AI and HPC workloads. + Explore innovative solutions in HW and SW for our next generation platforms as… more

NVIDIA (05/05/25)
- Save Job - Related Jobs - Block Source
Senior HPC Engineer, Infrastructure…

NVIDIA (Santa Clara, CA)

…and to power data centers. Join the team building many of the largest and fastest AI / HPC systems in the world! NVIDIA is looking for someone with the ... and internal teams to analyze, define, and implement large-scale AI / HPC projects. These efforts include a combination...they begin rolling out some of the most sophisticated systems in the world! + Provide feedback to internal… more

NVIDIA (06/12/25)
- Save Job - Related Jobs - Block Source
Sr. Software Development Engineer, HPC /ML…

Amazon (Cupertino, CA)

Description We are seeking an experienced engineer to work on distributed AI /ML systems . This role involves working on collective operations - the fundamental ... operations that enable AI to scale across multiple accelerators & servers. Most...building networking solutions that for Machine Learning (ML) and High- Performance Computing ( HPC ) workloads on AWS. We… more

Amazon (05/14/25)
- Save Job - Related Jobs - Block Source
Senior Software Engineer - HPC

NVIDIA (Santa Clara, CA)

…long term maintenance strategy. What you'll be doing: + Design highly available and scalable systems to meet the demands of our HPC clusters + Evaluate new and ... graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI and enabled the next era of computing. NVIDIA is a "learning… more

NVIDIA (05/28/25)
- Save Job - Related Jobs - Block Source
Senior Site Reliability Engineer, HPC…

NVIDIA (Santa Clara, CA)

…experience. Ways to stand out from the crowd: + Experience analyzing and tuning performance for a variety of HPC or EDA workloads. + Solid understanding ... NVIDIA is the leader in AI , machine learning and datacenter acceleration. NVIDIA is...and operate these clusters at high reliability, efficiency, and performance and drive foundational improvements and automation to improve… more

NVIDIA (04/04/25)
- Save Job - Related Jobs - Block Source
HPC Middleware Developer

NVIDIA (Santa Clara, CA)

…Protocols InfiniBand, Ethernet + Deep knowledge in computer architecture and operating systems + Experience in performance optimizations + MSc or equivalent ... We are now looking for a senior HPC software engineer. As a member of our the High Performance Computing Software development team, you will be responsible for… more

NVIDIA (05/17/25)
- Save Job - Related Jobs - Block Source
Senior Solutions Architect, HPC…

NVIDIA (Santa Clara, CA)

NVIDIA is looking for an experienced GPU and network systems Solutions Architect & Engineer. Do you want to be part of a team that brings new Artificial Intelligence ... ( AI ) hardware and software technologies to production in customer...Demonstrate subject matter expertise in advanced GPU & network systems and be a trusted technical advisor to NVIDIA's… more

NVIDIA (06/05/25)
- Save Job - Related Jobs - Block Source
Research Scientist, AI & Systems…

Meta (Menlo Park, CA)

…on existing accelerator systems and guiding the future of models and AI HW at Meta. This drives improved performance , new model architectures and ... the following areas: Accelerators/GPU architectures, High Performance Computing ( HPC ), Machine Learning Compilers, Training/Inference ML Systems , Model… more

Meta (04/14/25)
- Save Job - Related Jobs - Block Source
Senior Site Reliability Engineer - AI…

NVIDIA (Santa Clara, CA)

… infrastructure. + Passion for solving complex technical challenges and optimizing system performance . + Experience with AI / HPC advanced job schedulers, and ... support operational and reliability aspects of large scale distributed systems with focus on performance at scale,...storage systems like Lustre and GPFS for AI / HPC workloads. + Familiarity with deep learning… more

NVIDIA (03/26/25)
- Save Job - Related Jobs - Block Source
Software Engineering Manager - AI…

Meta (Menlo Park, CA)

…Qualifications: 7. Experience in leading teams working on high performance computing ( HPC ) and AI /ML systems , including: 8. Communication libraries (eg, ... of Meta AI infrastructure! **Required Skills:** Software Engineering Manager - AI Systems Co-Design Responsibilities: 1. Lead and support the communications… more

Meta (04/02/25)
- Save Job - Related Jobs - Block Source
Systems Development Eng (AWS Generative…

Amazon (Cupertino, CA)

…and operating AWS cloud offerings that enable high performance and scalability in AI /ML and HPC workloads. You are intrigued by the continuous release of ... Want to do industry leading work delivering continuous price performance improvements in the cloud for AI ...have tremendous interest in cloud scale and curious how systems and software decisions impact the user. You insist… more

Amazon (06/11/25)
- Save Job - Related Jobs - Block Source

"Juju

Recent Searches

Recent Jobs

Account Login

Sign Up

Forgot your password?

Advanced Search