• AI / HPC Systems

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...a loss-less fabric interconnect with minimal latency. To improve performance of these systems we constantly look… more
    Meta (04/20/25)
    - Save Job - Related Jobs - Block Source
  • AI / HPC Systems

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
    Meta (06/18/25)
    - Save Job - Related Jobs - Block Source
  • Production Systems Engineer, Sustaining

    Meta (Menlo Park, CA)
    …hardware and software components, co-design 15. Experience in developing or debugging AI / HPC systems , performance optimizations, including familiarity ... or supporting production hardware at scale 9. Experience in deploying and productionizing AI / HPC systems and/or related components at scale 10. Experience in… more
    Meta (05/20/25)
    - Save Job - Related Jobs - Block Source
  • Senior AI - HPC Cluster Engineer

    NVIDIA (Santa Clara, CA)
    …to work effectively with diverse teams and individuals. + Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Passion for ... GPU compute clusters that run demanding deep learning, high performance computing, and computationally intensive workloads. We seek a...storage systems like Lustre and GPFS for AI / HPC workloads + Familiarity with deep learning… more
    NVIDIA (04/02/25)
    - Save Job - Related Jobs - Block Source
  • Senior AI - HPC Storage Engineer

    NVIDIA (Santa Clara, CA)
    …designing and operating large scale storage infrastructure. + Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Experience ... join us today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership...solutions to enable runs of demanding deep learning, high performance computing, and computationally intensive workloads. We seek an… more
    NVIDIA (05/07/25)
    - Save Job - Related Jobs - Block Source
  • Senior Observability Architect, AI

    NVIDIA (Santa Clara, CA)
    …looking for a technical leader to define a vision and roadmap for distributed observability systems for large-scale AI and HPC clusters and workloads and ... and visualization to spectacularly improve efficiency, performance , and productivity of AI and HPC workloads. You will lead technical teams to develop,… more
    NVIDIA (05/15/25)
    - Save Job - Related Jobs - Block Source
  • AI / HPC Network Engineer

    Meta (Menlo Park, CA)
    …requirements of RDMA workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look for opportunities across ... fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Network Engineer Responsibilities: 1. Design, develop, test and… more
    Meta (05/08/25)
    - Save Job - Related Jobs - Block Source
  • Senior Solution Architect, HPC

    NVIDIA (Santa Clara, CA)
    …Be Doing: + Primary responsibilities will include building and enabling robust AI / HPC infrastructure for customers + Support operational and reliability aspects ... of large-scale AI clusters, focusing on performance at scale,...in working with customers + Expertise with parallel file systems (eg Lustre, GPFS, BeeGFS, WekaIO) and high-speed interconnects… more
    NVIDIA (06/18/25)
    - Save Job - Related Jobs - Block Source
  • Sr. Worldwide Specialist Solutions Architect,…

    Amazon (Santa Clara, CA)
    …computing and its potential to overcome some of the biggest challenges in High Performance Computing ( HPC )? Do you have a unique combination of deep technical ... C++, Python, CUDA, Bash - Deep GPU knowledge in HPC and/or AI /ML frameworks. Preferred Qualifications -...life sciences or related discipline. - Working knowledge of HPC schedulers and distributed/parallel file systems , underlying… more
    Amazon (06/12/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Architect - Deep Learning…

    NVIDIA (Santa Clara, CA)
    …vision? What you will be doing: + Investigate opportunities to improve communication performance by identifying bottlenecks in today's systems . + Design and ... implement new communication technologies to accelerate AI and HPC workloads. + Explore innovative solutions in HW and SW for our next generation platforms as… more
    NVIDIA (05/05/25)
    - Save Job - Related Jobs - Block Source
  • Senior HPC Engineer, Infrastructure…

    NVIDIA (Santa Clara, CA)
    …and to power data centers. Join the team building many of the largest and fastest AI / HPC systems in the world! NVIDIA is looking for someone with the ... and internal teams to analyze, define, and implement large-scale AI / HPC projects. These efforts include a combination...they begin rolling out some of the most sophisticated systems in the world! + Provide feedback to internal… more
    NVIDIA (06/12/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer - HPC

    NVIDIA (Santa Clara, CA)
    …long term maintenance strategy. What you'll be doing: + Design highly available and scalable systems to meet the demands of our HPC clusters + Evaluate new and ... graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI and enabled the next era of computing. NVIDIA is a "learning… more
    NVIDIA (05/28/25)
    - Save Job - Related Jobs - Block Source
  • Senior Site Reliability Engineer, HPC

    NVIDIA (Santa Clara, CA)
    …experience. Ways to stand out from the crowd: + Experience analyzing and tuning performance for a variety of HPC or EDA workloads. + Solid understanding ... NVIDIA is the leader in AI , machine learning and datacenter acceleration. NVIDIA is...and operate these clusters at high reliability, efficiency, and performance and drive foundational improvements and automation to improve… more
    NVIDIA (04/04/25)
    - Save Job - Related Jobs - Block Source
  • HPC Middleware Developer

    NVIDIA (Santa Clara, CA)
    …Protocols InfiniBand, Ethernet + Deep knowledge in computer architecture and operating systems + Experience in performance optimizations + MSc or equivalent ... We are now looking for a senior HPC software engineer. As a member of our the High Performance Computing Software development team, you will be responsible for… more
    NVIDIA (05/17/25)
    - Save Job - Related Jobs - Block Source
  • Senior Solutions Architect, HPC

    NVIDIA (Santa Clara, CA)
    NVIDIA is looking for an experienced GPU and network systems Solutions Architect & Engineer. Do you want to be part of a team that brings new Artificial Intelligence ... ( AI ) hardware and software technologies to production in customer...Demonstrate subject matter expertise in advanced GPU & network systems and be a trusted technical advisor to NVIDIA's… more
    NVIDIA (06/05/25)
    - Save Job - Related Jobs - Block Source
  • Research Scientist, AI & Systems

    Meta (Menlo Park, CA)
    …on existing accelerator systems and guiding the future of models and AI HW at Meta. This drives improved performance , new model architectures and ... the following areas: Accelerators/GPU architectures, High Performance Computing ( HPC ), Machine Learning Compilers, Training/Inference ML Systems , Model… more
    Meta (04/14/25)
    - Save Job - Related Jobs - Block Source
  • Senior Site Reliability Engineer - AI

    NVIDIA (Santa Clara, CA)
    … infrastructure. + Passion for solving complex technical challenges and optimizing system performance . + Experience with AI / HPC advanced job schedulers, and ... support operational and reliability aspects of large scale distributed systems with focus on performance at scale,...storage systems like Lustre and GPFS for AI / HPC workloads. + Familiarity with deep learning… more
    NVIDIA (03/26/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineering Manager - AI

    Meta (Menlo Park, CA)
    …Qualifications: 7. Experience in leading teams working on high performance computing ( HPC ) and AI /ML systems , including: 8. Communication libraries (eg, ... of Meta AI infrastructure! **Required Skills:** Software Engineering Manager - AI Systems Co-Design Responsibilities: 1. Lead and support the communications… more
    Meta (04/02/25)
    - Save Job - Related Jobs - Block Source
  • Hardware Systems Engineer, AI

    Meta (Menlo Park, CA)
    …7+ years of experience with one subset of the following AI systems : Accelerator (GPU/ASIC), Performance characterization/optimization/tracing/debugging (eg, ... exploring, developing and productizing high- performance software and hardware technologies for AI at datacenter scale. Hardware Systems Engineer in RTP work… more
    Meta (05/24/25)
    - Save Job - Related Jobs - Block Source
  • Technical Program Manager, AI Network Infra

    Meta (Menlo Park, CA)
    AI product introductions and AI operations initiatives supporting Meta's growing AI / HPC infrastructure for our Family of Apps . They will be responsible ... deliver on shared goals. 10. The ideal candidate will have experience in AI / HPC product development and operations, demonstrated experience in the Network… more
    Meta (05/08/25)
    - Save Job - Related Jobs - Block Source