• AI / HPC Systems

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
    Meta (08/22/25)
    - Save Job - Related Jobs - Block Source
  • Senior Partner Solutions Architect - High…

    Amazon (San Francisco, CA)
    …computing and its potential to overcome some of the biggest challenges in High Performance Computing ( HPC )? Do you enjoy tackling large analytical problems as ... - helping them envision and build the future of high- performance computing. Your technical solutions and insights will shape...solutions and insights will shape how partners transform their HPC approaches for the AI era. AWS… more
    Amazon (09/11/25)
    - Save Job - Related Jobs - Block Source
  • Research Scientist, AI & Systems

    Meta (Menlo Park, CA)
    …on existing accelerator systems and guiding the future of models and AI HW at Meta. This drives improved performance , new model architectures and ... the following areas: Accelerators/GPU architectures, High Performance Computing ( HPC ), Machine Learning Compilers, Training/Inference ML Systems , Model… more
    Meta (08/08/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineering Manager - AI

    Meta (Menlo Park, CA)
    …Qualifications: 7. Experience in leading teams working on high performance computing ( HPC ) and AI /ML systems , including: 8. Communication libraries (eg, ... of Meta AI infrastructure! **Required Skills:** Software Engineering Manager - AI Systems Co-Design Responsibilities: 1. Lead and support the communications… more
    Meta (08/01/25)
    - Save Job - Related Jobs - Block Source
  • Technical Program Manager, AI Network Infra

    Meta (Menlo Park, CA)
    AI product introductions and AI operations initiatives supporting Meta's growing AI / HPC infrastructure for our Family of Apps . They will be responsible ... deliver on shared goals 10. The ideal candidate will have experience in AI / HPC product development and operations, demonstrated experience in the Network… more
    Meta (08/01/25)
    - Save Job - Related Jobs - Block Source
  • Technical Sourcing Manager, Advanced Thermal…

    Meta (Fremont, CA)
    …and associated system design trade-offs, particularly for AI and High Performance Computing ( HPC ) systems 21. Experience interfacing with internal ... chain organizations related to data center products, infrastructure, rack design, AI , Compute Hardware, or Mechanical Engineering 12. Proven experience building and… more
    Meta (08/01/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineer, SystemML - AI Networking

    Meta (Menlo Park, CA)
    …following machine learning/deep learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, ... large-scale GPU training and inference fleet through an observable, reliable and high- performance distributed AI /GPU communication stack. Currently, one of the… more
    Meta (08/01/25)
    - Save Job - Related Jobs - Block Source
  • AI Applications Engineer

    quadric.io, Inc (Burlingame, CA)
    …wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high- performance automotive or autonomous vehicle systems . ... Candidates must demonstrate deep technical mastery of Quadric's product ecosystem including HPC Hardware (IP, Chips, Boards), SDK, and various algorithms (NN, DSP,… more
    quadric.io, Inc (08/26/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineer, Accelerator Solutions…

    Meta (Menlo Park, CA)
    …tools, libraries, and frameworks (eg, PyTorch, CUDA) 19. Full-stack experience and understanding of AI / HPC systems , with a focus on the application layer and ... of Meta's accelerators collective communications software library and optimizing distributed AI /ML workloads' performance . This is an opportunity to work… more
    Meta (10/04/25)
    - Save Job - Related Jobs - Block Source
  • Head of Scientific Computing

    Roche (South San Francisco, CA)
    …intensive models, understand AI /ML workflows, and can balance cost, performance , and reliability in high-demand systems . + You have exceptional ... operate cutting-edge scientific platforms and infrastructure, you will enable high- performance computing, AI /ML, and large-scale data processing. Collaborating… more
    Roche (09/06/25)
    - Save Job - Related Jobs - Block Source
  • Network Engineering Manager

    Meta (Menlo Park, CA)
    …These workloads expect a loss-less fabric interconnect with minimal latency. To improve performance of these systems we constantly look for opportunities across ... that the network is running smoothly and meets stringent performance and availability requirements of RDMA (Remote Direct Memory...responsible for design, model, develop, test, deploy and operate AI / HPC Networks at scale 2. Provide continual… more
    Meta (09/17/25)
    - Save Job - Related Jobs - Block Source
  • Research Associate - Machine Learning…

    SLAC National Accelerator Laboratory (Menlo Park, CA)
    …labs. Our team also works on advanced computational workflows linking high performance computing ( HPC ) with experiments in real-time, in collaboration with ... fill open positions to work with our team on AI /ML solutions for challenging problems in modeling and control...unprecedented capabilities in modeling and control of complex, nonlinear systems in particle accelerators, which in turn enables new… more
    SLAC National Accelerator Laboratory (08/13/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineering Manager, MTIA

    Meta (Menlo Park, CA)
    …10. Experience in leading teams working on high performance computing ( HPC ) and AI /ML systems , including: GPU/ASIC-based kernel development and ... ROCm), distributed systems for large scale training and serving, and systems architecture and performance 11. Accelerator (GPU/ASIC) kernel development and… more
    Meta (09/06/25)
    - Save Job - Related Jobs - Block Source