• AI / HPC Network

    Meta (Menlo Park, CA)
    … fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Network Engineer Responsibilities: 1. Design, develop, ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...daily basis. We need to build and evolve our network infrastructure that connects myriads of GPUs together. In… more
    Meta (05/08/25)
    - Save Job - Related Jobs - Block Source
  • AI / HPC Systems Performance…

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. Lead ... 5. Work with cross functional teams and provide guidance on the AI network architecture including topologies, transport, congestion control techniques. **Minimum… more
    Meta (04/20/25)
    - Save Job - Related Jobs - Block Source
  • Senior HPC Engineer , Infrastructure…

    NVIDIA (Santa Clara, CA)
    NVIDIA is looking for a Senior HPC Engineer to join its Infrastructure Specialists team. Academic, commercial and government groups around the world are using ... and to power data centers. Join the team building many of the largest and fastest AI / HPC systems in the world! NVIDIA is looking for someone with the ability to… more
    NVIDIA (06/12/25)
    - Save Job - Related Jobs - Block Source
  • Senior Solutions Architect, HPC Systems…

    NVIDIA (Santa Clara, CA)
    NVIDIA is looking for an experienced GPU and network systems Solutions Architect & Engineer . Do you want to be part of a team that brings new Artificial ... Intelligence ( AI ) hardware and software technologies to production in customer...GPU server and networking system deployments as Solution Architect Engineer . Guide customer discussions on network design,… more
    NVIDIA (06/05/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineer , SystemML - AI

    Meta (Menlo Park, CA)
    …space of GenAI/LLM scaling reliability and performance. **Required Skills:** Software Engineer , SystemML - AI Networking Responsibilities: 1. Tech-leading the ... this role, you will be a member of the AI Networking Software team and part of the bigger...Library), which enables multi-GPU and multi-node data communication through HPC -style collectives. NCCL has been integrated into PyTorch and… more
    Meta (04/22/25)
    - Save Job - Related Jobs - Block Source
  • Hardware Systems Engineer , AI

    Meta (Menlo Park, CA)
    …in exploring, developing and productizing high-performance software and hardware technologies for AI at datacenter scale. Hardware Systems Engineer in RTP work ... and optimize these systems in production. **Required Skills:** Hardware Systems Engineer , AI Systems Responsibilities: 1. Interface with external vendors… more
    Meta (05/24/25)
    - Save Job - Related Jobs - Block Source
  • Sr. Hardware Dev Engineer (AWS Generative…

    Amazon (Cupertino, CA)
    …and operating AWS cloud offerings that enable high performance and scalability in AI /ML and HPC workloads. AWS Infrastructure Services owns the design, planning, ... Do you want to build the backbone of Generative AI cloud at AWS? Do you want to build...You'll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers,… more
    Amazon (03/25/25)
    - Save Job - Related Jobs - Block Source
  • AI Applications Engineer

    quadric.io, Inc (Burlingame, CA)
    …(GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint ... or autonomous vehicle systems. Unlike other NPUs or neural network accelerators in the industry today that can only...C++ DSP and control code. Role: The Corporate Applications Engineer is the key bridge between development engineering and… more
    quadric.io, Inc (06/13/25)
    - Save Job - Related Jobs - Block Source
  • Production Systems Engineer , Sustaining

    Meta (Menlo Park, CA)
    …hardware requirements and specifications (eg, configuring hardware components, GPU, memory, network for AI / HPC workloads) **Public Compensation:** ... **Summary:** Meta is seeking an experienced Production Systems Engineer to join our Release to Production (RTP)...Responsibilities: 1. Develop robust, industry leading practices for supporting AI / HPC infrastructure at scale 2. Interface with… more
    Meta (05/20/25)
    - Save Job - Related Jobs - Block Source
  • R&D Applications Engineer

    Broadcom (San Jose, CA)
    …the latest Broadcom switch platforms and emerging network technologies optimized for AI and HPC workloads. + Contribute to hardware and low-level software ... Broadcom high-speed Ethernet switch solutions, specifically designed to accelerate AI /ML and High-Performance Computing ( HPC ) workloads. Our products… more
    Broadcom (05/21/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineer , Accelerator Systems…

    Meta (Menlo Park, CA)
    HPC hardware requirements and specifications (eg, configuring hardware components, GPU, memory, network for AI / HPC workloads). 14. Understanding of the ... Qualifications:** Preferred Qualifications: 11. Full-stack experience and understanding of AI / HPC systems, from HW/infrastructure through the application layer,… more
    Meta (05/01/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Development Engineer

    Amazon (Cupertino, CA)
    …have extensive experience in low-latency networking and collective operations, such as HPC network fabric or machine learning accelerator cluster systems. Also ... on building networking solutions that for Machine Learning (ML) and High-Performance Computing ( HPC ) workloads on AWS. We are seeking an experienced engineer more
    Amazon (06/04/25)
    - Save Job - Related Jobs - Block Source
  • Sr Staff Engineer , ML Infrastructure…

    LinkedIn (Mountain View, CA)
    LinkedIn is the world's largest professional network , built to create economic opportunity for every member of the global workforce. Our products help people make ... About the Role We are seeking a Senior Staff Engineer to design, build, and maintain our large-scale GPU...our large-scale GPU infrastructure for machine learning (ML) and AI workloads. In this role, you will be the… more
    LinkedIn (04/18/25)
    - Save Job - Related Jobs - Block Source
  • Software Engineer , SystemML - Scaling…

    Meta (Menlo Park, CA)
    **Summary:** In this role, you will be a member of the Network . AI Software team and part of the bigger DC networking organization. The team develops and owns the ... Communications Library), which enables multi-GPU and multi-node data communication through HPC -style collectives. NCCL has been integrated into PyTorch and is on… more
    Meta (04/18/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software QA Test Development…

    NVIDIA (Santa Clara, CA)
    …GPU Computing. We are passionate about markets include gaming, automotive, vision, HPC , datacenters and networking in addition to our traditional OEM business. ... NVIDIA is also well positioned as the ' AI Computing Company', and NVIDIA GPUs are the brains...test cases automation + Strong experience in FW, BMC/OpenBMC, Network protocol, internal/external enterprise storage devices, PCIe buses and… more
    NVIDIA (06/10/25)
    - Save Job - Related Jobs - Block Source
  • Senior System Software Engineer - Cloud…

    NVIDIA (Santa Clara, CA)
    …Contributions to open-source projects as well as experience supporting large scale AI / HPC compute with hardware acceleration (GPU, DPU) and strong communication ... We are looking for a Senior System Software Engineer , Cloud Networking to design, prototype, implement and...gaming market. More recently, GPU deep learning ignited modern AI - the next era of computing - with… more
    NVIDIA (06/13/25)
    - Save Job - Related Jobs - Block Source
  • Signal and Power Integrity Engineer

    Google (Sunnyvale, CA)
    …unparalleled performance, efficiency, and integration. As a Signal Integrity/Power Integrity Engineer , you will lead chip and package design, ensuring optimal Signal ... chip and advanced packages. The ML, Systems, & Cloud AI (MSCA) organization at Google designs, implements, and manages...from developing our latest TPUs to running a global network , while driving towards shaping the future of hyperscale… more
    Google (06/11/25)
    - Save Job - Related Jobs - Block Source
  • Sr Staff FAE

    Broadcom (San Jose, CA)
    …End-2-End congestion techniques, working experience on debugging Embedded Software, knowledge of HPC and AI /ML data center operational models, deep knowledge of ... please Sign-In before you apply.** **Job Description:** Software Field Applications Engineer (FAE) is software technical lead for Broadcom ethernet controllers/… more
    Broadcom (05/13/25)
    - Save Job - Related Jobs - Block Source
  • Principal Product Manager Tech, dbrown Team

    Amazon (Sunnyvale, CA)
    …digital transformation across several customer workloads including AI /ML, generative AI , databases, Big Data analytics, SAP, HPC , Edge, and more. ... across a range of EC2 products across compute, storage, network and accelerated computing. You will be responsible for...will help each team member develop into a better-rounded engineer and enable them to take on more complex… more
    Amazon (05/16/25)
    - Save Job - Related Jobs - Block Source