• AI / HPC Network

    Meta (Menlo Park, CA)
    … fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Network Engineer Responsibilities: 1. Design, develop, ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...daily basis. We need to build and evolve our network infrastructure that connects myriads of training accelerators like… more
    Meta (04/16/24)
    - Save Job - Related Jobs - Block Source
  • AI / HPC Systems Performance…

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. Lead ... 5. Work with cross functional teams and provide guidance on the AI network architecture including topologies, transport, congestion control techniques. **Minimum… more
    Meta (04/25/24)
    - Save Job - Related Jobs - Block Source
  • AI / HPC Systems Performance…

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. Active ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...daily basis. We need to build and evolve our network infrastructure that connects myriads of training accelerators like… more
    Meta (05/11/24)
    - Save Job - Related Jobs - Block Source
  • Senior HPC Performance Engineer

    NVIDIA (Santa Clara, CA)
    …UCX for Deep Learning and HPC . We are looking for a motivated Performance engineer to influence the roadmap of our communication libraries. The DL and HPC ... are even higher at huge scales! This is an outstanding opportunity for someone with HPC and performance background to advance the state of the art in this space. Are… more
    NVIDIA (05/04/24)
    - Save Job - Related Jobs - Block Source
  • HPC Operations Manager - Hardware…

    NVIDIA (Santa Clara, CA)
    …to support their future chip design needs, understand their workflow characteristics, and engineer an efficient HPC environment. Work with IT and engineering ... intelligence to autonomous cars. We are now looking for a highly motivated HPC Operations Manager to join this multifaceted and innovative infrastructure team to… more
    NVIDIA (03/13/24)
    - Save Job - Related Jobs - Block Source
  • Sr. Systems Development Engineer (AWS…

    Amazon (Cupertino, CA)
    …delivering and operating AWS cloud offerings that enable high performance and scalability in AI /ML and HPC workloads. You are intrigued by the continuous release ... Do you want to build the backbone of Generative AI cloud at AWS? Do you want to build...You'll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers,… more
    Amazon (04/12/24)
    - Save Job - Related Jobs - Block Source
  • GPU Computing Capacity Optimization…

    NVIDIA (Santa Clara, CA)
    …intelligence. Make the choice to join us today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership in the design and implementation ... usage of all datacenter resources including compute , storage, network and power. You will help build methodologies, tools...Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Working knowledge of cluster… more
    NVIDIA (05/04/24)
    - Save Job - Related Jobs - Block Source
  • Field Application Engineer - CSP

    Lenovo (San Jose, CA)
    …strong technical background in datacenter infrastructure and proven history of success selling HPC , AI , and GPU solutions. This candidate should also have ... Field Application Engineer - CSP **General Information** Req # WD00063015...Strong problem-solving and analytical abilities + Strong knowledge of AI / HPC , including both software frameworks and hardware.… more
    Lenovo (04/13/24)
    - Save Job - Related Jobs - Block Source
  • Software Engineer , SystemML - Scaling…

    Meta (Menlo Park, CA)
    **Summary:** In this role, you will be a member of the Network . AI Software team and part of the bigger DC networking organization. The team develops and owns the ... Communications Library), which enables multi-GPU and multi-node data communication through HPC -style collectives. NCCL has been integrated into PyTorch and is on… more
    Meta (04/12/24)
    - Save Job - Related Jobs - Block Source
  • Software Engineer , SystemML - Scaling…

    Meta (Menlo Park, CA)
    **Summary:** In this role, you will be a member of the Network . AI Software team and part of the bigger DC networking organization. The team develops and owns the ... Communications Library), which enables multi-GPU and multi-node data communication through HPC -style collectives. NCCL has been integrated into PyTorch and is on… more
    Meta (04/23/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer , GPU…

    NVIDIA (Santa Clara, CA)
    …wave of artificial intelligence. We are looking for a highly motivated senior software engineer for an exciting role in our communication libraries and network ... for Deep Learning frameworks (eg NCCL for TensorFlow/Pytorch) and HPC programming interfaces (eg UCX for MPI/OpenSHMEM) on GPU...are growing fast. If you're a creative and autonomous engineer with real passion for technology, we want to… more
    NVIDIA (04/16/24)
    - Save Job - Related Jobs - Block Source
  • Senior Infrastructure Performance and Development…

    NVIDIA (Santa Clara, CA)
    Joining NVIDIA's AI Efficiency Team means contributing to the infrastructure that powers our leading-edge AI research. This team focuses on optimizing efficiency ... and resiliency of ML workloads, as well as developing scalable AI infrastructure tools and services. Our objective is to deliver a stable, scalable environment for… more
    NVIDIA (04/16/24)
    - Save Job - Related Jobs - Block Source
  • Senior Solution Engineer , Networking

    NVIDIA (Santa Clara, CA)
    We are looking for an experienced software engineer who excels solving customer problems with DGX clusters (GPU-based supercomputers interconnected with InfiniBand ... network fabric). The ideal candidate also has experience developing...MOFED, RDMA, ROCE and GPU Technology + Clustering or HPC Data-Center technologies including Upper Layer Protocols (ie, NCCL,… more
    NVIDIA (05/04/24)
    - Save Job - Related Jobs - Block Source
  • Senior Math Libraries Engineer , Iterative…

    NVIDIA (Santa Clara, CA)
    We are looking for a software engineer for our Sparse Linear Algebra team which develops key technologies and libraries such as cuSOLVER, cuSPARSE, cuDSS, and AmgX, ... come and join our team! What you will be doing: + developing scalable HPC math library software for various numerical methods including but not limited to sparse… more
    NVIDIA (03/05/24)
    - Save Job - Related Jobs - Block Source
  • Senior CPU Tooling and Design Automation…

    NVIDIA (Santa Clara, CA)
    …the development of CPU technology for architectures used for artificial intelligence ( AI ) / deep learning (DL), high-performance computing ( HPC ), cloud service ... for the CPU/Fabric teams to improve efficiency + Develop Network -on-Chip (NoC)/SoC tooling, working closely with architects, chip leads...challenges no one else can solve. Our work in AI and the metaverse is transforming the world's largest… more
    NVIDIA (04/16/24)
    - Save Job - Related Jobs - Block Source