• Network Engineer , HPC

    Meta (Menlo Park, CA)
    …our network engineering teams is for you! **Required Skills:** Network Engineer , HPC Systems Network Strategy Responsibilities: 1. Design, ... you will be responsible for conceiving, developing, and deploying software, hardware and network systems and tools that improve reliability and efficiency in our… more
    Meta (04/27/24)
    - Save Job - Related Jobs - Block Source
  • Senior HPC Performance Engineer

    NVIDIA (Santa Clara, CA)
    …UCX for Deep Learning and HPC . We are looking for a motivated Performance engineer to influence the roadmap of our communication libraries. The DL and HPC ... are even higher at huge scales! This is an outstanding opportunity for someone with HPC and performance background to advance the state of the art in this space. Are… more
    NVIDIA (05/04/24)
    - Save Job - Related Jobs - Block Source
  • Sr. Software Development Engineer

    Amazon (Cupertino, CA)
    …to improve network speed and performance so that customers can scale their network intensive ML and HPC workloads over thousands of CPUs, GPUs and TPUs ... blocks for EC2 infrastructure. We specialize in designing software, systems and chips that optimize the AWS customer experience....experience. More and more customers run their ML and HPC workloads on AWS to reap the benefits of… more
    Amazon (04/08/24)
    - Save Job - Related Jobs - Block Source
  • AI/ HPC Systems Performance…

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI/ HPC Systems Performance Engineer Responsibilities: 1. Active ... daily basis. We need to build and evolve our network infrastructure that connects myriads of training accelerators like...a loss-less fabric interconnect. To improve performance of these systems we constantly look for opportunities across stack: … more
    Meta (05/11/24)
    - Save Job - Related Jobs - Block Source
  • AI/ HPC Systems Performance…

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI/ HPC Systems Performance Engineer Responsibilities: 1. Lead ... deal with on a daily basis. We need to build and evolve our network infrastructure that connects myriads of training accelerators like GPUs together. In addition, we… more
    Meta (04/25/24)
    - Save Job - Related Jobs - Block Source
  • IT InfiniBand/GPU -Sr Staff Systems

    Cadence Design Systems, Inc. (San Jose, CA)
    …impact on the world of technology. Cadence is looking for a Sr Staff Systems Engineer who accelerates strategic customer deployments and ensures on-time bring-up ... and deployment of HPC infrastructure and troubleshooting...tasks such as shell scripting. Role: IT -Sr Staff Systems Engineer Location on-site (not remote): San… more
    Cadence Design Systems, Inc. (03/01/24)
    - Save Job - Related Jobs - Block Source
  • GPU Computing Capacity Optimization…

    NVIDIA (Santa Clara, CA)
    …InfiniBand with IBOP and RDMA as well as u nderstanding of fast, distributed storage systems like Lustre and GPFS for AI/ HPC workloads + Familiarity with deep ... join us today! As a member of the GPU AI/ HPC Infrastructure team, you will provide leadership in the...usage of all datacenter resources including compute , storage, network and power. You will help build methodologies, tools… more
    NVIDIA (05/04/24)
    - Save Job - Related Jobs - Block Source
  • Sr. Systems Development Engineer

    Amazon (Cupertino, CA)
    …SRE (Site Reliability Engineering), or Resilience Engineering - 7+ years of SysDE ( Systems Development Engineer ) or equivalent experience - 7+ years of server ... that enable high performance and scalability in AI/ML and HPC workloads. You are intrigued by the continuous release...have tremendous interest in cloud scale and curious how systems and software decisions impact the user. You insist… more
    Amazon (04/12/24)
    - Save Job - Related Jobs - Block Source
  • Research Data Center Engineer

    Stanford University (Stanford, CA)
    …Stanford and SLAC organizations. The primary data center facilities that hosts the HPC systems themselves are the Stanford Research Computing Facility (SRCF1). ... Research Data Center Engineer **Business Affairs: University IT (UIT), Stanford, California,...Center (Research Computing). Research Computing offers High Performance Computing ( HPC ) hosting services, computational and data systems ,… more
    Stanford University (05/01/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer , GPU…

    NVIDIA (Santa Clara, CA)
    …wave of artificial intelligence. We are looking for a highly motivated senior software engineer for an exciting role in our communication libraries and network ... crew that develops and maintains software for complex heterogeneous computing systems that power disruptive products in High Performance Computing and Deep… more
    NVIDIA (04/16/24)
    - Save Job - Related Jobs - Block Source
  • Software Engineer , SystemML - Scaling…

    Meta (Menlo Park, CA)
    **Summary:** In this role, you will be a member of the Network .AI Software team and part of the bigger DC networking organization. The team develops and owns the ... Communications Library), which enables multi-GPU and multi-node data communication through HPC -style collectives. NCCL has been integrated into PyTorch and is on… more
    Meta (04/12/24)
    - Save Job - Related Jobs - Block Source
  • Software Engineer - Data Center Networking

    Meta (Menlo Park, CA)
    …Meta's global data center networks. Our work covers the entire network lifecycle, including hardware development, capacity planning, distributed and centralized ... control systems , modeling/provisioning/automation, monitoring/troubleshooting/analytics, and simulation/design/failure analysis. We are actively seeking Software… more
    Meta (05/01/24)
    - Save Job - Related Jobs - Block Source
  • Senior Solution Engineer , Networking

    NVIDIA (Santa Clara, CA)
    …protocols (ARP, STP, LACP, MLAG, IGMP, PIM, BGP, OSPF) within drivers/ network operating systems + Experience analyzing vendor interoperability problems ... We are looking for an experienced software engineer who excels solving customer problems with DGX clusters (GPU-based supercomputers interconnected with InfiniBand … more
    NVIDIA (05/04/24)
    - Save Job - Related Jobs - Block Source
  • Software Engineer - Networking

    Meta (Menlo Park, CA)
    …Meta's global data center networks. Our work covers the entire network lifecycle, including hardware development, capacity planning, distributed and centralized ... control systems , modeling/provisioning/automation, monitoring/troubleshooting/analytics, and simulation/design/failure analysis. We are actively seeking Software… more
    Meta (04/28/24)
    - Save Job - Related Jobs - Block Source
  • Senior Math Libraries Engineer , Iterative…

    NVIDIA (Santa Clara, CA)
    We are looking for a software engineer for our Sparse Linear Algebra team which develops key technologies and libraries such as cuSOLVER, cuSPARSE, cuDSS, and AmgX, ... come and join our team! What you will be doing: + developing scalable HPC math library software for various numerical methods including but not limited to sparse… more
    NVIDIA (03/05/24)
    - Save Job - Related Jobs - Block Source
  • Senior Infrastructure Performance and Development…

    NVIDIA (Santa Clara, CA)
    …+ Adapt and enhance communication libraries to seamlessly support innovative network topologies and system architectures. + Design or adapt optimized storage ... environments. + Strong background in parallel programming and distributed systems + Experience analyzing and optimizing large scale distributed applications.… more
    NVIDIA (04/16/24)
    - Save Job - Related Jobs - Block Source