• Network Engineer , HPC

    Meta (Menlo Park, CA)
    …our network engineering teams is for you! **Required Skills:** Network Engineer , HPC Systems Network Strategy Responsibilities: 1. Design, ... you will be responsible for conceiving, developing, and deploying software, hardware and network systems and tools that improve reliability and efficiency in our… more
    Meta (04/27/24)
    - Save Job - Related Jobs - Block Source
  • AI/ HPC Network Engineer

    Meta (Menlo Park, CA)
    … fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI/ HPC Network Engineer Responsibilities: 1. Design, develop, test ... networking and its associated infrastructure 4. Define and develop optimized network monitoring systems 5. Be oncall to learn from real world production… more
    Meta (04/16/24)
    - Save Job - Related Jobs - Block Source
  • Senior HPC Performance Engineer

    NVIDIA (Santa Clara, CA)
    …UCX for Deep Learning and HPC . We are looking for a motivated Performance engineer to influence the roadmap of our communication libraries. The DL and HPC ... are even higher at huge scales! This is an outstanding opportunity for someone with HPC and performance background to advance the state of the art in this space. Are… more
    NVIDIA (05/04/24)
    - Save Job - Related Jobs - Block Source
  • Software Development Engineer , HPC

    Amazon (Cupertino, CA)
    …to improve network speed and performance so that customers can scale their network intensive ML and HPC workloads over thousands of CPUs, GPUs and TPUs ... blocks for EC2 infrastructure. We specialize in designing software, systems and chips that optimize the AWS customer experience....experience. More and more customers run their ML and HPC workloads on AWS to reap the benefits of… more
    Amazon (05/10/24)
    - Save Job - Related Jobs - Block Source
  • AI/ HPC Systems Performance…

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI/ HPC Systems Performance Engineer Responsibilities: 1. Active ... daily basis. We need to build and evolve our network infrastructure that connects myriads of training accelerators like...a loss-less fabric interconnect. To improve performance of these systems we constantly look for opportunities across stack: … more
    Meta (05/11/24)
    - Save Job - Related Jobs - Block Source
  • AI/ HPC Systems Performance…

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI/ HPC Systems Performance Engineer Responsibilities: 1. Lead ... deal with on a daily basis. We need to build and evolve our network infrastructure that connects myriads of training accelerators like GPUs together. In addition, we… more
    Meta (04/25/24)
    - Save Job - Related Jobs - Block Source
  • IT InfiniBand/GPU -Sr Staff Systems

    Cadence Design Systems, Inc. (San Jose, CA)
    …impact on the world of technology. Cadence is looking for a Sr Staff Systems Engineer who accelerates strategic customer deployments and ensures on-time bring-up ... and deployment of HPC infrastructure and troubleshooting...tasks such as shell scripting. Role: IT -Sr Staff Systems Engineer Location on-site (not remote): San… more
    Cadence Design Systems, Inc. (03/01/24)
    - Save Job - Related Jobs - Block Source
  • GPU Computing Capacity Optimization…

    NVIDIA (Santa Clara, CA)
    …InfiniBand with IBOP and RDMA as well as u nderstanding of fast, distributed storage systems like Lustre and GPFS for AI/ HPC workloads + Familiarity with deep ... join us today! As a member of the GPU AI/ HPC Infrastructure team, you will provide leadership in the...usage of all datacenter resources including compute , storage, network and power. You will help build methodologies, tools… more
    NVIDIA (05/04/24)
    - Save Job - Related Jobs - Block Source
  • Sr. Systems Development Engineer

    Amazon (Cupertino, CA)
    …SRE (Site Reliability Engineering), or Resilience Engineering - 7+ years of SysDE ( Systems Development Engineer ) or equivalent experience - 7+ years of server ... that enable high performance and scalability in AI/ML and HPC workloads. You are intrigued by the continuous release...have tremendous interest in cloud scale and curious how systems and software decisions impact the user. You insist… more
    Amazon (04/12/24)
    - Save Job - Related Jobs - Block Source
  • Research Data Center Engineer

    Stanford University (Stanford, CA)
    …Stanford and SLAC organizations. The primary data center facilities that hosts the HPC systems themselves are the Stanford Research Computing Facility (SRCF1). ... Research Data Center Engineer **Business Affairs: University IT (UIT), Stanford, California,...Center (Research Computing). Research Computing offers High Performance Computing ( HPC ) hosting services, computational and data systems ,… more
    Stanford University (05/01/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer , GPU…

    NVIDIA (Santa Clara, CA)
    …wave of artificial intelligence. We are looking for a highly motivated senior software engineer for an exciting role in our communication libraries and network ... crew that develops and maintains software for complex heterogeneous computing systems that power disruptive products in High Performance Computing and Deep… more
    NVIDIA (04/16/24)
    - Save Job - Related Jobs - Block Source
  • Software Engineer , SystemML - Scaling…

    Meta (Menlo Park, CA)
    **Summary:** In this role, you will be a member of the Network .AI Software team and part of the bigger DC networking organization. The team develops and owns the ... Communications Library), which enables multi-GPU and multi-node data communication through HPC -style collectives. NCCL has been integrated into PyTorch and is on… more
    Meta (04/12/24)
    - Save Job - Related Jobs - Block Source
  • Software Engineer - Data Center Networking

    Meta (Menlo Park, CA)
    …Meta's global data center networks. Our work covers the entire network lifecycle, including hardware development, capacity planning, distributed and centralized ... control systems , modeling/provisioning/automation, monitoring/troubleshooting/analytics, and simulation/design/failure analysis. We are actively seeking Software… more
    Meta (05/01/24)
    - Save Job - Related Jobs - Block Source
  • Senior Solution Engineer , Networking

    NVIDIA (Santa Clara, CA)
    …protocols (ARP, STP, LACP, MLAG, IGMP, PIM, BGP, OSPF) within drivers/ network operating systems + Experience analyzing vendor interoperability problems ... We are looking for an experienced software engineer who excels solving customer problems with DGX clusters (GPU-based supercomputers interconnected with InfiniBand … more
    NVIDIA (05/04/24)
    - Save Job - Related Jobs - Block Source
  • Software Engineer - Networking

    Meta (Menlo Park, CA)
    …Meta's global data center networks. Our work covers the entire network lifecycle, including hardware development, capacity planning, distributed and centralized ... control systems , modeling/provisioning/automation, monitoring/troubleshooting/analytics, and simulation/design/failure analysis. We are actively seeking Software… more
    Meta (04/28/24)
    - Save Job - Related Jobs - Block Source
  • Field Application Engineer (Machine…

    quadric.io, Inc (Burlingame, CA)
    …battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems . Unlike other NPUs or neural network accelerators in the ... co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of...C++ DSP and control code. Role: The Field Application Engineer (FAE) will work closely with Business Development, Product,… more
    quadric.io, Inc (05/07/24)
    - Save Job - Related Jobs - Block Source
  • Principal Service Engineer

    Microsoft Corporation (Mountain View, CA)
    …or related field AND 4+ years technical experience in software engineering, network engineering, service engineering, or systems engineering o OR equivalent ... and sustainability related to Microsoft cloud hardware. We are looking for an engineer who is customer-focused with the insight and industry knowledge to envision… more
    Microsoft Corporation (05/11/24)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer , Annapurna Labs

    Amazon (Cupertino, CA)
    …or any application where networking performance matters greatly. Knowledge of embedded systems , transport protocols, network fabrics and kernel/device drivers is ... critical building blocks for EC2 infrastructure. We specialize in designing software, systems and chips that optimize the AWS customer experience. This team works… more
    Amazon (04/03/24)
    - Save Job - Related Jobs - Block Source
  • ML Compiler Engineer , XLA TPU Compiler…

    Google (Sunnyvale, CA)
    …or algorithms. + 2 years of experience with performance optimization, systems data analysis, visualization tools, or debugging. + Experience using compilers ... PhD in Computer Science or related technical fields. + Experience as an engineer in compilers. + Machine Learning, Parallel Computing and High Performance Computing… more
    Google (04/27/24)
    - Save Job - Related Jobs - Block Source
  • Senior Math Libraries Engineer , Iterative…

    NVIDIA (Santa Clara, CA)
    We are looking for a software engineer for our Sparse Linear Algebra team which develops key technologies and libraries such as cuSOLVER, cuSPARSE, cuDSS, and AmgX, ... come and join our team! What you will be doing: + developing scalable HPC math library software for various numerical methods including but not limited to sparse… more
    NVIDIA (03/05/24)
    - Save Job - Related Jobs - Block Source