• Senior AI - HPC

    NVIDIA (Santa Clara, CA)
    …intelligence. Make the choice to join us today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership in the design and implementation ... years of experience designing and operating large scale compute infrastructure + Experience with AI / HPC advanced job schedulers, such as Slurm, K8s, RTDA or LSF… more
    NVIDIA (04/02/25)
    - Save Job - Related Jobs - Block Source
  • Senior AI - HPC Storage…

    NVIDIA (Santa Clara, CA)
    …solutions on any of the leading Cloud environment [AWS, Azure or GCP] + Experience with AI / HPC cluster job schedulers such as SLURM, LSF + In depth ... InfiniBand with IBOIP and RDMA + Background with Software Defined Networking and AI / HPC cluster networking + Familiarity with deep learning frameworks like… more
    NVIDIA (05/07/25)
    - Save Job - Related Jobs - Block Source
  • Senior Observability Architect, AI

    NVIDIA (Santa Clara, CA)
    …, HW, and SW engineering and research teams to define a vision and roadmap for AI / HPC cluster observability. + Architect and lead teams to develop, test, and ... NVIDIA's Hardware Infrastructure organization is seeking a Senior or Princip al Data and Observability Architect....vision and roadmap for distributed observability systems for large-scale AI and HPC clusters and workloads and… more
    NVIDIA (05/15/25)
    - Save Job - Related Jobs - Block Source
  • Senior HPC Engineer, Infrastructure…

    NVIDIA (Santa Clara, CA)
    NVIDIA is looking for a Senior HPC Engineer to join its...the team building many of the largest and fastest AI / HPC systems in the world! NVIDIA is ... customers, partners and internal teams to analyze, define, and implement large-scale AI / HPC projects. These efforts include a combination of networking, system… more
    NVIDIA (06/12/25)
    - Save Job - Related Jobs - Block Source
  • Senior Site Reliability Engineer,…

    NVIDIA (Santa Clara, CA)
    …a variety of HPC or EDA workloads. + Solid understanding of cluster configuration managements tools such as Ansible. + Proficiency in Perl for maintaining legacy ... NVIDIA is the leader in AI , machine learning and datacenter acceleration. NVIDIA is...and support workload and resource schedulers in a large-scale HPC environment. + Automate Everything: Develop automation scripts to… more
    NVIDIA (04/04/25)
    - Save Job - Related Jobs - Block Source
  • Senior Solutions Architect, HPC

    NVIDIA (Santa Clara, CA)
    …Do you want to be part of a team that brings new Artificial Intelligence ( AI ) hardware and software technologies to production in customer data centers? As part of ... What you will be doing: + Working with NVIDIA AI Native, Consumer Internet and Enterprise customers on large...on network design, compute/storage and support bring up of server/network/ cluster deployments. You will need to visit customer data… more
    NVIDIA (06/05/25)
    - Save Job - Related Jobs - Block Source
  • Senior Site Reliability Engineer…

    NVIDIA (Santa Clara, CA)
    …Make the choice, join our diverse team today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership in the design and implementation of ... You will also be maintaining and building deep learning AI - HPC GPU clusters at scale and supporting...cluster . + Deep understanding of GPU computing and AI infrastructure. + Passion for solving complex technical challenges… more
    NVIDIA (03/26/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Development Engineer,…

    Amazon (Cupertino, CA)
    …have extensive experience in low-latency networking and collective operations, such as HPC network fabric or machine learning accelerator cluster systems. Also ... solutions that for Machine Learning (ML) and High-Performance Computing ( HPC ) workloads on AWS. We are seeking an experienced...and TPUs. This role is on the forefront of AI /ML, we spend a good deal of the day… more
    Amazon (06/04/25)
    - Save Job - Related Jobs - Block Source
  • Senior Research Engineer, Foundation Model…

    NVIDIA (Santa Clara, CA)
    NVIDIA is searching for a senior or principal engineer who specializes in building cutting-edge infrastructure for large-scale foundation model training in the ... works on multimodal foundation models, large-scale robot learning, embodied AI , and physics simulation. Our past projects include Eureka… more
    NVIDIA (06/07/25)
    - Save Job - Related Jobs - Block Source
  • Senior DevOps and Automation Engineer,…

    NVIDIA (Santa Clara, CA)
    …clusters connected via NVLink and InfiniBand, supporting some of the most advanced workloads in HPC and AI . What You'll Be Doing: + Drive robust CI/CD workflows ... DevOps technologies to automate software updates, perform system maintenance, and monitor cluster health and availability. + Own and resolve daily operational issues… more
    NVIDIA (06/11/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software QA Test Development…

    NVIDIA (Santa Clara, CA)
    …telemetries, scale out cluster , test plan development, track record in developing AI tools and NLP, DevOps, CI/CD experience to join our platform SWQA team. What ... We are passionate about markets include gaming, automotive, vision, HPC , datacenters and networking in addition to our traditional...OEM business. NVIDIA is also well positioned as the ' AI Computing Company', and NVIDIA GPUs are the brains… more
    NVIDIA (06/10/25)
    - Save Job - Related Jobs - Block Source