• Cluster Deployment Operations…

    NVIDIA (Santa Clara, CA)
    …people to make them operational in production? We are seeking a dedicated Cluster Deployment Operations Engineer to support product deployments and issues by ... years of experience in at least two of the following: HPC/large-scale cluster administration, Linux systems engineering, infrastructure automation (eg, Ansible,… more
    NVIDIA (12/18/25)
    - Save Job - Related Jobs - Block Source
  • Senior Platform and EngOps Engineer

    NVIDIA (Santa Clara, CA)
    …DevOps tools to automate software updates, perform maintenance tasks, and monitor cluster availability, ensuring seamless operations. + Take ownership of daily ... cluster failures and issues, troubleshooting them promptly to maintain...in deploying and administrating clusters, servers, switches, and related infrastructure . + Automation expert with hands on skills in… more
    NVIDIA (11/01/25)
    - Save Job - Related Jobs - Block Source
  • AI and ML HPC Cluster Engineer

    NVIDIA (Santa Clara, CA)
    …the world's most advanced computing workloads. NVIDIA is looking for an AI/ML HPC Cluster Engineer to join our MARS team. You will provide technical engagement ... mission, our team, Managed AI Superclusters (MARS) builds and scales the infrastructure , platforms, and tools that enable researchers and engineers to develop the… more
    NVIDIA (01/10/26)
    - Save Job - Related Jobs - Block Source
  • Senior HPC Cluster Engineer - EDA

    NVIDIA (Santa Clara, CA)
    …lasting impact on the world. We are seeking a highly skilled and experienced HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters for EDA and ... of 5 years of proven experience crafting and operating large scale compute infrastructure , including cluster configuration managements tools such as BCM or… more
    NVIDIA (12/10/25)
    - Save Job - Related Jobs - Block Source
  • Senior AI and ML HPC Cluster

    NVIDIA (Santa Clara, CA)
    …Make the choice to join us today! As a member of the GPU AI/HPC Infrastructure team, you will provide leadership in the design and implementation of ground breaking ... + Minimum 5+ years of experience designing and operating large scale compute infrastructure + Experience with AI/HPC advanced job schedulers, such as Slurm, K8s,… more
    NVIDIA (01/10/26)
    - Save Job - Related Jobs - Block Source
  • Senior AI-HPC Cluster Engineer

    NVIDIA (Santa Clara, CA)
    …Minimum of 6 years of experience crafting and operating large scale compute infrastructure . + Experience with AI/HPC job schedulers and orchestrators, such as Slurm, ... staying ahead of new technologies and effective approaches in the HPC and AI/ML infrastructure fields. Ways to stand out from the crowd: + Experience with NVIDIA… more
    NVIDIA (10/30/25)
    - Save Job - Related Jobs - Block Source
  • Senior GPU and HPC Infrastructure

    NVIDIA (Santa Clara, CA)
    NVIDIA is hiring engineers to scale up its AI Infrastructure . We expect you to have a strong programming background, knowledge of datacenter hardware, operations, ... help advance NVIDIA's capacity to build and deploy leading infrastructure solutions for a broad range of AI-based applications...multiple data streams, ranging from GPU hardware diagnostics to cluster and network telemetry. + Work on software that… more
    NVIDIA (01/08/26)
    - Save Job - Related Jobs - Block Source
  • Staff Software Engineer , AI…

    Google (Sunnyvale, CA)
    Staff Software Engineer , AI Infrastructure _corporate_fare_ Google _place_ Kirkland, WA, USA; Sunnyvale, CA, USA **Advanced** Experience owning outcomes and ... on and is growing every day. As a software engineer , you will work on a specific project critical...cluster interconnects and networking. We're building the AI infrastructure for the future, so if you are interested… more
    Google (01/10/26)
    - Save Job - Related Jobs - Block Source
  • Senior Infrastructure Software…

    NVIDIA (Santa Clara, CA)
    We are now looking for a Senior Infrastructure Software Engineer for Deep Learning Libraries! NVIDIA's Deep Learning Libraries Group is seeking excellent ... kernel libraries. The mission is to design and develop scalable, modular infrastructure that streamlines development, builds, and tests across NVIDIA's diverse set… more
    NVIDIA (12/13/25)
    - Save Job - Related Jobs - Block Source
  • Senior Research Engineer , Foundation Model…

    NVIDIA (Santa Clara, CA)
    NVIDIA is searching for a senior or principal engineer who specializes in building cutting-edge infrastructure for large-scale foundation model training in the ... to support multi-modal foundation models for robotics. + Optimize GPU and cluster utilization for efficient model training and fine-tuning on massive datasets. +… more
    NVIDIA (12/05/25)
    - Save Job - Related Jobs - Block Source
  • Staff Software Engineer , AI/ML…

    Google (Sunnyvale, CA)
    Staff Software Engineer , AI/ML Infrastructure _corporate_fare_ Google _place_ Kirkland, WA, USA; Sunnyvale, CA, USA **Advanced** Experience owning outcomes and ... on and is growing every day. As a software engineer , you will work on a specific project critical...clusters using the latest technologies for AI acceleration and cluster interconnects and networking. The AI and Infrastructure more
    Google (01/10/26)
    - Save Job - Related Jobs - Block Source
  • (Senior) Software Engineer

    pony.ai (Fremont, CA)
    …public at NASDAQ in November 2024. Responsibilities As a (Senior) Kubernetes Engineer , you will: + Design, operate, and optimize Kubernetes clusters across hybrid ... CRDs, APIs) to automate and productize internal use cases. + Own cluster lifecycle management including upgrades, patching, configuration, and governance. + Define… more
    pony.ai (12/16/25)
    - Save Job - Related Jobs - Block Source
  • Staff Software Engineer - Compute…

    LinkedIn (Mountain View, CA)
    …needs of the team. Job Description As a staff member of the Compute Infrastructure team at LinkedIn, you will be charged with building the next-generation ... infrastructure and platforms for LinkedIn. This is a unique...multi-clusters solutions, automate upgrades, and intelligently detect and remediate cluster health, etc. In this role, you will be… more
    LinkedIn (11/21/25)
    - Save Job - Related Jobs - Block Source
  • Staff Software Engineer , Emerging On-prem…

    Google (Sunnyvale, CA)
    Staff Software Engineer , Emerging On-prem AI Infrastructure _corporate_fare_ Google _place_ Kirkland, WA, USA; Sunnyvale, CA, USA **Advanced** Experience owning ... + 5 years of experience building and developing large-scale infrastructure , distributed systems or networks, or experience with compute...on and is growing every day. As a software engineer , you will work on a specific project critical… more
    Google (12/18/25)
    - Save Job - Related Jobs - Block Source
  • Sr. Staff Software Engineer , Compute…

    LinkedIn (Mountain View, CA)
    …days, as determined by the business needs of the team. As a Sr. Staff Software Engineer of the Compute Infrastructure team at LinkedIn, you will play a crucial ... role in our ongoing efforts to re-architect our compute infrastructure stack. This is a high-profile, high-impact project that will touch every aspect of our… more
    LinkedIn (11/19/25)
    - Save Job - Related Jobs - Block Source
  • Principal Staff Software Engineer - Compute…

    LinkedIn (Sunnyvale, CA)
    …as determined by the business needs of the team. As a Principal Staff Software Engineer of the Compute Infrastructure team at LinkedIn, you will play a crucial ... role in our ongoing efforts to re-architect our compute infrastructure stack. This is a high-profile, high-impact project that touches every aspect of our… more
    LinkedIn (12/10/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer , Compute…

    NVIDIA (Santa Clara, CA)
    We are seeking a Senior Software Engineer to join a new team building the foundational infrastructure for Robotics Research. This new team will work very closely ... technology for humanoid robots. This particular position will focus on compute infrastructure . You will work with an amazing and collaborative research team that… more
    NVIDIA (01/10/26)
    - Save Job - Related Jobs - Block Source
  • Senior System Software Engineer , EDA…

    NVIDIA (Santa Clara, CA)
    …telemetry and data systems that provide real-time understandings of our sophisticated, distributed infrastructure . As an engineer on our team, you will play a ... data into actionable insights. You will architect, develop, and maintain infrastructure that supervises workload health, performance, and usage in critical… more
    NVIDIA (01/10/26)
    - Save Job - Related Jobs - Block Source
  • Principal, Software Engineer - Cloud…

    Walmart (Sunnyvale, CA)
    **Position Summary ** We are seeking a highly skilled Principal Engineer (Ceph/Scale-Out Storage) with 10years+ of deep technical experience in distributed storage ... intersection of distributed storage systems, Linux internals, networking, and cloud infrastructure , solving some of the toughest technical challenges in scalability,… more
    Walmart (11/20/25)
    - Save Job - Related Jobs - Block Source
  • Senior Site Reliability Engineer , BCM…

    NVIDIA (Santa Clara, CA)
    …varying from a few to several thousands of nodes, and streamlines cluster provisioning, workload management, and infrastructure monitoring. It provides all ... in providing excellent, comprehensive support to our customers! ​Sr Site Reliability Engineer in this role will significantly impact and contribute to the overall… more
    NVIDIA (10/28/25)
    - Save Job - Related Jobs - Block Source