- NVIDIA (Santa Clara, CA)
- …people to make them operational in production? We are seeking a dedicated Cluster Deployment Operations Engineer to support product deployments and issues by ... years of experience in at least two of the following: HPC/large-scale cluster administration, Linux systems engineering, infrastructure automation (eg, Ansible,… more
- NVIDIA (Santa Clara, CA)
- …DevOps tools to automate software updates, perform maintenance tasks, and monitor cluster availability, ensuring seamless operations. + Take ownership of daily ... cluster failures and issues, troubleshooting them promptly to maintain...in deploying and administrating clusters, servers, switches, and related infrastructure . + Automation expert with hands on skills in… more
- NVIDIA (Santa Clara, CA)
- …the world's most advanced computing workloads. NVIDIA is looking for an AI/ML HPC Cluster Engineer to join our MARS team. You will provide technical engagement ... mission, our team, Managed AI Superclusters (MARS) builds and scales the infrastructure , platforms, and tools that enable researchers and engineers to develop the… more
- NVIDIA (Santa Clara, CA)
- …lasting impact on the world. We are seeking a highly skilled and experienced HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters for EDA and ... of 5 years of proven experience crafting and operating large scale compute infrastructure , including cluster configuration managements tools such as BCM or… more
- Broadcom (Bellevue, WA)
- …you apply.** **Job Description:** **About Broadcom** Broadcom Inc. is a global infrastructure technology leader built on 50 years of innovation, collaboration, and ... We design, develop, and supply a broad range of semiconductor and infrastructure software solutions. Our category-leading product portfolios serve the world's most… more
- Oracle (Seattle, WA)
- **Job Description** OCI AI Infrastructure is at the forefront of building cutting-edge GPU supercomputers that scale to tens of thousands of GPUs without ... team strives to be the go-to experts on RDMA cluster architecture and its relationship to AI/ML/HPC performance. We...+ Troubleshoot performance problems on RDMA clusters and perform cluster performance validation, including on very novel and not… more
- Bloomberg (New York, NY)
- Senior Software Engineer - Market Data Platform, Cluster Management Location New York Business Area Engineering and CTO Ref # 10046371 **Description & ... in it for you:** As a Market Data Platform engineer , you will: + Get hands-on experience working on...and diagnosing unexpected issues in production. The market data infrastructure you'll help build and improve is mission-critical for… more
- NVIDIA (Santa Clara, CA)
- …Make the choice to join us today! As a member of the GPU AI/HPC Infrastructure team, you will provide leadership in the design and implementation of ground breaking ... + Minimum 5+ years of experience designing and operating large scale compute infrastructure + Experience with AI/HPC advanced job schedulers, such as Slurm, K8s,… more
- NVIDIA (Santa Clara, CA)
- …Minimum of 6 years of experience crafting and operating large scale compute infrastructure . + Experience with AI/HPC job schedulers and orchestrators, such as Slurm, ... staying ahead of new technologies and effective approaches in the HPC and AI/ML infrastructure fields. Ways to stand out from the crowd: + Experience with NVIDIA… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is hiring engineers to scale up its AI Infrastructure . We expect you to have a strong programming background, knowledge of datacenter hardware, operations, ... help advance NVIDIA's capacity to build and deploy leading infrastructure solutions for a broad range of AI-based applications...multiple data streams, ranging from GPU hardware diagnostics to cluster and network telemetry. + Work on software that… more
- TEKsystems (Hillsboro, OR)
- Description We're looking for a Senior DevOps and Infrastructure Engineer to work in IPP's ( Infrastructure , Planning and Process) Cloud Infrastructure ... Deep Learning, Artificial Intelligence and Driverless Cars to cater to their infrastructure needs. These cloud services provide almost half a million automated jobs… more
- Google (Sunnyvale, CA)
- Staff Software Engineer , AI Infrastructure _corporate_fare_ Google _place_ Kirkland, WA, USA; Sunnyvale, CA, USA **Advanced** Experience owning outcomes and ... on and is growing every day. As a software engineer , you will work on a specific project critical...cluster interconnects and networking. We're building the AI infrastructure for the future, so if you are interested… more
- NVIDIA (Santa Clara, CA)
- We are now looking for a Senior Infrastructure Software Engineer for Deep Learning Libraries! NVIDIA's Deep Learning Libraries Group is seeking excellent ... kernel libraries. The mission is to design and develop scalable, modular infrastructure that streamlines development, builds, and tests across NVIDIA's diverse set… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is searching for a senior or principal engineer who specializes in building cutting-edge infrastructure for large-scale foundation model training in the ... to support multi-modal foundation models for robotics. + Optimize GPU and cluster utilization for efficient model training and fine-tuning on massive datasets. +… more
- Google (Sunnyvale, CA)
- Staff Software Engineer , AI/ML Infrastructure _corporate_fare_ Google _place_ Kirkland, WA, USA; Sunnyvale, CA, USA **Advanced** Experience owning outcomes and ... on and is growing every day. As a software engineer , you will work on a specific project critical...clusters using the latest technologies for AI acceleration and cluster interconnects and networking. The AI and Infrastructure… more
- Oracle (Montgomery, AL)
- …Oracle Cloud Infrastructure (OCI) is looking for a Principal Software Engineer to lead the development of scalable, resilient, and secure infrastructure ... efficiency at cloud hyperscale + Contribute to the evolution of OCI's infrastructure into next-gen cluster and automation frameworks Disclaimer: **Certain US… more
- pony.ai (Fremont, CA)
- …public at NASDAQ in November 2024. Responsibilities As a (Senior) Kubernetes Engineer , you will: + Design, operate, and optimize Kubernetes clusters across hybrid ... CRDs, APIs) to automate and productize internal use cases. + Own cluster lifecycle management including upgrades, patching, configuration, and governance. + Define… more
- LinkedIn (Mountain View, CA)
- …needs of the team. Job Description As a staff member of the Compute Infrastructure team at LinkedIn, you will be charged with building the next-generation ... infrastructure and platforms for LinkedIn. This is a unique...multi-clusters solutions, automate upgrades, and intelligently detect and remediate cluster health, etc. In this role, you will be… more
- Google (Sunnyvale, CA)
- Staff Software Engineer , Emerging On-prem AI Infrastructure _corporate_fare_ Google _place_ Kirkland, WA, USA; Sunnyvale, CA, USA **Advanced** Experience owning ... + 5 years of experience building and developing large-scale infrastructure , distributed systems or networks, or experience with compute...on and is growing every day. As a software engineer , you will work on a specific project critical… more
- Ochsner Health (New Orleans, LA)
- …provides advanced engineering, administration, and operational support for core infrastructure platforms with a focus on Nutanix, virtualization technologies, Active ... services, and enterprise DNS DHCP IPAM NTP through InfoBlox. The engineer ensures platform stability, optimizes performance, strengthens identity and network… more