- NVIDIA (Santa Clara, CA)
- …the first people to make them operational in production? We are seeking a dedicated Cluster Deployment Operations Engineer to support product deployments ... team, acting as the link between engineering and the NVIS field team for cluster deployment and management solutions! We bridge the gap between product roadmaps… more
- NVIDIA (Santa Clara, CA)
- …the world's most advanced computing workloads. NVIDIA is looking for an AI/ML HPC Cluster Engineer to join our MARS team. You will provide technical engagement ... and problem solving on the management of large-scale HPC systems including the deployment of compute, networking, and storage. You will be working with a team of… more
- Walmart (Sunnyvale, CA)
- …remediation. **Automation & Observability** + Build and standardize automation for cluster deployment , expansion, and monitoring using Ansible, Terraform, and ... **Position Summary ** We are seeking a highly skilled Principal Engineer (Ceph/Scale-Out Storage) with 10years+ of deep technical experience in distributed storage… more
- Cadence Design Systems, Inc. (San Jose, CA)
- …world of technology. We are seeking a highly skilled and experienced AI Systems Engineer to join our team. This is a hands-on, senior individual contributor role ... that will be pivotal in leading the development, operations , and support of our entire AI infrastructure. You...services on both GCP and Azure. + Hands-on GPU Cluster Management: Take a leadership role in the configuration,… more
- NVIDIA (Santa Clara, CA)
- … operations , and networking, familiarity with software testing and deployment , familiarity with distributed systems, and excellent communication and planning ... management systems (Kubernetes, SLURM.) Hands-on experience in Machine Learning Operations . Hands-on experience with Bright Cluster Manager. + Hands-on… more
- NVIDIA (Santa Clara, CA)
- …Artifactory, Jira) in hybrid on-premise and cloud environments. + Assist with cluster operations and system administration (managing: servers, team accounts, ... dedicated and motivated senior build and continuous integration (CI/CD) engineer for its GenAI Frameworks (Megatron-LM (https://github.com/NVIDIA/Megatron-LM) and NeMo… more
- NVIDIA (Santa Clara, CA)
- We are looking for Senior Software Development Engineer in Test (SDET) to join our New GPU Integration (NPI) team for NVIDIA's Enterprise Compute SWQA team. Are you ... to have your skills on the team! As an engineer on this New Platform GPU Integration team, you...tools to significantly enhance our testing capabilities and streamlining operations for more efficient and accurate results. + Improve… more
- Google (Sunnyvale, CA)
- Staff Software Engineer , AI/ML Infrastructure _corporate_fare_ Google _place_ Kirkland, WA, USA; Sunnyvale, CA, USA **Advanced** Experience owning outcomes and ... experience with ML design and ML infrastructure (eg, model deployment , model evaluation, data processing, debugging, fine tuning). **Preferred...on and is growing every day. As a software engineer , you will work on a specific project critical… more
- Microsoft Corporation (Mountain View, CA)
- …metrics visualization to support operational efficiency. + Manage GPU cluster operations (scheduling, isolation, utilization), high-performance computing (HPC), ... build the infrastructure that powers training, evaluation, and data platforms for reliable deployment of world-class foundational AI models. We are on a mission to… more
- Google (Sunnyvale, CA)
- Staff Software Engineer , AI Infrastructure _corporate_fare_ Google _place_ Kirkland, WA, USA; Sunnyvale, CA, USA **Advanced** Experience owning outcomes and decision ... on and is growing every day. As a software engineer , you will work on a specific project critical...clusters using the latest technologies for AI acceleration and cluster interconnects and networking. We're building the AI infrastructure… more
- NVIDIA (Santa Clara, CA)
- …experience. Ways to stand out from the crowd: + Knowledge of cloud and cluster level deployment and management systems. + Experience with GPU computing (CUDA), ... NVIDIA is seeking a Senior Firmware Engineer to join our CSP Engagements team, focusing...hardware and software, driving technical solutions from concept through deployment . What you will be doing: + Design and… more
- Google (Sunnyvale, CA)
- Senior Staff Software Engineer , SRE, ML Fleet Systems _corporate_fare_ Google _place_ Sunnyvale, CA, USA; Kirkland, WA, USA; +2 more; +1 more **Advanced** Experience ... of resource management systems (eg, compute infrastructure, Kubernetes, Flex), cluster management, and scheduling algorithms. + Familiarity with Machine Learning… more
- Insight Global (San Jose, CA)
- Job Description A large software and networking company is looking for a Site Reliability Engineer to join a growing team (ideally Hybrid in San Jose or RTP). This ... supporting Shell scripting - Build tools and automation to reduce manual operations and improve team efficiency - Integrate and maintain secrets management using… more
- NVIDIA (Santa Clara, CA)
- …of artificial intelligence. Join our team at NVIDIA as a Senior Site reliability engineer focused on HPC storage and play a crucial role in designing, implementing, ... deploying distributed storage solutions, build automation tools, and ensuring the efficient operations of our growing IT ecosystem. You will collaborate closely with… more
- Meta (Fremont, CA)
- …Introduction (NPI) Roadmap success. As the Network is essential to every mega cluster and Meta's data center success, its roadmap is rapidly expanding in volume, ... Engineering, Legal, Finance, Accounting, Compliance) across multiple domains.The NPI Operations Organization supports complex roadmaps to deliver NPI programs on… more
- Google (Sunnyvale, CA)
- …**About the job** Like Google's own ambitions, the work of a Software Engineer goes beyond just Search. Software Engineering Managers have not only the technical ... multiple teams and locations, a large product budget and oversee the deployment of large-scale projects across multiple sites internationally. Google is looking to… more