- IBM (San Jose, CA)
- …thrive. **Your role and responsibilities** We are seeking a Sr Customer Support / SRE to join our team who is responsible for delivering Astra Streaming (Apache ... **Required technical and professional expertise** * 5+ years of experience in SRE , DevOps, or Production Engineering for large-scale distributed systems. * Deep… more
- Abbott (Pleasanton, CA)
- …and compliant with healthcare regulations-this is the role for you. As a Senior SRE , you'll work closely with engineering, QA, cybersecurity, and regulatory ... and scientists. **The Opportunity** We're looking for a strong ** Senior Site Reliability Engineer ( SRE )** who's ready to roll up their sleeves and make… more
- Insight Global (Santa Clara, CA)
- …fast-paced Infrastructure, Planning and Processes organization where you will be working as a Senior SRE Engineer. The position will be part of a fast-paced crew ... Job Description Insight Global is looking for a seasoned SRE to join one of our largest technology clients'...cater to their infrastructure & systems needs. As an SRE , you'll also be working in conjunction with various… more
- Palo Alto Networks (Santa Clara, CA)
- …we all win with precision. **Your Career** We are actively seeking a highly motivated DevOps/ SRE Senior Manager to lead our Global InfoSec SRE team, based ... and resolving high-priority production maintenance issues and incidents. As the Senior Manager for the Infosec SRE Group, you will lead a team in a cutting-edge… more
- NVIDIA (Santa Clara, CA)
- …robust, automated, and secure production environments. We are seeking a deeply skilled Senior Staff Site Reliability Engineer ( SRE ) to advance our enterprise ... experience (or equivalent experience). + 10+ years of software engineering/DevOps/ SRE experience, with a significant focus on operational security, automation,… more
- ServiceNow, Inc. (Pleasanton, CA)
- …next-generation analytics to support ServiceNow's Cloud and AI growth. As our Senior Staff DevOps Engineer for Cloud Analytics & FinOps Engineering Platform, you ... SLIs/SLOs/SLAs for data platform services with error budget management, establish SRE practices including toil reduction and capacity planning, and create… more
- NVIDIA (Santa Clara, CA)
- Site Reliability Engineering ( SRE ) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and ... open source cloud enabling technologies like Kubernetes and OpenStack. SRE at NVIDIA ensures that our internal and external...while keeping an eye on capacity, latency and performance. SRE is also a mindset and a set of… more
- NVIDIA (Santa Clara, CA)
- …the world's most powerful GPU systems. Join our top team and apply your SRE and software engineering skills to craft robust, user-friendly platforms for seamless ML ... reproducibility and scalability across large-scale, distributed GPU clusters. + Apply SRE principles to diagnose, troubleshoot, and resolve complex system issues… more
- NVIDIA (Santa Clara, CA)
- GeForce Now is looking for a Manager, Network Site Reliability Engineer ( SRE ) to enhance our network infrastructure and operations. We are looking for a leader who ... ensuring a smooth user experience. The position focuses on managing Network SRE to streamline network operations, minimize manual tasks, and achieve service level… more
- pony.ai (Fremont, CA)
- …Pony.ai went public at NASDAQ in November 2024. Responsibilities As a ( Senior ) Kubernetes Engineer, you will: + Design, operate, and optimize Kubernetes clusters ... security policies, and operational guidelines. + Contribute to observability and SRE practices to ensure reliability at scale (SLOs, incident reviews, metrics-driven… more
- ServiceNow, Inc. (Santa Clara, CA)
- …AI technologies that unlock new work experiences in the future. **As a Senior Staff Machine Learning Engineer you will:** + Contribute to the design, development ... well, and remain reliable. + Contribute to the continuous improvement of the SRE practice by turning operational use cases into requirements for software tooling. +… more
- NVIDIA (Santa Clara, CA)
- …well as managing vendor relationships. You will partner with engineering, SRE , product, and third-party infrastructure providers to achieve operational excellence. ... operational excellence best practices across all infrastructure providers, partnering with SRE , infra, product, and security teams + Define and operationalize… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is seeking a Senior Technical Program Manager to lead the Infrastructure and Product Security and Compliance program for DGX Cloud. In this role, you will ... highest standards of trust, resilience, and governance. As a Senior TPM focused on Cloud Security, you will own...and processes, establishing security KPIs, dashboards, and "run safe" SRE practices. + Partner with the CISO organization to… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is looking for an outstanding, passionate, and talented Senior AI Infrastructure Engineer to join our DGX Cloud group. This engineering role will design, ... and open source cloud enabling technologies like Kubernetes and OpenStack. DGX Cloud SRE at NVIDIA ensures that our internal and external facing GPU cloud services… more
- NVIDIA (Santa Clara, CA)
- …and periodic health checks for hardware components. We are looking for a Senior Software Engineer with experience in building highly scalable and robust enterprise ... collaboration with teams across various departments with the goal of reducing SRE toil and improving hardware utilization + Collaborating with various organizations… more
- NVIDIA (Santa Clara, CA)
- We are looking for a Senior AI Infrastructure Engineer (AI Tooling) to design and build the backend systems and infrastructure powering our internal AI tools and ... workflows + Collaborate closely with Incident Commanders, incident response, and SRE teams to integrate AI-driven automation and analytics into operational workflows… more
- Palo Alto Networks (Santa Clara, CA)
- …Alto Networks is disrupting the Cyber Security industry! We are looking for a Senior Cloud and firewall Engineer to join our Infosec team that owns, securing and ... Center, including Vulnerability Management, Network Engineering, OS Engineering, and product SRE . + Prioritize and respond to critical vulnerabilities and data… more
- Palo Alto Networks (Santa Clara, CA)
- …**Your Career** We are seeking a highly skilled and motivated Senior Manager, Performance Engineering to lead our global performance engineering organization. ... to define scalability goals and capacity models. + Work with DevOps/ SRE to ensure production readiness, observability, and monitoring align with performance… more
- NVIDIA (Santa Clara, CA)
- …upon which every new AI-powered application is built. We are seeking a Senior Software Engineer focused on container and cloud infrastructure. You will help design ... dependency management, and artifact/registry topology. + Collaborate across research, backend, SRE , and product teams to ensure day-0 availability of new models.… more
- Palo Alto Networks (Santa Clara, CA)
- …of security products. We are looking for a highly skilled and experienced Senior Staff Engineer to join our dynamic backend engineering team, specifically focused on ... cross-functional teams, including product management, frontend engineers, security researchers, and SRE , to define, design, and ship new features and platform… more