• DataRobot (San Francisco, CA)
    …We are the internal equivalent of a hyperscale cloud provider's core compute service , obsessed with performance, efficiency, and enabling the future of AI. As a ... Senior Backend Engineer in our AI Compute team, your...to once-per-day to once-per-hour without sacrificing performance, security, or reliability + Work with Product, Legal and Security to… more
    DirectEmployers Association (10/23/25)
    - Save Job - Related Jobs - Block Source
  • Automation Anywhere, Inc. (San Jose, CA)
    …focus on incident management, root cause analysis (RCA), and continuous service reliability improvement. + Excellent communication and stakeholder management ... human potential. **Our Opportunity:** We are seeking a dynamic Senior Manager of Cloud Engineering & Operations ...monitoring, alerting, and incident management processes to ensure high service uptime and reliability . + Ensure robust… more
    DirectEmployers Association (09/30/25)
    - Save Job - Related Jobs - Block Source
  • SLAC National Accelerator Laboratory (Menlo Park, CA)
    …for multiple SLAC projects and accelerator facilities especially the design and operations of SLAC's superconducting and normal Linear Coherent Light Source (LCLS). ... Division Director is a key member of the AD's Senior Management team and contributes to the vision and...in support of AD's and SLAC's mission in accelerator operations and development projects in the disciplines including accelerator… more
    DirectEmployers Association (09/17/25)
    - Save Job - Related Jobs - Block Source
  • NetApp (San Jose, CA)
    **Job Summary** We are seeking an exceptional Senior Software Engineer to join our team and drive innovation in our cloud-based test platform used in manufacturing ... Architect scalable cloud solutions that support high-volume manufacturing test operations + Collaborate with cross-functional teams including manufacturing, quality… more
    DirectEmployers Association (10/10/25)
    - Save Job - Related Jobs - Block Source
  • Senior Service Reliability

    NVIDIA (Santa Clara, CA)
    …Administrator/DevOps engineers to design, develop and implement a global, dynamic, innovative Service Reliability Operations Center, to provide extraordinary ... you will partner with other key members of our organization including Site Reliability Engineering, Security Operations Center, DevOps teams, and other partners… more
    NVIDIA (10/06/25)
    - Save Job - Related Jobs - Block Source
  • Senior Manager, Network Site…

    NVIDIA (Santa Clara, CA)
    …reduce network disruptions and decrease Mean Time to Recovery (MTTR), improving overall service reliability and user satisfaction. + Work closely with Security ... GeForce Now is looking for a Manager, Network Site Reliability Engineer (SRE) to enhance our network infrastructure and...position focuses on managing Network SRE to streamline network operations , minimize manual tasks, and achieve service more
    NVIDIA (08/08/25)
    - Save Job - Related Jobs - Block Source
  • Senior Site Reliability Engineer…

    NVIDIA (Santa Clara, CA)
    …accelerate the next wave of artificial intelligence. Join our team at NVIDIA as a Senior Site reliability engineer focused on HPC storage and play a crucial role ... deploying distributed storage solutions, build automation tools, and ensuring the efficient operations of our growing IT ecosystem. You will collaborate closely with… more
    NVIDIA (08/21/25)
    - Save Job - Related Jobs - Block Source
  • Site Reliability Engineer ( Senior

    MongoDB (San Francisco, CA)
    …and from the public internet. Their responsibilities encompass network architecture, service mesh, and edge load balancing, ensuring customer data remains safe ... region. **Role Overview** We are seeking a talented Site Reliability Engineer (SRE) with a strong networking background to...that benefit end-users + Value efficiency in processes and operations , and display a strong preference for automation over… more
    MongoDB (10/07/25)
    - Save Job - Related Jobs - Block Source
  • Sr. Quality & Reliability Engineer,…

    Amazon (Cupertino, CA)
    …cutting AI platforms for the world's largest Cloud Services provider. As a Senior Reliability Engineer you will engage with an experienced cross-disciplinary ... Description The Trainium Manufacturing, Quality and Reliability (MQR) Team is part of AWS Annapurna...with employees, supervisors, and staff to ensure exceptional customer service ; and follow all federal, state, and local laws… more
    Amazon (09/05/25)
    - Save Job - Related Jobs - Block Source
  • Sr Manager, Site Reliability Engineering…

    Palo Alto Networks (Santa Clara, CA)
    …+ Experience in navigating the complexities of business requirements and ensuring high service reliability in a dynamic environment. + US citizenship for FedRAMP ... Career** We are actively seeking a highly motivated DevOps/SRE Senior Manager to lead our Global InfoSec SRE team,...The InfoSec SRE group is fundamental to ensuring the reliability and availability of the production environment that hosts… more
    Palo Alto Networks (09/17/25)
    - Save Job - Related Jobs - Block Source
  • Operations Technology Consultant-Specialist…

    Deloitte (San Jose, CA)
    Operations Technology Consultant-Specialist Senior AI & Engineering Join our AI & Engineering team in transforming technology platforms, driving innovation, and ... reliable, and optimized for OT. + Oversee edge and AI infrastructure operations , including monitoring, reliability engineering, and AI-driven operations more
    Deloitte (10/23/25)
    - Save Job - Related Jobs - Block Source
  • Senior Director - IT Operations

    PagerDuty (San Francisco, CA)
    PagerDuty, Inc. (NYSE:PD) is a global leader in digital operations management. Trusted by nearly half of both the Fortune 500 and the Forbes AI 50, as well as ... is growing, and we are looking for an experienced Senior Director of Corporate IT (End User Services/AI, Systems...Systems Engineering, and Enterprise Security) to lead our IT Operations at scale. In this role, you will architect,… more
    PagerDuty (09/13/25)
    - Save Job - Related Jobs - Block Source
  • Layer 4-7 F5 Nginx - Senior Network…

    ServiceNow, Inc. (Santa Clara, CA)
    …24/7 on-call rotation, including weekends, as part of the Network Operations Engineering team. + Maintain software-defined, declarative infrastructure at scale using ... latency, and availability to detect and prevent outages. + Support network operations in private and hybrid multi-cloud environments (eg, Azure, AWS, GCP). +… more
    ServiceNow, Inc. (10/17/25)
    - Save Job - Related Jobs - Block Source
  • Senior Manager, Facilities Data & Analytics…

    Roche (South San Francisco, CA)
    **The Position** **We are seeking a strategic and visionary Senior Manager, Facilities Data & Analytics to build and lead the data foundation for our Facilities & ... leader will transform how we leverage data to enhance infrastructure reliability , optimize capital planning, and drive operational excellence across our building… more
    Roche (10/03/25)
    - Save Job - Related Jobs - Block Source
  • Senior , Software Engineer

    Walmart (Sunnyvale, CA)
    …through our high-performance checkout services running in Edge and Cloud. As a Site Reliability Engineer in the CPC Team, you will work with L2, Other dependent ... tools, and processes that will ensure the highest levels of availability and reliability of CPC applications. About Team: Our team works closely with our US… more
    Walmart (08/15/25)
    - Save Job - Related Jobs - Block Source
  • Senior Service Engineer

    Saildrone (Alameda, CA)
    …designs to drive reliability , safety and cost improvements. Work with Fleet Service Managers to define complexity of service and diagnostic work in support ... below the surface to support border security, law enforcement, naval operations , and undersea infrastructure protection. Headquartered in Alameda, CA, with offices… more
    Saildrone (09/11/25)
    - Save Job - Related Jobs - Block Source
  • Senior Manager, Machine Learning Engineer

    Cisco (San Jose, CA)
    …for our Generative AI Assistant products. Your skills in understanding service operations and architecting/building effective interfaces between service ... Senior Manager, Machine Learning Engineer Apply (https://jobs.cisco.com/jobs/Login?projectId=1449940) +...software development engineer **Preferred Qualifications** + Experience in AI service operations , including Generative AI, and the… more
    Cisco (10/15/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer - Azure Storage

    Microsoft Corporation (Mountain View, CA)
    …as a Service ), containers, serverless, BareMetal, and more. As a Senior Software Engineer - Azure Storage, you will design and implement protocols and ... in designing, developing and building large-scale distributed backend systems including operations , performance, reliability , resilience, and scale-out. + 4+… more
    Microsoft Corporation (10/29/25)
    - Save Job - Related Jobs - Block Source
  • Senior Software Engineer

    Microsoft Corporation (Mountain View, CA)
    …and delivering high-quality solutions. + 2 years of experience with service reliability engineering and incident management for mission-critical systems. ... elevating existing standards and introducing innovations that set a new benchmark for reliability and resilience. As a Senior Software Engineer, you will design… more
    Microsoft Corporation (10/24/25)
    - Save Job - Related Jobs - Block Source
  • Senior PCBA Manufacturing Engineer, Project…

    Amazon (San Jose, CA)
    …- Drive closure of all operational issues during pre-production builds. Work within Operations , Supply Chain, R&D, and Reliability Engineering to implement and ... prioritize issues for whole program. You use data to drive manufacturability, reliability , and quality objectives in design and development teams. Key job… more
    Amazon (08/08/25)
    - Save Job - Related Jobs - Block Source