• Production Engineer

    Meta (Sunnyvale, CA)
    …efforts underway in the company.Relevant industry experience is important (Software Engineer , Site Reliability Engineer ( SRE ), Systems Engineer , ... **Summary:** Meta is seeking an experienced engineer to join our Production Engineering team. Production Engineers at Meta are hybrid software/systems/infrastructure… more
    Meta (07/31/24)
    - Save Job - Related Jobs - Block Source
  • Production Engineer

    Meta (Sunnyvale, CA)
    …new grads to industry experts. Relevant industry experience is important (Software Engineer , Site Reliability Engineer ( SRE ), Systems Engineer ,, ... we are always learning.This position is full-time. **Required Skills:** Production Engineer Responsibilities: 1. Own back-end services which handle fleet management,… more
    Meta (07/19/24)
    - Save Job - Related Jobs - Block Source
  • DevOps Engineer

    Zoom (San Jose, CA)
    …processes adhere to regulatory requirements What we're looking for + 5+ years in a Site Reliability Engineer ( SRE ) or DevOps Engineer role + ... Experience deploying and managing Web/Java based applications through EKS/Kubernetes, ArgoCD/GIT/Ansible, NGINX, Redhat/Ubuntu, AWS, Python/Shell scripting. + Experience with containerization and orchestration tools (eg Docker, Kubernetes) + Have proficiency… more
    Zoom (09/17/24)
    - Save Job - Related Jobs - Block Source
  • Senior Production SRE Engineer

    NVIDIA (Santa Clara, CA)
    Site Reliability Engineering ( SRE ) is an engineering discipline that involves designing, building, and maintaining large-scale production systems with high ... software and systems engineering practices, storage, data management, and services. SRE professionals are highly specialized and possess expertise in different… more
    NVIDIA (08/10/24)
    - Save Job - Related Jobs - Block Source
  • Senior Network SRE and Automation…

    NVIDIA (Santa Clara, CA)
    …to Deep Learning frameworks. To achieve this goal, we are looking for an engineer who has a deep understanding of L3 underlay and overlay networks, outstanding ... troubleshoots, and analyzes system disruptions and develop solutions for improved reliability + Owning and driving integrations with various service APIs such… more
    NVIDIA (07/21/24)
    - Save Job - Related Jobs - Block Source
  • Manager, Site Reliability

    NVIDIA (Santa Clara, CA)
    …everyone is inspired to do their best work. We are looking for a Manager for Site Reliability Engineering to help build and lead its cloud service team for ... SLOs. We partner with Service Owners to drive the reliability of the service. What you will be doing:...availability and performance of critically meaningful services in a live- site production environment, either as an SRE more
    NVIDIA (08/10/24)
    - Save Job - Related Jobs - Block Source
  • Senior Site Reliability

    NVIDIA (Santa Clara, CA)
    Site Reliability Engineering ( SRE ) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency ... open source cloud enabling technologies like Kubernetes and OpenStack. SRE at NVIDIA ensures that our internal and external...internal and external facing GPU cloud services run maximum reliability and uptime as promised to the users and… more
    NVIDIA (08/21/24)
    - Save Job - Related Jobs - Block Source
  • Senior Engineer , Applications, Site

    General Motors (Mountain View, CA)
    …to pioneer next-generation data solutions to support all our business units. The site reliability engineering team has a mission to relentlessly pursue the ... availability, and operations for our applications. As a senior engineer , you'll be building patterns for reliability ,...experiences to life. As a key member of our SRE team, you'll have the opportunity to shape the… more
    General Motors (09/19/24)
    - Save Job - Related Jobs - Block Source
  • Principal Site Reliability

    Palo Alto Networks (Santa Clara, CA)
    …is the market leader in this space. We are seeking development heavy Site Reliability Engineers to design, build, maintain, and scale production services ... to improve scalability in networking like BGP, OSPF, service reliability , capacity, and performance + Collaborate with development teams...the product **The Team** As a member of the SRE team, you will work on producing mission-critical platforms,… more
    Palo Alto Networks (09/17/24)
    - Save Job - Related Jobs - Block Source
  • Site Reliability Engineer

    TEKsystems (Cupertino, CA)
    …in proprietary tool boxes and it's predicted to reach 100% in 3 months, SRE would help detect that and procure and provision additional hardware. Another example is ... when the C* SRE communicates the dev team that the storage of...the Workflow Platform. In the case of K8s workloads, SRE will work with Platform team to automate the… more
    TEKsystems (09/14/24)
    - Save Job - Related Jobs - Block Source
  • Site Reliability Engineer

    JPMorgan Chase (Palo Alto, CA)
    …field of study plus 5 years of experience in the job offered or as Site Reliability Engineer , Systems Administrator, Application Engineer , Technical ... engineering and operating JPMC's cloud infrastructure and platforms ensuring reliability , resiliency, security, availability, and performance. Diagnose and repair… more
    JPMorgan Chase (09/11/24)
    - Save Job - Related Jobs - Block Source
  • Site Reliability Engineer L5…

    Netflix (Los Gatos, CA)
    …pipeline team and day-to-day live-streaming operations for Netflix. As a Live Streaming Pipeline SRE , you will be responsible for the reliability of our live ... the world is a hard challenge, demanding exceptional levels of stability and reliability from dozens of services and systems between camera and device screens. About… more
    Netflix (07/25/24)
    - Save Job - Related Jobs - Block Source
  • Lead Site Reliability

    EPAM Systems (San Jose, CA)
    We are looking for a candidate to join a multi-functional SRE team with the focus on Google Cloud Platform. You should have cloud engineering experience in such ... areas acting as the SME on operation automation and monitoring, identifying TOIL within the team's existing systems and processes, and recommending, and implementing automated solutions to reduce TOIL and improve the efficiency and effectiveness of the team.… more
    EPAM Systems (08/29/24)
    - Save Job - Related Jobs - Block Source
  • Senior DevOps Engineer - DGX Cloud

    NVIDIA (Santa Clara, CA)
    …up its AI Infrastructure. We expect you to have significant experience with site reliability principles and techniques including reliability assessments, ... solutions for a broad range of AI-based applications. If you're creative, passionate about SRE , and love having fun, please apply today! For two decades, we have… more
    NVIDIA (08/29/24)
    - Save Job - Related Jobs - Block Source
  • Staff Engineer - Cloudera-Hadoop - Big Data…

    ServiceNow, Inc. (Santa Clara, CA)
    …mitigating or minimizing any impact on Big Data applications. Collaborate closely with Site Reliability Engineers ( SRE ), Customer Support (CS), Developers, ... sunny San Diego, California in 2004 when a visionary engineer , Fred Luddy, saw the potential to transform how...green card, will be considered._** As a **Staff DevOps Engineer -Hadoop Admin** on our **Big Data Federal Team** you… more
    ServiceNow, Inc. (08/27/24)
    - Save Job - Related Jobs - Block Source
  • Staff Hadoop Administrator - Cloudera / BigData

    ServiceNow, Inc. (Santa Clara, CA)
    …mitigating or minimizing any impact on Big Data applications. Collaborate closely with Site Reliability Engineers ( SRE ), Customer Support (CS), Developers, ... Any employment is contingent upon passing the screening. _ As a **Staff DevOps Engineer -Hadoop Admin** on our **Big Data Federal Team** you will help deliver 24x7… more
    ServiceNow, Inc. (09/19/24)
    - Save Job - Related Jobs - Block Source