- NVIDIA (Santa Clara, CA)
- …and drive foundational improvements and automation to improve researchers productivity. As a Site Reliability Engineer , you are responsible for the big ... and operating large scale compute infrastructure + Proven experience in site reliability engineering for high-performance computing environments with operational… more
- NVIDIA (Santa Clara, CA)
- …drive foundational improvements and automation to improve engineer 's productivity. As a Site Reliability Engineer , you are responsible for the big ... comprehensive troubleshooting from bare metal to application level, ensuring system reliability and efficiency. + Develop, define and document standard methodologies… more
- ServiceNow, Inc. (Santa Clara, CA)
- …in the future. **As a Senior Staff Machine Learning Engineer - Site Reliability Engineer you will:** + Contribute to the design, development and ... It all started in sunny San Diego, California in 2004 when a visionary engineer , Fred Luddy, saw the potential to transform how we work. Fast forward to today -… more
- NVIDIA (Santa Clara, CA)
- Join our team in Santa Clara, CA, USA as a Senior Site Reliability Engineer . At NVIDIA, you'll be part of the team shaping the future of computing and ... guaranteeing the smooth operation of our brand-new technologies. Our mission is to leverage AI's power to build outstanding and pioneering solutions that have a significant impact on the world. What you'll be doing: + Own the solutions you build, collaborating… more
- Palo Alto Networks (Santa Clara, CA)
- …configuration management with a framework such as Terraform, Helm + Experience in Site Reliability Engineering, Production Engineering, or DevOps + Passion for ... Networks runs a large infrastructure and is one of the largest GCP customers. As a Senior Staff DevOps Engineer for the App Services team, you will be part of… more
- Palo Alto Networks (Santa Clara, CA)
- …and Alerts Management - Clear understanding of incident and alerts management in Site Reliability Engineering + DevOps/SRE Expertise - 4+ years of experience ... our systems' performance and health. **Your Impact** As a Senior SRE with the Cortex Cloud Security Posture Management...influence the operability of the product and ensure the reliability and availability of our services **Your Experience** +… more
- NVIDIA (Santa Clara, CA)
- Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and ... internal and external facing GPU cloud services run maximum reliability and uptime as promised to the users and...be doing: + Design, implement and support operational and reliability aspects of large scale Kubernetes clusters with focus… more
- Palo Alto Networks (Santa Clara, CA)
- …actionable insights into our systems' performance and health. **Your Impact** As a Senior Staff SRE with the Cortex Observability team, you will: + Cloud Expertise: ... influence the operability of the product and ensure the reliability and availability of our services. **Your Experience** +...DevOps/SRE Expertise: 5+ years of experience as a DevOps/SRE engineer with a passion for technology and a strong… more
- NVIDIA (Santa Clara, CA)
- …large-scale systems supporting critical use cases for AI Infrastructure, driving reliability , operability, and scalability across global public and private clouds. + ... + Build tools and frameworks to improve observability, define actionable reliability metrics, and enable fast issue resolution, driving continuous improvement in… more
- Rubrik (Palo Alto, CA)
- …and services with the objective of achieving and exceeding availability and reliability goals * Manage and streamline monitoring systems to enhance observability and ... visibility * Perform Production Readiness Assessments of new services to identify reliability needs and surface potential gaps * Develop and maintain documentation… more
- Rubrik (Palo Alto, CA)
- …public cloud technologies + Minimum 1-3 years of experience as a Development, DevOps or Site Reliability Engineer Willing to provide 24/7 coverage + Strong ... and like to win, we want to talk to you! **About The Role:** Sr . Site Reliability Engineers at Rubrik are systems/software engineers who ensure that Rubrik's… more
- Walmart (Sunnyvale, CA)
- …of orders daily through our high-performance checkout services running in Edge and Cloud. As a Site Reliability Engineer in the CPC Team, you will work with ... criteria (for example, probability of failure, frequency of failure) to measure site reliability . Monitors site reliability conditions and new … more
- General Motors (Mountain View, CA)
- …+ Participate in on-call engineering duty to support production. + Instill Site Reliability best practice through automation, data insights, and observability ... future for generations to come. In this SRE SW Engineer role, you will develop and maintain key elements...and maintain key elements of the infrastructure health and reliability monitoring for GM's commercial fleet. We are an… more
- Amazon (Cupertino, CA)
- …designs cutting AI platforms for the world's largest Cloud Services provider. As a Senior Reliability Engineer you will engage with an experienced ... Description The Trainium Manufacturing, Quality and Reliability (MQR) Team is part of AWS Annapurna Labs focused on Machine Learning products that… more
- Amazon (Sunnyvale, CA)
- Description As a Sr . Embedded SW Engineer , you will collaborate with internal cross-functional teams to deliver software solutions for variety of customer ... and maintainability. A day in the life As a Senior Embedded Software Development Engineer , you will:...in ways we can't even imagine yet. As a Sr . Embedded Software Development Engineer , you will… more
- Amazon (Sunnyvale, CA)
- …General Intelligence (AGI) team is looking for a passionate, talented, and inventive Sr . Software Development Engineer ( Sr . SDE)/Machine Learning Engineer ... multi-modal and multi-lingual Large Language Models (LLM). As our Sr . SDE/MLE superstar, you'll have the power to lead...5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience… more
- Amazon (Sunnyvale, CA)
- Description As a Sr . Embedded SW Engineer , you will collaborate with internal cross-functional teams to deliver software solutions for variety of customer ... peer environment. Key job responsibilities Seeking a highly motivated senior software engineer who will help drive...in ways we can't even imagine yet. As a Sr . Software Development Engineer , you will be… more
- Amazon (Sunnyvale, CA)
- Description We are looking for an experienced and motivated Sr . Software Development Engineer to join Amazon's Smart Eyewear initiative on the Emerging Devices ... of systems, components, make design decisions, and contribute to the scalability, reliability , and performance of our systems. You'll also have opportunities to work… more
- J&J Family of Companies (Santa Clara, CA)
- …and implementing changes to improve manufacturing quality and product performance. The Senior Engineer will assist in the identification and pursuit of ... - External** Johnson & Johnson is hiring for a ** Sr . Manufacturing Engineer , Shockwave Medical** to join...for the treatment of calcified plaque. **Position Overview** The Senior Manufacturing Mechanical Engineer will be responsible… more
- Amazon (Palo Alto, CA)
- Description Sr Software Development Engineer - Sponsored Products Amazon is investing heavily in building a world class advertising business and we are ... relentlessly obsessed with making connections with customers. As a Sr software engineer you will be responsible...5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience… more