- Rivian (Palo Alto, CA)
- …more sustainable for everyone. Role Summary We are seeking a Senior Site Reliability Engineer (SRE) specializing in Observability to join RivianVW's Data ... Computer Science, Engineering, or equivalent practical experience. Experience : 5+ years in Site Reliability Engineering or a related role with a strong emphasis… more
- NVIDIA Corporation (Santa Clara, CA)
- Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and ... you'll be doing: Design, implement and support operational and reliability aspects of large scale Observability &...operational and reliability aspects of large scale Observability & Telemetry collection platform with a focus on… more
- Rivian (Palo Alto, CA)
- A leading automotive technology firm is seeking a Senior Site Reliability Engineer specializing in Observability to enhance their Data Platform. This ... role involves designing observability systems, collaborating with... systems, collaborating with cross-functional teams, and ensuring the reliability of production environments. The ideal candidate will have… more
- Promote Project (Santa Clara, CA)
- Senior Site Reliability Engineer ML Platforms Location 60000 - 135000 a year (s) Description Are you passionate about building and maintaining large-scale ... culture? If so, we have a great opportunity for you! NVIDIA is seeking a Senior Site Reliability Engineer (SRE) for the Data Science & ML Platform(s) team.… more
- NVIDIA Corporation (Santa Clara, CA)
- …NTP/PTP, DHCP, and LDAP. This includes building for performance and reliability at global scale, covering automation, monitoring, high availability, capacity ... optimizations (SR-IOV/ DPU)* Experience with Technologies like eBPF and XDP for Observability & DDoS mitigation* Collect and review system data for capacity and… more
- NVIDIA Corporation (Santa Clara, CA)
- A leading technology company is seeking a Site Reliability Engineer to design and maintain large-scale production systems with a focus on observability ... and reliability . The ideal candidate will have 5+ years of experience in infrastructure automation and distributed systems, along with strong skills in Python and cloud technologies like Kubernetes. This position offers a competitive salary based on… more
- IBM Computing (San Jose, CA)
- A leading global technology firm is seeking a Site Reliability Engineer to join their team in San Jose, California. This role involves monitoring production ... systems, troubleshooting issues, and utilizing CI/CD tools for deployment. Candidates should have 1-3 years of experience in system monitoring and automation, as well as proficiency in Linux. The company offers a collaborative environment focused on innovation… more
- Kognitos (Mountain View, CA)
- …this hybrid role, you'll operate at the intersection of Developer Productivity and Site Reliability Engineering (SRE). You'll design, implement, and maintain the ... Lead Infrastructure Engineer We're looking for a Lead Infrastructure ...scale the core systems powering developer velocity and platform reliability at Kognitos. If you're passionate about Terraform, scalable… more
- Cisco Systems (San Jose, CA)
- Meet the Team The Splunk Observability Application Platform Team is a dynamic group of engineers responsible for the core platform powering Splunk Observability ... Cloud. Our platform is the foundation for advanced observability capabilities that enable our customers to thrive. We operate in small, high-performing teams,… more
- jobr.pro (Mountain View, CA)
- …engineers. Drive adoption of best practices in distributed system design, observability , reliability engineering, and modern DevOps tooling . Cross-Functional ... per week) About the Role ID.me is seeking a Staff Software Development Engineer to help design and build the next-generation SaaS Commerce Platform and Developer… more
- Kloudfuse (Cupertino, CA)
- …and led by the leading investors in Silicon Valley. Kloudfuse AI takes observability data to the next level. Our HawkEye and BullsEye analytics engine generates ... architecting and developing production web-scale systems (monitoring, telemetry, performance, reliability , triage and debug) Experience building and maintaining large… more
- Cisco Systems (San Jose, CA)
- …and Cisco's global engineering capabilities. Our work spans networking, security, observability , and customer experience - designing and deploying foundation models ... that enhance reliability , strengthen security, prevent downtime, and deliver predictive insights...security, prevent downtime, and deliver predictive insights across Splunk Observability , Security, and Platform at enterprise scale. You'll be… more
- Develop Health Inc. (Menlo Park, CA)
- …rapidly following a major funding round. About The Role: We're hiring an AI Engineer to take models from prototype to production and drive real clinical impact ... optimizing inputs, retrieval, and context to deliver cutting‑edge performance and reliability . Design and maintain evaluation pipelines that rigorously measure model… more
- Develop Health Inc. (Menlo Park, CA)
- …to bring frontier LLM capabilities into real‑world workflows, ensuring performance, usability, and reliability . Working on‑ site in Menlo Park at least three days ... a major funding round. About The Role: We're hiring a Senior Full Stack Engineer to design and ship the systems and interfaces that power our AI‑driven healthcare… more
- Athena LLC. (Palo Alto, CA)
- …with expertise in Google Cloud Platform (GCP), development tools, distributed systems, Site Reliability Engineering (SRE), observability , and DevOps ... **SRE and Observability :** Implement SRE best practices to enhance system reliability and performance, and build observability into the infrastructure to… more
- Pantera Capital (Palo Alto, CA)
- …and accurately share knowledge with their teammates. About the Role As a Data Center Site Reliability Engineer (SRE) at xAI, you will play a pivotal ... or a related technical field (or equivalent experience). 5+ years in site reliability engineering, data center operations, or large‑scale infrastructure… more
- SayPro (Palo Alto, CA)
- …(physical, virtual and cloud) (A, I) Proven track record delivering high‑availability, multi‑ site architectures (A, I) Monitoring and observability using Azure ... summary About the Role We're looking for a highly skilled Senior Infrastructure Engineer to design, implement, and support the core systems and services that drive… more
- Athena LLC. (Palo Alto, CA)
- …in cloud platforms along with a deep understanding of distributed systems, site reliability engineering (SRE), observability , and DevOps ... trained assistants leveraging highly trained AI.**Role Overview**The Infrastructure Staff Engineer will spearhead Athena's infrastructure initiatives, focusing on creating… more
- Neara (Palo Alto, CA)
- Job type: Full Time Department: Backend Engineer Work type: On- Site About A rchetype AI Archetype AI is developing the world's first AI platform to bring AI into ... jobsarchetypeaiio. About the Role Were looking for a highly motivated backend engineer with a passion for building performant, scalable, and resilient distributed… more
- Neara (Palo Alto, CA)
- Job type: Full Time . Department: Backend Engineer . Work type: On- Site About A rchetype AI Archetype AI is developing the world's first AI platform to bring AI ... io. About the Role We're looking for a highly motivated backend engineer with a passion for building performant, scalable, and resilient distributed systems.… more