- Rubrik (Palo Alto, CA)
- …infrastructure services run smoothly and have the capacity for future growth. As a Senior Site Reliability Engineer , you will be responsible for: + Ensure we ... technologies + Minimum 3-5 years of experience as a Development, DevOps or Site Reliability Engineer Willing to provide 24/7 coverage + Strong Documentation… more
- Cisco (San Jose, CA)
- …improve our infrastructure and customer experience. Who You Are You are an innovative Site Reliability Engineer with experience in managing large-scale SaaS ... * Bachelor's degree with a minimum of 3 years of experience as a Site Reliability Engineer or DevOps Engineer * 7+ years' experience working… more
- JPMorgan Chase (Palo Alto, CA)
- …field of study plus 5 years of experience in the job offered or as Site Reliability Engineer , Systems Administrator, Application Engineer , Technical ... engineering and operating JPMC's cloud infrastructure and platforms ensuring reliability , resiliency, security, availability, and performance. Diagnose and repair… more
- NVIDIA (Santa Clara, CA)
- …architectural changes and/or completely new approaches for our GPU Compute Clusters. As a Site Reliability Engineer , you will help us with the strategic ... role, you will design, implement and support operational and reliability aspects of large scale large scale distributed systems...are growing fast. If you're a creative and autonomous engineer with real passion for technology, we want to… more
- Tarana Wireless (Milpitas, CA)
- As a Senior Site Reliability Engineer , you will help us manage software that runs on the cloud and remotely manages millions of radio devices. You will work ... on a team and be a main point of contact during off shore hours and responsible for all aspects of cloud operations, such as: + Infrastructure as Code + Manage environments in AWS + Monitoring and alerting + Automation for scaling, failure recovery, disaster… more
- NVIDIA (Santa Clara, CA)
- …accelerate the next wave of artificial intelligence. Join our team at NVIDIA as a Senior Site reliability engineer focused on HPC storage and play a crucial ... role in designing, implementing, and optimizing on-prem High-Performance Computing (HPC) storage solutions while harnessing the power of cloud computing. You will be responsible for crafting and deploying distributed storage solutions, build automation tools,… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is looking for a Senior Site Reliability Engineer to work in IPP (Infrastructure, Planning and Process). IPP is a global organization within NVIDIA. ... This group works with various other groups within NVIDIA Software such as Graphics Processors, Mobile Processors, Deep Learning, Artificial Intelligence and Driverless Cars to cater to their infrastructure needs. These cloud services provide almost half a… more
- LinkedIn (Sunnyvale, CA)
- …based in our Sunnyvale, CA office location. LinkedIn is looking to hire Staff Site Reliability Engineer within the production Storage Engineering group. The ... Storage Engineering group is defining the strategy for LinkedIn's storage infrastructure and is responsible for architecture, design, tooling and automation related to that. The candidate will have wide latitude in making contributions in several areas. In… more
- NVIDIA (Santa Clara, CA)
- …everyone is inspired to do their best work. We are looking for a Manager for Site Reliability Engineering to help build and lead its cloud service team for ... SLOs. We partner with Service Owners to drive the reliability of the service. What you will be doing:...availability and performance of critically meaningful services in a live- site production environment, either as an SRE or Service… more
- Palo Alto Networks (Santa Clara, CA)
- …is the market leader in this space. We are seeking development heavy Site Reliability Engineers to design, build, maintain, and scale production services ... architecture to improve scalability in networking like BGP, OSPF, service reliability , capacity, and performance + Collaborate with development teams to ensure… more
- NVIDIA (Santa Clara, CA)
- Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and ... internal and external facing GPU cloud services run maximum reliability and uptime as promised to the users and...be doing: + Design, implement and support operational and reliability aspects of large scale Observability & Telemetry collection… more
- Netflix (Los Gatos, CA)
- …the world is a hard challenge, demanding exceptional levels of stability and reliability from dozens of services and systems between camera and device screens. About ... a Live Streaming Pipeline SRE, you will be responsible for the reliability of our live streaming pipeline (transmission, encoding, packaging, origin). Instrumenting… more
- Splunk (San Jose, CA)
- …AWS). Your previous job titles might be something close to systems admin, network engineer or devops engineer . + You're passionate about your work. Our customers ... a plus. **WHAT WE PROVIDE** + Opportunities to develop and grow as an engineer . We are always expanding into new areas, working with open-source projects and… more
- Siemens Digital Industries Software (Fremont, CA)
- …of internal policies and procedures while maintaining security procedures at the site . . Communicating clearly, both verbally and in writing, regarding issues and/or ... office automation, and other enterprise services are working properly at your site . . Provide required updates and maintenance to the local high performance… more
- EPAM Systems (San Jose, CA)
- …of all other expected benefits and compensation for the position EPAM Systems, Inc. is an equal opportunity employer. We recognize the value of diversity and ... We are looking for a candidate to join a multi-functional SRE team with the focus on Google Cloud Platform. You should have cloud engineering experience in such areas acting as the SME on operation automation and monitoring, identifying TOIL within the team's… more
- TEKsystems (Cupertino, CA)
- …The main objective of this role will be to support the ramp up of Task queue and Pulsar for a platform. The role will eventually support the upgrade and migration of ... said tool with Pulsar to enhance features for internal users. WHAT Proactive capacity monitoring: periodic check on growth and utilization metrics to predict when additional resources will be needed, or when existing resources can be decommissioned or shrunk.… more
- Palo Alto Networks (Santa Clara, CA)
- …time zones to provide 24x7 coverage **Your Experience** + 4+ years as DevOps Engineer (or equal role) with a passion for technology and strong motivation and ... responsibility for high reliability and service level + Proficiency with code language (Python / Go - preferred) + High proficiency with Linux + Proficiency in the… more
- Meta (Sunnyvale, CA)
- …engineering efforts underway in the company.Relevant industry experience is important (Software Engineer , Site Reliability Engineer (SRE), Systems ... **Summary:** Meta is seeking an experienced engineer to join our Production Engineering team. Production... Engineer , DevOps Engineer , Network Engineer , or similar role), but ultimately less so than… more
- Meta (Sunnyvale, CA)
- …backgrounds, from new grads to industry experts. Relevant industry experience is important (Software Engineer , Site Reliability Engineer (SRE), Systems ... Engineer ,, DevOps Engineer , Network Engineer , or similar role), but ultimately less so than your demonstrated abilities and attitude. We sail into uncharted… more
- Zoom (San Jose, CA)
- …and processes adhere to regulatory requirements What we're looking for + 5+ years in a Site Reliability Engineer (SRE) or DevOps Engineer role + ... Experience deploying and managing Web/Java based applications through EKS/Kubernetes, ArgoCD/GIT/Ansible, NGINX, Redhat/Ubuntu, AWS, Python/Shell scripting. + Experience with containerization and orchestration tools (eg Docker, Kubernetes) + Have proficiency… more