- Merck & Co. (Rahway, NJ)
- Job DescriptionWe are looking for an experienced and enthusiastic Senior Site Reliability Engineer to join our Agile Planning Product Team. As part of the DevXOps ... and implement automation to enhance reliability and efficiency.As a Senior Reliability Engineer , you will: Become a member of a high-performing team to deliver… more
- Ford Motor Company (Dearborn, MI)
- We are seeking a highly skilled and motivated HPC SRE Systems Engineer to join our growing team. You will be responsible for designing, building, and ... maintaining our HPC and SRE infrastructure that our platform...Troubleshoot and resolve complex technical issues related to Linux systems , networking, storage, and HPC applications. +… more
- NVIDIA (Santa Clara, CA)
- …efficiency, and performance and drive foundational improvements and automation to improve engineer 's productivity. As a Site Reliability Engineer , you are ... responsible for the big picture of how our systems relate to each other, we use a breadth...to both product quality and interesting dynamic day-to-day work. SRE 's culture of diversity, intellectual curiosity, problem solving and… more
- NVIDIA (Santa Clara, CA)
- …and tools to enable NVIDIA's AV program. We are seeking a motivated Senior Engineer to join our team in building and scaling our cloud-native infrastructure which ... powers 100s of micro-services and large scale HPC clusters (15k+ GPUs). You'll play a critical role...will have strong software development as well as operational ( SRE ) skills. What you'll be doing: + Develop, operate… more
- NVIDIA (Santa Clara, CA)
- …Experience with Cloud Deployment, BCM, Terraform. + Understanding of fast, distributed storage systems like Lustre and GPFS for AI/ HPC workloads. + Familiarity ... diverse team today! As a member of the GPU AI/ HPC Infrastructure team, you will provide leadership in the...automation to improve researchers productivity. As a Site Reliability Engineer , you are responsible for the big picture of… more
- Synergy ECP (Fort Meade, MD)
- Software Integration Engineer 3 Ft. Meade, MD (http://maps.google.com/maps?q=Ft.+Meade+MD+USA+20146) Job Type Full-time Description Founded in 2007 and headquartered ... Maryland, Synergy ECP is a leading provider of cybersecurity, software and systems engineering and IT services to the US intelligence and defense communities.… more
- SOS International LLC (Reston, VA)
- …Secret Security Clearance with SCI eligibility Experience as a Site Reliability Engineer ( SRE ). Experience with Cloud-Native Services. Experience with MLOps ... Overview SOSi is seeking a Senior Data Engineer to assist the MLOps Team in creating,...security missions interacting with petabyte-scale data using High-Performance Computing ( HPC ) environment. We are hiring for the following locations:… more
- Massachusetts Institute of Technology (Cambridge, MA)
- Lead Site Reliability Engineer + Job Number: 24909 + Functional Area: Information Technology + Department: Office of Research Computing and Data + School Area: VP ... Save Save Apply Now Posting Description LEAD SITE RELIABILITY ENGINEER , Office of Research Computing and Data (ORCD), to...Research Computing and Data (ORCD), to build and advance SRE functions in collaboration with a diverse team of… more
- Merck (Rahway, NJ)
- …Description** We are looking for an experienced and enthusiastic Senior Site Reliability Engineer to join our Agile Planning Product Team. As part of the DevXOps ... to enhance reliability and efficiency. **As a Senior Reliability Engineer , you will:** + Become a member of a...for platform issues + Set up monitoring and alerting systems to proactively detect conditions and remediate before outages… more