- NVIDIA (Santa Clara, CA)
- …in any of the leading Cloud environment [ AWS, Azure or GCP] + Experience with AI/ HPC cluster job schedulers such as SLURM, LSF + In depth understating of ... and RDMA + Background with Software Defined Networking and AI/ HPC cluster networking + Familiarity with deep...are growing fast. If you're a creative and autonomous engineer with real passion for technology, we want to… more
- General Dynamics Information Technology (Fairfax, VA)
- …Yes **Job Description:** At GDIT, people are our differentiator. Our work depends on a Senior HPC Systems Engineer joining our team to support the National ... Obtain:** None **Job Family:** Systems Engineering **Skills:** High-Performance Computing ( HPC ) Systems,Linux System Administration,Systems Management **Certifications:** None - N/A… more
- NVIDIA (Santa Clara, CA)
- …+ Background in running and instrumenting distributed LLM training on a multi gpu HPC cluster + Knowledge of LLM training features and libraries - Checkpointing, ... working with distributed system software architecture + Basic understanding of HPC GPU cluster , slurm + Basic understanding of Machine learning concepts and… more
- Federal Reserve Bank (Kansas City, MO)
- …skills and experience. + Experience supporting and administering a High Performance Compute ( HPC ) cluster and its components: Red Hat Linux, SLURM, IBM Spectrum ... Federal Reserve System. Our services include multiple high performance computing ( HPC ) environments, research data warehousing and curating services, and endpoint… more
- NVIDIA (CA)
- NVIDIA is looking for a Senior HPC Engineer to join its Professional Services team. Academic, commercial and government groups around the world are using ... the team building many of the largest and fastest AI/ HPC systems in the world! NVIDIA is looking for...equivalent experience. Ways to stand out from crowd: + Cluster management technologies knowledge (bonus credit for BCM (Base… more
- Penn Medicine (Philadelphia, PA)
- …performance parallel file systems. + Assist end users running applications on the HPC cluster . + Provide leadership and solutions for complex research computing ... Location:Remote Hours: (must live reasonable distance from location)M-F, Daylight The Senior Scientific Computing Support Engineer position will work in… more
- NVIDIA (Santa Clara, CA)
- …the choice, join our diverse team today! As a member of the GPU AI/ HPC Infrastructure team, you will provide leadership in the design and implementation of ground ... for our GPU Compute Clusters. As a Site Reliability Engineer , you will help us with the strategic challenges...fixing problems before they occur + Building automation for Cluster bring up and scaled up operation. + Improving… more
- NVIDIA (Santa Clara, CA)
- …artificial intelligence. Join our team at NVIDIA as a Senior Site reliability engineer focused on HPC storage and play a crucial role in designing, ... Familiarity with newer and emerging monitoring products. + Prior Experience with HPC cluster management tools such as Slurm, PBS, LSF, etc. + Experience with… more
- Dana-Farber Cancer Institute (Boston, MA)
- …work with amazing partners, including other Harvard Medical School-affiliated hospitals. The HPC Sr. Engineer , Storage will serve the Dana-Farber Cancer ... + Researches, finds and implements optimal runtime conditions for HPC workloads. + Act as a cluster scheduler power user and works with system administration to… more
- Dell Technologies (Austin, TX)
- ** Senior Principal Systems Development Engineer ** Our customers' system requirements are usually highly complex. Bringing together hardware and software systems ... the best work of your career and make a profound social impact as a Senior Principal Systems Development Engineer on our Systems Development Engineering Team in… more
- NVIDIA (Santa Clara, CA)
- …seeking a distributed software engineer to join our team! As a Senior engineer , you'll be instrumental in developing and optimizing AI infrastructure ... engineers. What You Will Be Doing: As a software engineer specializing in backend development, you'll work in a...well as container technologies like Docker and Kubernetes, and HPC /AI platforms such as Slurm. Ways to stand out… more
- NVIDIA (Santa Clara, CA)
- …how you can make a lasting impact on the world. We are looking for a Senior Linux Software Engineer to join the NVIDIA Applied Systems Engineering group. The ... runtimes, drivers+containers, and containerization of various high performance computing cluster software elements within a variety of environments. + Crafting,… more
- Amazon (Cupertino, CA)
- …of peer teams? We want to talk to you! We seek a Software Development Engineer for the Machine Learning (ML) Infrastructure team to build the tools that are used ... top performance of AWS ML and High Performance Computing ( HPC ) technologies developed by our organization. Bring your exceptional...Fabric Adapter (EFA). Key job responsibilities Be an autonomous engineer on a team that builds and maintains the… more
- NVIDIA (Santa Clara, CA)
- We are seeking a Sr System Software Engineer to help us build out our scientific computing platform on Nvidia DGX Cloud. We are building a cloud based accelerated ... shared memory and distributed memory architecture, message passing (MPI, NCCL), Cluster scalability and performance. + Hands on Debugging skills with Process,… more
- NVIDIA (Santa Clara, CA)
- …GPU Computing. We are passionate about markets include gaming, automotive, vision, HPC , datacenters and networking in addition to our traditional OEM business. ... integration, strong OS experience, reliability testing with various telemetries, scale out cluster , test plan development, CI/CD and DevOps experience to join our… more
- SpaceX (Hawthorne, CA)
- …+ Work with SpaceX HPC team to monitor and maintain a 4000+ thread HPC cluster + Closely collaborate with GNC software engineers to create highly operable ... appropriate to the responsibilities COMPENSATION AND BENEFITS: Pay Range: GNC DevOps Engineer / Senior : $160,000.00 - $220,000.00/per year Your actual level and… more
- Amazon (Cupertino, CA)
- …have extensive experience in low-latency networking and collective operations, such as HPC network fabric or machine learning accelerator cluster systems. Also ... Description We are seeking an experienced software engineer with low-level latency networking or interconnect expertise to optimize customer experience by designing… more
- Capital One (Mclean, VA)
- …to love the products and services we build. We are looking for an experienced Senior Distinguished Engineer , AI Systems, to help us build the foundations of our ... VA - McLean, United States of America, McLean, Virginia Sr Distinguished Engineer , Generative AI Systems - (Remote- Eligible) Sr Distinguished Engineer ,… more