- NVIDIA (Santa Clara, CA)
- …make a lasting impact on the world. We are seeking a highly skilled and experienced HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters ... + Provide leadership and strategic mentorship on the management of large-scale HPC systems including the deployment of compute, networking, and storage. + Develop… more
- NVIDIA (Santa Clara, CA)
- …+ Provide leadership and strategic mentorship on the management of large-scale HPC systems including the deployment of compute, networking, and storage. + Develop ... and operating large scale compute infrastructure. + Experience with AI/ HPC job schedulers and orchestrators, such as Slurm, K8s...such as Slurm, K8s or LSF. Applied experience with AI/ HPC workflows that use MPI and NCCL. + Proficient… more
- NVIDIA (Santa Clara, CA)
- …Make the choice to join us today! As a member of the GPU AI/ HPC Infrastructure team, you will provide leadership in the design and implementation of ground ... + Provide leadership and strategic guidance on the management of large-scale HPC systems including the deployment of compute, networking, and storage. + Develop… more
- NVIDIA (Westford, MA)
- …that will be integrated with diverse quantum computing platforms. The lead HPC Engineer will offer technical mentorship, system administration, optimizing ... As the HPC Operations Engineer for the new...As the HPC Operations Engineer for the new Accelerated Quantum Center (https://www.nvidia.com/en-us/solutions/quantum-computing/accelerated-quantum-center/)… more
- Texas A&M University System (College Station, TX)
- Job Title Senior HPC Engineer Agency Texas A&M University Department Technology Services - IT Enterprise Operations Proposed Minimum Salary Commensurate Job ... members' faculty and staff providing cutting-edge research and super computing needs. As a Senior High Performance Computing Engineer ( HPC ), you will provide… more
- University of Pennsylvania (Philadelphia, PA)
- …programs and resources, and much more. Posted Job Title HPC Systems Engineer Job Profile Title Systems Administrator Senior Job Description Summary The Penn ... and motivated High Performance Computing ( HPC ) Systems Engineer to join the team. PARCC's main cluster...systems team. Job Description Job Responsibilities + Collaborate with senior staff to design, plan, test, and implement advanced… more
- NVIDIA (Santa Clara, CA)
- …and planning abilities. Experience working with High Performance Computing ( HPC ), GPUs, and high-performance networking (RDMA, Infiniband, RoCE) are strongly ... will be harnessing multiple data streams, ranging from GPU hardware diagnostics to cluster and network telemetry. + Work on software that manages NVLINK topography… more
- NVIDIA (TX)
- NVIDIA is looking for an experienced GPU and network systems Solutions Architect & Engineer . Do you want to be part of a team that brings new Artificial Intelligence ... center GPU server and networking system deployments as Solution Architect Engineer . Guide customer discussions on network design, compute/storage and support bring… more
- NVIDIA (Santa Clara, CA)
- Join the NVIDIA Deep Learning Frameworks Infrastructure team as a Senior Systems Engineer focusing on High-Performance AI & Networking Applications, committed to ... for internal teams and external partners on standard methodologies in HPC networking deployments. + Share insights on improving networking strategies for… more
- NVIDIA (MA)
- …with an interest in advancing artificial intelligence (AI) and high-performance computing ( HPC ) in academic and research environments? We are looking for a Solutions ... background in building and deploying research computing clusters, deploying AI and HPC workloads, and optimizing system performance at scale. What you'll be doing:… more
- NVIDIA (Santa Clara, CA)
- We are now looking for a Senior Software Engineer for AI Resiliency. At NVIDIA, we are pushing the boundaries of what's possible in AI. We are currently seeking ... a Senior Software Engineer to lead the development...GPUs. Your expertise will be crucial in driving down cluster downtime towards zero, ensuring that our AI systems… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is searching for a senior or principal engineer who specializes in building cutting-edge infrastructure for large-scale foundation model training in the ... to support multi-modal foundation models for robotics. + Optimize GPU and cluster utilization for efficient model training and fine-tuning on massive datasets. +… more
- NVIDIA (Santa Clara, CA)
- …The data center platforms like GB200 NVL72 by NVIDIA are redefining AI, HPC , and cloud computing. To accommodate leading workloads globally, our diagnostic systems ... hardware technologies. We're in search of a visionary technical leader to engineer and propel innovation in diagnostics for NVIDIA's partner ecosystem. This role… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is seeking a Senior Firmware Engineer to join our CSP Engagements team, focusing on system software for Datacenter products such as GB200. This role ... see: + Deep expertise in data center server architectures, HPC systems, and hardware-software co-design. + Deep expertise in...out from the crowd: + Knowledge of cloud and cluster level deployment and management systems. + Experience with… more
- NVIDIA (Santa Clara, CA)
- …for AVs capable of running on thousands of GPUs; + Optimize GPU and cluster utilization for efficient model training and fine-tuning on massive datasets; + Implement ... curriculum learning. + Deep understanding of GPU acceleration, CUDA programming, and cluster management tools like Kubernetes. + Strong programming skills in Python… more
- NVIDIA (Santa Clara, CA)
- …GPU Computing. We are passionate about markets include gaming, automotive, vision, HPC , datacenters and networking in addition to our traditional OEM business. ... Linux experience, reliability testing with various telemetries, scale out cluster , test plan development, track record in developing AI...are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want… more
- NVIDIA (CO)
- …server architecture. In-depth understanding of the different deployment models for GPUs (eg, HPC , AI cluster , single- or multi-GPU servers). + Experience in Data ... NVIDIA is searching for a highly motivated, creative engineer with experience in system software security to join the Data Center Systems Software team. In this… more
- Mount Sinai Health System (New York, NY)
- …and implements backup policies. + Assist in the management and maintenance of HPC cluster and data center work, including troubleshooting for resolving system ... data warehouse team and a research data services team. The **_Senior Systems Administrator/ Engineer ,_** as a member of the Scientific Computing and Data group, is… more
- Honeywell (Phoenix, AZ)
- You will report directly to the Senior Engineering Manager and you'll work at our Plymouth, MN location on a Hybrid work schedule. (Other allowed Honeywell Aerospace ... **KEY RESPONSIBILITIES** + Work with IC Design EDA Applications, High Performance Compute cluster staff, and IC Design engineers to craft and maintain optimized EDA… more
- NVIDIA (NY)
- …other Engineering fields (or equivalent experience) + 12+ years experience as an ML/Software Engineer with a proven track record in writing code in Python, C++ + ... models at scale on public cloud computing and/or on-prem HPC clusters in production Ways To Stand Out From...of MLOps technologies such as containers, data center deployments, cluster management software, etc. + Experience working with enterprise… more