- Meta (New York, NY)
- …fabric and host networking, communications lib and scheduling infrastructure. **Required Skills:** AI / HPC System Performance Engineer Responsibilities: ... a loss-less fabric interconnect with minimal latency. To improve performance of these systems we constantly look...teamwork and close collaboration 3. Responsible for the overall performance of the communication system , including … more
- NVIDIA (Santa Clara, CA)
- …fit for you, we'd love to hear from you! NVIDIA is seeking a Senior High Performance Computing ( HPC ) and AI Networking Performance Research and Analysis ... In this exciting role, you will profile and analyze AI workloads on large GPUs and CPUs scale clusters...and platforms, such as HCAs, Switches, CPUs, GPUs, and Systems . You will develop performance analysis tools… more
- Oracle (Seattle, WA)
- …running HPC / AI workloads, or maintaining an HPC / AI system . + Experience troubleshooting or tuning performance on distributed systems . + ... experts on RDMA cluster architecture and its relationship to AI /ML/ HPC performance . We apply our...experience + Experience with benchmarking and troubleshooting or optimizing performance of a system . + Experience with… more
- Lilly (Indianapolis, IN)
- …infrastructure! The Cloud and Connectivity organization is seeking experts and leaders in AI and High- Performance Computing ( HPC ), and Nvidia DGX server ... of advanced Linux platforms supporting AI and HPC workloads, managing Nvidia DGX systems using...expert. **What You Should Bring** + Expertise in Linux system administration, HPC environments, and Nvidia DGX… more
- NVIDIA (Santa Clara, CA)
- …you'll be doing: + Support day-to-day operations of production on-premises and multi-cloud AI / HPC clusters, ensuring system health, user satisfaction, and ... the GPU and driving breakthroughs in gaming, computer graphics, high- performance computing, and artificial intelligence. Our technology powers everything... systems such as Lustre and GPFS for AI / HPC workloads + Applied knowledge in … more
- NVIDIA (Santa Clara, CA)
- …with AI / HPC workflows that use MPI + Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Passion for continual learning ... GPU compute clusters that run demanding deep learning, high performance computing, and computationally intensive workloads. We seek a...storage systems like Lustre and GPFS for AI / HPC workloads + Familiarity with deep learning… more
- NVIDIA (Santa Clara, CA)
- …analyzing and tuning performance for a variety of AI / HPC workloads. Excellent problem-solving to analyze complex systems , identify bottlenecks, and ... and implement GPU compute clusters for deep learning and high- performance computing. What you'll be doing: + Provide leadership...storage systems like Lustre and GPFS for AI / HPC workload. Experience working with deep learning… more
- NVIDIA (Santa Clara, CA)
- …, time-series databases, and large-scale monitoring systems . + Familiarity with AI /ML pipelines, GPU-based workloads , and HPC environments. + Experience ... teams to optimize observability for model training, inference workloads, and HPC performance . + Leverage machine learning and statistical techniques… more
- Bloomberg (New York, NY)
- …and maintenance of our HPC / AI clusters, ensuring peak performance and reliability + Drive system upgrades, customization, and seamless integration ... enables communication between GPUS, CPUs, and storage in scale-out AI and HPC systems . This...overseeing the ongoing monitoring, support, and maintenance of our HPC / AI clusters, ensuring peak performance … more
- Deloitte (Costa Mesa, CA)
- …and building secure networks and modern data centers, to enabling the adoption of AI or high- performance computing ( HPC ), you'll gain firsthand experience ... organizations through Data Center and infrastructure transformation journeys, such as adopting AI , deploying high- performance computing ( HPC ) or edge… more
- Federal Reserve Bank (Kansas City, MO)
- …deploy, configure, and administer medium scale HPC clusters and associated storage systems . + Monitor system health, performance metrics, and resource ... Bank of Kansas City and across the Federal Reserve System . Our services include multiple high performance ...technologies (CUDA, OpenACC). + Experience supporting machine learning and AI workloads on HPC systems .… more
- NVIDIA (Santa Clara, CA)
- …architecture group at NVIDIA has openings for software architects in the field of AI and high- performance networking and system software. We research, ... and usable. + Creating proofs-of-concept to evaluate and motivate extensions in AI Frameworks (PyTorch/NEMO), HPC programming models (MPI, OpenSHMEM, PGAS), new… more
- General Dynamics Information Technology (Dayton, OH)
- …expectations. Familiarity with commonly used HPC services (ie high performance file systems , modules for installing applications, compilers, MPI, OpenMP, ... Required:** None **Job Family:** Science and Research **Skills:** High Performance Computing ( HPC ),Linux,Python (Programming Language),Supercomputing **Experience:** 3… more
- Meta (Menlo Park, CA)
- …These workloads expect a loss-less fabric interconnect with minimal latency. To improve performance of these systems we constantly look for opportunities across ... host networking, communications lib and scheduling infrastructure. **Required Skills:** AI / HPC Network Engineering Manager Responsibilities: 1. Manage engineers… more
- Texas A&M University System (College Station, TX)
- …patching, and performance tuning.* Oversee networking, security, and infrastructure for HPC systems .* Lead the development of specialized HPC computing ... expertise and consultation for the design and deployment of HPC systems . Get in on the ground...the following:* Experience with High Performance Computing ( HPC ) environments* Advanced Linux system administration skills*… more
- Johns Hopkins University (Baltimore, MD)
- …Deployment and Design** + Develop and refine deployment strategies for scientific software on HPC and AI systems . + Design computational workflows, selecting ... Utilize CUDA, DNN, TensorRT, and Intel Compilers to enhance system performance . _HPC Scientific Software Support_ +...Ensure compliance with security and regulatory standards for all HPC and AI systems . _In… more
- Johns Hopkins University (Baltimore, MD)
- …Deployment and Design_ + Develop and refine deployment strategies for scientific software on HPC and AI systems . + Design computational workflows, selecting ... and project status. + Ensure compliance with security and regulatory standards for all HPC and AI systems . **Minimum Qualifications** + Master's Degree in… more
- NVIDIA (Westford, MA)
- …workflow orchestration. + Comfort working across system boundaries: control systems , APIs, networking, and performance tooling. + Strong communication skills ... role sits at the intersection of data center infrastructure, high- performance computing, and emerging quantum systems , and...computation, quantum execution, and data movement across tightly coupled systems . HPC Systems & Operations… more
- NVIDIA (Santa Clara, CA)
- …Be Doing: + Primary responsibilities will include building and enabling robust AI / HPC infrastructure for customers + Support operational and reliability aspects ... of large-scale AI clusters, focusing on performance at scale,...in working with customers + Expertise with parallel file systems (eg Lustre, GPFS, BeeGFS, WekaIO) and high-speed interconnects… more
- Massachusetts Institute of Technology (Cambridge, MA)
- …a Friend Save Save Apply Now Posting Description SENIOR HPC SYSTEMS ENGINEER, The Massachusetts Green High Performance Computing Center (MGHPCC ... be responsible for deploying, maintaining, and optimizing HPC clusters, storage systems , and networking for AI /ML workloads. Join a collaborative, fast-paced… more