Senior Engineer

New Today

We are looking for a highly motivated HPC Network Engineer to support the deployment and management of large-scale data center infrastructure. This role places you at the forefront of high-performance computing technologies, with a strong focus on Infiniband (IB) systems. You'll work closely with global teams and customers to ensure optimal system performance, reliability, and scalability while driving continuous improvement through automation and operational excellence. Key Responsibilities
Deploy and manage Infiniband network systems in high-scale data center environments. Oversee the end-to-end lifecycle of IB hardware and software components. Develop and maintain tools to automate deployment, monitoring, and troubleshooting workflows. Analyze and resolve complex network performance, connectivity, and stability issues. Collaborate cross-functionally with internal teams and external customers around the globe to ensure high-quality outcomes. Participate in a rotating on-call schedule to provide 24/7 operational support.
Qualifications
Hands-on experience with Infiniband networking technologies. Strong foundational knowledge of networking concepts and protocols. Experience using Grafana and PromQL for monitoring and visualization. Proficient in Linux systems administration. Experience with infrastructure-as-code tools such as Ansible and HELM. Proficiency in at least one scripting or programming language (e.g., Python, Go). Strong problem-solving skills with a proven track record in troubleshooting complex infrastructure and application issues. Familiarity with diagnosing issues at the network and server hardware component level. Excellent verbal and written communication skills, with a collaborative mindset. Experience operating large-scale environments with 1,000+ switches or nodes. Background in data center networking. Exposure to HPC-specific workloads and architectures. Familiarity with NVIDIA UFM (Unified Fabric Manager). Experience with NCCL (NVIDIA Collective Communications Library).
CoreWeave is The Essential Cloud for AI. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com. We’re proud to be a Living Wage accredited Employer. The base pay and target total cash for this position range from $165,000 to $220,000 and [ttc $190,000] to [ttc $253,000], accordingly. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. This position includes a discretionary bonus, equity, and a comprehensive benefits package. To fulfil our obligation to protect client data, successful applicants offered employment with CoreWeave will be required to complete a basic criminal record check, conducted in compliance with GDPR. Employment offers are conditional upon receiving satisfactory check results. What We Offer
Family-level Medical Insurance Family-level Dental Insurance Generous Pension Contribution Life Assurance at 4x Salary Critical Illness Cover Employee Assistance Programme Tuition Reimbursement Work culture focused on innovative disruption
#J-18808-Ljbffr
Location:
City Of Westminster
Job Type:
FullTime

We found some similar jobs based on your search