System Engineer

Systems Programmer

Remote

January 30, 2026

Full Time

Application ends: April 29, 2026

Apply Now

Job Description

About the Role

As a System Software Engineer focused on data center infrastructure, you will be responsible for the reliability, scalability, and performance of extremely large compute environments supporting advanced AI and machine learning workloads.
You’ll work at the intersection of software reliability, distributed systems, and physical infrastructure – helping operate and evolve high-density compute clusters used for large-scale model training and experimentation. This is a deeply technical, hands-on role in a fast-moving environment, collaborating closely with teams across hardware, networking, and software.

Key Responsibilities

Own the reliability, availability, and performance of on-premises and cloud-based compute environments, including GPU-accelerated clusters
Design, build, and operate monitoring, logging, and alerting systems to ensure high observability and fast incident response
Develop and maintain infrastructure-as-code and automated deployment pipelines
Participate in on-call rotations, incident response, root cause analysis, and post-incident improvements
Analyse system performance, forecast capacity, and optimise resource utilisation for large-scale distributed workloads
Partner with hardware, networking, and platform teams to design resilient and scalable systems
Create and maintain technical documentation and operational runbooks
Identify and remove bottlenecks across compute, storage, and networking layers to improve overall system efficiency

Required Qualifications

Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent professional experience)
5+ years of experience in site reliability engineering, infrastructure engineering, or large-scale systems operations
Strong expertise in Kubernetes (on-prem and/or cloud), infrastructure-as-code tools, and CI/CD systems
Proficiency in at least one systems programming language (e.g. Go, Rust, C++) and strong automation/scripting skills
Deep understanding of monitoring, alerting, and observability practices
Proven ability to troubleshoot complex issues spanning hardware, networking, and distributed software systems
Hands-on experience with incident response, post-mortems, and preventative engineering
Clear written and verbal communication skills

Are you interested in this position?

Apply by clicking on the “Apply Now” button below!

#GraphicDesignJobsOnline

#WebDesignRemoteJobs #FreelanceGraphicDesigner #WorkFromHomeDesignJobs #OnlineWebDesignWork #RemoteDesignOpportunities #HireGraphicDesigners #DigitalDesignCareers# Dynamicbrand guru

System Engineer

Job Description

Junior IT Systems Engineer

by Aqua Digital

IT Systems Engineer

by Horse Fintech Jobs

System Engineer II

by Armendes Ltd

Systems Developer

by Horse Fintech Jobs

Call Us

+447414629102

About Us

Legal Info

Our Social Network

Login with your account

Reset Password

Create a free account

System Engineer

Job Description

Share this post

Related Jobs

Junior IT Systems Engineer

by Aqua Digital

IT Systems Engineer

by Horse Fintech Jobs

System Engineer II

by Armendes Ltd

Systems Developer

by Horse Fintech Jobs

Call Us

+447414629102

About Us

Legal Info

Our Social Network