This is a fully remote position, open to applicants in Saudi Arabia.

📋 Description

• Provide expert advice and assist in the upkeep of extensive computational and AI infrastructures, with a focus on monitoring, logging, and workload orchestration utilizing Kubernetes and Linux job schedulers.

• Deliver consultative support and engage in practical problem-solving across the entire stack—from bare metal and operating systems to the software stack, container platforms, networking, and storage.

• Evaluate customer environments and suggest optimized, production-ready Kubernetes-based container platforms that are integrated with enterprise-grade networking and storage solutions.

• Act as a vital technical resource: create, enhance, and document standard practices and operational guidelines for dissemination among internal teams and customer partners.

• Assist in Research & Development efforts and participate in POCs/POVs to validate new features, architectures, and upgrade strategies.

• Generate and provide high-quality documentation, including runbooks, onboarding materials, and best-practice guides for both customers and internal teams.

• Serve as the technical leader for designated customer accounts, offering strategic advice on DevOps and platform architecture, and influencing long-term decisions regarding infrastructure and operations.

⛳️ Requirements

• Bachelor’s, Master’s, or PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or related disciplines (or equivalent experience) with over 8 years of professional experience in managing scalable cloud environments and automation engineering roles.

• Demonstrated understanding of networking principles, data center architectures, and hands-on experience in leading HPC/AI clusters, including their deployment, optimization, and troubleshooting.

• Proven hands-on experience in deploying, configuring, and optimizing NVIDIA GPU-accelerated infrastructures, covering driver management, CUDA toolkit integration, and GPU workload profiling.

• Extensive experience with Kubernetes for container orchestration, resource scheduling, scaling, and integration with GPU-accelerated and HPC environments.

• Strong familiarity with HPC and AI technologies (CPUs, GPUs, high-speed interconnects) along with the supporting software stacks.

• In-depth knowledge of Linux (RedHat, Ubuntu), OS-level security, and relevant protocols.

• Experience with storage solutions such as Lustre, GPFS, ZFS, XFS, and emerging Kubernetes storage technologies.

• Proficient in Python and Bash scripting, configuration management, and Infrastructure-as-Code tools (e.g., Ansible, Terraform).

• Experience with observability stacks (Grafana, Loki, Prometheus) for monitoring, logging, and building resilient systems.

• Solid background in designing scalable solutions and providing consultative support to customers, including leading architectural reviews and addressing executive partners publicly.

🏝️ Benefits

• Comprehensive health, dental, and vision insurance.

• Flexible work hours and remote work options.

• Professional development opportunities and continuous learning support.

• Generous paid time off and holiday schedule.

• Collaborative and innovative work environment.

Senior Solutions Architect, Cloud Infrastructure – DevOps

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Senior Solutions Architect, Customer Success

Solutions Architect

AI Solutions Engineer – Document Intelligence, Generative AI

SAP S/4HANA Solution Architect

Solution Architect

Business Systems Solutions Manager

Never miss a great job!