
Solutions Architect, DevOps
Posted Jun 12

Posted Jun 12
This is a fully remote position, open to applicants in Poland.
• Provide guidance and assistance in the upkeep of extensive computational and AI infrastructure, which includes monitoring, logging, and workload orchestration using Kubernetes and Linux job schedulers.
• Offer consultative support and perform practical troubleshooting across the entire stack—from bare metal and operating systems to the software stack, container platforms, networking, and storage.
• Evaluate customer environments and suggest optimized, production-ready Kubernetes-based container platforms that are integrated with enterprise-grade networking and storage solutions.
• Act as a principal technical resource: develop, enhance, and document standard practices and operational guidelines to be disseminated among internal teams and customer stakeholders.
• Assist in Development activities and participate in POCs/POVs to validate new features, architectures, and upgrade strategies.
• Produce and present high-quality documentation, including runbooks, onboarding materials, and best-practice guides for both customers and internal teams.
• Serve as the technical lead for designated customer accounts, offering strategic insights on DevOps and platform architecture while influencing long-term infrastructure and operations decisions.
• A BS/MS/PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or related disciplines.
• Over 5 years of professional experience managing scalable cloud environments and in automation engineering roles.
• Proven expertise in networking fundamentals, data center architectures, and practical experience managing HPC/AI clusters, including deployment, optimization, and troubleshooting.
• Demonstrated hands-on experience in deploying, configuring, and optimizing NVIDIA GPU-accelerated infrastructure, which includes driver management, CUDA toolkit integration, and GPU workload profiling.
• Extensive knowledge of Kubernetes for container orchestration, resource scheduling, scaling, and integration with GPU-accelerated and HPC environments.
• Strong familiarity with HPC and AI technologies (CPUs, GPUs, high-speed interconnects) and their supporting software stacks.
• Comprehensive knowledge of Linux (RedHat, Ubuntu), OS-level security, and protocols.
• Proficiency in Python and Bash scripting, as well as configuration management and Infrastructure-as-Code tools (e.g., Ansible, Terraform).
• Experience with observability stacks (Grafana, Loki, Prometheus) for monitoring, logging, and creating fault-tolerant systems.
• Strong background in developing scalable solutions and delivering consultative support to customers, including leading architectural reviews and presenting to executive stakeholders.
• Competitive salary and performance-based incentives.
• Comprehensive health, dental, and vision insurance.
• Flexible work hours and remote work options.
• Opportunities for professional development and continuous learning.
• Collaborative and innovative work environment.
NVIDIA
Towa Software
AIM Qualifications and Assessment Group
Get handpicked remote jobs straight to your inbox weekly.