This is a fully remote position, open to applicants in Bulgaria.

📋 Description

• Take ownership of the edge proxy platform: Ensure the maintenance, upgrades, and enhancements of a high-performance reverse proxy — this includes managing the proxy binary and its configuration tools, developing automation in Go and Python, overseeing the complete container image lifecycle on secure Linux base images, and collaborating across the broader edge layer, which encompasses CDN, WAF, and traffic management functionalities.

• Develop and sustain cloud infrastructure as code: Design and execute Terraform/Terragrunt modules and live environment configurations that manage EKS clusters, load balancers, IAM roles, VPC networking, ECR registries, and associated AWS services across various regions including GovCloud.

• Operate Kubernetes clusters efficiently: Administer multi-region, multi-cluster EKS deployments utilizing FluxCD GitOps workflows and Helm charts, managing node AMI rotations, add-on lifecycle, and horizontal pod autoscaling.

• Construct and oversee CI/CD pipelines: Design, maintain, and enhance shared GitLab CI/CD pipeline templates utilized across all team repositories; establish and manage alternative pipeline workflows for dedicated government cloud environments.

• Streamline operational tasks: Create and sustain tools for operations such as container image patching, EKS AMI rotation, air-gapped ECR image synchronization to GovCloud, and automated MR generation for monthly version-bump patching cycles.

• Oversee observability and on-call duties: Provision and maintain Datadog SLOs, monitors, and dashboards via Terraform; partake in the team's on-call rotation, addressing edge proxy incidents in both production and GovCloud settings.

• Support FedRAMP/GovCloud operations: Manage the GovCloud environment while adhering to its distinct constraints — including air-gapped image distribution, infrastructure automation in isolated networks, and alert management with compliance-aware data handling.

• Assess and implement internal developer tools: Investigate, prototype, and promote the adoption of internal tools that enhance engineering productivity throughout the company — such as developer portals, platform self-service capabilities, and other tools that elevate the developer experience at Smartsheet.

• Mentor and collaborate with the team: Share expertise through code reviews, architecture discussions, and the creation of runbooks; cultivate a culture of engineering excellence and operational rigor.

• Strategically implement AI tools: Proactively apply and advocate for AI tools within your team's domain to enhance project execution, infrastructure design, quality, and debugging, leading the adoption of AI best practices.

⛳️ Requirements

• 5+ years of experience in DevOps, platform engineering, or site reliability engineering.

• A BS or MS in Computer Science, Engineering, or a related field, or equivalent industry experience.

• Extensive expertise with Terraform and Terragrunt for managing production cloud infrastructure at scale across multiple environments and regions.

• Strong knowledge of Kubernetes, specifically EKS cluster operations and Helm chart authoring.

• Practical experience with AWS networking and container workload services: EKS, ALB/NLB, VPC, IAM, ECR, Route53, CloudWatch, and EventBridge.

• Proficiency in at least one general-purpose programming language — Go or Python preferred — for developing operational tools and automation.

• Solid understanding of reverse proxies, API gateways, or load balancers (NGINX, HAProxy, or equivalent).

• Experience in designing and maintaining CI/CD pipelines (preferably GitLab CI), including shared template libraries across multiple repositories.

• Familiarity with container image security practices: hardened base images, vulnerability scanning, and image promotion workflows.

• Strong operational instincts: comfort with on-call responsibilities, incident response, runbook creation, and conducting postmortems in production environments.

• At least 1 year of professional experience utilizing AI-based workflows for authoring, maintaining, reviewing, and deploying infrastructure or code.

• Fluency in English is required.

• Legally authorized to work in Bulgaria on a continuous basis.

🏝️ Benefits

• Health insurance

• Retirement plans

• Paid time off

• Flexible work arrangements

• Professional development

Senior DevOps Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

DevOps Reliability Engineer

Senior Site Reliability Engineer – Network

Staff Site Reliability Engineer

DevOps Engineer, Mid Level

DevOps Engineer, Azure

DevOps Engineer, mk8s

Never miss a great job!