This is a fully remote position, open to applicants in India.

📋 Description

• Take ownership of the reliability, performance, and operability of intricate, business-critical production services and workflows.

• Manage complex and escalated production issues from support, driving long-term resolutions in partnership with engineering through code, configuration, and architecture modifications.

• Actively identify and mitigate systemic risks uncovered during problem-solving, transforming them into lasting engineering enhancements.

• Spearhead initiatives aimed at improving production efficiency, while defining, developing, and maintaining processes, run-books, and knowledge base integrity across various services or domains.

• Establish, construct, and sustain production monitoring systems for essential services, ensuring comprehensive visibility into system health and user experience.

• Continuously enhance alerting mechanisms to reduce noise, guaranteeing actionable, well-documented runbooks with clearly defined responses.

• Define and uphold SLIs/SLOs for key services, utilizing error budgets to guide operational and product decisions, and influencing priorities as needed.

• Transform manual processes into resilient automation, advocating for the adoption of automation patterns and tools across teams.

• Oversee and propel the post-mortem review process and actions stemming from incident analyses, ensuring thorough follow-up and measurable improvements in reliability.

• Work alongside the support organization as a senior escalation contact, systematically relaying knowledge, enhancements to tooling, and recommendations for improvements.

• Collaborate with developers throughout the entire change lifecycle, from design to rollout and patch delivery, ensuring safe deployments and efficient incident resolution.

• Lead or significantly contribute to design reviews, ensuring services are operable with minimal manual intervention in production (including automation, safe deployments, clear run-books, and resilience patterns), while sharing insights through documentation and feedback.

• Mentor and guide other engineers in production engineering practices (such as observability, incident management, automation, and failure design), enhancing the operational standards across the organization.

⛳️ Requirements

• 5–8+ years of experience in software engineering, site reliability, production engineering, or senior technical support roles involving distributed systems.

• Proficient in log analysis and advanced troubleshooting within complex production environments.

• Strong programming skills (e.g., JS, Go, Typescript, Java, or C#).

• Experience with deploying and troubleshooting systems on public cloud platforms (with Azure being preferred).

• Well-versed in observability tools (e.g., Elastic, Prometheus, Grafana, OpenTelemetry).

• Comprehensive understanding of distributed systems, networking, automation, and CI/CD.

🏝️ Benefits

• 18 paid vacation days, along with 4 additional global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam Cares.

• Private medical coverage for yourself and up to four dependents.

• Life, accident, and disability insurance with enhanced coverage options.

• Annual flexible wellbeing allowance dedicated to physical and mental wellness.

• Free confidential counseling and coaching through the Employee Assistance Program (EAP), which includes legal and financial advice.

• Meal, fuel, and transportation benefits based on your work arrangement.

• Daycare reimbursement and safe cab services for eligible employees.

• Opportunities for personal and professional growth through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops, and learning events such as our annual Global Day of Learning.

Senior Production Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Production Engineer

Production Support Engineer – CCaaS

Senior Production Support Engineer, Contract Position

Senior Production Engineer

Production Engineer

Production Support Engineer

Never miss a great job!