
Senior Production Engineer
Posted Jun 12

Posted Jun 12
This is a fully remote position, open to applicants in India.
• Take ownership of the reliability, performance, and operability of intricate, business-critical production services and workflows.
• Manage complex and escalated production issues from support, driving long-term resolutions in partnership with engineering through code, configuration, and architecture modifications.
• Actively identify and mitigate systemic risks uncovered during problem-solving, transforming them into lasting engineering enhancements.
• Spearhead initiatives aimed at improving production efficiency, while defining, developing, and maintaining processes, run-books, and knowledge base integrity across various services or domains.
• Establish, construct, and sustain production monitoring systems for essential services, ensuring comprehensive visibility into system health and user experience.
• Continuously enhance alerting mechanisms to reduce noise, guaranteeing actionable, well-documented runbooks with clearly defined responses.
• Define and uphold SLIs/SLOs for key services, utilizing error budgets to guide operational and product decisions, and influencing priorities as needed.
• Transform manual processes into resilient automation, advocating for the adoption of automation patterns and tools across teams.
• Oversee and propel the post-mortem review process and actions stemming from incident analyses, ensuring thorough follow-up and measurable improvements in reliability.
• Work alongside the support organization as a senior escalation contact, systematically relaying knowledge, enhancements to tooling, and recommendations for improvements.
• Collaborate with developers throughout the entire change lifecycle, from design to rollout and patch delivery, ensuring safe deployments and efficient incident resolution.
• Lead or significantly contribute to design reviews, ensuring services are operable with minimal manual intervention in production (including automation, safe deployments, clear run-books, and resilience patterns), while sharing insights through documentation and feedback.
• Mentor and guide other engineers in production engineering practices (such as observability, incident management, automation, and failure design), enhancing the operational standards across the organization.
• 5–8+ years of experience in software engineering, site reliability, production engineering, or senior technical support roles involving distributed systems.
• Proficient in log analysis and advanced troubleshooting within complex production environments.
• Strong programming skills (e.g., JS, Go, Typescript, Java, or C#).
• Experience with deploying and troubleshooting systems on public cloud platforms (with Azure being preferred).
• Well-versed in observability tools (e.g., Elastic, Prometheus, Grafana, OpenTelemetry).
• Comprehensive understanding of distributed systems, networking, automation, and CI/CD.
• 18 paid vacation days, along with 4 additional global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam Cares.
• Private medical coverage for yourself and up to four dependents.
• Life, accident, and disability insurance with enhanced coverage options.
• Annual flexible wellbeing allowance dedicated to physical and mental wellness.
• Free confidential counseling and coaching through the Employee Assistance Program (EAP), which includes legal and financial advice.
• Meal, fuel, and transportation benefits based on your work arrangement.
• Daycare reimbursement and safe cab services for eligible employees.
• Opportunities for personal and professional growth through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops, and learning events such as our annual Global Day of Learning.
Miratech
Pindrop
WP Engine
Get handpicked remote jobs straight to your inbox weekly.