
Senior Software Engineer – Reliability Engineering
Posted 23 hours ago

Posted 23 hours ago
• Responsible for the development, testing, deployment, and maintenance of software, with a strong focus on the value it is intended to deliver.
• Embraces new opportunities and challenging tasks with urgency, high energy, and enthusiasm.
• Consistently delivers results, even in difficult situations.
• Develops comprehensive test suites (functional, destructive, etc.) to facilitate successful and rapid deployment of code to production.
• Approaches issues with a broad perspective, employing a global viewpoint.
• Gains insights through both successful and unsuccessful experiments when addressing new challenges.
• Actively looks for opportunities to grow and face challenges through both formal and informal development avenues.
• Collaborates with team members within agile methodologies.
• Innovates new and improved strategies for organizational success.
• Works alongside the Product Team to ensure user stories are valuable, developer-ready, easy to comprehend, and testable.
• Communicates effectively in multiple formats, ensuring a clear understanding of the unique requirements of various audiences.
• Adjusts approach and demeanor in real-time to align with the evolving demands of different scenarios.
• Engages openly and comfortably with diverse groups of individuals.
• Mentors junior engineers by providing insights into modern software development frameworks and leading technical discussions.
• Must be at least eighteen years of age.
• Must have legal authorization to work in the United States.
• Proficiency in GCP Cloud Infrastructure — including BigQuery analytics, ADC authentication, and cloud-native services.
• Experience with observability tools such as Grafana, Prometheus, Kibana/Elasticsearch (WES logs), and OCP Health Dashboards.
• Knowledge of Terraform Enterprise for Infrastructure as Code.
• Familiarity with GitHub for source code management.
• Experience with GH Copilot and AI Agents for AI-accelerated incident analysis, automated remediation workflows, and prompt-engineered operational tools.
• Understanding of SRE practices including Production Readiness Review, Capacity Planning, Change Validation, Production Support, Post-Mortems, and SLO Definition & Tracking.
• Proficient in ServiceNow for incident, problem, and change management; trend analysis; and RCA grouping.
• Experience with BigQuery for incident analytics, problem candidate identification, and operational reporting.
• Knowledge of PagerDuty for on-call scheduling, escalation paths, and push-button paging.
• Familiarity with Rundeck for self-heal automation and push-button remediation tasks.
• Experience with Atlassian tools (Jira/Confluence) for RCA documentation, runbooks, architecture diagrams, and onboarding processes.
• Knowledge of CyberArk for privileged access management related to WMS/DFC log pulls and node access.
• Experience in Manhattan WMS operations, including RF/UI/LM node support.
• Proficiency in Python Automation for operational scripting, BQ pipelines, alert correlation, and report generation.
• Health insurance.
• 401(k) matching.
• Flexible work hours.
• Paid time off.
• Options for remote work.
Arctiq
Arctiq
Software Mind
Mediastream
Get handpicked remote jobs straight to your inbox weekly.