
Staff Software Engineer
Posted 1 day ago

Posted 1 day ago
This is a fully remote position, open to applicants in United States.
β’ Lead and Build: Design and architect scalable, reliable, and efficient machine learning infrastructure and services, enabling the organization to manage extensive training, vast datasets, and an expanding clientele with varied needs.
β’ Scale Systems: Ensure that the machine learning platform can accommodate large-scale data processing and the development of machine learning models on a global scale.
β’ Agility & Cost Efficiency: Optimize the platform's implementation for flexibility and cost-effectiveness while maintaining development agility, allowing for rapid adaptation to changing requirements.
β’ User-Centric Design: Develop systems that are convenient, reliable, and user-friendly for both internal teams (data scientists, engineers) and external customers.
β’ End-to-End Ownership: Assume complete responsibility for the platform's lifecycle, from initial concept through design and implementation to deployment and monitoring.
β’ Collaboration: Partner closely with cross-functional teams, including data science, product, and operations, to ensure smooth integration of machine learning capabilities into critical business services.
β’ Innovation: Stay abreast of the latest industry trends and integrate cutting-edge technologies into our platform to sustain a competitive advantage.
β’ Problem-Solving: Address complex technical challenges directly, from infrastructure optimization to providing scalable solutions for data processing, storage, and access.
β’ Mentorship: Guide and mentor engineers, cultivating a culture of excellence and high performance within the engineering team.
β’ Proven experience in constructing and scaling cloud-based infrastructure and services, particularly in machine learning or data-intensive settings.
β’ Expert-level knowledge of distributed systems, cloud platforms (AWS, GCP, Azure), and technologies such as Kubernetes, Docker, and microservices architecture.
β’ Extensive experience in building large-scale services, along with an understanding of the challenges associated with performance, reliability, and cost management at scale.
β’ Demonstrated ability to design and implement cost-effective solutions, balancing performance, security, and scalability effectively.
β’ Strong foundation in machine learning concepts, including model training and validation.
β’ Excellent leadership and collaboration skills, with prior experience in an agile development environment and mentoring high-performing teams.
β’ Ability to navigate ambiguity and excel in a dynamic, fast-paced environment where priorities frequently change.
β’ Passionate problem solver dedicated to creating practical, reliable, and efficient systems that can scale in real-world production scenarios.
β’ Competitive health insurance options
β’ 401K plan management
β’ Remote-friendly and flexible team culture
β’ Free lunch and fully-stocked kitchen in our South Bay office
β’ Additional perks: monthly wellness stipend, office setup allowance, company retreats, and more to come as we scale
β’ The opportunity to work on one of the most interesting, impactful problems of the decade
EverCommerce
PlanetScale
Slingshot Aerospace
Upstart
Get handpicked remote jobs straight to your inbox weekly.