
Senior Software Engineer, Data Processing
Posted 6 days ago

Posted 6 days ago
This is a fully remote position, open to applicants in Brazil.
• Design, develop, and manage the ingestion systems that handle substantial amounts of multimodal data, transforming it into structured, usable datasets.
• Take ownership of the entire ingestion process, from data acquisition to validation, processing, tracking, and making it accessible for downstream use.
• Create specific processing steps tailored for real-world data sources, including medical imaging processing, audio and video metadata extraction, quality validation, and notes processing.
• Develop parsers, validators, and normalization logic capable of systematically addressing messy, non-standard, and highly variable source formats.
• Convert repetitive, one-time data handling tasks into reusable processing patterns, internal tools, and platform functionalities.
• Design systems for high volume and throughput, ensuring optimization for reliability, cost-effectiveness, and speed.
• Collaborate across distributed and parallel computing systems to manage workloads that are not suitable for a single machine.
• Select the appropriate execution model for each workload, including batch processing, distributed execution, and modern compute patterns for unstructured data and inference-heavy processing.
• Identify and resolve bottlenecks within the ingestion and processing systems, maintaining performance as volume and complexity of modalities increase.
• Implement validation and quality checks that detect poor, incomplete, or incorrectly formatted data before it propagates downstream.
• Manage sensitive and regulated data, including PHI, with the necessary security and diligence, including de-identification when required.
• Monitor provenance, metadata, and usage constraints throughout the ingestion process to ensure downstream usage remains compliant and auditable.
• Enhance the quality of observability, debuggability, and operational reliability throughout the ingestion layer.
• Collaborate with product and Data Lab teams to accommodate new modalities, partner requirements, and non-standard data sources.
• Work directly with partner engineering teams as needed to translate source-system realities into effective ingestion and processing designs.
• Identify recurring patterns that can be standardized into reusable transforms, validators, and internal tools.
• Contribute to shaping how Protege manages new data types as the platform evolves into more sophisticated data environments.
• 5+ years of experience in building and operating production backend or data systems, with hands-on experience in large-scale data processing.
• Proven expertise in designing and managing large-scale data pipelines.
• Strong programming capabilities in Python.
• Experience with distributed data processing systems.
• High proficiency with AWS services.
• Ability to navigate messy, diverse, high-volume data and ambiguity, with a talent for identifying patterns in complex scenarios.
• Meticulous attention to detail while maintaining speed, with a proactive approach to action.
• Enthusiastic about working on products focused on managing and processing large data volumes.
• Inquisitive, persistent, and self-motivated.
• Health insurance
• Professional development opportunities
• Flexible working hours
Webedia
TechBiz Global
The Flex
Nodeworthy
Get handpicked remote jobs straight to your inbox weekly.