
Backend Engineer, AI & Data Pipeline
Posted 1 day ago

Posted 1 day ago
• Design and sustain three unique processing pipelines — scheduled job ingestion, event-driven course processing, and a periodic knowledge graph builder — each featuring independent trigger logic and cost management.
• Generate and oversee semantic embeddings utilizing Amazon Bedrock (Titan v2), indexing them in MongoDB Atlas Vector Search, and adjusting similarity thresholds to guarantee match precision.
• Construct and maintain a knowledge graph that connects jobs, courses, skills, and industries through FP-Growth association rules and archetype-to-SOC code mapping.
• Develop and enhance a two-stage discovery and matching API on AWS Lambda — initiating with vector retrieval, followed by deep eligibility scoring with LLM re-ranking.
• Optimize Fargate Spot instances and design resumable processing loops that can withstand interruptions, ensuring infrastructure costs remain manageable as data volume increases.
• Maintain and enhance daily job scrapers from various sources and create institution data scrapers equipped with robust HTML cleaning pipelines.
• 1+ years of backend engineering experience with a focus on data pipelines, ML infrastructure, or search systems.
• Practical experience with AWS serverless and container services — Lambda, ECS Fargate, EventBridge, and Step Functions.
• Proficient in Python — including Pandas, asynchronous processing, bulk database operations, and text cleaning.
• Familiarity with vector databases and semantic similarity search; experience with MongoDB Atlas Vector Search is a significant advantage.
• A cost-aware infrastructure mindset — you consider per-record compute costs, free tiers, Spot resilience, and right-sizing.
• Ability to document and clearly communicate complex architecture to both technical and non-technical stakeholders.
• Nice to have: Experience with knowledge graphs or association rule mining (FP-Growth, Apriori).
• Nice to have: Experience utilizing LLMs for re-ranking or eligibility assessment based on vector retrieval results.
• Degree or relevant proven experience.
• Fully remote / work-from-home position.
• Some flexibility in working hours, contingent on team needs and deliverables.
• Hands-on experience working on impactful backend, data pipeline, and AI-related systems.
• Opportunity to contribute to a growing platform with genuine product and engineering challenges.
• Professional growth in a dynamic, fast-paced environment.
• Strong potential for long-term progression based on performance, irrespective of location.
Rox Partner
Very
Get handpicked remote jobs straight to your inbox weekly.