
Data Engineer
Posted Jun 20

Posted Jun 20
This is a fully remote position, open to applicants in Germany.
• Design, construct, and enhance comprehensive ETL pipelines for legal data across various jurisdictions, encompassing tasks such as cleaning, transformation, chunking, validation, embedding, and ingestion into vector databases.
• Engage extensively with XML-based legal data feeds: parse, validate, normalize, and convert XML structures into scalable internal schemas and unified document formats.
• Create and sustain data models and storage schemas that accommodate continuously updated datasets, ensuring consistency, scalability, and accuracy across diverse datasets and substantial volumes of data.
• Oversee the data handover and integration process from multiple internal and external data providers, including official sources, APIs, and web scraping pipelines, ensuring dependability and timely updates.
• Execute and continually enhance metadata enrichment strategies to optimize searchability, ranking quality, and relevance of legal information within vector databases.
• Establish and maintain a high-performance search and retrieval infrastructure that enables agent-based systems to invoke search functions and efficiently retrieve the most pertinent legal information.
• Collaborate with product, AI, and legal domain specialists to deliver high-quality, dependable data solutions.
• Take complete ownership of the data integration for one jurisdiction from start to finish.
• A minimum of 2 years of professional experience in data engineering, with involvement in successfully deployed projects.
• Proficient in Python, with experience in designing robust data pipelines.
• Experience in constructing and maintaining reliable ET and RAG pipelines, along with a solid understanding of data modeling, quality, filtering, validation, and consistency.
• Familiarity with containerization (Docker), CI/CD pipelines, and version control systems (Git).
• Strong understanding of data structures, algorithms, system design principles, and software engineering best practices.
• Expertise in working with graph databases and familiarity with developing and deploying NLP models is an advantage.
• Proficiency in English at the C2 level.
• Remote: 100% remote work available (with a German residence), other countries can be considered upon request.
• Working hours: Flexible working hours.
• Vacation: 26 days plus December 24th & 31st off, and an additional vacation day for each year of employment (capping at 30 days).
• Discounts: e.g., Urban Sports Club Membership, dependent on location.
• Equipment: Laptop (Lenovo or Mac), along with a €1,000 net home office setup budget (disbursed with your first salary).
Anord Mardix
Stefanini Brasil
InVision Communications
Get handpicked remote jobs straight to your inbox weekly.