
Data Engineer – Infrastructure Advocate
Posted Jun 3

Posted Jun 3
This is a fully remote position, open to applicants in France.
• Cultivate and enhance the open-source data and infrastructure community by initiating projects, collaborating with data-centric organizations, and organizing events or challenges. Engage with communities such as Apache Parquet, Open Table Formats, and data engineering forums to advocate for best practices and Hugging Face tools.
• Advocate for the Hugging Face Hub as the premier platform for data storage, versioning, and collaboration, curating and highlighting datasets, benchmarks, and tools such as Xet.
• Illustrate use cases including efficient updates for large datasets, Parquet editing, and deduplication to showcase the Hub's significance for data workflows.
• Develop demonstrations, benchmarks, and tools (for instance, Colab notebooks) that exemplify best practices in data storage and versioning while experimenting with Xet, Parquet, and other formats.
• Create high-quality tutorials, blog posts, and videos that simplify complex topics for broader accessibility.
• Provide insights on optimizing storage, dataset versioning, and deduplication to empower developers.
• Engage actively in online communities (Discord, GitHub, forums) to showcase contributions, respond to inquiries, and encourage collaboration.
• Ensure that datasets and tools released on the Hub are thoroughly documented with clear examples, benchmarks, and use cases.
• Over 3 years of experience in developer relations or developer advocacy, preferably within data engineering, infrastructure, or ML tools and platforms.
• A well-established public presence as a technical voice, with a history of regularly publishing content related to data, infrastructure, or ML, and a demonstrable, engaged audience on LinkedIn and X (Twitter).
• A portfolio of developer-oriented content including tutorials, blog posts, videos, demos, benchmarks, or conference presentations.
• Practical experience in building and engaging open-source or developer communities on platforms such as Discord, GitHub, and forums.
• Proficient in Python.
• Hands-on experience with data libraries like pandas, pyarrow, and huggingface/datasets.
• Practical knowledge of storage systems and formats, including Parquet, Open Table Formats, and S3.
• Familiarity with dataset versioning, deduplication, and compression techniques.
• Ability to clearly articulate complex technical concepts through written content, demonstrations, or presentations.
• Proficient in both written and spoken English.
• Comprehensive health, dental, and vision benefits for employees and their dependents.
• Parental leave.
• Flexible paid time off policies.
• Flexible working hours.
• Remote work options available.
• Reimbursement for relevant conferences, training, and educational pursuits.
• Company equity included as part of the compensation package.
Akka (formerly Lightbend)
Swimlane
Get handpicked remote jobs straight to your inbox weekly.