
Data Advocate Engineer
Posted Jun 21

Posted Jun 21
This is a fully remote position, open to applicants in New York.
• Cultivate and support the open-source data and infrastructure community by initiating projects, collaborating with data-oriented groups, and organizing events or challenges.
• Advocate for the Hugging Face Hub as the premier platform for data storage, versioning, and collaboration, while curating and promoting datasets, benchmarks, and tools such as Xet.
• Showcase practical applications, including efficient updates to large datasets, Parquet editing, and deduplication, to illustrate the Hub's advantages for data workflows.
• Develop demonstrations, benchmarks, and tools (such as Colab notebooks) that highlight best practices for data storage and versioning, and explore formats like Xet and Parquet.
• Create high-quality tutorials, blog articles, and videos that simplify complex subjects.
• Provide insights on optimizing storage, versioning datasets, and deduplication to empower developers.
• Engage actively in online communities (Discord, GitHub, forums) to showcase contributions, respond to inquiries, and promote collaboration.
• Ensure that datasets and tools published on the Hub are thoroughly documented, featuring clear examples, benchmarks, and use cases.
• Over 3 years of experience in developer relations or advocacy, preferably within data engineering, infrastructure, or machine learning tools and platforms.
• A recognized public persona as a technical expert, with a proven history of regularly publishing content related to data, infrastructure, or machine learning, and an engaged audience on LinkedIn and X (Twitter).
• A portfolio showcasing developer-oriented content, including tutorials, blog posts, videos, demos, benchmarks, or conference presentations.
• Practical experience in building and nurturing open-source or developer communities (Discord, GitHub, forums).
• Proficient in Python programming.
• Hands-on experience with data libraries such as pandas, pyarrow, and huggingface/datasets.
• Familiarity with storage systems and formats, including Parquet, Open Table Formats, and S3.
• Knowledge of dataset versioning, deduplication, and compression techniques.
• Capability to convey complex technical subjects clearly through writing, demonstrations, or presentations.
• Proficient in both written and spoken English.
• Comprehensive health, dental, and vision benefits for employees and their dependents.
• Parental leave policy.
• Flexible paid time off.
• Reimbursement for relevant conferences, training, and educational opportunities.
• Flexible working hours with remote work options available.
Divert
Get handpicked remote jobs straight to your inbox weekly.