
Senior AI Engineer, NLP, Training Data
Posted Jun 5

Posted Jun 5
This is a fully remote position, open to applicants in India.
• Design and develop pipelines for generating training data, including the creation of synthetic data.
• Establish data labeling and annotation processes with mechanisms for quality validation.
• Transform enterprise data into formats that are appropriate for model training (instruction-tuning pairs, embeddings).
• Employ active learning methods to pinpoint high-value training examples.
• Work in conjunction with domain experts to ensure the quality and relevance of training data.
• Create automated checks for data quality, focusing on coverage, balance, and consistency.
• Devise strategies for training data versioning and lineage tracking.
• Evaluate model assessment results to detect gaps in training data.
• A minimum of 5 years of experience in software engineering, including at least 2 years in NLP, data science, or ML data engineering.
• Proficient in text processing, tokenization, and NLP pipelines.
• Practical experience with data labeling tools and annotation processes.
• Proven experience in generating synthetic training data using language model APIs.
• Familiarity with instruction-tuning and training data quality metrics.
• Strong proficiency in Python (including libraries such as pandas and PySpark).
• Experience with data versioning tools is advantageous.
• Bachelor's or Master's degree in Computer Science, NLP, or a related field.
• Pioneering Technology: At Coupa, we're at the forefront of innovation, leveraging the latest technology to empower our customers with greater efficiency and visibility in their spend.
• Collaborative Culture: We value collaboration and teamwork, and our culture is driven by transparency, openness, and a shared commitment to excellence.
• Global Impact: Join a company where your work has a global, measurable impact on our clients, the business, and each other.
Get handpicked remote jobs straight to your inbox weekly.