
AI Engineer, Data Pipeline
Posted May 7

Posted May 7
This is a fully remote position, open to applicants in India.
• Develop data ingestion pipelines to extract and transform enterprise data.
• Execute data cleansing and normalization processes.
• Create and manage ETL jobs utilizing Spark/PySpark on cloud platforms.
• Enforce data validation and quality assurance measures at every stage of the pipeline.
• Construct automated data export jobs for datasets used in model training.
• Assist with feature extraction from enterprise schemas.
• Oversee pipeline health, diagnose failures, and enhance performance.
• Maintain thorough documentation of data lineage, schemas, and transformation logic.
• Minimum of 3 years of experience in software engineering.
• Proficient in Python and data processing tools (such as pandas, PySpark, or their equivalents).
• Knowledge of SQL and relational databases (including MySQL, PostgreSQL).
• Familiarity with cloud data services (such as object storage, managed Spark, managed ETL, or similar).
• Comprehension of ETL/ELT methodologies and data pipeline architecture.
• Experience with various data formats (including Parquet, JSON, Avro).
• Strong focus on data quality and testing practices.
• Bachelor’s degree in Computer Science or equivalent experience.
• Pioneering Technology: At Coupa, we are leading the way in innovation, utilizing advanced technology to provide our customers with enhanced efficiency and visibility in their spending.
• Collaborative Culture: We emphasize teamwork and collaboration, fostering a culture characterized by transparency, openness, and a collective commitment to excellence.
• Global Impact: Become part of an organization where your contributions have a worldwide, measurable effect on our clients, the business, and one another.
HubSpot
Prima
Get handpicked remote jobs straight to your inbox weekly.