
Senior Data Engineer
Posted 1 day ago

Posted 1 day ago
This is a fully remote position, open to applicants in New York.
• Design, construct, and manage low-latency streaming pipelines (Kafka, Spark Structured Streaming) alongside robust batch ETL/ELT processes on the Databricks Lakehouse platform.
• Implement reliable orchestration and dependency management (Airflow), ensuring strong SLAs and readiness for on-call support for critical business data flows.
• Model, optimize, and document curated datasets and interfaces that cater to analytics, product features, and AI workloads.
• Establish data quality checks, observability, and backfill processes; lead root-cause analyses and incident prevention efforts.
• Collaborate with application teams (Go/Java), analytics, and ML/AI to deploy data products into production environments.
• Create and sustain datasets and services that drive RAG pipelines and agentic AI workflows (tool-use/function calling).
• In scenarios where Spark/Databricks is not ideal, design and manage custom processors/services in Go to fulfill strict latency or specialized transformation needs.
• Instrument prompt/response and token usage telemetry to support LLMOps evaluation and cost optimization; provide datasets for labeling and golden sets.
• Enhance performance and cost (storage/compute), conduct code reviews, and elevate engineering standards.
• Over 6 years of experience in building production-grade data pipelines at scale (both streaming and batch).
• Extensive expertise in Python and SQL; significant experience with Spark on Databricks (or a similar platform).
• Advanced SQL skills: including window functions, CTEs, partitioning/z-ordering, as well as query planning and tuning in lakehouse environments.
• Practical experience with Kafka (or equivalent) and an orchestration tool (Airflow preferred).
• Strong skills in data modeling and performance tuning for low latency and high throughput scenarios.
• Production-oriented mindset: SLAs, monitoring, alerting, CI/CD, and participation in on-call rotations.
• Proficient in utilizing AI coding assistants (Cursor, Claude Code) as part of regular development tasks.
• Competence in building data services/processors in Go (or a willingness to quickly learn), with familiarity with alternative frameworks (e.g., Flink/Beam) being a plus.
• Performance-related bonus
• Benefits
Cision France
Navigate Power
Get handpicked remote jobs straight to your inbox weekly.