
Research Crawling Engineer
Posted 1 day ago

Posted 1 day ago
This is a fully remote position, open to applicants in United States.
• Develop and sustain large-scale web crawlers spanning various domains.
• Create high-throughput, fault-tolerant systems designed for extensive data collection (millions to billions of URLs daily).
• Manage anti-bot systems, rate limitations, and dynamic/JavaScript-heavy websites.
• Build pipelines for data cleaning, deduplication, filtering, and normalization.
• Create and uphold datasets for research purposes and model training.
• Oversee crawl performance, coverage, and data quality; adapt swiftly.
• Work in tandem with research teams to synchronize data collection with modeling requirements.
• Enhance infrastructure for cost efficiency, latency reduction, and reliability.
• Proficient in programming with one or more of the following: Go, Rust, Python, Java, or C++.
• Experience in building web crawlers or large-scale data processing pipelines.
• Strong grasp of HTTP, networking principles, and browser functionality.
• Knowledge of distributed systems and parallel processing techniques.
• Proven experience working with sizable datasets (preferably in the TB–PB range).
• Capability to troubleshoot in unstable or adversarial conditions.
• Competitive salary.
• Comprehensive benefits and equity package.
Cision France
Navigate Power
Get handpicked remote jobs straight to your inbox weekly.