
Web Scraping Specialist
Posted 6 days ago

Posted 6 days ago
This is a fully remote position, open to applicants in New York.
• Code Development: Develop, test, and enhance high-performance code to extract data from diverse online sources, ensuring optimal reliability and efficiency.
• Data Retrieval: Oversee intricate data retrieval operations, including managing pagination and dynamic content loading via AJAX.
• Data Quality: Cleanse and format the extracted data to guarantee it adheres to stringent quality standards for subsequent analysis and processing.
• Database Management: Store and oversee scraped data in suitable databases, optimizing for both access speed and long-term data integrity.
• Monitoring and Maintenance: Continuously supervise scraping processes and infrastructure to detect and rectify issues, ensuring a steady and uninterrupted data flow.
• Extraction Expertise: Proven capability to extract data from complex websites with minimal oversight, backed by a portfolio of previous projects.
• Technical Proficiency: Advanced knowledge of Python or JavaScript, particularly with libraries and frameworks such as BeautifulSoup, Scrapy, or Selenium.
• Advanced Programming: Strong understanding of asynchronous programming, multithreading, and distributed scraping architectures.
• Web Fundamentals: Comprehensive knowledge of HTML, CSS, JavaScript, and the Document Object Model (DOM).
• Data Storage: Familiarity with NoSQL databases (e.g., MongoDB, Cassandra), including the ability to design efficient storage solutions.
• Cloud Infrastructure: Experience in deploying and managing large-scale scraping tasks using cloud services such as AWS, Google Cloud, or Azure.
• Preferred Skills: Capability to implement machine learning algorithms for data cleaning, categorization, or predictive analysis; active involvement in relevant open-source projects.
• Competitive Compensation: A highly competitive salary ranging from **$75,000 to $100,000**, complemented by a comprehensive benefits and equity package.
• Impactful Work: The opportunity to work at the forefront of AI development and web-scale knowledge graph creation.
• High-Output Culture: A professional environment that prioritizes low ego, technical autonomy, and rapid execution.
• Remote Flexibility: This is a remote position requiring a 6-hour overlap with the core team's schedule.
Digital Federal Credit Union
Lucet
NJM Insurance Group
Get handpicked remote jobs straight to your inbox weekly.