
Mid-Level Data Scientist
Posted Jun 20

Posted Jun 20
This is a fully remote position, open to applicants in United States.
• Create and sustain knowledge bases, vector stores, and Retrieval Augmented Generation (RAG) pipelines utilizing Amazon Bedrock and Amazon OpenSearch Services to render financial and regulatory datasets AI-ready for sophisticated analytics and machine learning applications.
• Assist in the development, validation, and operational implementation of statistical outputs and derived data products; collaborate with the agency's data science team and subject matter expert data scientists to deploy Airflow DAGs and AWS Glue jobs that guarantee automated, recurring updates.
• Facilitate the transition of data science outputs into production by confirming accuracy, completeness, and reporting readiness; ensure that all production data products are integrated into the agency's ETL load and gap reporting framework.
• Design and validate machine learning models and analytical pipelines using extensive financial and regulatory datasets stored in the data lake.
• Utilize AI-assisted development tools for code generation, debugging, and performance optimization; comply with agency security standards and relevant federal AI governance regulations.
• Write Python 3.10 code in accordance with PEP 8; connect analytical pipelines with the agency's ETL metadata framework and generate necessary load and gap reporting outputs.
• Aid in entity resolution efforts to ensure consistent identification and linkage of records across high-volume financial datasets.
• Generate required documentation for all analytical models and pipelines, including methodology, data lineage, model assumptions, refresh schedules, and Independent Verification & Validation (IV&V) Questionnaires.
• Develop automated tests to achieve a minimum code coverage threshold of 90%; conduct security scans at least once per sprint as part of the Definition of Done in accordance with OWASP ASVS Level 2.
• Engage in 2-week sprint ceremonies, quarterly Program Increment (PI) planning, backlog refinement, and agile delivery utilizing JIRA and GitHub.
• US Citizenship is mandatory.
• A Bachelor's Degree is required.
• A minimum of 3-5 years of relevant experience is necessary.
• Proficiency in Python 3.10 (PEP 8), including libraries such as pandas, NumPy, scikit-learn, and others.
• Practical experience with Amazon Bedrock, knowledge bases, vector stores, and RAG pipeline design on AWS.
• Familiarity with Amazon OpenSearch Services or equivalent vector/search infrastructure.
• Experience with Apache Airflow (MWAA) for orchestrating DAG-based pipelines.
• Understanding of AWS Glue, S3, and Apache Spark for large-scale data processing.
• Proficient in SQL and query tools such as Trino, Athena, or Redshift.
• Experience with large-scale financial or regulatory datasets is highly preferred.
• Knowledge of federal AI governance requirements and responsible AI practices within a government context.
• Experience in agile development, CI/CD pipelines, GitHub, and sprint-based delivery methodologies.
• Awareness of FISMA, NIST 800-53, and Zero Trust principles.
• Must be available to work from 8 AM to 5 PM Eastern Time, regardless of the home location.
• An active federal public trust suitability determination or the ability to obtain one is required.
• Special incentives for team members residing in qualified HUBZones.
• Flexibility to support personal and professional growth.
Zeta Global
Humana
Binance.US
10x Genomics
Get handpicked remote jobs straight to your inbox weekly.