This is a fully remote position, open to applicants in United States.

📋 Description

• Create and sustain knowledge bases, vector stores, and Retrieval Augmented Generation (RAG) pipelines utilizing Amazon Bedrock and Amazon OpenSearch Services to render financial and regulatory datasets AI-ready for sophisticated analytics and machine learning applications.

• Assist in the development, validation, and operational implementation of statistical outputs and derived data products; collaborate with the agency's data science team and subject matter expert data scientists to deploy Airflow DAGs and AWS Glue jobs that guarantee automated, recurring updates.

• Facilitate the transition of data science outputs into production by confirming accuracy, completeness, and reporting readiness; ensure that all production data products are integrated into the agency's ETL load and gap reporting framework.

• Design and validate machine learning models and analytical pipelines using extensive financial and regulatory datasets stored in the data lake.

• Utilize AI-assisted development tools for code generation, debugging, and performance optimization; comply with agency security standards and relevant federal AI governance regulations.

• Write Python 3.10 code in accordance with PEP 8; connect analytical pipelines with the agency's ETL metadata framework and generate necessary load and gap reporting outputs.

• Aid in entity resolution efforts to ensure consistent identification and linkage of records across high-volume financial datasets.

• Generate required documentation for all analytical models and pipelines, including methodology, data lineage, model assumptions, refresh schedules, and Independent Verification & Validation (IV&V) Questionnaires.

• Develop automated tests to achieve a minimum code coverage threshold of 90%; conduct security scans at least once per sprint as part of the Definition of Done in accordance with OWASP ASVS Level 2.

• Engage in 2-week sprint ceremonies, quarterly Program Increment (PI) planning, backlog refinement, and agile delivery utilizing JIRA and GitHub.

⛳️ Requirements

• US Citizenship is mandatory.

• A Bachelor's Degree is required.

• A minimum of 3-5 years of relevant experience is necessary.

• Proficiency in Python 3.10 (PEP 8), including libraries such as pandas, NumPy, scikit-learn, and others.

• Practical experience with Amazon Bedrock, knowledge bases, vector stores, and RAG pipeline design on AWS.

• Familiarity with Amazon OpenSearch Services or equivalent vector/search infrastructure.

• Experience with Apache Airflow (MWAA) for orchestrating DAG-based pipelines.

• Understanding of AWS Glue, S3, and Apache Spark for large-scale data processing.

• Proficient in SQL and query tools such as Trino, Athena, or Redshift.

• Experience with large-scale financial or regulatory datasets is highly preferred.

• Knowledge of federal AI governance requirements and responsible AI practices within a government context.

• Experience in agile development, CI/CD pipelines, GitHub, and sprint-based delivery methodologies.

• Awareness of FISMA, NIST 800-53, and Zero Trust principles.

• Must be available to work from 8 AM to 5 PM Eastern Time, regardless of the home location.

• An active federal public trust suitability determination or the ability to obtain one is required.

🏝️ Benefits

• Special incentives for team members residing in qualified HUBZones.

• Flexibility to support personal and professional growth.

Mid-Level Data Scientist

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Data Collaboration Lead

Lead Data Scientist

Senior Data Scientist, Product Analytics

Head of Data and Insights

Product Manager – MS, Data

Staff Data Scientist – Digital Assets

Never miss a great job!