Remotery

Bolsista Mestre – Cientista de Dados, NLP, Sistemas de Busca Semântica

atSistema FibraBR flagBrazilFull-timeData ScientistMid-levelSeniorR$9,000/month

Posted 6 days ago

This is a fully remote position, open to applicants in Brazil.

📋 Description

• Pre-processing and enrichment of texts: Cleaning, tokenization, lemmatization, and removal of noise in textual documents.

• Generation and management of embeddings: Create document and query embeddings using models such as Sentence-BERT, OpenAI Ada, or similar.

• Development of semantic search pipelines, information retrieval, and applications with RAG: Build pipelines that integrate: query → embedding → vector search → re-ranking (optional) → utilization of retrieved context in applications with RAG.

• Adaptation and experimentation with models in PyTorch or TensorFlow: Employ pre-trained models and tailor them for specific tasks.

• Documentation and version control: Document pipelines, technical decisions, and experiment results.


⛳️ Requirements

• Master’s degree in progress or completed.

• Fields of study include Computer Science, Computer/Software Engineering, Information Systems, Statistics, Applied Mathematics, Electrical Engineering, Data Science, or related areas within Exact Sciences and Engineering.

• Proficient in Python programming, including typing, data manipulation (pandas/polars), and the use of virtual environments.

• Practical experience with NLP, encompassing: tokenization, stemming/lemmatization, stopword removal, and vectorization (TF IDF, word2vec, or embeddings).

• Applied knowledge of embeddings, semantic search, and RAG techniques (e.g., Sentence BERT, contemporary embedding models or similar), as well as familiarity with vector databases (e.g., FAISS, ChromaDB, Pinecone, Qdrant).

• Experience with at least one deep learning framework (PyTorch or TensorFlow).

• Understanding of analytical pipelines and experimentation in data science applied to NLP (e.g., extraction → pre-processing → embedding → search/classification → context generation) and version control using Git.

• Experience with generative models (LLMs), prompt engineering, and response evaluation will be a significant advantage.

• Knowledge of relational databases (PostgreSQL) and NoSQL databases (MongoDB, Redis).

• Experience in real-world projects (academic or professional) with documentation and testing.


🏝️ Benefits

• Comprehensive benefits package including health insurance, retirement plans, and professional development opportunities.

• Flexible working hours and remote work options.

• Collaborative and innovative work environment fostering creativity and growth.

People also viewed

AVENCORE11 hours ago

Data Scientist – Consulting and Industry

FR flagFrance OnlyFull-timeData Scientist
ApplyView job
Konfío11 hours ago

Senior Data Scientist

MX flagMexico OnlyFull-timeData Scientist
ApplyView job
Smadex11 hours ago

Senior Data Scientist – Programmatic

ES flagSpain OnlyFull-timeData Scientist
ApplyView job
ShipBob, Inc.11 hours ago

Data Scientist III, Forecasting

IN flagIndia OnlyFull-timeData Scientist
ApplyView job
Extractta1 day ago

Cientista de Dados Sênior

BR flagBrazil OnlyFull-timeData Scientist
ApplyView job
Sabre Corporation1 day ago

Principal Data Science Engineer

IN flagIndia OnlyFull-timeData Scientist
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers