This is a fully remote position, open to applicants in Portugal.

📋 Description

• Design and create comprehensive NLP pipelines, encompassing everything from traditional text processing to cutting-edge LLM-driven architectures.

• Develop and support systems for intent detection, named entity recognition (NER), entity extraction, and text classification, both as independent solutions and as integrated components within larger LLM workflows.

• Design and refine Retrieval-Augmented Generation (RAG) systems, including chunking strategies, vector storage architecture, hybrid search methods (dense and sparse), and re-ranking processes.

• Collaborate with embedding models for semantic search, document retrieval, and intent classification in the context of contact centers.

• Create and implement agentic architectures that incorporate tool usage, function calling, multi-step reasoning, and orchestration using frameworks like LangChain, LlamaIndex, or tailored solutions.

• Develop strategies for memory and context management, focusing on short-term conversational memory, long-term user context, and optimizing context windows for multi-turn interactions.

• Rigorously evaluate and benchmark models, focusing on hallucination detection, faithfulness assessment, latency/token cost trade-offs, and ongoing performance monitoring.

• Integrate AI components into scalable, production-ready microservices, emphasizing low-latency inference pipelines.

• Work collaboratively with product and engineering teams to design innovative AI-powered features and foster innovation across the platform.

• Engage in practical research with a scientific approach, prioritizing delivery, with opportunities for publications encouraged.

⛳️ Requirements

• 1-3 years of experience in a role related to Data Science, AI, or NLP Engineering.

• Proficient programming skills in Python along with core Data Science and machine learning libraries (Pandas, scikit-learn, NLTK, spaCy, Gensim).

• Strong grasp of NLP fundamentals, including word embeddings, NER, information extraction, intent classification, and text similarity.

• Proven experience in building and deploying machine learning products in production settings.

• Practical experience with LLMs in production environments (OpenAI, Anthropic, Mistral, LLaMA, Gemini, or similar).

• Familiarity with Retrieval-Augmented Generation (RAG) pipelines.

• Experience with vector databases (Pinecone, Weaviate, Qdrant, pgvector, etc.) and contemporary embedding models.

• Knowledge of context window management, token budgeting, and prompt design for multi-turn conversations.

• Experience in LLM observability and monitoring.

• Familiarity with LLM frameworks such as LangChain, LlamaIndex, or Hugging Face Transformers.

🏝️ Benefits

• Competitive compensation package.

• Health insurance coverage.

• Opportunities for career advancement.

• Access to training programs, events, and conferences.

• Remote-first working model.

Data Scientist

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Data Scientist – Consulting and Industry

Senior Data Scientist

Senior Data Scientist – Programmatic

Data Scientist III, Forecasting

Cientista de Dados Sênior

Principal Data Science Engineer

Never miss a great job!