
Data Scientist
Posted May 19

Posted May 19
This is a fully remote position, open to applicants in Portugal.
• Design and create comprehensive NLP pipelines, encompassing everything from traditional text processing to cutting-edge LLM-driven architectures.
• Develop and support systems for intent detection, named entity recognition (NER), entity extraction, and text classification, both as independent solutions and as integrated components within larger LLM workflows.
• Design and refine Retrieval-Augmented Generation (RAG) systems, including chunking strategies, vector storage architecture, hybrid search methods (dense and sparse), and re-ranking processes.
• Collaborate with embedding models for semantic search, document retrieval, and intent classification in the context of contact centers.
• Create and implement agentic architectures that incorporate tool usage, function calling, multi-step reasoning, and orchestration using frameworks like LangChain, LlamaIndex, or tailored solutions.
• Develop strategies for memory and context management, focusing on short-term conversational memory, long-term user context, and optimizing context windows for multi-turn interactions.
• Rigorously evaluate and benchmark models, focusing on hallucination detection, faithfulness assessment, latency/token cost trade-offs, and ongoing performance monitoring.
• Integrate AI components into scalable, production-ready microservices, emphasizing low-latency inference pipelines.
• Work collaboratively with product and engineering teams to design innovative AI-powered features and foster innovation across the platform.
• Engage in practical research with a scientific approach, prioritizing delivery, with opportunities for publications encouraged.
• 1-3 years of experience in a role related to Data Science, AI, or NLP Engineering.
• Proficient programming skills in Python along with core Data Science and machine learning libraries (Pandas, scikit-learn, NLTK, spaCy, Gensim).
• Strong grasp of NLP fundamentals, including word embeddings, NER, information extraction, intent classification, and text similarity.
• Proven experience in building and deploying machine learning products in production settings.
• Practical experience with LLMs in production environments (OpenAI, Anthropic, Mistral, LLaMA, Gemini, or similar).
• Familiarity with Retrieval-Augmented Generation (RAG) pipelines.
• Experience with vector databases (Pinecone, Weaviate, Qdrant, pgvector, etc.) and contemporary embedding models.
• Knowledge of context window management, token budgeting, and prompt design for multi-turn conversations.
• Experience in LLM observability and monitoring.
• Familiarity with LLM frameworks such as LangChain, LlamaIndex, or Hugging Face Transformers.
• Competitive compensation package.
• Health insurance coverage.
• Opportunities for career advancement.
• Access to training programs, events, and conferences.
• Remote-first working model.
AVENCORE
Smadex
ShipBob, Inc.
Get handpicked remote jobs straight to your inbox weekly.