Data Scientist, S&P Global

Data Scientist, S&P Global

Company S&P Global
Job title Data Scientist – NLP, LLM and GenAI
Job location New York/Canada/Colorado/New Jersey
Type Full Time

Responsibilities:

  • ML, Gen AI, NLP, LLM Model Development: Design and develop custom ML, Gen AI, NLP, and LLM Models for batch and stream processing-based AI ML pipelines. Model components will include data ingestion, preprocessing, search and retrieval, Retrieval Augmented Generation (RAG), NLP/LLM model development, fine-tuning, and prompt engineering and ensure the solution meets all technical and business requirements. Work closely with other members of data science, MlOps, and technology teams in the design, development, and implementation of the ML model solutions.
  • ML, NLP, LLM Model Evaluation: Work closely with the other data science team members to develop, validate, and maintain robust evaluation solutions and tools to evaluate model performance, accuracy, consistency, and reliability, during development, and UAT. Implement model optimizations to improve system efficiency.
  • NLP, LLM, Gen AI Model Deployment: Work closely with the MLOps team for the deployment of machine learning models into production environments, ensuring reliability and scalability.
  • Internal Collaboration: Collaborate closely with product teams, business stakeholders, Mlops, machine learning engineers, and software engineers to ensure smooth integration of machine learning models into production systems.
  • Documentation: Write and Maintain comprehensive documentation of ML modeling processes and procedures for reference and knowledge sharing.
  • Develop Models Based on Standards and Best Practices: Ensure that the models are designed and developed while adhering to specified standards, governance, and best practices in ML model development as specified by senior Data Science and MLOps leads.
  • Assist in Problem Solving: Troubleshoot complex issues related to machine learning model development and data pipelines and develop innovative solutions.

Requirements & Skills:

  • Bachelor’s / Master’s in Computer Science, Mathematics or Statistics, Computational linguistics, Engineering, or a related field.
  • Hands-on experience leveraging large sets of structured and unstructured data to develop data-driven tactical and strategic analytics and insights using ML, NLP, and computer vision solutions.
  • Demonstrated hands-on experience with Python, Hugging Face, TensorFlow, Keras, PyTorch, Spark, or similar statistical tools. Expert in Python programming.
  • Hands-on experience developing natural language processing (NLP) models, ideally with transformer architectures.
  • Knowledge of information search and retrieval at scale, using a range of solutions ranging from keyword search to semantic search using embeddings.
  • Knowledge of developing or tuning Large Language Models (LLM) and Generative AI (GAI)
  • Knowledge of NLP, LLMs (extractive and generative), fine-tuning, and LLM model development. Familiar with higher-level trends in LLMs and open-source platforms
  • Nice to have: Experience with contributing to GitHub and open source initiatives or in research projects and/or participation in Kaggle competitions.

apply for job button