Responsible for the transition of machine learning algorithms to a production environment and integration with the enterprise ecosystem
Design, create, maintain, troubleshoot, and optimize the complete end-to-end machine learning life cycle, which includes:
machine-learning model optimization
data preparation
feature extraction
model performance monitoring
AB/Canar/Bluegreen/etc. testing
Integration with Enterprise ecosystem/IoT devices/Mobile devices
Write specifications, documentation, and user guides for developed solutions
Build frameworks for data scientists to accelerate the development of production-grade machine learning models
Collaborate with data scientists and the engineering team to optimize the performance of the ML pipeline
Aid In the improvement of SDLC practices
Exploration of new tools and techniques and propose improvements
Establish and configure CI/CD/CT processes
Design and maintain ML model’s continuous training
Provide capabilities for early detection of various drifts (data, concept, schema., etc.)
Continuously identify technical risks and gaps, devise mitigation strategies
Identify and eliminate technical debt in machine learning systems
Requirements & Skills:
Experience in Enterprise Software Development for 5+ years
Solid background in Machine Learning for 3+ Years
Expertise in NLP/LLM, RecSys, Time Series
Experience with designing, building, and deploying production applications and data pipelines
Experience in the development of highly available, largely scalable, ML-driven applications and systems
Experience with cloud-native services: GCP, AWS, Azure
Able to work closely with customers and other stakeholders
Strong knowledge and experience in Python development
Practical experience with one or more Cloud-native services (GCP, AWS, Azure) and Apache Spark Ecosystem (Spark SQL, MLlib/Spark ML)
Deep understanding of Python ML ecosystem (PyTorch, TensorFlow, numpy, pandas, sklearn, XGBoost)
Hands-on experience in the implementation of Data Products
Deep understanding of data preparation and feature engineering
Understanding of Apache Spark Ecosystem (Spark SQL, MLlib/Spark ML)
Deep hands-on experience with the implementation of SDLC best practices in complex IT projects
Experience with automated data pipeline and workflow management tools (Airflow)
Knowledge and experience in computer science disciplines such as data structures, algorithms, and software design patterns
Hands-on experience in different data processing paradigms (batch, micro-batch, streaming)
Deep understanding of MLOps concepts and best practices
Experience with some of the MLOps related platform/technology such as AWS SageMaker, Azure ML, GCP Vertex AI / AI Platform, Databricks MLFlow, Kubeflow, Airflow, Argo Workflow, TensorFlow Extended (TFX), etc
Production experience in integrating ML models into complex data-driven systems/IoT devices/Mobile devices
Experience with basic software engineering tools (CI/CD environments such as Jenkins or Buildkit, PyPi, Docker, Kubernetes)
Experience with one of the infrastructures as a code (IoC) framework (Terraform/CDK TF, Ansible, AWS CloudFormation / AWS CDK)