Data Scientist, Fidelity

Responsibilities:

Performs exploratory analysis, data cleaning, preparation and annotation, ML pipeline design and development, model evaluation, and validation.
Develops models using supervised and unsupervised ML algorithms.
Develops models using algorithms, such as decision trees, isolation forests, autoencoders/neural networks, linear/logistic regression, and clustering.
Develops models that operate on both structured and unstructured data (Natural Language Processing).
Analyzes and preprocesses features for model training.
Collaborates with team to assess project scope, define data requirements, prioritize tasks, and share research findings and updates.
Researches new techniques and technologies to improve team knowledge and enhance solutions.
Creates presentations to provide team updates on project progress, research, and new findings.
Participates in code reviews to enable learning, collaboration, and mentoring of other team members.

Master’s degree (or foreign education equivalent) in Computer Science, Engineering, Information Technology, Information Systems, Mathematics, Physics, or a closely related field and no experience.
Demonstrated Expertise (“DE”) performing complex SQL queries to extract features from SQL databases; using Python language for typical DS workflow steps — data preprocessing, regression, decision trees/random forest, neural network, feature selection/reduction, clustering, and parameter tuning.
DE developing data pipelines on Amazon Web Services (AWS) using S3 storage services and EC2 Cloud computing services; performing supervised and unsupervised modeling on tabular data in Python, using DS libraries (Pandas, NumPy, SciPy, and Scikit-Learn); training models on imbalanced datasets using python libraries (IMBlearn); creating data visualizations to analyze and evaluate model results, using Python libraries (Matplotlib and Seaborn).
DE developing classification models on text data, using Spacy, NLTK, Tensorflow, Pytorch, and BERT frameworks.
DE communicates and collaborates across teams to break down complex business problems, translate them into ML projects, and deliver data products and insights for productization, using collaboration tools (JIRA).