
Company |
Clinia |
Job title |
Data Quality Developer |
Job location |
Montreal, QC, Canada |
Type |
Full Time |
Responsibilities:
- Collaborate closely with web scraping developers, and internal stakeholders to understand data requirements.
- Design secure and efficient processes for ingesting healthcare data from private sources and crawled websites.
- Develop and maintain robust data cleaning and transformation procedures to ensure data quality and consistency.
- Utilize Spark and Ray for scalable, high-performance distributed data processing optimized for large healthcare datasets.
- Implement and manage Apache Airflow workflows for scheduling and automating routine healthcare data processing tasks.
- Work collaboratively with the GRC Lead to ensure data processing aligns with regulations and certifications (eg, GDPR, HIPAA).
- Implement security measures such as data encryption, access controls, and anonymization techniques to safeguard sensitive data.
- Maintain comprehensive documentation for data processing pipelines, including design decisions, configurations, and workflow dependencies.
- Facilitate knowledge-sharing sessions with the data engineering team to disseminate best practices, new techniques, and updates.
Requirements & Skills:
- Solid Python programming skills, as well as proficiency with SQL.
- Knowledge of techniques and methodologies around data cleaning and data quality.
- Knowledge of different data processing paradigms (ETL, ELT, etc.)
- Experience working with parallel and distributed data computing (Ray, Spark, Dask, Hadoop, etc.)
- Experience working with versioned data lakes (Apache Iceberg)
- Experience working with containers and cloud computing
- Experience working with sensitive and clinical data (nice to have)
