Design and implement the best data pipeline for our Text-based products (ingestion, processing, exposition) :
Test and design state-of-the-art data ingestion pipelines
Implement efficient streaming services
Take part in the acquisition of new data sources
For each new data source, describe its feasibility and potential
Create and maintain data collection and centralization pipelines
Integration of data enrichment modules created by Data Scientists
Develop data request tooling for Technical teams
Ease the use of data-requesting engines
Optimize architecture and data pipelines
Implement and maintain critical data systems
Process and integrate data in our systems
Ensure maintainability and efficiency
Requirements & Skills:
Engineering school/university with specialization in IT, software engineering, or data science. Other types of profiles are welcome to apply as long as they have significant IT experience
At least 3 years of experience in data engineering with a successful implementation of a cloud-based data processing pipeline
Good understanding of different databases and data storage technologies
Very good knowledge of distributed computing systems, such as Spark
Good knowledge of cloud computing systems, such as AWS, GCP, Azure ML
Development: Be at ease with Python
Good communication and popularization skills: understand technical team needs and issues, and collaborate with several internal teams. Team player.
Additional skills: strong interest in Data Science / Natural Language Processing.