Collect, clean, and pre-process structured and unstructured data, including images, designs, and drawings, from various sources within the construction domain
Design and manage workflows for data ingestion pipelines across cloud and Databricks services, with a focus on the ingestion and embedding of images, designs, and drawings using both text and image embeddings
Perform monitoring and debugging to implement alerts and catches for continuous ingestion requests from a user-facing application
Develop novel methods and approaches to enable large vision models to effectively understand and interpret images such as architectural designs and engineering drawings
Create innovative approaches to data ingestion and arrangement for unstructured data, including various architectural formats such as images, PDFs, written reports, coordinate-based data, GIS data, geometries, CAD files, BIM models, and other visual elements
Use vector databases to store, persist, and extract embeddings from large collections of visual and textual data
Perform exploratory data analysis to identify patterns, trends, and anomalies within visual datasets
Lead the development and implementation of large vision models and computer vision techniques to address challenges related to construction and asset/facilities management
Collaborate with senior data scientists to build pipelines for visual data processing, including retrieval-augmented generation (RAG) for image-based applications, and fine-tune models to meet internal and client needs
Work closely with cross-functional teams, including engineers, project managers, and domain experts, to understand business requirements and objectives
Collaborate with other data scientists and analysts to share knowledge and insights
Work closely with Meinhardt DTS’s software engineering team for data analytics/AI aspects of the development and delivery of DTS’s products
Research and test new developments in generative AI and their potential applications for products in the Meinhardt DTS team
Stay updated on industry trends, emerging technologies, and best practices in data science
Proactively seek opportunities for professional development and skill enhancement
Proactively seek opportunities for expanding knowledge base in construction, infrastructure, urban planning, and other domain topics
Requirements & Skills:
Bachelor’s degree in Data Science, Computer Science, Statistics, Applied Mathematics, or a related field, or equivalent experience
At least 1 year of experience in computer vision, AI model development, and image processing
Strong programming skills in Python
Knowledge of image data engineering and image pre-processing techniques
Knowledge of vector stores and databases in relational and non-relational formats
Familiarity with data engineering workflows and tools, particularly in cloud environments (Azure, Databricks)
Ability to propose and test creative solutions to natural language processing and computer vision
Effective communication skills to convey technical concepts to non-technical stakeholders