Design and Development: Create robust and scalable software solutions for observability and AI operations, utilizing the latest technologies and best practices.
System Integration: Collaborate with various teams to integrate observability and AI tools with existing infrastructure, ensuring smooth and efficient operations.
Monitoring and Optimization: Develop and implement monitoring strategies to proactively detect and resolve issues, enhancing system reliability and performance.
AI and Machine Learning: Leverage AI and machine learning techniques to automate operational tasks, predict system behavior, and improve overall efficiency.
Collaboration: Work closely with developers, engineers, and stakeholders to understand requirements and deliver solutions that meet business needs.
Documentation: Maintain comprehensive documentation of developed solutions, including design specifications, implementation details, and operational procedures.
Continuous Improvement: Stay updated with industry trends and advancements, continually improving and innovating observability and AI ops practices.
Requirements & Skills:
Education: Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
Experience: Proven experience as a software developer, with a focus on observability and AI operations.
Technical Skills: Proficiency in programming languages such as Python, Java, or Go; experience with observability tools like Prometheus, Grafana, and ELK stack; knowledge of AI and machine learning frameworks.
Problem Solving: Strong analytical and problem-solving skills, with the ability to troubleshoot complex system issues.
Communication: Excellent communication and collaboration skills, with the ability to work effectively in a team environment.
Adaptability: Ability to thrive in a fast-paced, dynamic environment, managing multiple priorities and deadlines.