Engineering Lead Manager, Index Exchange

Engineering Lead Manager, Index Exchange

Company Index Exchange
Job title Engineering Lead Manager, Production Operations
Job location London, England, United Kingdom
Type Full Time

Responsibilities:

  • Team Building: Recruit, hire, and train a skilled team of ProdOps Engineers to establish a fully operational ProdOps team from scratch. Develop team goals, performance metrics, and performance evaluation processes. Foster a collaborative and inclusive team culture that promotes teamwork, innovation, and excellence.
  • Infrastructure Monitoring and Troubleshooting: Oversee the monitoring of our company’s global infrastructure using management tools to detect and resolve infrastructure issues proactively. Lead the troubleshooting efforts to resolve incidents in a timely manner, and escalate issues to appropriate teams for further investigation and resolution when necessary.
  • Incident Management: Establish and enforce incident management procedures, including incident reporting, escalation, and resolution processes. Coordinate with other engineering teams and external vendors to resolve technology incidents, minimize downtime, and ensure service level agreements (SLAs) are met.
  • Infrastructure Automation & Optimization: Continuously analyze infrastructure performance data, identify trends, develop and implement strategies to optimize performance and minimize disruptions. Collaborate with other engineering teams to automate and optimize for improved performance, implement upgrades, and changes to enhance infrastructure reliability, capacity, and security.
  • Documentation and Reporting: Develop and maintain comprehensive documentation, including network diagrams, run books & standard operating procedures (SOPs), and incident reports. Generate regular reports on performance, incidents, and trends to senior management and stakeholders.
  • Vendor Management: Establish and maintain relationships with technology equipment vendors, service providers, and other relevant stakeholders. Coordinate with vendors to resolve technical issues, manage maintenance contracts, and ensure timely delivery of services and equipment.
  • Training and Development: Provide ongoing training and professional development opportunities to the ProdOps team to enhance their technical skills, industry knowledge, and job performance. Mentor and coach team members to foster their growth and career advancement.

Requirements & Skills:

  • Proven experience in building and managing a 24×7 Production Operations team working with peers and colleagues in a distributed global operation  
  • Strong leadership skills with the ability to motivate, mentor, and develop a high-performing team
  • In-depth knowledge of on-premise and cloud technology concepts, protocols, and procedures
  • Strong understanding of monitoring tools, incident management processes, automation, and optimization strategies
  • Ability to analyze complex technical issues, develop effective solutions, and make informed decisions in a fast-paced environment
  • Excellent communication skills, both written and verbal, with the ability to communicate technical concepts to non-technical stakeholders. 
  • In-depth understanding of the Linux operating environment: kernel tuning, network stack tuning, system observability & instrumentation, and security & access management.
  • Solid understanding of layer 2-7 networking fundamentals and the relationship between servers & services, and the transit of their packets through network hardware.
  • In-depth experience engineering and maintaining a private-cloud infrastructure: Bare-metal, vSphere, KVM, Kubernetes.
  • Experience with tools like Ansible, Terraform, Docker, Kafka, Nexus
  • Experiencing with observability platforms: Prometheus, ELK, Jaeger, Grafana, Nagios, Zabbix
  • Familiarity with Big Data tools: Hadoop, HDFS, Spark, HBase
  • Ability to write code in Go, Python, Bash, or Perl for automation.
  • apply for job button