
Company |
Databricks |
Job title |
Senior Cloud Network Operations Engineer |
Job location |
Heredia, Costa Rica |
Type |
Full Time |
Responsibilities:
- Monitor critical infrastructure, triage alerts to proactively identify incidents, and work with stakeholders to resolve incidents.
- Investigate incidents and propose solutions to improve platform reliability and stability.
- Perform root cause analysis for reoccurring incidents and provide proactive solutions.
- Develop toolings or automate processes to improve platform monitoring and alerting.
- Contribute to software development efforts to improve overall service reliability and stability.
- Communicate with internal stakeholders, including executive staff, to provide incident analysis.
- Participate in war rooms and temporary communication channels during outages.
- Demonstrate cross-functional leadership and establish ownership of incidents and outages.
- Multitask on several incidents and/or projects at once
Requirements & Skills:
- 3 years of experience as a NOC, SRE, or DevOps engineer
- Knowledge of cloud technologies such as Azure, AWS, and GCP
- Hands-on experience with monitoring, logging, and alerting tools
- Hands-on experience with containers and orchestration technologies
- Automation and scripting skills
- Linux systems administration skills.
- Knowledge of managing incidents
- Excellent communication skills.
- Technical degree or equivalent experience
- Willingness to learn the Databricks products
