DevOps Lead
The DevOps Lead ensures the reliability, scalability, and performance of critical systems and services. This role bridges development and operations, fostering a culture of automation, resilience, and continuous improvement. The manager leads a team of SREs to apply best practices, manage incidents, and drive operational excellence.
Qualifications, Skills and Experience:
Bachelor’s degree in Computer Science, Engineering, or related field
Proven experience in SRE or DevOps leadership roles
-
Strong knowledge of:
Cloud platforms (AWS, Azure, GCP)
Container orchestration (Kubernetes, Docker)
Infrastructure automation (Terraform, Ansible, Jenkins, Lava)
Expertise in programming languages (Python, Java)
Proficiency with source control systems (GitHub Enterprise)
Familiarity with monitoring tools: Prometheus, Grafana, PRTG
Excellent communication and stakeholder management skills
Experience with distributed systems and high-availability architectures
Knowledge of security and compliance frameworks (ISO27001, SOC 2)
Certifications in cloud technologies or ITIL
Experience with Agile, Scrum, and Atlassian Jira
-
Familiarity with Google Cloud AI & ML services, including:
Vertex AI (end-to-end ML platform)
AutoML (custom model training)
BigQuery ML (machine learning in SQL)
Cloud AI APIs (Vision, Natural Language, Translation)
TensorFlow on Google Cloud
Strategic thinker with strong problem-solving skills
Ability to thrive in a fast-paced, evolving environment
Collaborative and empathetic leadership style
Be a hands-on leader who connects with direct reports, peers, and partners both operationally and strategically
Provide technical leadership and coaching, maintaining credibility in systems engineering, tools, and DevOps
Promote a culture of learning, collaboration, and continuous improvement through Agile and Scrum
Ensure the team has development pathways, meaningful objectives, and KPIs aligned to a clear technology roadmap
Manage, optimise, and deliver Systems, DevOps, and ML Ops as a service to internal stakeholders
Define, publish, and measure Service Level Objectives (SLOs) and Indicators (SLIs)
Oversee incident response, service request fulfilment, change management, optimisation backlogs, and post‑implementation/incident reviews
Deliver efficiencies through problem management, release management, and continuous improvement
Leverage Google Cloud AI and other tools for predictive analytics and anomaly detection
Focus on consumption and cost-to-serve via demand shaping, capacity planning, and environment governance
Automation & Efficiency
Develop and propagate frameworks, pipelines, and system engineering templates across platforms
Evangelise engineering practices, microservices, CI/CD, infrastructure-as-code, and security-by-design
Partner with Technology, Delivery, and Support teams to ensure alignment between software development and platform engineering
Drive automation initiatives to promote self-help and self-enablement, reducing manual effort
Build strong relationships with stakeholders across Technology, Engineering, Architecture, and Seeing Machines support services
Work closely with development teams to design scalable and resilient systems
Align priorities across engineering, product, and operations teams
Influence architecture and governance standards to balance innovation, scalability, and compliance
Establish cloud governance policies, access controls, and compliance standards
Ensure systems are aligned to Seeing Machines DR and BCP expectations
Enable monitoring systems, standards, and services that support predictive and reactive responses
Develop and publish dashboards showing system health across Seeing Machines
Deliver information and reporting according to an agreed cadence
Technology Division
Enterprise Systems & Services Department
Project Leads
All SM senior stakeholders
Product Vendors
Service Providers