Site Reliability Manager

Vollzeit
Melbourne VIC, Australia
vor 7 Monate

The AKQA Managed Services team provides customer experience platform management and cloud environment support solutions, DevOps engineering, and consulting services that support our clients' ecommerce platforms and associated business operations. There is now an exciting opportunity to join the AKQA Managed Services team in a newly created role, and be an integral part in continuing to expand their core services and solutions.

As a Site Reliability Engineering Manager (SREM), you will play a pivotal role in ensuring the reliability, scalability, and performance of our clients’ digital platforms and CX applications. This position requires a unique blend of software engineering skills, operational acumen, and a commitment to fostering a culture of site reliability within the organization. You will support development teams, customer platforms, while championing the Site Reliability mindset through knowledge transfer, continuous learning & training and technical certification support. 

At AKQA Melbourne, you’ll work in an innovative and inclusive culture, surrounded by some of the brightest minds in their fields. You will have the opportunity to learn and grow within a creative and technically advanced team and have access to ongoing personal and professional development. 

ROLE REQUIREMENTS

  • Design, implement and support platform reliability, scalability, and performance of our systems and applications of all phases of the SDLC 
  • Provide support during operations such as deployments and general production and non-production testing 
  • Provide technical support and manage tasks from multiple agile teams across the region 
  • Support the Delivery team with processes, frameworks and tool sets that champion environment health engineering 
  • Be on the 24/7 service desk roster supporting key clients across the region 
  • Perform root cause analysis for environment and application performance and uptime issues 
  • Contribute to both Incident Reports and Monthly Operational Reports 
  • Identify and perform analysis to provide recommendations for improvements 
  • Contribute to the planning and coordination of platform environment updates 
  • Contribute to the definition of DevSecOps best-practice and operational standards 
  • Collaborate with developers to ensure new environments meet client requirements and conform to defined standards and compliance 
  • Builds strong interpersonal relationships with key staff members across studios, clients, partners and teams 
  • Champions a vibrant and diverse engineering culture through internal presentations and knowledge transfer. 

QUALITIES AND CHARACTERISTICS

  • 3-5 years' experience in a similar role 
  • Sound experience with Azure, AWS, DevOps, AWS CloudFormation, Terraform, Helm etc. 
  • Knowledge and hands on expertise in tools like New Relic, Dynatrace, Splunk etc. 
  • Exposure to GIT, Bitbucket, Jira/Confluence, New Relic and Cloudflare 
  • Experience in deploying resources using Python and Powershell scripting languages  
  • Self-motivated and willing to do what it needs to get the job done efficiently and effectively 

 

AKQA is an Equal Opportunities Employer, we believe that diversity is vital to AKQA’s ability to provide our clients with the best recommendations and are committed to fostering a varied and inclusive work environment. Your race, colour, ancestry, religion, gender, gender identity, national origin, sexual orientation, age, marital status, disability or veteran status have no bearing on our hiring decisions. If you have a disability or special need that requires accommodation, please let us know. Aboriginal, Torres Strait Islander and Indigenous people are encouraged to apply for this role.