Our client is an exciting start up company which was formed by like minded individuals who believe in using a powerful combination of human principles and machine intelligence, using automation help customers manage critical issues protecting their data and business, better digital engagement with customers and monitoring business risk.
 
As a Site Reliability Engineer, you will be involved in debugging complex issues, quick in dissecting and solving issues, responding to and repairing incidents.
 
All in a day’s work
  • Build new tools and techniques to automate human-intensive tasks.
  • Be immersed in a culture of CI by delivering scalable workflow automation solutions.
  • Write and configure code to build and maintain tools for your team.
  • Changing code and configuration of services to improve application reliability.
  • Maintain common components (like CI/CD, monitoring, IAM or VPC configurations) built on top of AWS, Azure & GCP.
  • Operate distributed systems at scale and improve the reliability of application and business.
  • Maintain shared services (such as Kubernetes clusters).
  • Support and plan systems that have a reliability-oriented feature set.
  • Respond to and stabilize incidents when they occur and prevent repeatable issues.
  • Share best practices, tools, and tippers.
 
Our ideal candidate
  • A strong desire to develop and apply your knowledge to solve real world problems.
  • Passionate about building and running some of the largest and most complex platforms and systems.
  • Have deep systems knowledge with a focus on all infrastructure components and some related skills like software development, workflow optimisation, and system administration.
  • Prior experience debugging code, operating a network, building hardware, etc.
  • Designed and coded systems that can distribute a package in parallel to N servers.
  • Worked with AWS, GCP, Azure, Elastic Stack, Kafka, Kubernetes, Calico, Terraform, Github, Gitlab, ArgoCD, Docker, JenkinsX or Thanos at scale.
  • Ability to set up service monitoring to identify problems before they start and alerting to catch issues before you get a wave of customer support tickets.
  • A strong awareness and ability to think “outside the box”
  • Pays attention to detail while not losing focus on the bigger picture.
  • Strong communication skills, like to work in a fast paced environment with a tight-knit team.

JOB ID: 1650

APPLY NOW
BACK TO JOB SEARCH
Tell A Friend
Your Name*
Your Email*
Job Title*
Friend Email*
Friend Email
Friend Email
Friend Email
Friend Email
Message*
Max Length Is 250 Chars
  
Tell A Friend
Job successfully sent to friends