Our purpose is to make great financial decision making a breeze for everyone, and that purpose drives us every day.
It’s why we’re on a mission to create an automated quoting engine, with the simplest of experiences, wrapped in a brand everyone loves!
We change lives by making it simple to switch and save money. So, when it comes to getting a better deal, it’s never been more blindingly obvious why you would choose Compare the Market.
We’d love you to be part of our journey.
As the Site Reliability Engineer, you will ensure the highest levels of system uptime and performance, contributing directly to the trust and reliability of our services and infrastructure. You will be working collaboratively with engineering teams to design, implement, and maintain robust systems that can withstand the challenges of a rapidly evolving technology landscape. Bridging the gap between development and operations, focusing on automation, scalability, and system stability.
Everyone is welcome. Be you.
We have a culture of creativity. We approach our work passionately, improve constantly and celebrate our wins at every turn. We are an inclusive workplace, and our employees are comfortable bringing their authentic, whole selves to work.
This means we’re excited to hear from people with a range of skills, experiences, and ideas. We don’t expect you to tick all the boxes but would love you to hear what makes you great for this role.
Some of the great things you’ll be doing:
• System Reliability - Ensure the uptime and reliability of critical systems and applications, minimizing downtime and service disruptions.
• Automation - Develop and maintain automated processes for deployment, configuration, and scaling of infrastructure and applications.
• Observability - Ensure that teams and their services are making the most of our observability stack and that relevant information is accessible to them to effectively manage their estate.
• Incident Response - Respond to and resolve incidents, conducting post-mortem analysis to identify root causes and implement preventative measures ensuring any learnings are shared.
• Capacity Planning - Collaborate with teams to forecast resource requirements, ensuring that systems can handle current and future workloads, or providing guidance if teams have services that are over-provisioned.
• Service Resilience - Ensure that our services are tolerant to failure and are utilising appropriate strategies when faced with intermittent failures.
• Documentation - Create and maintain detailed documentation of system configurations, procedures, and best practices.
• Continuous Improvement - Identify areas for improvement in processes and technologies, staying current with industry trends and best practices
What we’d like to see from you:
• Strong knowledge of Linux, networking, and a Cloud provider – we mostly use AWS
• Proficiency with at least one high-level programming language (Python, Go, C#, JavaScript/Node)
• Experience with infrastructure automation tools (Terraform, Ansible, CloudFormation).
• Good working knowledge of containers and associated orchestrations technologies (Kubernetes)
• Good debugging skills across a diverse tech
• Experience with common monitoring and alerting tools
There’s something for everyone.
We’re a place of opportunity. You’ll have the tools and autonomy to drive your own career, supported by a team of amazingly talented people.
And then there’s our benefits. For us, it’s not just about a competitive salary and hybrid working, we care about what matters to you. From a generous holiday allowance and private healthcare to an electric car scheme and paid CSR days, we’ve pretty much got you covered!
#LI-HL1