Site Reliability Engineer
Site Reliability Engineer
We are a London fintech, building a problem resolution network for the financial industry. In our first two years since going live we have signed over 200 banks in 50 countries. Our clients' Back Office operations use us to quickly connect with the right people with the right skills in the right firms, to resolve queries and issues quickly & efficiently.
We're looking for someone who has a passion for technology, DevOps, a hunger to learn, and the professionalism required in a mission critical role. Our close-knit team of operations engineers automate everything; system administration, continuous integration, deployments, monitoring, metrics and tooling. They enjoy working on complex problems with others and are not afraid to span the stack from the network all the way to building and extending tools.
As an engineer in operations, your mission is to define, innovate and improve our CI/CD process so that an idea flows from a developer's Workstation to production in a clean, predictable and automated fashion, once it's in production, it's in front of users all over the world. You must have an analytical mind, a dislike for the unorganised and strong Linux skills. Ideally, you will have experience in production operations and you're looking for the next step, however, less experienced candidates with strong skills and aptitude will be considered.
Our relaxed, informal office is in a Central London location.
- Building, monitoring and maintaining a 100% cloud environment (AWS)
- Enabling us to keep creating secure, reliable, repeatable production roll-outs of the platform
- Developing tooling that enhances the lives and happiness of the Dev and Ops team alike
- Investigating new technologies that advance the observability and performance of our platform
- Taking ownership of tasks, communicating ideas and decisions throughout the team and ensuring tasks are fully completed with high quality
- Strong Unix/Linux administration skills, that includes understanding TCP/IP Networking, Scripting (Bash or Python)
- A dislike of ad-hoc or manual processes, enjoyment of automating them away and experience of doing this using one or more of Ansible, Puppet, Chef, Salt etc.
- Experience of, or the ability to be part of an operations on-call rotation responsible for mission critical systems
- Strong teamwork, written and oral communication skills
- A degree in a relevant discipline
- An understanding of modern infrastructure design & administration, experience of using AWS, Google Cloud, Microsoft Azure or one of the other IaaS providers (Linode, Digital Ocean, OpenStack, VMWare, XEN etc.) and using Terraform, CloudFormation, boto or other orchestration tools
- An appreciation for security, both in design & operation
- Ability to work in a small team part of a larger organisation, capable of independent work & working with distributed team members
- Experience with Continuous Integration/Delivery (tooling and approach)
- Experience of or a strong desire to be part of a larger engineering organisation
- Experience building, operating, maintaining & scaling RabbitMQ/AMQP & Postgres