Senior Site Reliability Engineer
Olá!
Valtech is looking for a Site Reliability Engineer (SRE). Are you passionate about Site Reliability Engineering, do you have an eye for SLIs, SLOs, automation, do you hate toil and intend to do something about it, and does it excite you to get things done in close collaboration with people around the globe? Would you like the freedom to choose to either work from the comfort of your home and also have the opportunity to visit any of our offices close to you? Then you might be the person we’re looking for! Keep reading to find out.
Valtech and Site Reliability Engineering
Valtech is a leading global agency in the business of digital transformation. We help our client to transform their business into a true digital experience. In this mission we design, build and run large scale global experience and commerce platforms in co-creation and co-operation with our clients. Experience and commerce platforms have drastically evolved over the last years in complex eco systems that tie together multiple services of multiple vendors – also known as MACH or composable architecture. As founding member of MACH Alliance, a group that educates enterprises on best-of-breed Microservices, APIs, Cloud, and Headless (MACH) technology, Valtech pioneers in how to properly build and manage those complex eco systems. Site reliability engineering is at the core of our vision how this modern day distributed eco system should and can be managed.
A day in the life of a Site Reliability Engineer (SRE)
As a Site Reliability Engineer (SRE), you are the bridge between software development and operations. You help us to deliver reliable speed to our clients, allowing them to leverage the benefits of continuous deployment without losing grip on customer experience. You will work with our multidisciplinary teams in an essential DevOps way of working where your main responsibility is to keep everyone focused on production, while creating the facilities to do so.
Your responsibilities will be:
- Work with teams to define SLIs and SLOs.
- Creating systems for observability.
- Work with teams to analyze failure scenarios and possible mitigations.
- (Assisting to) Create runbooks to remediate or prevent failure scenarios.
- Reduce work that does not add value.
- Participate and facilitate incident management including On Call Duty.
You and the role
You are someone with 5 years of experience in the field of software engineering, devops engineer, qa engineering and/or cloud engineering of which at least the last 2 years as a dedicated Site Reliability Engineer. You feel comfortable to take the lead, make decisions and know how to mobilize and motivate people to set things in motion. In your current role, people come to you for advice on what to look for to determine the robustness of their production environments, advice for reliable deployment procedures, assistance in analysis of failure scenarios and ideas on how to mitigate or remediate those.
We would love to talk to you if:
- You are assertive with good communicative skills, capable of taking the lead and coaching a development team to make the right choices.
- You have experience with incident management on a production environment of a public facing online service with high business value and preferably high traffic in a 24x7 fashion.
- You have experience in working in corporate environments.
- You have experience programming and scripting.
- You have at least basic knowledge of serverless services in one or more public cloud providers (AWS, Azure, GCP).
- You have extensive knowledge of and experience with various monitoring systems, amongst which APM systems such as Datadog, New Relic, Dynatrace, Prometheus, Grafana.
- You have knowledge of and experience with various pipelining tools, such as GitHub, Azure DevOps, Gitlab, Jenkins.
- You have knowledge of and experience with microservices related technology: Docker, Kubernetes.
- You have a good conceptual understanding of software architecture and system thinking.
- You have worked as an engineer in a DevOps context.
- You have an excellent command of English (C1 or above).
- Are familiar with the following technologies:
- Datadog (or APM equivalent)
- Argo CD
- Java / Springboot
- Kafka
- Kubenetes / EKS
- AWS
- Have worked within the context of publicly accessible, highly available eCommerce platforms.
- Have experience working in an international context with on- and off-shore teams.
What do we offer in return?
A sunny terrace to enjoy a few drinks 🥂 with your colleagues and our BBQs
Regular online and onsite events 🤙
Remote work-friendly 👩💻
Mingling with your colleagues at Café Valtech ☕
A team who serious about getting things done while not taking itself too seriously 😁 Personal study budget and time – we take learning very seriously 👨🏫 and want you to be able to improve your skills
Knowledge-sharing sessions where you can talk about different topics – some of them might not even be related to tech at all 🎾
Health insurance 👩⚕️ with the option to add family members
Partnerships with Kushi Minds and Urban Sports Club – we care deeply about your health, safety, and mental well-being 🍀
Home office budget to improve your workstation and get you ready to shine 🤩
A Coverflex budget to allocate in whatever suits your preferences 💸
Our recruitment process:
- Introduction video call with HR.
- Technical video interview with a Hiring Manager and team member.
- Final video interview with Program Manager and People Partner.
Diversity and Inclusion at Valtech
At Valtech, we’re here to engineer experiences that work and reach every single person. To do this, we are proactive about creating workplaces that work for every person at Valtech. Our goal is to create an equitable workplace that gives people from all backgrounds the support they need to thrive, grow and meet their goals (whatever they may be). You can find out more about what we’re doing to create a Valtech for everyone here.
- Department
- Technology
- Role
- Site Reliability Engineer
- Locations
- Lisboa
- Remote status
- Hybrid Remote
Lisboa
About Valtech Portugal
Whether it's one of our B2B solutions for Henkel, Westcon, or Dot Foods or a new customer experience for Dolby, D'Addario, or Profoto ... together, we design, build and deliver transformative digital solutions for the world's best-known brands.
Whatever brought you to us, whether it was work, play, or something in between, as a multi-award winning agency, we build intuitive, frictionless, and connected experiences that improve human lives and make our client's businesses grow.
Senior Site Reliability Engineer
Loading application form
Already working at Valtech Portugal?
Let’s recruit together and find your next colleague.