Be part of our PayU team in Latam! We are looking for our next Head of SRE Latam.
About the role
PayU is a leading financial services provider in global growth markets. We are building our next generation end-to-end PSP solution. Our solutions are based on the most advanced technology that empowers billions of people and millions of merchants to buy and sell online, extending the reach of financial services.
Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, distributed, fault-tolerant systems.
SRE ensures that PayU’s services—both our internally critical and our externally-visible systems—have reliability and uptime appropriate to users’ needs and a fast rate of improvement while keeping an eye on capacity and performance.
SREs are responsible for the big picture of how our systems relate to each other, we use a wide range of tools and approaches to solve a broad spectrum of problems. Practices such as limiting time spent on operational work, blameless postmortems and break things to proactively identify potential outages are our bread and butter.
SRE’s culture of diversity, intellectual curiosity, problem-solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences, and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment.
As a tech lead you are responsible to build and lead an agile team of SREs. You are expected to educate them and other engineers to understand and implement SRE practices, plan and prioritize the team tasks and enable team members’ growth. You will also collaborate with other SRE teams around the world to build our global SRE tools and culture.
What you will do
- Plan and lead day-to-day and future tasks.
- Train, educate and grow SRE engineers.
- Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation, and refinement.
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
- Help to maintain services once they are live by measuring and monitoring availability, latency and overall system health.
- Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
- Educate and practice sustainable incident response and blameless postmortems.
- Be on an on-call rotation to respond to PayU systems availability incidents and provide support for service engineers with customer incidents.
- Use your on-call shift to prevent incidents from ever happening.
- Make monitoring and alerting alerts on symptoms and not on outages.
- Design, build and maintain core infrastructure pieces that allow us scaling to support hundreds of thousands of concurrent payments.
- Educate engineers on how to approach and debug production issues across services and levels of the stack.
- Break things in purpose to identify potential outages.
What you will need to succeed
- High fluid english level
- Think about systems – edge cases, failure modes, behaviors, specific implementations.
- When you see a manual process you get the itch to automate it.
- Know your way around Linux and the Unix Shell.
- Know what is the use of config management systems like Ansible (the one we use)
- Have strong programming skills – Go/Node.js/Java/Python
- Have experience with Docker, Kubernetes, Terraform, or similar technologies.
- Wants to take part in both software and system engineering tasks.
Projects you could work on
- PayU platforms migration to the cloud (AWS)
- Coding infrastructure automation with Ansible and Terraform
- Improving our Prometheus Monitoring or building new Metrics
- Helping deploy and fix new versions of PayU platforms.
- Build new ways to prevent production failures and test PayU platforms for resilience and reliability by implementing chaos engineering practices.
PayU is the payments and fintech business of Prosus, a global consumer internet group and one of the largest technology investors in the world. PayU’s local operations in Asia, Central and Eastern Europe, Latin America, the Middle East, Africa and South East Asia enable us to combine the expertise of high growth companies with our own unique local knowledge and technology to ensure that our customers have access to the best financial services.We support over 350,000+ merchants and millions of consumers making payments online with over 250 payment methods and 1,800+ payment specialists. The markets in which PayU operates represent a potential consumer base of nearly 2.3 billion people and a huge growth potential for merchants.
Where do you want to take your career?
At PayU, we can help you get there.