We are seeking a seasoned Site Reliability Engineer to join our team! Working with our existing SRE team, you’ll improve application reliability by using a software engineering approach to operations. You’ll develop internal tools and systems for all engineering teams to use. Using site reliability principles and a robust approach to observability, you will not only fix problems but solve the issues that contributed to them when things go wrong.
This position works closely with Release Engineering and other engineering teams in our Production Engineering Zone to develop and maintain the tools and systems that support all of Zapier engineering. This role calls upon a broad range of experience and technologies. You’ll get to interact with every engineering team in the organization. Maintaining excellent relationships and communicating effectively with those teams regularly is key to success.
Zapier is rapidly scaling and growing, and you will work directly on the applications that support over 5 million customers. When bad things happen, you’ll have the support of your team to solve contributing causes, to learn from failures, and to build a robust and resilient system for our customers.
Building new features and services is a big part of this role. We are continually developing and implementing new ways to support our teams, understanding our customers needs, and becoming experts in site reliability.
If you’re interested in taking your career to the next level at a fast-growing and profitable startup, then read on.
Zapier is proud to be an equal opportunity workplace dedicated to pursuing and hiring a diverse workforce.
We’re looking for an experienced engineer who is eager to use software development approaches to operations. You should have a breadth of experience in software development, operations, and be actively practicing site reliability principles. There is a lot to learn, and we’re continually improving our approaches to SRE. There are plenty of learning opportunities. We don’t expect you to know it all.
Ideally, you’ll have several years of experience in practicing infrastructure as code, including using tools like Ansible, Terraform, and using platforms like Kubernetes. Well-honed experience with the fundamentals of software development goes a long way here. Python and Go, we do it all. Generalists thrive in this role.
Writing is our primary means of communication, from pull requests, team chat, knowledge sharing, and communicating changes. Excellent writing skills are crucial to success here at Zapier. We are 100% remote and commonly work asynchronously. We even wrote a book on it.
You should feel comfortable taking a default to action. Most decisions are changeable. It’s better to deliver something real today over something maybe better later. Sharing context, goals, objectives, and in-progress work in public helps us all achieve a common goal.
Things We’ve Done Recently
- Develop new methods for retaining task history
- Migrating applications and services from EC2 to Kubernetes
- Write custom Kubernetes controllers to improve resilience
- Create deployment pipelines in ArgoCD
- Develop autoscaling strategies to handle bursts in workloads
- Implementing OPA to enforce policies across our Kubernetes Clusters
- Deploying ProxySQL for pooling connections against MySQL databases
Zapier helps people across the world automate the boring and tedious parts of their job. We do that by helping everyone connect the web applications they already use and love.
We believe that there are jobs a computer is best at doing and that there are jobs a human is best at doing. We want to empower businesses to create processes and systems that let computers do what they are best at doing and let humans do what they are best at doing.
We believe that with the right tools, you can have big impact with less hassle.
We believe in small teams. Small teams are fast and nimble. Small teams mean less bureaucracy and less management and more getting things done.
We believe in a safe, welcoming, and inclusive environment. All teammates at Zapier agree to a code of conduct.
The Whole Package
We’re currently hiring for the following locations:
- North America
- South America
Competitive salary (we don’t use remote as an excuse to pay less)
Great healthcare + dental + vision coverage*
Retirement plan with 4% company match*
2 annual company retreats to awesome places
14 weeks paid leave for new parents of biological or adopted children
Pick your own equipment. We’ll set you up with whatever Apple laptop + monitor combo you want plus any software you need.
Unlimited vacation policy. Plus we require you to take at least 2 weeks off each year. We see most employees take 4-5 weeks off per year. This isn’t a vague policy where unlimited vacation means no vacation.
Work with awesome companies around the world. We partner with great software companies all over the world and you’ll constantly get to interact with people from these great companies
Currently, healthcare and retirement plans are only available to US, UK, and Canadian employees.