11 days ago

Site Reliability Engineer at Sketch

82% 40 hours / week Worldwide (Remote)
Gym membership or wellness allowance
Flexible working hours
Paid health insurance
Paid parental leave
Employee training program
Unlimited paid holidays

Over a million designers use Sketch to transform their ideas into incredible products, every day. Would you like to join us and help take the infrastructure that supports this leading design tool to the next level? We’re looking to expand our team with a full-time Site Reliability Engineer.

The job

As a Site Reliability Engineer at Sketch, you will focus on shaping our cloud infrastructure and make sure all the pieces work well together: development environments, metrics processing and observability, security policies, network design, deployment strategies, high availability, etc.

You will work closely with backend, frontend, Mac developers and product managers to guarantee platform stability, and actively participate in the architecture and design of new projects.

The stack

At Sketch, we work with a unique technology blend: A deeply interconnected platform consisting of a Linux-based cloud platform and our award-winning macOS application.

Our cloud stack backend is based on a mix containerised services and serverless built on Elixir and Go and exposing GraphQL and REST APIs, with most pieces deployed on AWS and automated through Terraform. Our backend services persist data in PostgreSQL databases and other minor services.

We use Chef for configuration management each time we need to configure instances for non-cloud services, and Python for small programs or scripts, e.g. to migrate data, run recurring jobs or automate operations.

Our monitoring, metrics and alerting stack includes Thanos, ELK and Grafana.

For CI/CD and testing, we use mostly CircleCI but also our fully automated, defined-in-code Jenkins instance, that, among other tasks, spawns ephemeral ECS workers for running jobs.

Essentials for the job

  • Professional experience managing Linux-based and cloud-native distributed systems in the past
  • Experience coding with high-level programming languages like Python for technical operations tasks and services automation
  • Experience with Infrastructure as Code tools such as Terraform, and configuration management tools to automate manual operations
  • A good understanding of the HTTP protocol and the behavior of production web services
  • Excellent communication skills and a good written and spoken English
  • You’re based in European / African timezones.

We care about your well-being and your professional success, so we offer you

  • Full time employment, with a flexible schedule
  • As many vacation days as you need
  • Whatever training you need to develop in your job
  • Private healthcare and gym reimbursements
  • The laptop you need
  • The option to work anywhere in European/African timezones
  • Company equity
  • Paid family leave
  • An annual company meetup