Site Reliability Engineer at Sketch

Posted on: 02/06/2021

Location: (REMOTE)

Original Source

Tags: aws ecs sketch python jenkins chef grafana serverless elixir postgresql graphql circleci terraform

Over a million designers use Sketch to transform their ideas into incredible products, every day. Would you like to join us and help take the infrastructure that supports this leading design tool to the next level? We're looking to expand our team with a full-time **Site Reliability Engineer.** **The job** As a Site Reliability Engineer at Sketch, you will focus on **shaping our cloud infrastructure** and make sure all the pieces work well together: development environments, metrics processing and observability, security policies, network design, deployment strategies, high availability, etc. You will **work closely with backend, frontend, Mac developers and product managers** to guarantee platform stability, and actively participate in the architecture and design of new projects. **The stack** At Sketch, we work with a unique technology blend: A deeply interconnected platform consisting of a Linux-based cloud platform and our award-winning macOS application. Our cloud stack backend is based on a mix containerised services and serverless built on **Elixir** and **Go** and exposing **GraphQL** and **REST** APIs, with most pieces deployed on **AWS** and automated through **Terraform.** Our backend services persist data in **PostgreSQL** databases and other minor services. We use **Chef** for configuration management each time we need to configure instances for non-cloud services, and **Python** for small programs or scripts, e.g. to migrate data, run recurring jobs or automate operations. Our monitoring, metrics and alerting stack includes **Thanos**, **ELK** and **Grafana.** For CI/CD and testing, we use mostly **CircleCI** but also our fully automated, defined-in-code **Jenkins** instance, that, among other tasks, spawns ephemeral ECS workers for running jobs. **The challenge** Due to our unique technology blend, you will find plenty of interesting challenges when working as an SRE at Sketch, such as: Managing and helping autoscale our Renderfarm that currently processes more than 70k documents daily (soon [**using the new Apple Silicon Mac Minis**](https://www.sketch.com/blog/2020/11/24/how-sketch-performs-on-apple-M1-silicon/) 😍) Helping to design, maintain and battle test our [**real-time collaboration features**](https://www.sketch.com/collab/), making sure we offer a world-class experience to our users by extracting each ounce of performance from all layers of the stack: from HTTP requests, caching and WebSockets, to backend autoscaling and database performance. Improving our continuous deployment pipeline by designing and setting up fully automated ephemeral test environments comprising all the different application and cloud pieces Setting up, debugging and owning enterprise-oriented features such as Single-Sign-On and full-featured Sketch document embedding Working towards achieving full platform observability through curated metrics and actionable alerts, using the best open-source tools for the job **About you** You care about security, code quality, scalability, performance, and simplicity. Above all, you seek **operational excellence** and apply the best engineering practices possible. Not everything that you or your team do can be perfect, but you make sure that you always **know the trade-offs**. You back your decisions with **arguments**. You **don't care for hype** and always try to **find the best solution and technology for the job** and its context. **Essentials for the job** * Professional experience managing Linux-based and cloud-native distributed systems in the past * Experience coding with high-level programming languages like Python for technical operations tasks and services automation * Experience with Infrastructure as Code tools such as Terraform, and configuration management tools to automate manual operations * A good understanding of the HTTP protocol and the behavior of production web services * Excellent communication skills and a good written and spoken English * You're based in European / African timezones. **We care about your well-being and your professional success, so we offer you** * Flexibility to organize your own time, no set hours * As many vacation days as you need * Whatever training you need to develop in your job * The laptop you need * The option to work anywhere in European/African timezones * Company equity * Paid family leave * An annual company meetup