Senior Site Reliability Engineer at Kaiko

Posted on: 05/26/2022

Location: Paris, France (REMOTE)

full time

Original Source

Tags: haproxy grpc karma pagerduty consul loki traefik kafka nomad postgresql containers ansible terraform

The Challenge ------------- You will be joining a fast-paced engineering team made up of people with significant experience working with terabytes of data. We believe that everybody has something to bring to the table, and therefore put collaborative effort and team-work above all else (and not just from an engineering perspective). You will be able to work autonomously as an equally trusted member of the team, and participate in efforts such as: * Addressing high availability problems: cross-region data replication, disaster recovery, etc. * Addressing “big data” problems: 200+ millions of messages/day, 160B data points since 2010 (currently growing at a rate of 10B per month). * Improving our development workflow, continuous integration, continuous delivery and in a broader sense our team practices * Expanding our platform’s observability through monitoring, logging, alerting and tracing What you’ll doing ----------------- * Deploy, maintain, evolve our infrastructures (we have 2 autonomous regions) for optimum data consistency, availability while keeping costs down * Automate what is not, fix what’s needed, providing ideas * Adapt fast Our tech stack -------------- * **Alerting:** AlertManager, Karma, PagerDuty * **Logging:** Vector, Loki * **Caching:** FoundationDB * **Secrets management and PKI:** Vault * **Configuration management and provisioning:** Terraform, Ansible * **Service discovery:** Consul * **Messaging:** Kafka * **Proxying:** HAProxy, Traefik * **Service deployment:** Terraform, Nomad (plugged in Consul and Vault) * **Database systems:** ClickHouse (main datastore), FoundationDB (caching, deduplication), replicated PostgreSQL * **Operating System:** Ubuntu 20.04 * **Protocols:** gRPC, HTTP (phasing out in favor of gRPC), WebSocket (phasing out in favor of gRPC) * **Platform:** containers About You --------- * Significant experience as a DevOps/System Engineer * Experienced about Linux system admin, automation (ansible at a minimum) * Worked with, in no particular order: troubleshooting crashes & performance issues, load-balancing, VIPs/fail-over IPs, RAID You’ll notice that we don’t have any “hard” requirements in terms of development platforms or technologies: this is because we are primarily interested in people capable of adapting to an ever changing landscape of technical requirements, who learn fast and are not afraid to constantly push our technical boundaries. It is not uncommon for us to benchmark new technologies for a specific feature, or to change our infrastructure in a big way to better suit our needs. The most important skills for us revolve around two things: * What we like to call “core” knowledge: what’s a software process, how does it interact with a machine’s or the network’s resources, what kind of constraints can we expect for certain workloads, etc * How fast you can adapt to a technology you didn’t know existed 10 minutes ago In short, we are looking for someone able to spot early on that spending 10 days to migrate data to a more efficient schema is the better solution compared to scaling out a database cluster in a matter of minutes if we are looking to improve performance in the long term. Nice to have ------------ * Experience with HashiCorp tools (terraform, vault, consul, nomad) * Experience with orchestrating containers, micro-services * Experience with recent Ubuntu, systemd * Knowledgeable about network, routing (BGP, static, …), tunneling * Knowledge about encryption (PGP/TLS/SSH/WireGuard/…) * Basic knowledge of crypto-currencies Personal Skills --------------- * Honest: receiving and giving feedback is very important to you * Humble: making new errors is an essential part of your journey * Empathetic: you feel a sense of responsibility for all the team’s endeavors rather than focus on individual contributions * Committed: as an equally important member of the team, you want to make yourself heard while respecting everybody’s point of view * Fluent in written and spoken English * You have the utmost respect for legacy code and infrastructure, with some occasional and perfectly understandable respectful complaints What we offer ------------- * An entrepreneurial environment with a lot of autonomy and responsibilities * Opportunity to work with an internationally diverse team * Hardware of your choice * Perks: meal vouchers, multiple team events and staff surprises Process ------- * Introduction call (30mins) * Meeting with members of the team for a technical/product RPG: you read that right, no written test, no whiteboard quicksort implementation (1h30) * Cross team interviews (2-3 persons, 45m x2) * Meeting with VP of Engineering (20m) As our working language is English, we would appreciate it if you send us your application and any accompanying documents in English. Location -------- On-site in our Paris office, or full remote (+- 2h maximum with CET). Diversity & Inclusion --------------------- At Kaiko, we believe in the diversity of thought because we appreciate that this makes us stronger. Therefore, we encourage applications from everyone who can offer their unique experience to our collective achievements.