Site Reliability Engineer at D2iQ

Posted on: 05/27/2021

Location: Germany (REMOTE)

Original Source

Tags: azure rkt gcp cassandra mesos chef ansible terraform docker kubernetes hadoop aws kafka python

We are looking for a Software Engineer who is focused on operations and will be responsible for running our internal infrastructure services. Our preference is for DevOps engineers with a strong slant towards operations and tool building. All of us come with a strong operations or site reliability background and we heavily dog-food our Kubernetes distribution in everything we do. We do not look for workarounds to keep our services up by any means, what we care the most about is to find the best solutions and improve our products. We don't mind getting into the weeds with hard to diagnose networking issues, and we troubleshoot such problems by leveraging our years of frontline experience firefighting within large scale web operations. Some of us have experience with Kubernetes before coming on board, and some of us don’t. However, having a strong understanding of distributed systems and systems engineering is key to our success. We take pride in creating software which people rely on and it is a joy to use. We have a preference for candidates who could work from our office in Hamburg, Germany (when it reopens), but we are also open to remote candidates based in either Germany or the United Kingdom. ### **Responsibilities** * Architect, build, and maintain systems that our engineering team and customers rely on * Contribute to documentation for both our customers and other engineers * Make Kubernetes the easiest framework to deploy, manage, and monitor at scale * Responsible for third party services and production infrastructure in which Kubernetes is operating on * Partner with other engineers to design, build, and maintain critical systems * Consistently work to make our software simpler * Effectively estimate time to implement designs * Challenge yourself and your peers to always improve ### **Basic Qualifications** * Expert level knowledge in at least one high level programming language such as Python or Go * Technical understanding of one or more of Terraform, Ansible, Chef. * 3+ years experience with production infrastructure * Designed and operated large scale infrastructure running on AWS, GCP, Azure or other cloud providers * Able to debug, troubleshoot, and resolve complex technical issues * Background in system administration, operations or site reliability * Understanding of network protocols and networking in general * Deep knowledge of Linux fundamentals * Currently residing and eligible to work in either Germany or the UK. Preference for candidates based in Hamburg. ### **Preferred Qualifications** * Production experience with service oriented architectures and distributed systems like Mesos, Kafka, Cassandra, Hadoop, Zookeeper, etc. * An extremely clear, concise, and effective communicator * Worked with container systems like Docker or Rkt in production * Strong sense of ownership, urgency, and drive * Self-driven and motivated, with a strong work ethic and a passion for problem solving