Data Engineer/Analyst at Veryfi, Inc.

Posted on: 04/18/2022

Location: (ON-SITE)

full time

Original Source

Tags: memcache mongo etl neo4j redis s3 sql unix mysql apache kafka postgres numpy python pandas ml

Veryfi is looking for our next great data engineer that will build out and scale our analytics platform and corresponding data pipelines. Responsible for building and scaling a robust platform that will deliver our ML/AI driven insights to coordinate with the data visualization team to create engaging and insightful content Responsibilities: ----------------- * Craft data engineering components, applications and entities to empower self-service of our big data * Develop and implement technical best ETL practices for data movement, data quality and data cleansing * Optimize and tune ETL processes, utilize reusability, parameterization, workflow design, caching, parallel processing, and other performance tuning techniques. Qualifications: --------------- * Knowledgeable about data engineering best practices, comfortable in a fast-paced startup * Experience with data warehousing, streaming data and supporting architectures: pub/sub, stream processor/data aggregator, realtime analytics, data lake cluster computing framework * Master of components necessary to architect solutions for complex data platforms, and large scale CI/CD data pipelines using a variety of technologies (REST APIs, Advanced SQL, Amazon S3, Apache Kafka, Data-Lakes, etc.), relational SQL DBs (e.g. MySQL, Postgres), newer (e.g. Mongo, Neo4j) to in-memory caches (e.g. Redis, Memcache) * Working knowledge of distributed computing and data modeling principles. * Experience with object-oriented design and coding and testing patterns, including experience with engineering software platforms and data infrastructures. * Experience in Big Data, PySpark, Streaming Data. * Knowledge of data management standards, data governance practices and data quality dimensions. * Experience in UNIX systems, writing shell scripts and programming in Python * Hands on experience in Python using libraries like NumPy, Pandas, PySpark.