Responsible for designing and delivering high-quality, highly scalable, extensible, well-tested real-time analytics and high throughput Data Pipelines using tools and languages prevalent in the big data and cloud ecosystem. Researches, evaluates, and utilizes new technologies/tools/frameworks centered around high-volume data processing. Develops analytics solutions and assembles large, complex data sets that meet functional and analytics business requirements using Scala, Python, SQL, and machine learning algorithms on Amazon Elastic MapReduce (EMR), Databricks, AWS, DynamoDB, Redshift, Airflow and Jenkins as necessary. Takes ownership of building data solutions to provide actionable insights into key business metrics. Utilizes and advances continuous integration and deployment frameworks. Builds data validation testing frameworks to ensure high data quality and integrity. Works with the technical project manager to define project timelines and deliverables for each release cycle. Works on fast incremental release cycles with very tight deadlines using an agile/scrum methodology. May work remotely.
Requires a Bachelors degree in Computer Science, Information Systems, or related field, plus 5 years of data engineering experience (or a Masters degree in Computer Science, Information Systems, or related field, plus 3 years of data engineering experience). All qualifying experience must include: ETL pipelines; AWS; Apache Spark; SQL; Python; DynamoDB; Airflow; and Gitlab.
Please copy and paste your resume in the email body do not send attachments, we cannot open them and email them at candidates at (link removed) with reference #025893 in the subject line.
Thank you.