Position: Senior Data Engineer
Location: Toronto Down Town Canada Onsite
Mode of Hire: Contract
Job Description:
- An undergraduate or Masters degree in Computer Science or equivalent engineering experience
- 6 years of professional software engineering and programming experience (Java Python) with a focus on designing and developing complex dataintensive applications
- 3 years of architecture and design (patterns reliability scalability quality) of complex systems
- Advanced coding skills and practices (concurrency distributed systems functional principles performance optimization)
- Professional experience working in an agile environment
- Strong analytical and problemsolving ability
- Strong written and verbal communication skills
- Experience in operating and maintaining productiongrade software
- Comfortable with tackling very loosely defined problems and thrive when working on a team which has autonomy in their day to day decisions
Preferred Skills
- Indepth knowledge of software and data engineering best practices
- Experience in mentoring and leading junior engineers
- Experience in serving as the technical lead for complex software development projects
- Experience with largescale distributed data technologies and tools
- Strong experience with multiple database models ( relational document inmemory search etc )
- Strong experience with Data Streaming Architecture ( Kafka Spark Airflow SQL NoSQL CDC etc )
- Strong knowledge of cloud data platforms and technologies such as GCS BigQuery Cloud Composer Pub/Sub Dataflow Dataproc Looker and other cloudnative offerings
- Strong Knowledge of Infrastructure as Code (IaC) and associated tools (Terraform ansible etc)
- Experience pulling data from a variety of data source types including Mainframe (EBCDIC) Fixed Length and delimited files databases (SQL NoSQL Timeseries)
- Strong coding skills for analytics and data engineering (Java Python and Scala)
- Experience performing analysis with large datasets in a cloudbased environment preferably with an understanding of Googles Cloud Platform (GCP)
- Understands how to translate business requirements to technical architectures and designs
- Comfortable communicating with various stakeholders (technical and nontechnical)
- Experience with Airflow and Spark:
- Airflow: Proven experience in using Apache Airflow for orchestrating and scheduling workflows. Ability to design implement and manage complex data pipelines. Understanding of DAGs (also how to dynamically create them) task dependencies and error handling within Airflow.
- Spark: Handson experience with Apache Spark for largescale data processing and analytics. Proficiency in writing Spark jobs in Java (PySpak also fine as were moving in that direction). Also contains the ability to optimizie performance and handling data transformations and aggregations at scale.
Familiarity with GCP Services:
- BigQuery: Experience with Google BigQuery for running SQL queries on large datasets optimizing queries for performance and in general managing data warehousing solutions.
- Composer: Knowledge of Google Cloud Composer for managing and orchestrating workflows.
- Dataproc: Experience with Dataproc for managing and scaling Spark clusters including configuring clusters running jobs and integrating with other GCP services.
- Proficiency in Python Java and SQL:
- Python: Strong foundation in Python with experience in writing clean efficient code and utilizing libraries such as Pandas and NumPy for data manipulation. Proficient in debugging testing and using Python for API interactions and external service integration.
- Java: Proficiency in Java especially for integrating with data processing frameworks. Experience with Javabased libraries and tools relevant to data engineering is a plus.
- SQL: Experience in writing and optimizing complex SQL queries for data extraction transformation and analysis.
- Knowledge of Terraform (Optional but Preferred):
- Terraform: Familiarity with Terraform to automate the provisioning and management of cloud resources. Ability to write and maintain Terraform scripts to define and deploy GCP resources ensuring infrastructure consistency and scalability.
Nice to have Skills (though not required):
- Exposure to datascience or machinelearning packages (Pandas Pytorch Keras TensorFlow etc...)
- Contributions to opensource software (code docs or mailing list posts)
- GCP Professional Data Engineer Certification
- Minimum 7 years software engineering experience
- 5 years experience with Java & Spring Boot Framework
- Experience with REST concepts
- Experience with XML and JSON data formats
- Experience of largeteam development in integrated environments (eg: Intellij ) using managed source control systems (e.g. Git)
- Evidence of Design Skills and a good understanding of Design Patterns and why it is good practice to use them
- Good experience with Test Driven Development (TDD) and unit testing frameworks
- Agile program experience with continuous delivery approach
Desirable:
- Microservices architecture
- Development of cloud native apps
- Experience with The twelvefactor app methodology
- JIRA / Confluence
Personal Skills & Qualities:
- Selfmotivated with strong team spirit;
- Strong work ethic
- Ability to work independently with little supervision as well as ability to work within a team;
- Excellent multitasking skills;
- Ability to communicate well to both technical and nontechnical staff