This is a remote position.
We are looking for Data Engineer to join our dynamic team.
- Data Pipeline Development: Designing building and maintaining scalable data pipelines to collect process and store data from various sources.
- Data Integration: Integrating data from different sources including databases APIs and thirdparty services to create a unified view of the data.
- Database Management: Managing and optimising relational and nonrelational databases ensuring they are performant secure and reliable.
- Data Modeling: Designing and implementing data models to support efficient querying and analysis often involving creating data warehouses or data lakes.
- ETL Processes: Developing and maintaining Extract Transform Load (ETL) processes to convert raw data into a usable format for analytics and reporting.
- Performance Optimization: Monitoring and tuning database performance optimising queries and ensuring that data systems handle high loads efficiently.
- Data Quality Assurance: Implementing processes and tools to ensure data accuracy consistency and reliability including data validation and cleansing.
- Collaboration: Working closely with data scientists analysts and other stakeholders to understand data needs and provide the necessary data infrastructure and support.
- Security and Compliance: Ensuring that data is stored and processed in compliance with relevant regulations and industry standards including implementing data encryption and access controls.
- Documentation and Reporting: Documenting data architecture pipeline processes and system configurations and providing reports or dashboards to monitor system health and data usage.
Requirements
- Programming Languages: Proficiency in languages like Python Java or Scala for scripting data manipulation and building data pipelines.
- SQL and Database Management: Expertise in SQL for querying databases and managing relational databases such as PostgreSQL MySQL or Microsoft SQL Server as well as knowledge of NoSQL databases like MongoDB or Cassandra.
- Data Warehousing Solutions: Experience with data warehousing technologies like Amazon Redshift Google BigQuery Snowflake or traditional systems like Teradata.
- ETL Tools: Familiarity with ETL (Extract Transform Load) tools and frameworks such as Apache Airflow Apache NiFi Talend or Informatica for building and managing data pipelines.
- Big Data Technologies: Knowledge of big data frameworks and tools like Hadoop Apache Spark or Apache Flink for handling largescale data processing.
- Cloud Platforms: Proficiency in cloud computing platforms like AWS Google Cloud Platform (GCP) or Microsoft Azure including their data services and tools.
- Data Modeling: Skills in designing and implementing data models including understanding of dimensional modeling normalization and denormalization.
- Data Integration: Ability to integrate data from diverse sources including APIs thirdparty services and various data formats like JSON XML or CSV.
- Version Control: Experience with version control systems like Git for managing code changes and collaborating with other team members.
- ProblemSolving and Analytical Thinking: Strong problemsolving skills to troubleshoot and resolve data issues optimize performance and develop efficient solutions.
- Apache: 3 years (Preferred)
- SQL: 4 years (Preferred)
- Data warehouse: 3 years (Preferred)
Benefits
- Work Location: Remote
- 5 days working
Programming Languages: Proficiency in languages like Python, Java, or Scala for scripting, data manipulation, and building data pipelines. SQL and Database Management: Expertise in SQL for querying databases and managing relational databases such as PostgreSQL, MySQL, or Microsoft SQL Server, as well as knowledge of NoSQL databases like MongoDB or Cassandra. Data Warehousing Solutions: Experience with data warehousing technologies like Amazon Redshift, Google BigQuery, Snowflake, or traditional systems like Teradata. ETL Tools: Familiarity with ETL (Extract, Transform, Load) tools and frameworks such as Apache Airflow, Apache NiFi, Talend, or Informatica for building and managing data pipelines. Big Data Technologies: Knowledge of big data frameworks and tools like Hadoop, Apache Spark, or Apache Flink for handling large-scale data processing. Cloud Platforms: Proficiency in cloud computing platforms like AWS, Google Cloud Platform (GCP), or Microsoft Azure, including their data services and tools. Data Modeling: Skills in designing and implementing data models, including understanding of dimensional modeling, normalization, and denormalization. Data Integration: Ability to integrate data from diverse sources, including APIs, third-party services, and various data formats like JSON, XML, or CSV. Version Control: Experience with version control systems like Git for managing code changes and collaborating with other team members. Problem-Solving and Analytical Thinking: Strong problem-solving skills to troubleshoot and resolve data issues, optimize performance, and develop efficient solutions. Apache: 3 years (Preferred) SQL: 4 years (Preferred) Data warehouse: 3 years (Preferred)