Python/AWS Data Engineer
Remote
Long Term
Desired Experience:
- Python 3.8 - 3.10 (Advanced):
- Experience creating & using Python classes, lambdas, context managers (i.e., `with ` statements), decorators, generators, and list & map comprehensions.
- Experience with the asyncio, threading & multiprocessing libraries a plus.
- Familiarity with Python type annotations (i.e., typing) a plus.
- SQL (Advanced):
- Should know how to perform various kinds of JOINs, use sub-queries, and how to use GROUP BY & aggregation functions.
- Experience with relational database(s) (e.g., PostgreSQL, Oracle, SQL Server, MySQL, or MariaDB, etc.).
- Familiarity with DB table CREATE & ALTER (SQL DDL) statements, or experience designing DB table schemas a plus.
- Understanding of VIEWs (logical & materialized) would be beneficial.
- Experience with Common Table Expressions (CTEs), window functions & analytical functions a plus.
- Cloud-hosted and/or Cloud-native development experience (AWS preferred, but GCP or Azure experience welcome):
- AWS Lambda or GCP Cloud Functions, containerized deployments (e.g., Docker, Kubernetes, AWS ECS/Fargate), AWS EC2, AWS Simple Queuing Service (SQS) queues, AWS Simple Storage Service (S3), etc.
- Experience with the AWS Python CDK (boto3) a plus.
- Experience developing Batch-oriented or Near Real-time Stream Processing (data or event) pipelines,
- or experience with Event-Driven Architectures (EDA) or Event-Sourcing,
- or related Reactive design patterns (e.g., Actor model-based architectures).
Experience in the following a Plus:
- Experience with Extract-Transform-Load (ETL) data workflows (e.g., using Informatica, SAS or Microsoft SSIS).
- Apache Spark (Python or Scala API) experience and/or AWS EMR experience.
- Familiarity with User-Defined Functions (SQL, Python, etc.) and/or Stored Procedures,
- Apache Spark UDFs (Python or Scala) experience a bonus.
- Experience with the Python SQLAlchemy framework (SQLAlchemy Core and/or ORM usage).
- Experience with the Python Pandas library (e.g., dataframes use).
Bonus Experience:
- Experience with NoSQL databases (e.g., Apache Cassandra or DynamoDB)
- and/or distributed key-value stores or distributed caches (e.g., Redis).
- Familiarity with data warehouses (e.g., AWS Redshift, IBM Netezza, Snowflake, etc.).
- Experience with Apache Kafka (and/or Kafka Streams-based stream processing).
- Experience with the Python NumPy or SciPy libraries.
- Scala or Kotlin experience a bonus.
Bachelor's degree in Computer Science or related field or equivalent combination of industry-related professional experience and education.