A. Core Data Engineering Concepts
SQL (joins, window functions, performance tuning)

Data Modeling (star vs snowflake, normalization)

ETL/ELT pipelines (batch vs streaming, orchestration tools like Airflow)

B. Apache Spark / PySpark
Catalyst Optimizer & Tungsten

Narrow vs Wide transformations

Joins (broadcast, sort-merge), Skew handling

AQE (Adaptive Query Execution)

Partitioning, Predicate Pushdown

Execution Plan (DAG → Stage → Tasks)

Spark UI and Job Debugging

SCD Type 2 Implementation in PySpark

C. AWS
S3, Glue, Athena, Lambda, EMR, Redshift

Event-driven design (S3 → EventBridge → Lambda)

Security: IAM roles, bucket policies, encryption

CI/CD in AWS (CodePipeline, CloudFormation)

D. Python
Writing modular, reusable code

Working with Pandas, Boto3 (for AWS interaction)

Exception handling, logging

Lambda functions and decorators

E. Kafka / Streaming
Kafka topic partitioning, consumer groups

Offset management

Integration with Spark Structured Streaming

Question

A. Core Data Engineering Concepts
SQL (joins, window functions, performance tuning)

Data Modeling (star vs snowflake, normalization)

ETL/ELT pipelines (batch vs streaming, orchestration tools like Airflow)

B. Apache Spark / PySpark
Catalyst Optimizer & Tungsten

Narrow vs Wide transformations

Joins (broadcast, sort-merge), Skew handling

AQE (Adaptive Query Execution)

Partitioning, Predicate Pushdown

Execution Plan (DAG → Stage → Tasks)

Spark UI and Job Debugging

SCD Type 2 Implementation in PySpark

C. AWS
S3, Glue, Athena, Lambda, EMR, Redshift

Event-driven design (S3 → EventBridge → Lambda)

Security: IAM roles, bucket policies, encryption

CI/CD in AWS (CodePipeline, CloudFormation)

D. Python
Writing modular, reusable code

Working with Pandas, Boto3 (for AWS interaction)

Exception handling, logging

Lambda functions and decorators

E. Kafka / Streaming
Kafka topic partitioning, consumer groups

Offset management

Integration with Spark Structured Streaming

EPAM Systems