AWS Data Engineering Course — Basic to Advanced

Light
Dark

Complete AI Data Engineering With AWS ( 6.0 )
Basic To Advance

By Shashank In Grow Data Skills

Live Classes (In Progress)
English
Completion Certificate

Demo Class Video

1 Year

Validity (From the date of Enrollment)

180 Hours

Duration

40

Sessions

15

Projects

INR 13000

📢 Admissions Open

Buy Now

This course includes

🎁 FREE Complete Python Course
Content Duration180 Hours
Total Video Sessions40
Total Industry Projects15
AWS Cloud
Quality Assignments & quizzes after each module
Interview preparation guide
Dedicated placement assistance
Resume & Linkedin profile making
Doubt solving in live classes and offline doubt support on private discord group
Certificate of completion

Tech stack you'll learn

Course Content

✅ Class - 1

What is Database?
What is DBMS?
Transactional Databases vs NoSQL Databases
What is RDBMS?
What is SQL?
MySQL Workbench Setup and Connection
DDL, DML, DQL, DCL Commands
SHOW Databases
CREATE Database
DROP Database
CREATE Table
MySQL DataTypes
INSERT Command Syntaxes
Integrity Constraints
NOT NULL
DEFAULT
UNIQUE
CHECK
PRIMARY Key
Foreign Key
Apply NOT NULL, DEFAULT, UNIQUE, CHECK constraints and test violation
Customized name for Integrity Constraints

✅ Class - 2

ALTER Command
Add Column
Modify Column
Drop Column
Rename Column
Add Constraint
Drop Constraint
Primary Key vs Foreign Key
Create Primary Key
Referential Integrity With Foreign Key Setup
SELECT Command
Count Function
Distinct Function
Aliases and Derived Columns in Select Command
WHERE Clause
Comparison Operators
Logical Operators
BETWEEN Operator
UPDATE Command
Conditional UPDATE
Multicolumn UPDATE
DELETE Command
Conditional DELETE
DROP Command
TRUNCATE Command
DELETE vs DROP vs TRUNCATE
LIMIT
ORDER BY Clause
Multicolumn ORDER BY
LIKE Operation

✅ Class - 3

IS NULL / IS NOT NULL
GROUP BY, HAVING
GROUP CONCAT, GROUP ROLLUP
Subqueries, IN and NOT IN
CASE WHEN
SQL Joins

✅ Class - 4

EXISTS and NOT EXISTS
Window Functions
Frame Clause
COALESCE Function
Common Table Expressions — Iterative and Recursive

✅ Class - 1

Big Data Fundamentals
5 V’s of Big Data
Distributed Computation
Distributed Storage
Cluster and Commodity Hardware
File Formats - CSV, JSON, Parquet, ORC, Avro
Types of Data - Structured, Semi Structured, Unstructured
History of Hadoop
Hadoop Architecture and Components

✅ Class - 2

Map-Reduce Architecture
YARN Architecture

✅ Class - 1

( Theory Part Recorded )

Kafka Architecture
Brokers, Topics, Partitions
Producer, Consumer
Offset Management
Replicas
Commits
Sync and Async Commits
How Does Consumer Group Work?
Rebalancing

✅ Class - 2

Confluent Kafka Setup
Topic Creation
Schema Registry
Key-Value Messages, Message Ordering
Random Key vs Constant Key Messages
Kafka Producer with Avro Serialization
Kafka Consumer with Avro Deserialization
Practical Implementation Of Consumer Groups
KSQLdb in Confluent Kafka
Streams and Tables in KSQLdb
Persistent Queries
JOIN Queries on KSQLdb Streams
Window Based JOIN Streams using KSQLdb
GCP Pub/Sub Setup
Producer and Consumer for GCP Pub/Sub

✅ Class - 1

CAP Theorem
MongoDB and MongoDB Atlas
MongoDB vs Relational Database
MongoDB Features, Use Cases and Architecture
Node, Data Center, Cluster
Data Replication
Write and Read Operations
Indexing
MongoDB Atlas Setup
MongoDB Cluster Creation
MongoDB Compass Setup
Database and Collection
Connect MongoDB Cluster from Compass
Import JSON Data
Python Queries on MongoDB Collection
McDonald’s Payments Stream Data Ingestion from Kafka to MongoDB
Orders and Payments Stream Setup using KSQLdb
MongoDB Sink Connector

✅ Class 1 - Spark Fundamentals & Internals

( Theory Part Recorded )

Problems With Hadoop MapReduce
What is Apache Spark?
Spark Features and Ecosystem
RDD and RDD Properties
Data Partitioning in Spark
Transformations and Actions
Narrow vs Wide Transformations
Read/Write Operations in Spark
Lazy Evaluation
Lineage Graph (DAG)
Spark Web UI Walkthrough & DAG Understanding
Creation Of Job, Stage and Tasks
Spark Architecture And Components
Standalone and YARN Cluster Manager
Deployment Modes - Cluster & Client
Spark Job Internals

✅ Class 2 - Optimization & Resource Planning

( Theory Part Recorded )

Persist and Caching
Storage Levels
Data Skewness
Techniques To Handle Data Skewness
Repartition vs Coalesce
Key Salting
RDD vs DataFrame vs Dataset
Spark-Submit Utility
Spark Memory Management
Executor Memory Components
Dynamic Occupancy Mechanism
Processing 1 TB Data with Spark
Resource Allocation Case Studies & Infra Capacity Planning
Broadcast Variables and Accumulators
Spark Planning Breakdown - Analyzed Plan, Logical Plan, Physical Plan
Spark Failure Scenarios and Resolutions
Out-of-Memory Failures & Fixes
Code and Resource-Level Optimizations
Best Practices For Spark Applications

✅ Class 3 - PySpark Practical

Spark Cluster Setup on AWS EMR
SparkSession Creation
DataFrame with Custom Schema
Read CSV from HDFS
Partitions and Partition Size
Select, WithColumn, WithColumnRenamed
Filter, Drop, DropDuplicates
OrderBy, GroupBy
Accumulator
Case-When
Window Functions
Different Joins In Spark - Broadcast Join, Shuffle Hash Join, Sorted Merge Join
Spark SQL
Register DataFrame as Table
Write CSV and Parquet
Partitioned Write
Coalesce Write
JSON Data Operations

✅ Class 4 - Spark Structured Streaming

Spark-submit Execution
Spark Web UI Walkthrough & Debugging Points
Stream Processing
Spark Structured Streaming
Word Count Problem in Spark Structured Streaming
Output Modes in writeStream
State Management in Spark Structured Streaming
DStream vs Structured Streaming
File Source Streaming
Triggers in Spark Structured Streaming

✅ Class 5 - Stateful Streaming

Checkpointing
Exactly-once in Spark Structured Streaming
Stateless vs Stateful Processing
Global and Windowed Aggregations
Windowing
Sliding Window
Tumbling Window vs Sliding Window
Windowed Aggregations
Arbitrary Stateful Transformations
Watermarking for Delayed Event Handling
Kafka to Spark Structured Streaming
Stateless and Stateful Streaming with Kafka
Kafka to MongoDB Streaming Pipeline

✅ Class - 1

( Theory Part Recorded )

Orchestration in Big Data
Dependency Management in Data Pipeline Design
Airflow Fundamentals
Airflow Architecture and Components
Operators in Airflow
Writing Airflow DAG Scripts
Attribute Description

✅ Class - 2

Amazon Managed Apache Airflow (MWAA) Setup
Sequential DAG Execution using BashOperator and PythonOperator
Parallel DAG Execution using BashOperator and PythonOperator
Spark Cluster Creation, PySpark Job Execution and Cluster Deletion Orchestration via Airflow Job
Data Backfilling using Parameterized Inputs and Airflow Variables
Project: Aviation Data Processing Pipeline
Tech Stack: GitHub, GitHub Actions, AWS S3, PySpark, EMR Serverless, Airflow, Redshift, CI/CD setup

✅ Class - 1

AI vs ML vs Deep Learning vs Generative AI
Why AI Matters for Data Engineers
What is an LLM?
How LLMs Work at a High Level
Tokens, Embeddings and Context Window
Why LLMs Hallucinate
AI Assistant Tools for Data Engineers
Claude
Codex-style Coding Agents
Cursor
How to use AI tools for:
Data Pipeline code generation
SQL query generation
Airflow DAG creation
Data quality rule generation
Documentation generation

✅ Class - 2

Transformer architecture simplified
Attention mechanism simplified
Tokenization and embeddings
Prompt, completion and inference
Temperature, top-p and max tokens
What is RAG?
Why RAG is needed in enterprise data systems
RAG vs fine-tuning
RAG architecture:
Data ingestion
Chunking
Embedding generation
Vector database
Similarity search
Context retrieval
LLM response generation
GenAI data pipeline architecture
Data Engineering use cases of RAG:
Data catalog assistant
SQL assistant
Pipeline documentation assistant
Incident debugging assistant
Business metrics Q&A assistant

✅ Class - 3

What is an AI agent?
LLM vs AI agent
Agent reasoning loop:
Plan
Act
Observe
Respond
Tool calling and function calling
Memory in agents
What is a multi-agent system?
Planner agent, coding agent, testing agent, reviewer agent
What is MCP?
MCP client vs MCP server
How MCP connects AI agents with external tools
MCP use cases in Data Engineering:
Querying databases
Reading pipeline logs
Accessing metadata
Triggering validation checks
Reading files and documentation
Local AI agents for Data Engineering

✅ Class - 1

Databricks Fundamentals
Unity Catalog
Delta Lake and Delta Tables
Databricks Free Tier Account Setup
Workspace Setup
Metastore Setup
Managed and External Catalog Setup
Volumes
Cluster Setup
PySpark Notebook Setup
Read/Write from Databricks Volume
Create Delta Table using DeltaTable Python API
Partitioned Delta Writes
Read from Delta Table
Time Travel
GitHub Integration
Delta Sharing
Create Shares and Recipients
Access Shared Data Locally

✅ Class - 2

Introduction to AI capabilities on Databricks
Spark Declarative Pipelines overview
Lakeflow Connect for managed ingestion
Lakeflow Designer for low-code pipeline development
Building GenAI data pipelines on Databricks
Mosaic AI overview
Vector Search and embeddings on Databricks
Use of AI_QUERY
RAG pipeline design using Delta Tables, Unity Catalog and Mosaic AI
Databricks Genie for natural language analytics
AgentBricks overview for building AI agents on enterprise data
Governance and security for AI workloads using Unity Catalog

✅ Class - 3

Project 1: E-commerce Event-Driven Data Pipeline(Industrial Project)

Tech Stack: Databricks, PySpark, Delta Lake, Databricks Volumes, Databricks Workflows, GitHub
Project 2: Travel & Hospitality Data Platofrm (Industrial Project)

Tech Stack: Databricks, PySpark, Delta Lake (SCD2), Unity Catalog, PyDeequ (For Data Quality Checks), Databricks Volumes, Databricks Workflows
Project 3: DLT Pipeline for Healthcare Domain(Industrial Project)

Tech Stack: Medallion Architecture, Databricks DLT, Delta Lake, SQL, Unity Catalog, Expectations, Databricks Workflows
Project 4: UPI Transactions CDC Streaming Analytics(Industrial Project)

Tech Stack: Databricks, PySpark Structured Streaming, Delta Lake Change Data Feed, Unity Catalog
Exclusive GenAI & RAG Pipeline Projects On Databricks

✅ Class - 1

OLAP vs OLTP
What is a Data Warehouse?
Data Warehouse vs Data Lake vs Data Mart
Fact Tables
Dimension Tables
Slowly Changing Dimensions
Types of SCDs
Star Schema
Snowflake Schema
Galaxy Schema

✅ Class - 2

Case Study 1: Expedia Advanced Data Warehousing & Modeling
Galaxy schema for travel domain
Hotels, flights, cars, payments and customer dimensions
Fact tables: search sessions, impressions, clicks, bookings, cancellations, payments, refunds
Conformed dimensions
Business metrics: GBV, take rate, cancellations, refund rates, average booking value, occupancy
SQL queries for KPIs
Case Study 2: Swiggy Advanced Data Warehousing & Modeling
Food delivery domain model
Facts: sessions, impressions, add-to-cart, order header, order items, delivery trips, cancellations, refunds, payments, rider shifts
Conformed dimensions
SCD2 handling for customers, restaurants, menu items and riders
Metrics: AOV, take rate, on-time delivery, cancellation, refund, repeat customers, rider utilization, SLA adherence

✅ Class - 1

Snowflake Free-Tier Account Setup
Snowflake UI Walkthrough
Load Data and Create Table
Event-Driven Ingestion using Snowpipe
S3 Bucket + AWS SQS + Snowflake Integration
Storage Integration
Notification Integration
External Stage
Snowpipe
Snowflake Tasks
Introduction to Snowflake Cortex AI
Cortex AI Functions Overview
Using Cortex AI Functions with SQL
Text Summarization, Sentiment Analysis, Classification and Entity Extraction using Cortex
Introduction to Cortex Search for RAG-style Retrieval
Introduction to Cortex Analyst for Natural Language Analytics

✅ Class - 2

Project - 1: Event-Driven Incremental News Data Analysis(Industrial Project)

Tech Stack: NewsAPI, AWS Airflow, S3, Snowflake, Python (Requests, Pandas), SQL
Project - 2: Car Rental Data Platform (Industrial Project)

Tech Stack: Python, PySpark, AWS EMR, AWS Airflow, Snowflake (SCD2)
Project - 3: Realtime Entertainment Data Analysis (Industrial Project)

Tech Stack: Python, Snowflake Dynamic Table (CDC), Snowflake Stream, Snowflake Tasks, Streamlit
Project - 4: Customer Support Intelligence using Snowflake Cortex AI (Industrial Project)

Tech Stack: Snowflake, Cortex AI Functions, Cortex Search, Cortex Analyst, Snowflake Tasks, SQL, Streamlit

✅ Class - 1

Challenges with Traditional Data Lake Storage
Open Table Formats
Small File Problem
Iceberg Overview and Architecture
Iceberg Catalog
Metadata Files, Manifest Lists, Manifest Files
Data Files
Backend Representation after CRUD Operations
Create Table, Insert, Merge/Upsert
Copy-on-Write
Delete Files
Positional Delete Files
Equality Delete Files
Merge-on-Read
Choosing between CoW and MoR
Select Query Internals
Create, Insert, Delete, Update, Alter, Merge
Time Travel
Compaction
Iceberg on AWS and Snowflake Case Study
Medallion Architecture with Iceberg

AWS Services Covered

AWS IAM , AWS S3, AWS Lambda, AWS CodeBuild, AWS CloudFormation, AWS SNS, AWS SQS

AWS EC2, AWS EventBridge, AWS CloudWatch, AWS RDS, AWS Aurora, AWS Secrets Manager

AWS Glue, AWS Athena, AWS Redshift, Amazon MWAA (Managed Airflow), AWS EMR, AWS Kinesis Streams, AWS DynamoDB, AWS Kinesis Firehose, AWS Step Functions, Amazon Bedrock, Amazon Q , AWS OpenSearch

✅ Class - 1

AWS Free Tier Account Setup
Setup for Billing Alerts
IAM Users, Roles, Policies and Access Credentials
Amazon Q AI Assistant Usage
AWS S3 — Simple Storage Service Fundamentals
AWS S3 Storage Classes
Role of S3 in Data Engineering and Pipeline Design
General Purpose Buckets
Directory Buckets
Table Buckets — Apache Iceberg and its Fundamentals
Vector Buckets — Vector Embedding and RAG Fundamentals
Create S3 Bucket from AWS Console
Properties and Configuration of S3 Bucket
Create Folder Object in S3 Bucket from AWS Console
Upload, Download and Delete Files from S3 Bucket from AWS Console
AWS CLI Setup
AWS CLI Secrets Configuration
Interact with AWS S3 from Command Line — Create, Delete, Copy and Make Bucket
Access AWS S3 from Python using boto3 Library
AWS Lambda Architecture and Fundamentals
Where Lambda Fits in Data Engineering and Pipeline Architecture
Create First Lambda Function from AWS Console
Lambda Configurations and IAM Role Permissions
Testing Lambda Function with Test Event
Versions and Aliases for AWS Lambda
Weighted Aliases for Production Deployment
Setup S3 Notification Event Trigger for Lambda
Process S3 Notification Event Trigger Event in Lambda
Lambda Logs in CloudWatch
Packaging and Deployment of Lambda Function in ZIP Format
Concept of Layers in AWS Lambda
Creation and Deployment of Custom Layer
Lambda Function Execution with Custom Layer Package
Fundamentals of AWS Lambda Durable Function Pattern
Setup AWS Lambda Durable Function to Trigger Another Lambda as Worker
Execution of AWS Lambda Durable Function
Setup Lambda Function with External Python Dependency
Package Lambda Function and External Python Dependency for Deployment
CI/CD Fundamentals for Data Engineering
Create GitHub Repository for Code Versioning
Design AWS Lambda Function to Read CSV File from S3 based on S3 Create Event Trigger
Setup Project in GitHub Repository with Branching
Create AWS CodeBuild Project for CI/CD of AWS Lambda Function Deployment
AWS CodeBuild Environment Setup for Deployment using buildspec.yml, AWS CLI and Shell Script
Testing CI/CD Job in AWS CodeBuild through Feature Branch and Main Branch of GitHub

✅ Class - 2

AWS SNS Fundamentals
Real-world Pipeline Example with SNS : E-commerce Data Platform
Standard SNS vs FIFO SNS
SNS Configuration Properties
Setup AWS SNS from Console
Create SNS Topic
Publish Message on SNS Topic from Console
Subscription Protocols in SNS
Email Subscriber Setup for SNS Topic
Create AWS Lambda to Send Customized Messages on SNS Topic using boto3
AWS EC2 Fundamentals
Setup EC2 Instance from AWS Console
SSH Access of EC2 Instance
AWS SQS Fundamentals
Standard Queue vs FIFO Queue
Core SQS Parameters and Configurations — Visibility Timeout, Message Retention Period, Maximum Message Size, Delivery Delay, Receive Message Wait Time
Dead Letter Queue Fundamentals
Difference between AWS SQS and Kafka
AWS SQS Setup from Console
Publish and Consume Messages from SQS via Console
Create Lambda Function for Mock Data Generation
Create EventBridge Scheduler to Invoke Lambda Every Minute
Create Consumer Lambda Function and Set SQS as Trigger to Showcase Auto Polling / Event Consumption
Create Consumer Lambda Function to Programmatically Poll Messages from SQS and Showcase Long Polling
Dead Letter Queue Redrive Task
AWS EventBridge Pipe Fundamentals
Setup SQS to Store Logistics-domain Event Data
Create AWS Lambda to Generate Mock Logistics Data and Publish into SQS
Setup AWS EventBridge Pipe with SQS as Source
Apply Filter and Enrichment in EventBridge Pipe
Write Lambda Function for Enrichment and Transformation Logic
Add Target Lambda in EventBridge Pipe

✅ Class - 3

AWS Glue fundamentals
Glue Data Catalog fundamentals
How Glue Data Catalog works with Glue jobs, Crawlers and Athena
Glue Crawler fundamentals
Schema detection and schema evolution using Glue Crawler
Create Glue database and crawl CSV, JSON and partitioned data from S3
Register S3 datasets as Glue Catalog tables
Create Glue jobs from scratch to read data using Glue Catalog tables
Working with Glue DynamicFrames and DataFrames
Glue job configurations and properties
Data quality checks in Glue jobs
Publish Glue Data Quality results
Job Bookmarking for incremental data processing
Visual ETL job with Job Bookmarking to process incremental S3 data and write partitioned output
AWS RDS fundamentals
AWS Aurora architecture
Setup Aurora MySQL
Writer instance and read replica concepts
Aurora VPC, subnet and security group configuration
Connect Aurora writer and read replicas from DBeaver
Store Aurora credentials in AWS Secrets Manager
Create S3, Glue and STS VPC endpoints for Glue-Aurora connectivity
Create JDBC connection for Aurora in AWS Glue
Publish mock IoT data into Aurora MySQL table
Crawl Aurora tables using Glue Crawler and register them in Glue Data Catalog
Create Glue job with Job Bookmarking to process incremental Aurora data
Build medallion architecture using AWS Glue, Triggers and Workflows — Telecom data domain
Generate and upload mock telecom CSV data into S3
Create Bronze layer by reading raw data from Glue Catalog, applying filters and data quality checks, and writing Parquet to S3
Create Silver layer by transforming Bronze data and writing curated Parquet to S3
Create Gold layer by transforming Silver data for analytics-ready output
Create conditional Glue Triggers based on crawler and job success status
Create Glue Workflow with on-demand trigger, conditional triggers and Glue jobs
AWS Athena fundamentals and architecture
Where Athena fits in data pipelines
Athena Query Editor walkthrough
Glue Data Catalog and Athena integration
Configure Athena query results in S3
Create external databases and external tables in Athena
Query CSV, JSON and partitioned S3 data using Athena
Use MSCK REPAIR TABLE to refresh latest partitions
S3 Table Buckets and Apache Iceberg fundamentals
Create, ingest and query Iceberg tables in Athena
Event-driven Shopkart data pipeline using S3, Lambda, Glue and Athena
Trigger Lambda on S3 object creation
Invoke Glue job from Lambda
Transform, validate and write Parquet output to S3 using Glue
Query processed S3 data using Athena for analytics and reporting

✅ Class - 4

AWS Redshift fundamentals and architecture
MPP architecture, leader node and compute nodes
Columnar storage, compression encoding, sort keys and distribution styles — KEY, ALL, EVEN
Setup provisioned Redshift cluster and Redshift Serverless
Redshift Serverless network, security, namespace, workgroup and Query Editor walkthrough
Create database, schema and internal tables with encoding and distribution strategy
Load CSV data into Redshift using COPY command
UNLOAD data from Redshift into CSV and Parquet with manifest file
Load Parquet data into Redshift using manifest file
Create external database, external schema and external tables using Redshift Spectrum
Create and refresh materialized views
AWS EMR fundamentals and architecture
Setup EMR cluster, SSH access and SSH tunneling for application endpoints
Enable Apache Iceberg on EMR
AWS Managed Airflow fundamentals and architecture
Setup Airflow environment and UI walkthrough
Create and deploy Airflow DAGs with Bash operators
Insurance data pipeline orchestrated via Airflow and EMR
Build parameterized Airflow DAG using S3KeySensor, EMRAddStepOperator and EMRStepSensor
Submit Spark job from Airflow to EMR to process daily JSON files from S3 and write into Iceberg tables
EMR Serverless fundamentals
EMR Studio and EMR Serverless runtime application setup
Run Airflow-orchestrated Spark batch job on EMR Serverless
AWS DynamoDB fundamentals
Create DynamoDB table and perform insert, update and delete operations from Console and Python
DynamoDB Streams fundamentals for CDC
AWS Kinesis Data Streams fundamentals and setup
AWS Data Firehose fundamentals
Ride-hailing CDC pipeline using DynamoDB Streams, Kinesis Data Streams, Firehose, S3 and Athena
Create Athena table on CDC data and run analytical queries
AWS Step Functions fundamentals
Core components and state types in Step Functions
Where Step Functions fit in data pipelines
FreshCart quick-commerce data pipeline orchestrated using AWS Step Functions

✅ Class - 1

RAG pipeline design from scratch
Selecting Data Engineering use cases for RAG
Ingesting documents, metadata, SQL files and pipeline documentation
Text extraction and cleaning
Chunking strategies
Embedding generation
Vector database setup
Similarity search and context retrieval
LLM response generation
FastAPI-based RAG service
Dockerizing the RAG application
Deploying the RAG service
Monitoring, evaluation and cost optimization
What is AWS Bedrock?
Base Models & The Converse API
Where Bedrock Fits in the Data Engineering Pipeline
Real-World Use Cases for Data Engineers
Retail Support Intelligence Pipeline Using AWS Bedrock
AWS Step Function Orchrestrated data pipeline for Customer Support Analytics
AWS Glue job to process bronze layer data, standardize it and write it into silver layer
AWS Lambda Function to use Amazon Nova Lite LLM model for each processing each row of customer support ticket. Model classifies the issue, assesses sentiment and urgency, redacts PII, generates a summary, and recommends a next action
AWS Lambda Function to Amazon Titan Embedding LLM Model to generate embeddings for enriched support ticket data
AWS Lambda Function to generate cosine similarity between different pairs of support tickets and keep top-3
AWS Glue job to process LLM enriched, embedded and cosine similarity data along with silver layer data to prepare final fact table
AWS Lambda Function to perform Athena queries and prepare Operational summary for leadership using Amazon Nova Lite LLM model

✅ Class - 2

MCP architecture deep dive
Creating local MCP servers
Creating MCP tools for Data Engineering:
Read files
Query databases
Fetch schema metadata
Read pipeline logs
Run SQL validation
Run data quality checks
Connecting MCP servers with AI assistants
Building AI agents using LangChain / LangGraph-style frameworks
Creating pipeline debugging, SQL assistant and data quality agents
Adding memory, tool calling and human approval before execution
Securing MCP and agent-based workflows
Taking business requirements as input
Using AI to design data pipeline architecture
Generating PySpark pipeline code using AI
Generating SQL transformation logic
Generating Airflow / dbt-style orchestration logic
Adding data quality checks, logging and exception handling
Generating unit and integration test cases
Code coverage analysis
Multi-agent workflow for pipeline development:
Requirement analysis agent
Architecture agent
Code generation agent
Testing agent
Data quality agent
Documentation agent
Review agent
Final review of AI-generated code from a production lens

✅ Project - 1: Aviation Data Processing Pipeline (Covered In Module 6)

Tech Stack - GitHub, GitHub Actions, S3, PySpark, EMR Serverless, AWS Airflow, Redshift, CI/CD setup
✅ Project - 2: E-commerce Event-Driven Data Pipeline (Covered In Module 8)

Tech Stack - Databricks, PySpark, Delta Lake, Databricks Volumes, Databricks Workflows, GitHub
✅ Project - 3: Travel & Hospitality Data Platform (Covered In Module 8)

Tech Stack - Databricks, PySpark, Delta Lake (SCD2), Unity Catalog, PyDeequ (For Data Quality Checks), Databricks Volumes, Databricks Workflows
✅ Project - 4: DLT Pipeline for Healthcare Domain (Covered In Module 8)

Tech Stack - Medallion Architecture, Databricks DLT, Delta Lake, SQL, Unity Catalog, Expectations, Databricks Workflows
✅ Project - 5: UPI Transactions CDC Streaming Analytics (Covered In Module 8)

Tech Stack - Databricks, PySpark Structured Streaming, Delta Lake (Change Data Feed), Unity Catalog
✅ Project - 6: Event-Driven Incremental News Data Analysis (Covered In Module 10)

Tech Stack: NewsAPI, AWS Airflow, S3, Snowflake, Python (Requests, Pandas), SQL
✅ Project - 7: Car Rental Data Platform (Covered In Module 10)

Tech Stack: Python, PySpark, AWS EMR, AWS Airflow, Snowflake (SCD2)
✅ Project - 8: Realtime Entertainment Data Analysis (Covered In Module 10)

Tech Stack: Python, Snowflake Dynamic Table (CDC), Snowflake Stream, Snowflake Tasks, Streamlit
✅ Project - 9: Weather Forecast Data Processing

Tech Stack: Python, OpenWeather API, AWS Airflow, PySpark, EMR Serverless, Redshift, S3, GitHub, GitHub Actions
✅ Project - 10: Unified Stock Trading Analytics Platform

Tech Stack: MongoDB Atlas, Redshift, S3, PySpark, AWS Airflow, EMR, Hive Metastore, Python, SQL
✅ Project - 11: Industrial Maintenance RAG Assistant

Tech Stack: Python, AWS S3, AWS S3 Vectors, AWS Bedrock, AWS Lambda, AWS IAM, AWS DynamoDB, Static Frontend (HTML/CSS/JS), AWS CLI, Text LLM, Embedding LLM
✅ Project - 12: Retail Brokerage Data Processing Pipeline

Tech Stack: Python, AWS Glue (PySpark), AWS Step Functions, AWS EventBridge, AWS S3, AWS Redshift Serverless, AWS CloudFormation, AWS CodeBuild, Amazon SNS, PyTest, GitHub
✅ Project - 13: Retail Banking Analytics Data Platform — Transaction Processing, Risk Scoring & Portfolio Intelligence

Tech Stack: Python, PySpark, EMR Serverless, Apache Airflow (MWAA), AWS S3, Apache Iceberg, AWS Glue Data Catalog, AWS Athena, AWS IAM
✅ Project - 14: Real-Time Ticket Booking & Payment Analytics Platform

Tech Stack: Python, AWS Kinesis Data Streams, AWS Glue Streaming ETL, Spark Structured Streaming, AWS SQS, AWS Redshift Serverless, AWS S3, AWS CloudFormation, AWS Secrets Manager, Amazon VPC, n8n, OpenRouter (Frontier LLM), JavaScript
✅ Project - 15: Ride-Sharing Operations & Driver Performance Analytics Platform

Tech Stack: Python, AWS S3, EMR Serverless (PySpark), AWS Step Functions, AWS EventBridge, AWS Glue Data Catalog, AWS Athena, FastAPI, AWS Lambda Function URL, AWS CloudFormation, AWS CodeBuild, AWS SNS, n8n, Telegram Bot, JavaScript, GitHUb
✅ New Exclusive Project of RAG & GenAI for Data Engineering On AWS

Attention Seeking Resume Preparation and Interview Strategies
Strategies To Crack Tech Interviews
Linkedin Profile Making
How To Expand Your Professional Network On Linkedin
How To Use Various Job Portals
How To Approach For Referrals

Course Schedule

Mode Of The Course:

Online Live Classes

Started On:

20-June-2026 (In Progress)

Course Duration:

180 Hours

Total Sessions:

Total Projects:

Validity:

1 Year (Starting From The Date Of Enrollment)

Class Timing:

Saturday & Sunday [9:00 AM - 12:00 PM Live Teaching, 12:00 PM to 1:00 PM Live Doubt Session] (IST)

Class Duration:

3 Hours Live Teaching, 60 minutes Doubt Solving

Class Recording Provided:

Yes

Programming Language Used:

Python

Prerequisite:

No Prerequisite & No Prior Knowledge

⚠️ Important Notice :

The video may not work on Linux due to DRM restrictions. It is only accessible on Chrome when using Windows or macOS.

Workaround: To access the video on Linux, you can create a Windows virtual machine (VM) and watch the video through the VM. Alternatively, you can use our Android or iOS application to view the video on your mobile device.

Instructor

Shashank Mishra

Staff AI Data Engineer Ex-Expedia, Amazon, PayTM, Prophecy & McKinsey & Company

Shashank Mishra is a Staff AI Data Engineer with 9+ years of experience building scalable data platforms, real-time pipelines, cloud-native systems, and AI-powered data solutions for leading global organizations.

An MCA graduate from NIT Allahabad, he specializes in modern Data Engineering and its evolving intersection with AI, including RAG pipelines, LLM-ready data systems, intelligent workflows, and production-grade AI applications.

He is also the educator behind "E-Learning Bridge" YouTube channel, with 184K+ YouTube subscribers and a LinkedIn community of 189K+ professionals. Having mentored 25,000+ learners, Shashank is known for practical, project-driven teaching that helps professionals build industry-ready skills and prepare for top tech careers.

Complete AI Data Engineering With AWS ( 6.0 ) - Basic To Advance (Live Classes)

INR 13000

Buy Now

Complete AI Data Engineering With AWS ( 6.0 ) Basic To Advance

1 Year

180 Hours

40

15

Tech stack you'll learn

Course Content

Module 1 - SQL

Module 2 - Big Data Fundamentals & Hadoop ( Recorded )

Module 3 - Confluent Kafka

Module 4 - NoSQL Database: MongoDB

Module 5 - Apache Spark (PySpark)

Module 6 - Apache Airflow

Module 7 - AI Fundamentals for Data Engineering

Module 8 - Databricks & AI

Module 9 - Data Warehousing & Data Modelling ( Theory Part Recorded )

Module 10 - Snowflake & AI

Module 11 - Apache Iceberg ( Theory Part Recorded )

Module 12 - AWS Cloud

Module 13 - Practical AI Data Engineering With AWS

Module 14 - Industrial Projects (15 Projects)

Module 15 - Resume, LinkedIn & Interview Strategies

🎁 FREE Complete Python Course

Course Schedule

Instructor

Shashank Mishra

Complete AI Data Engineering With AWS ( 6.0 ) - Basic To Advance (Live Classes)

Namaste 🙏

Complete AI Data Engineering With AWS ( 6.0 )
Basic To Advance