Structure
HADOOP
What is Hadoop [me] [data-flair]
Hadoop Components [me] [data-flair]
Hadoop Architecture Overview [me] [data-flair]
Hadoop Installation Modes
HDP Overview HDP tutorial
questions. [data-flair] [me]todo!
quiz [data-flair]
HDFS
HDFS Overview [data-flair]
HDFS Architecture and Components
Concepts: Data Locality, Checkpointing, …
Namespaces
Access (CLI, Web, API)
Security overview
Debugging and Testing
HDFS Modes
HDFS Debugging and Troubleshooting
Common File Types
HDFS Alternatives
questions. [data-flair] [me]todo!
quiz [data-flair]
YARN AND MAPREDUCE
Yarn [me] [data-flair]
YARN VS Mesos [data-flair]
MapReduce [me]
quiz [data-flair]
HIVE
What is Hive [data-flair]
History of Hive
Hive Architecture and Components
Hive Data Units & File Formats
Hive data types & HiveQL
HiveQL Examples
Hive Functions
questions. [data-flair] [me]
quiz [data-flair]
DOCKER, JUPYTER, DLAB
Jupyter notebooks
Docker basics
DEVOPS INTRODUCTION
Infrastructure as a code [me]
CI / CD
Git, branching, merging
Jenkins
Ansible high level overview
Puppet / Chef level overview
topics to check. # todo!
SPARK. [data-flair]
CORE
What is Spark
Spark Architecture and Components>
Spark Installation
Spark RDD. Transformations
Spark Actions
Pair RDD. Shared variables
SQL
Datasets and DataFrames
Optimisators
Spark Deployment
Spark WebUI
PySpark [data-flair]
questions. [data-flair] [data-flair pyspark] [me]
quiz [data-flair]
KAFKA
Kafka. Overview & Architecture [data-flair]
Kafka Topics. Partitions & Replicas
Kafka Producers & Consumers
Kafka Connect. Configuration & Monitoring
questions. [data-flair] [me]
quiz # todo!
STREAMING BASICS
World of Streaming
Kafka Streams
Spark Streaming
Dstream Hands On
Structured Streaming
topics to check. # todo!
ELASTIC BASICS
What is ELK
Elasticsearch and it’s key concepts
What is Logstash
What is Beats platform
What is Kibana and how to query data
How to set up basic Elasticsearch with good predefined configs
Setting up of MetricBeat + Kibana
topics to check. # todo!
NOSQL
NoSQL Overview
HBase
Cassandra
Mongo DB
topics to check. # todo!
ORCHESTRATION & SCHEDULING
Apache Oozie
Apache Airflow
topics to check. # todo!
Last updated
Was this helpful?