Structure

  • INTRODUCTION TO BIG DATA

    • What is Big Data

    • Data Characteristics [me]

    • Big Data Solutions [me]

    • questions. [me]todo!

  • HADOOP

  • HDFS

    • HDFS Overview [data-flair]

    • HDFS Architecture and Components

    • Concepts: Data Locality, Checkpointing, …

    • Namespaces

    • Access (CLI, Web, API)

    • Security overview

    • Debugging and Testing

    • HDFS Modes

    • HDFS Debugging and Troubleshooting

    • Common File Types

    • HDFS Alternatives

    • questions. [data-flair] [me]todo!

    • quiz [data-flair]

  • YARN AND MAPREDUCE

  • HIVE

    • What is Hive [data-flair]

    • History of Hive

    • Hive Architecture and Components

    • Hive Data Units & File Formats

    • Hive data types & HiveQL

    • HiveQL Examples

    • Hive Functions

    • questions. [data-flair] [me]

    • quiz [data-flair]

  • DOCKER, JUPYTER, DLAB

    • Jupyter notebooks

    • Docker basics

  • DEVOPS INTRODUCTION

    • Infrastructure as a code [me]

    • CI / CD

    • Git, branching, merging

    • Jenkins

    • Ansible high level overview

    • Puppet / Chef level overview

    • topics to check. # todo!

  • SPARK. [data-flair]

    • CORE

      • What is Spark

      • Spark Architecture and Components>

      • Spark Installation

      • Spark RDD. Transformations

      • Spark Actions

      • Pair RDD. Shared variables

    • SQL

      • Datasets and DataFrames

      • Optimisators

      • Spark Deployment

      • Spark WebUI

    • PySpark [data-flair]

    • quiz [data-flair]

  • KAFKA

    • Kafka. Overview & Architecture [data-flair]

    • Kafka Topics. Partitions & Replicas

    • Kafka Producers & Consumers

    • Kafka Connect. Configuration & Monitoring

    • questions. [data-flair] [me]

    • quiz # todo!

  • STREAMING BASICS

    • World of Streaming

    • Kafka Streams

    • Spark Streaming

      • Dstream Hands On

      • Structured Streaming

    • topics to check. # todo!

  • ELASTIC BASICS

    • What is ELK

    • Elasticsearch and it’s key concepts

    • What is Logstash

    • What is Beats platform

    • What is Kibana and how to query data

    • How to set up basic Elasticsearch with good predefined configs

    • Setting up of MetricBeat + Kibana

    • topics to check. # todo!

  • NOSQL

    • NoSQL Overview

    • HBase

    • Cassandra

    • Mongo DB

    • topics to check. # todo!

  • ORCHESTRATION & SCHEDULING

    • Apache Oozie

    • Apache Airflow

    • topics to check. # todo!

  • DATA FLOW. PIPELINING

    • Data flow [me]

    • Nifi Flow [me]

    • Sreamsets Data Collector [me]

    • topics to check. # todo!

Last updated