

  • What task can be considered as BigData task?

  • What is Hadoop?

  • Pros & Cons of using Hadoop.

    • Pros: Scalable, Cost Effective, Flexible, Computing Power, Fault Tolerant

    • Cons: Security Concern, Potential Stability Issues, Not Fit for Small Data

  • Describe types of data in terms of structure. (Structured, Semi-Structured, Unstructured)

  • Schema on read vs schema on write


  • Name HDFS node types. (NameNode and DataNode)

  • What is the purpose of each of them?

  • How HDFS achieves Fault Tolerance? (Data Replication)

  • Block size. Physical space allocation of blocks.

  • Small files problem. Small file problem solutions

  • What is rack and rack awareness? Data locality.

  • How does name node mange meta data fsimage; editlogs, SecondaryNameNode

  • Describe at least one way to achieve High Availability. (StandBy NameNode, QJN, NFS, ZooKeeper etc.)

    • Active node

    • Standby node

    • Journal Node

  • What are differences between CheckpointingNameNode and BackupNameNode?

  • File Formats. Describe and compare these file formats: ORC, Sequence; Avro; JSON


  • What is Yarn?

  • Types of schedulers (FIFO, Fair, Capacity) and how they work

  • Describe the process of resource distribution during launching of disributed application:


  • Continuous integration, Continuous delivery, Continuous deployment what are the benefits of each?

  • What is Infrastructure as a Code? [max 1 point]

  • Docker image; Docker container; Docker Hub.

  • Describe these DevOps tools:

    • Jenkins

    • Sonar

    • Puppet

    • Chef

    • Ansible

Last updated