junior
Hadoop
What task can be considered as BigData task?
What is Hadoop?
Pros & Cons of using Hadoop.
Pros: Scalable, Cost Effective, Flexible, Computing Power, Fault Tolerant
Cons: Security Concern, Potential Stability Issues, Not Fit for Small Data
Describe types of data in terms of structure. (Structured, Semi-Structured, Unstructured)
Schema on read vs schema on write
HDFS
Name HDFS node types. (NameNode and DataNode)
What is the purpose of each of them?
How HDFS achieves Fault Tolerance? (Data Replication)
Block size. Physical space allocation of blocks.
Small files problem. Small file problem solutions
What is rack and rack awareness? Data locality.
How does name node mange meta data fsimage; editlogs, SecondaryNameNode
Describe at least one way to achieve High Availability. (StandBy NameNode, QJN, NFS, ZooKeeper etc.)
Active node
Standby node
Journal Node
What are differences between CheckpointingNameNode and BackupNameNode?
File Formats. Describe and compare these file formats: ORC, Sequence; Avro; JSON
Yarn
What is Yarn?
Types of schedulers (FIFO, Fair, Capacity) and how they work
Describe the process of resource distribution during launching of disributed application:
DevOps
Continuous integration, Continuous delivery, Continuous deployment what are the benefits of each?
What is Infrastructure as a Code? [max 1 point]
Docker image; Docker container; Docker Hub.
Describe these DevOps tools:
Jenkins
Sonar
Puppet
Chef
Ansible
Last updated
Was this helpful?