HADOOP

prestige
Programming
  • 0 lessons
  • 0 quizzes
  • 10 week duration

HADOOP COURSE CONTENT – (HADOOP-1.X, 2.X & 3.X)

(Development, Administration & REAL TIME Projects Implementation)

 

  • Introduction to BIGDATA and HADOOP
    • What is Big Data?
    • What is Hadoop?
    • Relation between Big Data and Hadoop.
    • What is the need of going ahead with Hadoop?
    • Scenarios to apt Hadoop Technology in REAL TIME Projects
    • Challenges with Big Data
      • Storage
      • Processing
    • How Hadoop is addressing Big Data Changes

Comparison with Other Technologies

  • RDBMS
  • Different Components of Hadoop Echo System
    • Storage Components
    • Processing Components
  • Type of BigData Projects
    • On Premises project
    • Cloud Integrated Project
    • Differences between On Premises & Cloud Integrated Projects
  • Hadoop Installation (Two methods)
  • HDFS (Hadoop Distributed File System)
    • Significance of HDFS in Hadoop
    • Features of HDFS
    • Storage aspects of HDFS
      • Block – the basic storage unit in hadoop
      • How to Configure block size
      • Default Vs Configurable Block size
      • Why HDFS Block size so large?
  • HDFS Architecture –
    • NameNode and its functionality
    • DataNode and its functionality
  • Replication in Hadoop – Fail Over Mechanism
  • Accessing HDFS
    • CLI (Command Line Interface) and HDFS Commands
  • Features of HDFS
  • File read operation
  • File write operation
  • Rack Awareness
  • Hadoop Archives
    • Configuration files in Hadoop Installation and the Purpose
    • Difference between Hadoop 1.X.X , Hadoop 2.X.X & 3.X.X version
  • MapReduce
    • Why Map Reduce is essential in Hadoop?
    • Processing Daemons of Hadoop
    • Node Manager
    • Resource Manager
    • Job
    • Task
    • Keys and Values
    • Mapreduce Flow
    • Wordcount example
    • Map abstraction
    • Mapper
    • Reduce abstraction
    • Reducer
    • Map only job
    • Combiner
    • Data locality
    • Anatomy of MapReduce
    • Hadoop Data types
    • Input files
    • Hdfs Blocks
    • Input format
    • Input split
    • InputSplits and
    • Records
    • InputSplits and Blocks
    • Record reader
    • Partitioner
    • Shuffling
    • Sorting
    • OutputFormat
  • Apache PIG
    • Introduction to Apache Pig
    • Map Reduce Vs Apache Pig
    • SQL Vs Apache Pig
    • Different datat ypes in Pig
    • Where to Use Map Reduce and PIG in REAL Time Hadoop Projects
    • Modes Of Execution in Pig
      • Local Mode
      • Map Reduce OR Distributed Mode
    • Execution Mechanism
      • Grunt Shell
      • Script
      • Embedded
    • How to write a simple pig script
    • Bags , Tuples and fields in PIG
    • UDFs in Pig
  • HIVE
    • Hive Introduction
    • Need of Apache HIVE in Hadoop
    • When to choose MAP REDUCE , PIG & HIVE in REAL Time Project
    • Hive Architecture
      • Driver
      • Compiler
      • Executor(Semantic Analyzer)
    • Meta Store in Hive
      • Importance Of Hive Meta Store
      • Embedded Metastore VS External Metastore
      • Embedded metastore configuration
      • External metastore configuration
      • Communication mechanism with Metastore and configuration details
      • Drawbacks with Internal/Embedded metastore over External metastore
    • Hive Integration with Hadoop
    • Hive Query Language(Hive QL)
    • Configuring Hive with MySQL MetaStore
    • SQL VS Hive QL
    • Data Slicing Mechanisms
      • Partitions In Hive
      • Static Partitioning in Hive and its performance trade offs
      • Dynamic Partitioning in Hive and its performance trade offs
      • Buckets In Hive
      • Partitioning with Bucketing usage in Real Time Project Use Cases
      • Partitioning Vs Bucketing
      • Real Time Use Cases
    • User Defined Functions(UDFs) in HIVE
      • Need of UDFs in HIVE
    • Hive Serializer/Deserializer – SerDe
  • SQOOP
    • Introduction to Sqoop.
    • MySQL client and Server Installation
    • How to connect to Relational Database using Sqoop
    • Performance Implications in SQOOP Import and how to improve the performance
    • Performance Implications in SQOOP Export and how to improve the performance
    • Different Sqoop Commands
      • Different flavors of Imports
      • Export
      • Hive-Imports
    • SQOOP Incremental Load VS History Load & Limitations in Incremental Load
  • Flume
    • Flume Introduction
    • Flume Architecture
    • Flume Master , Flume Collector and Flume Agent
    • Flume Configurations
    • Real Time Use Case using Apache Flume
  • YARN (Yet another Resource Negotiator) – Next Gen. Map Reduce
    • What is YARN?
    • Difference between Map Reduce & YARN
    • YARN Architecture
      • Resource Manager
      • Application Master
      • Node Manager
    • When should we go ahead with YARN
    • YARN Process flow
  • Projects:
    • Practical Knowledge on Retail data Analysis
Curriculum is empty

0.00 average based on 0 ratings

5 Star
0%
4 Star
0%
3 Star
0%
2 Star
0%
1 Star
0%