HADOOP

prestige
Programming
Free
  • 0 lessons
  • 0 quizzes
  • 10 week duration

HADOOP COURSE CONTENT – (HADOOP-1.X, 2.X & 3.X)

(Development, Administration & REAL TIME Projects Implementation)

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.

It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Learn hadoop at Prestige Point, best institute for Hadoop training in Indore.

 

  • Introduction to BIGDATA and HADOOP
    • What is Big Data?
    • What is Hadoop?
    • Relation between Big Data and Hadoop.
    • What is the need of going ahead with Hadoop?
    • Scenarios to apt Hadoop Technology in REAL TIME Projects
    • Challenges with Big Data
      • Storage
      • Processing
    • How Hadoop is addressing Big Data Changes

Comparison with Other Technologies

  • RDBMS
  • Different Components of Hadoop Echo System
    • Storage Components
    • Processing Components
  • Type of BigData Projects
    • On Premises project
    • Cloud Integrated Project
    • Differences between On Premises & Cloud Integrated Projects
  • Hadoop Installation (Two methods)
  • HDFS (Hadoop Distributed File System)
    • Significance of HDFS in Hadoop
    • Features of HDFS
    • Storage aspects of HDFS
      • Block – the basic storage unit in hadoop
      • How to Configure block size
      • Default Vs Configurable Block size
      • Why HDFS Block size so large?
  • HDFS Architecture –
    • NameNode and its functionality
    • DataNode and its functionality
  • Replication in Hadoop – Fail Over Mechanism
  • Accessing HDFS
    • CLI (Command Line Interface) and HDFS Commands
  • Features of HDFS
  • File read operation
  • File write operation
  • Rack Awareness
  • Hadoop Archives
    • Configuration files in Hadoop Installation and the Purpose
    • Difference between Hadoop 1.X.X , Hadoop 2.X.X & 3.X.X version
  • MapReduce
    • Why Map Reduce is essential in Hadoop?
    • Processing Daemons of Hadoop
    • Node Manager
    • Resource Manager
    • Job
    • Task
    • Keys and Values
    • Mapreduce Flow
    • Wordcount example
    • Map abstraction
    • Mapper
    • Reduce abstraction
    • Reducer
    • Map only job
    • Combiner
    • Data locality
    • Anatomy of MapReduce
    • Hadoop Data types
    • Input files
    • Hdfs Blocks
    • Input format
    • Input split
    • InputSplits and
    • Records
    • InputSplits and Blocks
    • Record reader
    • Partitioner
    • Shuffling
    • Sorting
    • OutputFormat
  • Apache PIG
    • Introduction to Apache Pig
    • Map Reduce Vs Apache Pig
    • SQL Vs Apache Pig
    • Different datat ypes in Pig
    • Where to Use Map Reduce and PIG in REAL Time Hadoop Projects
    • Modes Of Execution in Pig
      • Local Mode
      • Map Reduce OR Distributed Mode
    • Execution Mechanism
      • Grunt Shell
      • Script
      • Embedded
    • How to write a simple pig script
    • Bags , Tuples and fields in PIG
    • UDFs in Pig
  • HIVE
    • Hive Introduction
    • Need of Apache HIVE in Hadoop
    • When to choose MAP REDUCE , PIG & HIVE in REAL Time Project
    • Hive Architecture
      • Driver
      • Compiler
      • Executor(Semantic Analyzer)
    • Meta Store in Hive
      • Importance Of Hive Meta Store
      • Embedded Metastore VS External Metastore
      • Embedded metastore configuration
      • External metastore configuration
      • Communication mechanism with Metastore and configuration details
      • Drawbacks with Internal/Embedded metastore over External metastore
    • Hive Integration with Hadoop
    • Hive Query Language(Hive QL)
    • Configuring Hive with MySQL MetaStore
    • SQL VS Hive QL
    • Data Slicing Mechanisms
      • Partitions In Hive
      • Static Partitioning in Hive and its performance trade offs
      • Dynamic Partitioning in Hive and its performance trade offs
      • Buckets In Hive
      • Partitioning with Bucketing usage in Real Time Project Use Cases
      • Partitioning Vs Bucketing
      • Real Time Use Cases
    • User Defined Functions(UDFs) in HIVE
      • Need of UDFs in HIVE
    • Hive Serializer/Deserializer – SerDe
  • SQOOP
    • Introduction to Sqoop.
    • MySQL client and Server Installation
    • How to connect to Relational Database using Sqoop
    • Performance Implications in SQOOP Import and how to improve the performance
    • Performance Implications in SQOOP Export and how to improve the performance
    • Different Sqoop Commands
      • Different flavors of Imports
      • Export
      • Hive-Imports
    • SQOOP Incremental Load VS History Load & Limitations in Incremental Load
  • Flume
    • Flume Introduction
    • Flume Architecture
    • Flume Master , Flume Collector and Flume Agent
    • Flume Configurations
    • Real Time Use Case using Apache Flume
  • YARN (Yet another Resource Negotiator) – Next Gen. Map Reduce
    • What is YARN?
    • Difference between Map Reduce & YARN
    • YARN Architecture
      • Resource Manager
      • Application Master
      • Node Manager
    • When should we go ahead with YARN
    • YARN Process flow
  • Projects:
    • Practical Knowledge on Retail data Analysis

Explore more courses on Big Data at Prestige Point, best institute for Hadoop training in Indore.

Curriculum is empty

0.00 average based on 0 ratings

5 Star
0%
4 Star
0%
3 Star
0%
2 Star
0%
1 Star
0%