Moving Data into Hadoop
At a Glance
This course describes techniques for moving data into Hadoop. There are a variety of ways to get data into Hadoop from simple Hadoop shell commands to more sophisticated processes. Several techniques are presented but three, Sqoop, Flume, and Data Click are covered in greater detail.
About This Course
This course gives an overview of Oozie and how it is able to control Hadoop jobs. It begins with looking at the components required to code a workflow as well as optional components such as case statements, forks, and joins. That is followed by using the Oozie coordinator in order to schedule a workflow.
One of the things that the student will quickly notice is that workflows are coded using XML which tends to get verbose. The last lesson of this course shows a graphical workflow editor tool designed to simplify the work in generating a workflow.
Course Syllabus
After completing this course, you should be able to:
- Describe the MapReduce model v1
- List the limitations of Hadoop 1 and MapReduce 1
- Review the Java code required to handle the Mapper class, the Reducer class, and the program driver needed to access MapReduce
- Describe the YARN model
- Compare YARN / Hadoop 2 / MR2 vs Hadoop 1 / MR1
What will I get after passing this course?
- You will receive a completion certificate.
- You will receive the IBM Explorer - Big Data Administration badge.
Course Syllabus
- Lesson 1: Load Scenarios
- List different scenarios to load data into Hadoop
- Understand how to load data at rest, in motion and from sources
- Lesson 2: Using Sqoop
- Import data from a relational database into HDFS
- Use Sqoop import and export command
- Lesson 3: Flume Overview
- Describe Flume and its uses
- Lesson 4: Using Flume
- List the Flume configuration components
- Describe how to start and configure a Flume agent
- Lesson 5: Using Data Click v10 for BigInsights to Offload Data to Hadoop
- Describe Data Click for BigInsights
- List the major components of Data Click
General Information
- This course is free.
- It is self-paced.
- It can be taken at any time.
- It can be taken as many times as you wish.
Recommended skills prior to taking this course
- Basic understanding of Apache Hadoop and Big Data.
- Basic Linux Operating System knowledge.
- Basic understanding of the Scala, Python, or Java programming languages.
Grading scheme
- The minimum passing mark for the course is 60%, where the review questions are worth 40% and the final exam is worth 60% of the course mark.
- You have 1 attempt to take the exam with multiple attempts per question.
No comments:
Post a Comment
If you people have any doubts regarding content please let us know.