Basics of Hadoop

Understand the basics of the Apache Hadoop ecosystem with hands-on exercises in this free analytics training course.

In this short course, you will be introduced to the components and tools of Apache Hadoop. Learn how to store and process large datasets ranging in size from gigabytes to petabytes with big data. The HDFS (Hadoop distributed file system) architecture, data processing using MapReduce, and importing and exporting data using SQOOP will be covered. The course also has a section that provides you with practical knowledge and hands-on activities.

CourseDescription

Apache Hadoop is an open-source software framework that facilitates the use of a network of computer devices to store and process large data sets using simple programming models. It is designed to solve problems that involve analyzing large amounts of data ranging from gigabytes to petabytes (one million gigabytes). The framework is written in Java and is based on Google’s MapReduce programming model. This course begins with an introduction to Hadoop and big data software utility. It will teach you the features, types, and sources of information in big data. The various ways of analyzing big data and its benefits will also be covered. An overview of Apache Hadoop, its framework, history, and the Hadoop ecosystem will be discussed. Then, in the practice section, you will study how to download, start and connect to the Cloudera virtual machine using the Docker platform. Furthermore, you will study the architecture of the Hadoop distributed file system (HDFS). The building blocks of Hadoop, its components and workflow will be explained. Also, some useful HDFS shell commands used to manage files on the HDFS clusters and how to create directories, move, delete and read files will be highlighted.

Next, you will be introduced to MapReduce, studying its architecture and seeing how it works. You will also learn about the data flow of MapReduce, YARN (Yet Another Resource Negotiator) architecture, and the differences between traditional relational database management systems (RDBMS) and MapReduce. Thereafter, you will be taught the architecture of SQOOP and how to import and export data using the SQOOP command-line interface. The syntax for importing data from RDBMS to HDFS and from RDBMS to Hive through SQOOP import and exporting data from HDFS to RDBMS and from HIVE to RDBMS through SQOOP export will be explained in two practice sections. Then, you will study Hive, its architecture, components and data types. The types of tables in Hive, the Hive schema, and data storage will be highlighted. Furthermore, the Impala MPP SQL query engine, its features, and the differences between Impala, Hive, and the traditional RDBMS database will be considered. Also, creating external Hive tables, creating managed Hive tables, and running HQL and Impala queries for analyzing the data will be covered in the practice section.

Next, you will study Pig scripting in Hadoop. You will learn the Pig data types, their uses, and how Pig scripts are executed with the engine. How to load data into Pig as well as filtering data will be also be explained. Creating different Pig Latin scripts, executing and using different functions to perform ETL (extract, transform and load) using Pig will be outlined in the practice section. Then, you will be introduced to the Oozie workflow scheduling system to manage Hadoop jobs. The types of jobs in Oozie, its architecture, features, and actions will be reviewed. Oozie parameterization and how the flow control in the Oozie workflow operates will be critically analyzed. In the practice section, you will learn how to create different actions in SQOOP, Hive, and Pig. This course is for database and data house developers, big data developers, data analysts, and any technical personnel who are interested to learn and explore the various features of Hadoop and its tools. What keeps you waiting? Enroll now and start learning today!

Certificates

All Alison courses are free to enrol study and complete. To successfully complete this Certificate course and become an Alison Graduate, you need to achieve 80% or higher in each course assessment. Once you have completed this Certificate course, you have the option to acquire an official Diploma, which is a great way to share your achievement with the world.

Your Alison Certificate is:

Ideal for sharing with potential employers
Include it in your CV, professional social media profiles and job applications.
An indication of your commitment to continuously learn, upskill & achieve high results.
An incentive for you to continue empowering yourself through lifelong learning.

Free Certificate Courses, Diplom and More

Pages

Basics of Hadoop | What are the basic components of Hadoop? What are the 4 main modules of Hadoop? What are the three stages of Hadoop? What is Hadoop and its types?

Basics of Hadoop

This Free Online Course Includes:

Basics of Hadoop

Your Alison Certificate is:

No comments:

Post a Comment

Make a Web Template Responsive Using HTML5 & CSS3 | Built 2024 best web site | Free site built |

Labels

ad feed

Report Abuse

add feed

Adsence Multi