Data & Analytics

5th Oct 2018

Beginners guide for Cloudera and how to configure it on AWS – EC2!

Share:

Beginners guide for Cloudera and how to configure it on AWS – EC2!

What is Cloudera Distribution of Hadoop AKA CDH?

Cloudera release or sell products which includes the official Apache Hadoop release, and/or their own and other useful tools.

Other companies or organizations release products that include artifact builds from modified or extended versions of the Apache Hadoop source tree.

Such derivative works are not supported by the Apache Team: all support issues must be directed to the suppliers themselves.

There are two versions of Cloudera distribution as follows,

  1. Cloudera Express – Free version
  2. Cloudera Enterprise versions – Paid version

Following are the steps to configure CDH cluster in EC2 machines.

Pre-Requisites:

  • Login to https://www.cloudera.com/downloads.html and download Cloudera Manager by providing your sign up credentials
  • Login to all the EC2 machines with which you are about to set up the cluster.
  • Edit /etc/hosts file and add “hostnames and private IP address” of all the machines
  • Login to the master machine and follow the below steps

Installation:

Step-1:

  • wget https://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin
  • chmod u+x cloudera-manager-installer.bin
  • sudo ./cloudera-manager-installer.bin

Step-2:

Go to publicIP_of_EC2_machine:7180 in the browser with the below credentials

Username: admin

Password: admin

Step-3:

Click add cluster button and go to continue.

Step-4:

Select the CDH distribution version and package which you want to install in the cluster

Step-5:

Copy the hostnames from /etc/hosts file, ideally all the machines with which you want to create a cluster. Provide the hostnames/ip’s of the machines in textbox and continue.

Step-6:

  • Provide the common username and password which is available on all the machines.
  • Typically for linux machines you will either get “ec2-user” or “ubuntu” as a username.
  • Locate the ppk file by clicking the browse button and continue with the installation.

Step-7:

Once host setup is finished, click continue to go to the services installation section.

Leverge your Biggest Asset Data

Inquire Now

Step-8:

In this screen select the services which you want in the cluster (eg.,hadoop, hive, sqoop, oozie, all). And click next to continue

Step-9:

In this section you can select the machines where the master services (Namenode, ResourceManager,  HMaster., etc) should be configured and where the slave services (Datanode, Nodemanager, HRegionServer., etc) should be configured. Select the hosts to configure the corresponding services and click next.

Step-10:

Upon successful installation Cloudera manager will start the cluster and we can start monitoring and use it via Cloudera manager and hue for small dev tasks.

Author

ALEX MAILAJALAM

Alex is a Big Data Evangelist and a Certified Big Data Engineer with many years of experience. He has helped clients to optimize custom Big Data Implementation, migrate legacy systems to Big Data ecosystem, and build integrated Big Data and Analytics solutions to help business leaders generate custom analytics without need of IT.

Share:

Latest Blogs

AI Learning on the Fly: How Zero-Shot Learning is Reshaping Financial Predictions

Gen AI

2nd May 2025

AI Learning on the Fly: How Zero-Shot Learning is Reshaping Financial Predictions

Read More
How fortune 500 companies are accelerating AI innovation with databricks 

Data & Analytics

2nd May 2025

How fortune 500 companies are accelerating AI innovation with databricks 

Read More
Why Strong Data Assurance Practices Are a Game-Changer for Financial Institutions

Quality Engineering

2nd May 2025

Why Strong Data Assurance Practices Are a Game-Changer for Financial Institutions

Read More

Related Blogs

How fortune 500 companies are accelerating AI innovation with databricks 

Data & Analytics

2nd May 2025

How fortune 500 companies are accelerating AI innovation with databricks 

The AI revolution isn’t coming—it’s here, and Fortune 500 companies are in an arms race...

Read More
Optimizing ETL Workflows with Databricks and Delta Lake: Faster, Reliable, Scalable

Data & Analytics

13th Mar 2025

Optimizing ETL Workflows with Databricks and Delta Lake: Faster, Reliable, Scalable

ETL workflows form the backbone of data-driven decision-making in the modern data ecosystem. Although ETL...

Read More
Explainable AI in Finance: Ensuring Accountability and Compliance

Data & Analytics

24th Jan 2025

Explainable AI in Finance: Ensuring Accountability and Compliance

AI transforms the financial sector by enabling optimized decision-making, automating processes, and uncovering insights from...

Read More
Array ( [0] => Array ( [f_s_link] => https://x.com/IndiumSoftware [f_social_icon] => i-x ) [1] => Array ( [f_s_link] => https://www.instagram.com/indium.tech/ [f_social_icon] => i-insta ) [2] => Array ( [f_s_link] => https://www.linkedin.com/company/indiumsoftware/ [f_social_icon] => i-linkedin ) [3] => Array ( [f_s_link] => https://www.facebook.com/indiumsoftware/ [f_social_icon] => i-facebook ) )