In this Tutorial I will show you a complete way how you can install your own small Hadoop Single Node Cluster with the Hortonworks Data Platform inside a Virtualbox. After the easy setup you can play around with the cluster and get some experience with it without the need to setup a new machine. It could also be a local development environment where you can debug your Map/Reduce jobs. The Hortonworks Data Platform is an 100% Open Source Apache Hadoop Distribution and comes with the following components:
- Hadoop Distributed File System (HDFS)
- Apache Pig
- Apache Hive
- Apache HCatalog
- Apache HBase
- Apache ZooKeeper
- Apache Oozie
- Apache Sqoop
This tutorial is based on this quick start guide. It’s recommended to have a fast internet connection during the HMC setup. Otherwise you maybe run into problems with Puppet timeouts. In this case you can try to pre-install some of the RPMs. Have a look in this thread in the Hortonworks forum.
- The first step is the installation of the Virtualbox Software, which can be downloaded here. Please choose the installation binaries for your operating system.
- Install Virtualbox with default options.
- Download the ISO for CentOS 6.3 from your favourite mirror. (Maybe you take directly this one).
- Install the ISO-file in your Virtualbox. You will find detailed setup instructions here.
- Before you start the virtual machine make sure that you configure the following settings:
- Main memory: 4096 MB
- Disk space: 16 GB
- Enable the bridged network adapter
- Enable IOAPIC
- Start the Virtual machine
See also the screenshots below:
- When everthing is working correctly then CentOS will start the installation process.
- Please chosse “Install or upgrade an existing system” from the list.
- For the hostname leave the default “localhost.localdomain”.
- Skip the media test.
- Choose the installation type “Minimal Desktop”.
- Create a user for the cluster (e.g. hadoop).
- After the successful setup reboot your virtual system and login as root.
Prepare the HMC Single Node Cluster Setup
- Change the keyboard layout to the correct language through “System->Administration->Keyboard”.
- Disable the firewall.
chkconfig iptables off
chkconfig ip6tables off
- Disable SELinux.
Configure ntpd to start at bootup.
- Change SELINUX=enforcing to SELINUX=disabled.
Edit the File “/etc/hosts” so that it looks like in the following screenshot. It is important that the first entry is “localhost.localdomain”, otherwise the HMC-Setup will not work, because you will get a problem with the hostname resolution.
chkconfig ntpd on
Type “hostname -f” in the terminal. It should be “localhost.localdomain”.Type “hostname -s” in the terminal. It should be “localhost”.Start the ssh-Service with
Make sure that sshd ist started automatically on startup.
/sbin/service sshd start
Prepare password-less SSH Login for the root user to localhost.
chkconfig sshd on
Check that password-less login works with
chmod 700 .ssh
chmod 640 authorized_keys
Create a text file “hostdetail.txt” with the host names that will be part of your cluster. In our example with only one Node it should only contain this entry:
When you want to use a GUI-Editor to edit the file then you will get this error. Just install your favourite editor, e.g. gedit. Just follow the instructions.
After this preparation it’s recommended to make a snapshot of your actual system so that you can come back to this point when something goes wrong with the current installation.
Install Hortonworks Data Platform with HMC
- Download the RPM (Please verify if there is a newer version on this page)
rpm -Uvh http://public-repo-1.hortonworks.com/HDP-184.108.40.206/repos/centos6/hdp-release-220.127.116.11-1.el6.noarch.rpm
- Install “Extra Packages for Enterprise Linux (EPEL)”.
yum install epel-release
- Install HMC.
yum install hmc
- Check the installation status with
rpm -qa | grep hmc
- Start the HMC service. You will be prompted to agree to the Oracle Java License and download the binaries.
service hmc start
- Stop the firewall
- Proceed to the final installation step.
Provisioning Your Cluster
- Go to the main page of the Hortonworks Management Center (HMC). Maybe you replace “localhost” with the IP from your Virtual machine host, when you access it from outside.
- Follow the wizard instructions
- When you are prompted to specify the Disk Mount Point then choose another as proposed in the wizard. For example “/data”.
- When the installation was successful you should see this screen 🙂
- When there is an error then the following logfiles are maybe helpful for troubleshooting:
- You can now go to the dashboard and check the status of your cluster:
- To safely shutdown your Cluster please stop all services in the HMC and then you can stop your Virtual machine.
- When you restart your system you can start HMC again by issuing the following commands:
service hmc start
service hmc-agent start
- To run the HMC Service on startup follow the steps described here (optional).
You can now start playing around with your own Hadoop Cluster. When you have problems with the setup you can refer to the documentation or just leave a comment here. Merry X-Mas 🙂