In this Tutorial I will show you a complete way how you can install your own small Hadoop Single Node Cluster with the Hortonworks Data Platform inside a Virtualbox. After the easy setup you can play around with the cluster and get some experience with it without the need to setup a new machine. It could also be a local development environment where you can debug your Map/Reduce jobs. The Hortonworks Data Platform is an 100% Open Source Apache Hadoop Distribution and comes with the following components:
- Hadoop Distributed File System (HDFS)
- Apache Pig
- Apache Hive
- Apache HCatalog
- Apache HBase
- Apache ZooKeeper
- Apache Oozie
- Apache Sqoop
This tutorial is based on this quick start guide. It’s recommended to have a fast internet connection during the HMC setup. Otherwise you maybe run into problems with Puppet timeouts. In this case you can try to pre-install some of the RPMs. Have a look in this thread in the Hortonworks forum.
- The first step is the installation of the Virtualbox Software, which can be downloaded here. Please choose the installation binaries for your operating system.
- Install Virtualbox with default options.
- Download the ISO for CentOS 6.3 from your favourite mirror. (Maybe you take directly this one).
- Install the ISO-file in your Virtualbox. You will find detailed setup instructions here.
- Before you start the virtual machine make sure that you configure the following settings:
- Main memory: 4096 MB
- Disk space: 16 GB
- Enable the bridged network adapter
- Enable IOAPIC
- Start the Virtual machine
See also the screenshots below:
- When everthing is working correctly then CentOS will start the installation process.
- Please chosse “Install or upgrade an existing system” from the list.
- For the hostname leave the default “localhost.localdomain”.
- Skip the media test.
- Choose the installation type “Minimal Desktop”.
- Create a user for the cluster (e.g. hadoop).
- After the successful setup reboot your virtual system and login as root.
Prepare the HMC Single Node Cluster Setup
- Change the keyboard layout to the correct language through “System->Administration->Keyboard”.
- Disable the firewall.
- Disable SELinux.
chkconfig iptables off
chkconfig ip6tables off
- Change SELINUX=enforcing to SELINUX=disabled.
chkconfig ntpd on
/sbin/service sshd start
chkconfig sshd on
chmod 700 .ssh
chmod 640 authorized_keys
Install Hortonworks Data Platform with HMC
- Download the RPM (Please verify if there is a newer version on this page)
- Install “Extra Packages for Enterprise Linux (EPEL)”.
- Install HMC.
- Check the installation status with
- Start the HMC service. You will be prompted to agree to the Oracle Java License and download the binaries.
- Stop the firewall
- Proceed to the final installation step.
rpm -Uvh http://public-repo-1.hortonworks.com/HDP-220.127.116.11/repos/centos6/hdp-release-18.104.22.168-1.el6.noarch.rpm
yum install epel-release
yum install hmc
rpm -qa | grep hmc
service hmc start
Provisioning Your Cluster
- Go to the main page of the Hortonworks Management Center (HMC). Maybe you replace “localhost” with the IP from your Virtual machine host, when you access it from outside.
- Follow the wizard instructions
- When you are prompted to specify the Disk Mount Point then choose another as proposed in the wizard. For example “/data”.
- When the installation was successful you should see this screen 🙂
- When there is an error then the following logfiles are maybe helpful for troubleshooting:
- You can now go to the dashboard and check the status of your cluster:
- To safely shutdown your Cluster please stop all services in the HMC and then you can stop your Virtual machine.
- When you restart your system you can start HMC again by issuing the following commands:
- To run the HMC Service on startup follow the steps described here (optional).
service hmc start
service hmc-agent start
You can now start playing around with your own Hadoop Cluster. When you have problems with the setup you can refer to the documentation or just leave a comment here. Merry X-Mas 🙂