Tutorial: Installing a Apache Hadoop Single Node Cluster with Hortonworks Data Platform

No Comments

In this Tutorial I will show you a complete way how you can install your own small Hadoop Single Node Cluster with the Hortonworks Data Platform inside a Virtualbox. After the easy setup you can play around with the cluster and get some experience with it without the need to setup a new machine. It could also be a local development environment where you can debug your Map/Reduce jobs. The Hortonworks Data Platform is an 100% Open Source Apache Hadoop Distribution and comes with the following components:

  • Hadoop Distributed File System (HDFS)
  • MapReduce
  • Apache Pig
  • Apache Hive
  • Apache HCatalog
  • Templeton
  • Apache HBase
  • Apache ZooKeeper
  • Apache Oozie
  • Apache Sqoop
  • Ganglia
  • Nagios

This tutorial is based on this quick start guide. It’s recommended to have a fast internet connection during the HMC setup. Otherwise you maybe run into problems with Puppet timeouts. In this case you can try to pre-install some of the RPMs. Have a look in this thread in the Hortonworks forum.

Install Virtualbox

  1. The first step is the installation of the Virtualbox Software, which can be downloaded here. Please choose the installation binaries for your operating system.
  2. Install Virtualbox with default options.
  3. Download the ISO for CentOS 6.3 from your favourite mirror. (Maybe you take directly this one).
  4. Install the ISO-file in your Virtualbox. You will find detailed setup instructions here.
  5. Before you start the virtual machine make sure that you configure the following settings:
    • Main memory: 4096 MB
    • Disk space: 16 GB
    • Enable the bridged network adapter
    • Enable IOAPIC
  6. Start the Virtual machine

See also the screenshots below:

Virtualbox Settings

Virtualbox Settings

Install CentOS

  1. When everthing is working correctly then CentOS will start the installation process.
  2. Please chosse “Install or upgrade an existing system” from the list.
  3. For the hostname leave the default “localhost.localdomain”.
  4. Skip the media test.
  5. Virtualbox Settings

  6. Choose the installation type “Minimal Desktop”.
  7. Virtualbox Settings

  8. Create a user for the cluster (e.g. hadoop).
  9. After the successful setup reboot your virtual system and login as root.

Prepare the HMC Single Node Cluster Setup

  1. Change the keyboard layout to the correct language through “System->Administration->Keyboard”.
  2. Disable the firewall.
  3. chkconfig iptables off
    chkconfig ip6tables off

  4. Disable SELinux.
  5. vi /etc/selinux/config

    • Change SELINUX=enforcing to SELINUX=disabled.
  6. Configure ntpd to start at bootup.
  7. chkconfig ntpd on

  8. Edit the File “/etc/hosts” so that it looks like in the following screenshot. It is important that the first entry is “localhost.localdomain”, otherwise the HMC-Setup will not work, because you will get a problem with the hostname resolution.
  9. Virtualbox Settings

  10. Type “hostname -f” in the terminal. It should be “localhost.localdomain”.
  11. Type “hostname -s” in the terminal. It should be “localhost”.
  12. Start the ssh-Service with
  13. /sbin/service sshd start

  14. Make sure that sshd ist started automatically on startup.
  15. chkconfig sshd on

  16. Prepare password-less SSH Login for the root user to localhost.
  17. ssh-keygen
    ssh-copy-id localhost
    chmod 700 .ssh
    chmod 640 authorized_keys

  18. Check that password-less login works with
  19. ssh localhost

  20. Create a text file “hostdetail.txt” with the host names that will be part of your cluster. In our example with only one Node it should only contain this entry:
  21. localhost.localdomain

  22. When you want to use a GUI-Editor to edit the file then you will get this error. Just install your favourite editor, e.g. gedit. Just follow the instructions.
  23. Virtualbox Settings

  24. After this preparation it’s recommended to make a snapshot of your actual system so that you can come back to this point when something goes wrong with the current installation.

Install Hortonworks Data Platform with HMC

  1. Download the RPM (Please verify if there is a newer version on this page)
  2. rpm -Uvh http://public-repo-1.hortonworks.com/HDP-1.1.1.16/repos/centos6/hdp-release-1.1.1.16-1.el6.noarch.rpm

  3. Install “Extra Packages for Enterprise Linux (EPEL)”.
  4. yum install epel-release

  5. Install HMC.
  6. yum install hmc

  7. Check the installation status with
  8. rpm -qa | grep hmc

  9. Start the HMC service. You will be prompted to agree to the Oracle Java License and download the binaries.
  10. service hmc start

  11. Stop the firewall
  12. /etc/init.d/iptables stop

  13. Proceed to the final installation step.

Provisioning Your Cluster

  1. Go to the main page of the Hortonworks Management Center (HMC). Maybe you replace “localhost” with the IP from your Virtual machine host, when you access it from outside.
  2. http://localhost/hmc/html

  3. Follow the wizard instructions
  4. When you are prompted to specify the Disk Mount Point then choose another as proposed in the wizard. For example “/data”.
  5. Virtualbox Settings

  6. When the installation was successful you should see this screen :-)
  7. Virtualbox Settings

  8. When there is an error then the following logfiles are maybe helpful for troubleshooting:
  9. /var/log/hmc/hmc.log
    /var/log/puppet_apply.log

  10. You can now go to the dashboard and check the status of your cluster:
  11. Virtualbox Settings

  12. To safely shutdown your Cluster please stop all services in the HMC and then you can stop your Virtual machine.
  13. When you restart your system you can start HMC again by issuing the following commands:
  14. service hmc start
    service hmc-agent start

  15. To run the HMC Service on startup follow the steps described here (optional).

You can now start playing around with your own Hadoop Cluster. When you have problems with the setup you can refer to the documentation or just leave a comment here. Merry X-Mas :-)

Author

Dennis Schulte

Dennis Schulte

Dennis Schulte ist seit 2009 als Senior IT Consultant bei der codecentric AG tätig. Er unterstützt seine Kunden insbesondere im Bereich Enterprise-Architekturen und der Optimierung von IT-Prozessen. Aufgrund seiner langjährigen Erfahrung als Architekt und Entwickler verfügt er über ein umfassendes Wissen im Bereich Java EE und Open-Source-Technologien und hier insbesondere im Umfeld Spring Batch und Massendatenverarbeitung. Seine Projektschwerpunkte liegen in der Architekturberatung und der Durchführung von Projekten im Versicherungsumfeld.

Share on FacebookGoogle+Share on LinkedInTweet about this on TwitterShare on RedditDigg thisShare on StumbleUpon

Comment

Your email address will not be published. Required fields are marked *