Set-up of a small Riak cluster with VirtualBox, part II

This is the second part of a small tutorial to set up a small Riak cluster using VirtualBox. In the first part, we explained how to install and set up the first node.

Adding Riak nodes

Let us now add more nodes to set up a real cluster, even if a small one. The obvious strategy to do so is to clone and alter the virtual machine we installed in the first part. The easiest way to clone a virtual machine is to use the VirtualBox Manager GUI. It is not advisable to just copy the virtual machine because every virtual machine has its own UUID. And you cannot run two virtual machines with identical IDs at the same time on the same host.

In order to clone a virtual machine you need to make sure it is not running. Choose “Clone” either from the machine menu or from the context menu in the VirtualBox Manager GUI. Click to receive new MAC addresses for all network adapters. Choose to get a complete and not a linked clone, because you need a new, independent virtual disk to store the database. Cloning will take several minutes. Before starting the virtual machine make sure the network adapter is set to “Host-only Adapter” using the same adapter as for the first virtual machine.

The configuration steps that follow are pretty similar to the ones for the first virtual machine. Start the second virtual machine leaving the first one shut down.

We assume the second virtual machine receives IPv4 address 192.168.56.3. We manually configure the network set up to use this address. Start the system administration and go to Network → Wired → Configure → IPv4-Settings-Tab. Choose method “Manual”. Add a new address setting the net mask to 255.255.255.0 and the gateway to the IP-address of the host-only adapter, here 192.168.56.1. Leave DNS and search domain empty. Save your changes.

Riak node configuration

The Riak node on the clone is still identical to the one of the first virtual machine. Thus we have to change that pretty much the same way we had to do for the first node.

Stop the running Riak server.
> /etc/init.d/riak stop
Remove the existing Riak ring.
> rm -rf /var/lib/riak/ring/*
> rm -rf /var/lib/riak/bitcask/*

Change the name of the Riak node in /etc/riak/vm.args:
-name riak@192.168.56.3
or the IP address you picked for the virtual machine. Edit the HTTP and protocol buffer address (2 different lines) in /etc/riak/app.config:
{http, [ {"192.168.56.3", 8098 } ]},
{pb_ip, "192.168.56.3" }

Again, restart the whole virtual machine and check whether the Riak server is running properly. When the machine is up again, run
> riak-admin ring_status
You should see something like this:
Attempting to restart script through sudo -u riak
================================== Claimant ===================================
Claimant: 'riak@192.168.56.3'
Status: up
Ring Ready: true

============================== Ownership Handoff ==============================
No pending changes.

============================== Unreachable Nodes ==============================
All nodes are up and reachable

These steps of cloning and editing have to be repeated for at least one additional node.

Starting the cluster

We assume there are three virtual machines with IPv4 addresses 192.168.56.2, 192.168.56.3, and 192.168.56.4 running. Since we never removed the Riak start scripts, the three Riak servers are also running. But they are not yet connected to form a cluster. This is done by two more commands.

Log in as root on node 192.168.56.3 and run
> riak-admin join riak@192.168.56.2

If you check
> riak-admin ring_status
you get something like this:

Attempting to restart script through sudo -u riak
================================== Claimant ===================================
Claimant: 'riak@192.168.56.2'
Status: up
Ring Ready: false

============================== Ownership Handoff ==============================
Owner: riak@192.168.56.2
Next Owner: riak@192.168.56.3

Index: 844930634081928249586505293914101120738586001408
Waiting on: [riak_kv_vnode]
Complete: [riak_pipe_vnode]

Index: 867766597165223607683437869425293042920709947392
Waiting on: [riak_kv_vnode]
Complete: [riak_pipe_vnode]

Index: 936274486415109681974235595958868809467081785344
Waiting on: [riak_kv_vnode]
Complete: [riak_pipe_vnode]

Index: 1050454301831586472458898473514828420377701515264
Waiting on: [riak_kv_vnode]
Complete: [riak_pipe_vnode]

Index: 1118962191081472546749696200048404186924073353216
Waiting on: [riak_kv_vnode]
Complete: [riak_pipe_vnode]

Index: 1141798154164767904846628775559596109106197299200
Waiting on: [riak_kv_vnode]
Complete: [riak_pipe_vnode]

Index: 1210306043414653979137426502093171875652569137152
Waiting on: [riak_kv_vnode]
Complete: [riak_pipe_vnode]

Index: 1233142006497949337234359077604363797834693083136
Waiting on: [riak_kv_vnode]
Complete: [riak_pipe_vnode]

Index: 1301649895747835411525156804137939564381064921088
Waiting on: [riak_kv_vnode,riak_pipe_vnode]

Index: 1324485858831130769622089379649131486563188867072
Waiting on: [riak_kv_vnode]
Complete: [riak_pipe_vnode]

Index: 1392993748081016843912887106182707253109560705024
Waiting on: [riak_kv_vnode,riak_pipe_vnode]

Index: 1415829711164312202009819681693899175291684651008
Waiting on: [riak_kv_vnode,riak_pipe_vnode]

-------------------------------------------------------------------------------

============================== Unreachable Nodes ==============================
All nodes are up and reachable

indicating that data are transferred from node 1 to the new node. After a few minutes, this process is completed.

> riak-admin ring_status
produces
Attempting to restart script through sudo -u riak
================================== Claimant ===================================
Claimant: 'riak@192.168.56.2'
Status: up
Ring Ready: true

============================== Ownership Handoff ==============================
No pending changes.

============================== Unreachable Nodes ==============================
All nodes are up and reachable

The important bit of information is the claimant. It is now the first node, not the local node any more.

We repeat connecting a node to the cluster with the third node. Log in as root on node 192.168.56.4 and run
> riak-admin join riak@192.168.56.2
Again after waiting some moments
riak-admin ring_status
produces the same output as with the previous node.

If you now try
> raik-admin member_status
it should produce some output like this
Attempting to restart script through sudo -u riak
================================= Membership ==================================
Status Ring Pending Node
-------------------------------------------------------------------------------
valid 34.4% -- 'riak@192.168.56.2'
valid 32.8% -- 'riak@192.168.56.3'
valid 32.8% -- 'riak@192.168.56.4'
-------------------------------------------------------------------------------
Valid:3 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

We are done. The three node cluster is running.

To remove a node from a Riak cluster, log onto the node as root and do
riak-admin leave
This command will cause the node to transfer all of its data to other nodes in the cluster. So, depending on the amount of data stored at this node, completion of this step may take some time. As before, you can check the state by riak-admin ring_status. If the data transfer is completed, the node is shut down.

If you want to try this out, please do so later, because now we want to test our cluster a little bit.

Some tests

The tests we use stem from the Riak tutorial. All tests are done from the host, which is used as a client machine.

Store a simple datum:
> curl -v -X PUT -d '{"bar":"baz"}' -H "Content-Type: application/json" http://192.168.56.2:8098/riak/test/doc?returnbody=true

Retrieve that datum:
> curl -v http://192.168.56.2:8098/riak/test/doc

It is much more interesting to run the map-reduce examples from the Riak tutorial on our little cluster. We used the google stock data provided by Basho, but artificially tripled them in size by transferring the years 2004 – 2010 to 2014 – 2020 and again to 2024 – 2030. The new bigger CSV file can be found here: google-stock-data. You may use this erlang script to upload the data into the Riak cluster. As before in this tutorial it presumes one cluster node can be found at IP address 192.168.56.2. With erlang – need not be the latest version – installed just run
> ./load-data google-stock-data.csv
to upload the data set.

Let’s try some of the map-reduce example from the Riak tutorial. It is indeed simplest to copy them over into a shell. So type
> curl -X POST http://192.168.56.3:8098/mapred -H "Content-Type: application/json" -d @-
(If you chose to set up your cluster with different IP addresses you have to edit this comand, of course.) After pressing Enter curl will read from the shell till you press Control-D.

The first example just lists all dates where the high is higher than $600. Try
{"inputs":"goog",
"query":[{"map":{"language":"javascript",
"source":"function(value, keyData, arg) { var data = Riak.mapValuesJson(value)[0]; if(data.High && parseFloat(data.High) > 600.00) return [value.key]; else return [];}",
"keep":true}}]
}

The second example lists all dates where the closing price is lower than the opening price.
{"inputs":"goog",
"query":[{"map":{"language":"javascript",
"source":"function(value, keyData, arg) { var data = Riak.mapValuesJson(value)[0]; if(data.Close < data.Open) return [value.key]; else return [];}",
"keep":true}}]
}

These examples should show you that your little cluster is running fine.
If you've followed the tutorial to this point, you learned how to set up a small Riak cluster using VirtalBox. Have fun playing with it.

  • Facebook
  • Delicious
  • Digg
  • StumbleUpon
  • Reddit
  • Blogger
  • LinkedIn
Stephan Kepser

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>