Overview

Getting started with Titan using Cassandra and Solr

No Comments

Titan comes with several possibilities to configure the storage (BerkleyDb, Cassandra, Hbase) and the underlying search engine (Lucene, Solr, Elastic). Since DataStax aquired Aurelius and DataStax Enterprise Search uses Solr, I wanted to setup an environment I can easily modify to use DSE later, instead of the Apache Cassandra version.

Pre Requirements

My Environment

I am running this setup on Ubuntu 14.04 in a Virtual Machine. I am using the latest Java version “1.8.0_73”.

Please note: This article will only cover basic information on how to setup Cassandra or Solr. For more details I recommend starting reading Apache Cassandra Getting Started and solr Quickstart.

Cassandra

For this easy setup I will only use a one node cluster, so I leave the settings in cassandra.yaml as default.

To start Cassandra, unzip the downloaded Cassandra package and run the Cassandra binary inside of cassandra/bin

tar xvfz apache-cassandra-2.1.12-bin.tar.gz
cd apache-cassandra-2.1.12
bin/cassandra

Solr

Preparation

To start Solr, first unzip the downloaded Solr package.

tar xvfz solr-5.3.1.tgz

To be able to use geospacial search, we need to copy the file jts-1.13.jar – which is coming with Titan DB – into the Solr lib folder.

cp titan-1.0.0-hadoop1/lib/jts-1.13.jar solr-5.3.1/server/lib

This step is necessary, because the schema.xml – provided by Titan – uses geo definitions to be able to use spatial queries. If we don’t copy this jar into our classpath, we will run into the following error, when trying to create the Solr core.

The second possibility, to get rid of this error, is to delete the lines in schema.xml where a “geo” jts property is used. Of course that way we are not able to use geospacial search like shown in the official examples.

Now we can start Solr

./solr-5.3.1/bin/solr start

To validate that Solr is running, point your browser to http://localhost:8983/solr/#/

Create Core

In general, we need to create a Solr core for each index we create in Titan. In the GraphOfTheGods examples, we want to run when this setup is done, two indexes are created: “vertices” and “edges”. The “vertices” index will be used to be able to do some range search on the “age” properties of our vertices. The “edges” index will be used to search for a property named “reason” on some of the edges as well as to be able to do a geo search.

Before we can create these Solr cores, we need to copy the predefined Solr configuration files into Solr’s configsets folder. These configuration files are included in our Titan package.

Now we can create our cores:

create core for edges create core for edges

To verify, that the cores were successfully created, open the Solr pannel inside your browser and see if both cores are present in the drop down list.

Starting Gremlin Shell and creating Titan sampledata

There are several ways to use Titan. For the purpose of this tutorial I run Groovy commands inside of the Gremlin shell, which is provided within the Titan package. The Gremlin shell comes with the necessary plugins to run all example commands.

In this example I run everything on a single machine. If you want to install Cassandra and Solr on separate machines, you need to make sure your servers are accessible from the outside. You’ll also need to edit the titan-cassandra-solr.properties file to point to the correct IP addresses for both – Cassandra and Solr.

vi titan-1.0.0-hadoop1/conf/titan-cassandra-solr.properties

Also make sure that the other listed properties are set accordingly. You could also use Solr cloud, but this setup would be quite different – I will not cover this setup in this post.

Now that we finished setting up each of our components, its time to start the Gremlin console:

cd titan-1.0.0-hadoop1
bin/gremlin.sh

To test if our setup is correct we now load the Titan default graph named “GraphOfTheGods”.

To test if our setup is working, in the above example I first search for the vertex with the property “name = hercules”. Then I follow the edges pointing out to find the name of hercules parents. In the last example we do a geospacial search to find places within the given radius.

For a complete example of traversing this example graph, see the official Titan documentation

Conclusion

Setting up Titan as a highly scalable graph database using Cassandra as storage and Solr as search engine can be a bit tricky. The quick start examples provided by Aurelius – especially for using Cassandra with Solr – were not working for me out of the box. I hope this post helped to setup a first environment graph environment.

Post by Markus Höfer

More content about Big Data

Comment

Your email address will not be published. Required fields are marked *