Tuesday, May 10, 2011

More Cassandra

There's a lot written about the Cassandra data model, here's a good one: http://javamaster.wordpress.com/2010/03/22/apache-cassandra-quick-tour/

Let's set up a small system to work together with one little web app I did some years ago (at that time with MySql). Prepare a glossary for the Distributed Software Architecture Course for the fall. The data model should support two operations:

* List all terms/explanations in alphabetic order (over the term of course)
* List all terms/explanations in alphabetic order in a given interval

Shouldn't be too difficult, once you forget about ACID. Actually, in this case, with the small amount of "use cases", it seems I can just copy the basic SQL idea, equating a table with a column family. How did the SQL look like?

CREATE DATABASE IF NOT EXISTS DSAGlossary;
USE DSAGlossary;
DROP TABLE IF EXISTS Reference;

CREATE TABLE DSAGlossary.Reference (
refID INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
term VARCHAR(50) NOT NULL,
description VARCHAR(250) NOT NULL,
PRIMARY KEY (refID)
)
That doesn't look too difficult - the problems may arise with more cases. Here it's more or less the same structure - apart from the missing artificial primary key.
create keyspace DSAGlossary;
use DSAGlossary;

create column family Reference with
comparator = UTF8Type and
column_metadata =
[
{column_name: term, validation_class: UTF8Type,
index_type: KEYS},
{column_name: expl, validation_class: UTF8Type}
];

We need this KEYS in order to have access to a column over the term. Feed this into the Cassandra implementation - in the Cassandra home directory:

1. Start it with bin/cassandra start&
2. Use the CLI to feed the script in:
bin/cassandra-cli -host localhost -port 9160 -f glossary.txt
Connected to: "Test Cluster" on localhost/9160
1361c18a-7b18-11e0-b561-e700f669bcfc
Waiting for schema agreement...
... schemas agree across the cluster
Authenticated to keyspace: DSAGlossary
13eb63eb-7b18-11e0-b561-e700f669bcfc
Waiting for schema agreement...
... schemas agree across the cluster

Sounds like a success (you're never completely sure); so add one entry:
bin/cassandra-cli -host localhost -port 9160
Connected to: "Test Cluster" on localhost/9160
Welcome to cassandra CLI.

Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
[default@unknown] use DSAGlossary;
Authenticated to keyspace: DSAGlossary
[default@DSAGlossary] set Reference['Cassandra']['term']
= 'Cassandra';
Value inserted.
[default@DSAGlossary] set Reference['Cassandra']['expl']
= 'http://en.wikipedia.org/wiki/Apache_Cassandra';
Value inserted.
Still fine! Get it back, maybe?
[default@DSAGlossary] get Reference where term
= 'Cassandra';
-------------------
RowKey: Cassandra
=> (column=expl, value=http://en.wikipedia.org/wiki/Apache_Cassandra,
timestamp=1305041351336000)
=> (column=term, value=Cassandra,
timestamp=1305041338862000)

1 Row Returned.
Ok! Next time: Get access from Java (or any other language).

No comments:

Post a Comment