Introduction
I’ve worked with the Apache Jackrabbit implementation of the Java Content Repository (also known as JCR or JSR-170) for some time now, and found it was a bit confusing to get a load balanced implementation. There are plenty of guides and documentation on the Jackrabbit wiki but piecing them together in a way that makes sense took a significant effort. The purpose of this blog post is to describe my approach in the hope that it may make it easier for others with the same goal in mind. I have created a GitHub project with some of the key configuration files included, please feel free to check it out to refer to it as I go along.
Technology stack
I will be describing how to use Jackrabbit 2.2.x with Apache Tomcat 6.0. I’ll also be using an NFS shared filesystem and a PostgreSQL 8.4 database. Based on what you are using, there may be differences and so my steps may not directly apply. If you find something that gives you problems, please let me know and I’ll see if I can help. I don’t think it matters much, but this will be performed on VMs running CENTOS 5.x 64-bit. It should also work on any other system that the Java J2SE runs on, but you may have to alter configuration files such as the file paths and mount points.
Setting up the host systems
Assuming we have a vanilla GNU/Linux system, the first priority will be installing Tomcat. You may install using apt-get or some other package-based installation, but I tend to use the binary distribution, especially for test purposes. It essentially rules out mistakes I make when trying to conform to the proper directory layout of some other distribution. I initially create two directories, one for the tomcat server, and another for the repository. To follow along, I use
/srv/tomcat # Location for Tomcat 6.x installation
/srv/repository/datastore # Location for NFS mount point
I assign the owner as a user I’ve created named “tomcat”, mainly to avoid using the root user for running the Tomcat servers. Generally, this is just a user who does not have sudo priveleges but that can SSH into the system. If your setup is similar, your /srv directory will probably look something like this:
drwxr-xr-x 5 tomcat users 4096 Nov 21 02:25 repository
drwxr-xr-x 9 tomcat users 4096 Nov 21 01:51 tomcat
I’ll also create a subdirectory “datastore” underneath the repository directory for the shared datastore mount point. Here is the relevant entry in my /etc/fstab file for each system:
//10.20.1.3/Public/datastore /srv/repository/datastore cifs password="",uid=tomcat,gid=users 0 0
Any remote mount point will do. It must be accessible by all nodes in the Jackrabbit cluster. Make sure that the mount is active and that the tomcat user has write privileges to before proceeding.
Tomcat configuration
I’ll assume you’ve unzipped the Apache Tomcat binary and have the default layout. A directory listing should be similar to this one:
drwxr-xr-x 2 tomcat tomcat 4096 Nov 20 02:29 bin
drwxr-xr-x 3 tomcat tomcat 4096 Nov 21 01:09 conf
drwxr-xr-x 2 tomcat tomcat 4096 Nov 20 23:24 lib
-rw-r--r-- 1 tomcat tomcat 38657 Jan 10 2011 LICENSE
drwxr-xr-x 2 tomcat tomcat 4096 Nov 21 01:42 logs
-rw-r--r-- 1 tomcat tomcat 574 Jan 10 2011 NOTICE
-rw-r--r-- 1 tomcat tomcat 8672 Jan 10 2011 RELEASE-NOTES
-rw-r--r-- 1 tomcat tomcat 6836 Jan 10 2011 RUNNING.txt
drwxr-xr-x 8 tomcat tomcat 4096 Nov 28 06:55 temp
drwxr-xr-x 6 tomcat tomcat 4096 Nov 21 01:52 webapps
drwxr-xr-x 3 tomcat tomcat 4096 Nov 20 02:29 work
From now on, I’ll refer to this as CATALINA_HOME
. Mine will be located in /srv/tomcat but yours can be anywhere else. I will not be referencing a
CATALINA_BASE
because I am not using a split Tomcat deployment. The Tomcat configuration consists of exposing both a PostgreSQL datasource and the
Jackrabbit repository using JNDI. Inside of CATALINA_HOME/conf
there are two XML files to
edit. The first is server.xml. Edit the section “GlobalNamingResources” to contain a reference to your JDBC connection that the Jackrabbit
repository will use.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
You’ll need to alter the configuration to suit your needs. If you are using PostgreSQL like I am, all you need to do is create a database and user for the repository cluster. This will need to be the same for each cluster node. The UserDatabase section is not required, I left it in as a reference to the location in the server.xml file. The next file to edit is context.xml. You’ll need to add another JNDI resource, this time for the repository.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
You’ll notice that I left some of the default configuration in there just as a reference. The only tags relevant to the Jackrabbit configuration are the ResourceLink to the jdbc/repository resource and the jcr/repository Resource definition. You’ll notice the paths declared in that tag must be where you plan on having the repository configured on the node. I am still using my /srv/repository location.
The last step is to make sure that the proper libraries are available for tomcat to start the shared resources. I had some problems getting the exact
right jar files in my CATALINA_HOME/lib
directory, so I’m just going to show a directory listing. Note that several of these jars will be present in
the default Tomcat installation.
-rw-r--r-- 1 tomcat tomcat 481535 Nov 20 23:24 log4j-1.2.16.jar
-rw-r--r-- 1 tomcat tomcat 9753 Nov 20 23:24 slf4j-log4j12-1.6.1.jar
-rw-r--r-- 1 tomcat tomcat 62086 Nov 20 23:23 commons-pool-1.3.jar
-rw-r--r-- 1 tomcat tomcat 2512189 Nov 20 23:22 derby-10.5.3.0_1.jar
-rw-r--r-- 1 tomcat tomcat 740930 Nov 20 23:21 jackrabbit-spi-commons-2.2.0.jar
-rw-r--r-- 1 tomcat tomcat 26822 Nov 20 23:20 jackrabbit-spi-2.2.0.jar
-rw-r--r-- 1 tomcat tomcat 286499 Nov 20 23:20 jackrabbit-jcr-commons-2.2.0.jar
-rw-r--r-- 1 tomcat tomcat 25496 Nov 20 23:20 slf4j-api-1.6.1.jar
-rw-r--r-- 1 tomcat tomcat 575389 Nov 20 23:19 commons-collections-3.2.1.jar
-rw-r--r-- 1 tomcat tomcat 121757 Nov 20 23:19 commons-dbcp-1.2.2.jar
-rw-r--r-- 1 tomcat tomcat 109043 Nov 20 23:19 commons-io-1.4.jar
-rw-r--r-- 1 tomcat tomcat 4326608 Nov 20 23:19 netcdf-4.2-min.jar
-rw-r--r-- 1 tomcat tomcat 189284 Nov 20 23:19 concurrent-1.3.4.jar
-rw-r--r-- 1 tomcat tomcat 23861 Nov 20 23:19 jackrabbit-api-2.2.0.jar
-rw-r--r-- 1 tomcat tomcat 2117338 Nov 20 23:18 jackrabbit-core-2.2.0.jar
-rw-rw-r-- 1 tomcat tomcat 539510 Nov 20 22:51 postgresql-8.4-702.jdbc4.jar
-rw-r--r-- 1 tomcat tomcat 69246 Nov 20 22:46 jcr-2.0.jar
-rw-r--r-- 1 tomcat tomcat 15239 Jan 10 2011 annotations-api.jar
-rw-r--r-- 1 tomcat tomcat 53756 Jan 10 2011 catalina-ant.jar
-rw-r--r-- 1 tomcat tomcat 129739 Jan 10 2011 catalina-ha.jar
-rw-r--r-- 1 tomcat tomcat 1208895 Jan 10 2011 catalina.jar
-rw-r--r-- 1 tomcat tomcat 237317 Jan 10 2011 catalina-tribes.jar
-rw-r--r-- 1 tomcat tomcat 1563059 Jan 10 2011 ecj-3.3.1.jar
-rw-r--r-- 1 tomcat tomcat 33410 Jan 10 2011 el-api.jar
-rw-r--r-- 1 tomcat tomcat 112550 Jan 10 2011 jasper-el.jar
-rw-r--r-- 1 tomcat tomcat 526946 Jan 10 2011 jasper.jar
-rw-r--r-- 1 tomcat tomcat 76692 Jan 10 2011 jsp-api.jar
-rw-r--r-- 1 tomcat tomcat 88210 Jan 10 2011 servlet-api.jar
-rw-r--r-- 1 tomcat tomcat 762878 Jan 10 2011 tomcat-coyote.jar
-rw-r--r-- 1 tomcat tomcat 253526 Jan 10 2011 tomcat-dbcp.jar
-rw-r--r-- 1 tomcat tomcat 70034 Jan 10 2011 tomcat-i18n-es.jar
-rw-r--r-- 1 tomcat tomcat 51965 Jan 10 2011 tomcat-i18n-fr.jar
-rw-r--r-- 1 tomcat tomcat 55036 Jan 10 2011 tomcat-i18n-ja.jar
Repository configuration
The repository configuration resides mostly in one file, the repository.xml file that must be at the root of each node’s repository location. The complete repository.xml file I’m using will be in the linked GitHub project, so check that out for the complete copy. I will be describing each section though here where I think it is relevant.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
The initial FileSystem definition is required to be shared based on the Jackrabbit Wiki clustering article. I am accessing it via the JNDI datasource set up in the previous section.
1 2 3 4 5 6 7 |
|
The Datastore implementation I’m using requires the file system to be shared amongst all the nodes. Here, I am pointing it at the mount point I created earlier on the file server, in a subdirectory of the repository home named “datastore”.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
This is the workspace configuration I’m using. It is pretty bare bones, but it uses a PersistenceManager based on the recommendations of the Jackrabbit wiki. Again, it will be pointed at the PostgreSQL JNDI datasource we set up in Tomcat.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
This is the versioning configuration that I’m using. Again, make sure it is pointed at the PostgreSQL datasource using JNDI.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
The last and one of the most important pieces of information in the repository.xml file is the Cluster configuration. Again we will point to the PostgreSQL datasource using JNDI to store the journal. The journal will allow a consistent view by producing a composite of the actions taken by individual nodes. The one piece of information here that will change node to node is the id attribute of the Cluster tag. This must be unique for every node.
Testing
Start up tomcat server on one of the nodes. You should have notifications that Jackrabbit has been started. You should see some directories created
in your repository home directory for workspaces and versioning. I would recommend deploying something like
JCR-Explorer and connecting to your JCR using JNDI. You should be able to browse and add files to the repository.
Note: Be sure to use the JNDI name that we created, which within Tomcat will be java:comp/env/jcr/repository
Adding additional nodes
At this point, all we’ve really created is a single Jackrabbit server running on Tomcat. However, the next step allows a load balanced configuration. The Jackrabbit Wiki notes that there are some limitations, but based on this default configuration, it is very easy to add additional nodes. If you have been using a VM like me, all you need to do is:
- Shut down Tomcat
- Create a clone of the GNU/Linux machine with the configuration on it
- Update networking/etc to not conflict with the first node
- Get the current revision number from your first node. Here’s an SQL query on the PosgreSQL datasource
1
|
|
- Start up both GNU/Linux machines, and start Tomcat on the first node
- Update the repository.xml file on the second node and give it a new Cluster ID, for example “node2”
- Insert the new node’s location revision into the
JOURNAL_LOCAL_REVISIONS
table, again with my configuration:
1
|
|
- Note, replace the
revision_id
value of the insert clause with the number you got in the select statement. - Start Tomcat on the second node.
Now, whenever you modify either repository directly (by using JCR-Explorer etc) you will see that the nodes synchronize using the journal. You can now put in any type of load balancing technology you wish in front of any number of Jackrabbit nodes, and have a fault-tolerant repository.