Greymeister.net

Jackrabbit 2.4 Content Indexing

I have been working with Jackrabbit for about a year now, and recently wanted to upgrade to 2.4 which was the latest, stable revision. We had been using version 2.2 since that was latest stable version when we started the project, but recently changes including an unexpected upgrade of our PostgreSQL database servers to version 9.1 which was not compatible with 2.2.0. Upgrading to 2.4 seemed to go smoothly, except for one application which expects to use content search. This seemed to work fine in 2.2.x, but not at all in 2.4. I posted a JIRA Ticket hoping that I could get some resolution. Well, it turns out that it seems to have happened as far back as 2.3 due to a change in how Jackrabbit handles its content scraping. I found that by adding an indexingConfiguration and a tikaConfigPath to the SearchIndex element of repository.xml seemed to do the trick. I’ve updated the GitHub project linked to in my previous post on Jackrabbit. It’s basically just a ton of XML to add to your project, but it seems to trick Jackrabbit into actually indexing the content nodes in your repository.

As a side note, you may have to do what Razvan Potter explained on the JIRA ticket and manually override your dependency resolution tool to use Tika 1.1 instead of the version that comes with Jackrabbit as a dependency.