Vufind 0.8

Posted March 19, 2008 by Wayne Graham
Categories: vufind

Update: The instructions on installing Vufind have been moved to the Vufind Wiki. Please check there for the most up-to-date instructions.

It’s official…Andrew tagged Vufind 0.8 yesterday! There have been some very significant changes to how the index functions. First, the Java code to index has been included into the trunk, so indexing time is greatly reduced. Second, the full marc record is no longer stored on the file system, it’s in the index. Because of this change, you have to re-index the entire catalog to use 0.8.

So, now that 0.8 is tagged, I plan on breaking the Java importer to start refactoring with some of the changes that Robert Haschart from UVa has made to make the indexing engine a bit more flexible. If you’re using the SVN version of the code, I will be breaking it 😉

In other news, we’re very close to implementing a nightly build of the code. This will let folks who don’t want to be building their own software every time Andrew or I break the code a place to get functional code without being Subversion gurus or chasing down Ant dependencies.

So what’s on the horizon (at least for the marc indexing mechanism)? I’m going to be refactoring most of the code into POJOs instead of the spaghetti code it is now (mostly from Robert’s rewrites). The big enhancement is going to be logging (via log4j), the ability to point the indexer to a directory of marc files (instead of one big file), and more “human” error handling than just the stack trace.

Vufind 0.6 on Ubuntu 7.10

Posted October 30, 2007 by Wayne Graham
Categories: Library 2.0, linux, solr, technology, Ubuntu, vufind

Update: The instructions on installing Vufind have been moved to the Vufind Wiki. Please check there for the most up-to-date instructions.

This is an update to my previous post on configuring Ubuntu to run Vufind…

First, upgrade your server distribution to the latest-and-greatest

sudo apt-get dist-upgrade

If you’re on Edgy (7.04), this may take a while. Next install the Java 6 JDK and build-essential (for building Yaz).

sudo apt-get -y install sun-java6-jdk build-essential

When you’re prompted, answer the questions and let Ubuntu finish setting up Java. As a side note, the reason you want the JDK and not the JRE is that we want to run the Solr instance with a server switch to improve the performance. To do this, you need to the JDK.

Next, we install Apache2 and configure the mod_rewrite extension (and reload Apache2):

sudo apt-get -y install apache2
sudo a2enmod rewrite
sudo /etc/init.d/apache2 force-reload

Now, to download Vufind:

wget http://downloads.sourceforge.net/vufind/VuFind-0.6.1.tar.gz?use_mirror=superb-east
tar zxvf VuFind-0.6.1.tar.gz

Now, we need to move the Vufind files to the proper location. By default this should be /usr/local/vufind. If you choose a different location, you’ll need to set an environmental variable for VUFIND_HOME that points to your installation location, but I’ll get more into that a bit later. You also need to change the permissions on the compile and cache folders in the web/interface folder.

sudo mv vufind-0.6.1 /usr/local/vufind
sudo chown www-data:www-data /usr/local/vufind/web/interface/compile
sudo chown www-data:www-data /usr/local/vufind/web/interface/cache

Now to work install MySQL

sudo apt-get -y install mysql-server

PHP5 is required for Vufind with several dependencies.

sudo apt-get -y install php5 php5-dev php-pear php5-ldap php5-mysql php5-xsl php5-pspell aspell aspell-en

I don’t have an Oracle backend, so I haven’t tested the installation of the pdo-oci driver listed in the “official” documentation, but this page will hopefully walk you through installing the driver.

Lastly, we need the Yaz library.

cd /tmp
wget http://ftp.indexdata.dk/pub/yaz/yaz-3.0.14.tar.gz
tar -zxvf yaz-3.0.14.tar.gz
cd yaz-3.0.14
./configure
make
sudo make install

Ok, we’re now finished with adding the packages to get Vufind running. It’s time to run the installation script.

sudo /usr/local/vufind/install

You’ll be walked through the configuration of your Vufind instance. There’s a slight issue in the the database setup script as it assumes you haven’t set a root password (you actually set a password when you set up MySQL in Gutsy now). No biggy, just let the script run through the installation of the PEAR libraries and we’ll fix it with the following:

mysql -u root -p
GRANT ALL ON vufind.* TO vufind@localhost IDENTIFIED BY “secretPassword”;
quit

Now we need to edit a few files. First, we’ll edit /usr/local/vufind/web/conf/config.ini. The big sections that need editing are Site, Amazon, and Catalog (though you probably want to take a look at LDAP too). The Amazon id is your web services access id (not your affiliate ID) and you much change your drive to the appropriate driver that you’re using (e.g. Voyager, SirsiDynix, Koha, Evergreen, Aleph).

Next, the /usr/local/vufind/web/.htaccess file. You’ll need to change the rewrite base. And, you’ll most likely need to tweak the RewriteRule lines for your specific institution. The default is to use numeric call numbers, but if you’re like us, we have OCLC numbers, and many others. In case you’re not a RegEx expert, these are the settings I use:

RewriteRule ^([^/]+)/([a-zA-Z]*[0-9\s]+)/(.+)$
RewriteRule ^([^/]+)/([a-zA-Z]+[0-9\s]+)$
RewriteRule ^([^/]+)/([^0-9/]+)$

We’re almost there!

By default, the Ubuntu Apache2 distribution ignores .htaccess files, so we need to configure Apache to actually use the file. Edit the /etc/apache2/apache2.conf file with the following:

Alias /vufind /usr/local/vufind/web

<Directory /usr/local/vufind/web/>
AllowOverride ALL
Order allow,deny
allow from all
</Directory>

And reload Apache

sudo /etc/init.d/apache2 reload

Ok, let’s check to make sure that the interface is working before we do the final installation of the Solr backend. If you point your browser to http:<your_server>/vufind, you should see the default template. You should see a message on the page stating “Hey! You should customize this space.” If you see a message, you’ll need to do a little debugging (just read the message).

Ok, now for Solr. Vufind is packaged with Solr and Jetty. And, before we get going, we need to set an environmental variable JAVA_HOME. The way I do it is by adding the following line to /etc/profile

JAVA_HOME="/usr/lib/jvm/java-6-sun"
export JAVA_HOME

I always reboot, just to make sure that this really takes.

I forgot to change the permissions on startup script when I sent it to Andrew, so you need to make it executable

sudo chmod +x /usr/local/vufind/vufind.sh

And now to fire everything up

sudo /usr/local/vufind/vufind.sh start

Now, we want to make sure that Jetty and Solr start up all the time, so we create a symbolic link into /etc/init.d to the /usr/local/vufind/vufind.sh script and then run the update-rc.d script:

sudo ln -s /usr/local/vufind/vufind.sh /etc/init.d/vufind
sudo update-rc.d vufind defaults

Now, if everything went well, you should be able to check out the Solr interface at http://<your_server&gt;:8080/solr/admin.

With everything running, it’s time to create the index of marc records.

First, export your catalog holdings in marc format and put them in your /usr/local/vufind/import folder. The way I do this is I get the exported files and use scp to copy them to the user account and then sudo mv them to the location:

[On the ILS server]

tar czvf catalog.tar.gz catalog.mrc
scp catalog.tar.gz user@your.vufind.server:~

[On your Ubuntu server]

sudo mv ~/catalog.tar.gz /usr/local/vufind/import
tar zxvf /usr/local/vufind/import/catalog.tar.gz

Now, we need to create the MarcXML file:

sudo touch catalog.xml
sudo yaz-marcdump -f MARC-8 -t UTF-8 -o marcxml catalog.mrc > catalog.xml
sudo php import-solr.php

This is a good time to take a coffee break…or a lunch break…or come back tomorrow 😉 Seriously, the import takes a while. There are some big (ok, they’re HUGE) improvements in the speed in which the files are indexed in the Subversion branch, but those haven’t been officially tagged yet, so just be aware that while this is slow, it’s been significantly improved for future releases.

The only thing to do is to tune the JVM.

As always, if you have questions, leave a comment, or join the Vufind lists.

Vufind

Posted September 27, 2007 by Wayne Graham
Categories: java, solr, vufind

Tags: , ,

After a bit more testing by some folks on  the vufind-tech list, I think the concensus is that we’re going to work with Solr’s DirectUpdateHandler with a DocumentBuilder to construct entries for the index in memory. Once I got some of the more annoying bugs out of the way, folks were quite pleased with the speed in which they were able to create the index. Now, on to the business of writing JUnit tests, field customization, and some refactoring.

Vufind Importer

Posted September 21, 2007 by Wayne Graham
Categories: java, solr, vufind

Tags: , ,

This is the first of a couple of posts I’ve been meaning to write. There have been a flurry of posts on the Vufind lists about errors when creating the Solr index and the speed. I did about 1.8 million in 10 hours using the PHP script, considering these are getting sent across an HTTP connection, I thought this was pretty good.

Anyway, I had had the thought that using the EmbeddedSolr class to directly write to the Solr index would be faster, but before the thread developed, I hadn’t put in much into it. This week I got motivated and started working with the implementation.

Essentially this program uses marc4j to skip the conversion from a marc record to marcxml using yaz-marcdump while making the creation of the index faster. The essential flow is to first read in a marc file, open a direct connection to the Solr instance, write a marc xml record to disk, then write the same record to the index. I first did this with the EmbeddedSolr and essentially mapped each field in the marcxml file to its corresponding index field for the Solr index. While not 100% finished, I was really pleased with the speed results. I was able to index 10,100 (I wanted at least one autocommit from Solr in there) in less than 2:00 (I averaged about 1:45).

However, there is some differences in how marc has been implemented as noted by the folks on the list. I thought that the easiest way to deal with this would be to just use the XSLT stylesheet as the “rules” for transforming the marcxml. This way, if you needed to change the unique id for your resources, you just got to the XSLT and change which field is getting called out. I figured this would be a bit slower, but I wasn’t prepared for how MUCH slower it was!

First, a note about how I did this…

I used the DirectSolrConnection (at the bottom of the EmbeddedSolr page) and a RequestHandler to the solr.XmlUpdateRequestHandler in the solrconfig.xml file

<requestHandler name="/update" class="solr.XmlUpdateRequestHandler" />

Unfortunately, marc4j’s conversion process requires an OutputStream to write to, so I created a ByteArrayOutputStream to hold the generated XML and used its toString() method to create a new request to solr to add the record to the index.

For the same 10,100 records, using this second method, the time hovered around 22:00 to index! I was a little shocked that it was this different. Because of this difference, I thinking I need to come up with a better method to allow folks to customize which fields in their marc records map to the different fields in the index.

LibraryFind

Posted September 10, 2007 by Wayne Graham
Categories: libraryfind, linux

Last week I got pretty frustrated with LibraryFind. My test environment is a virtualized Ubuntu server (7.04) and I was running into all kinds of errors with Yaz. At first, I tried to install the most current version from the source with the “--enable-shared” switch. The source (finally) compiled after I did some dependency resolutions. However, no matter how many different ways I compiled the software, I kept getting errors with the ruby-zoom (now just zoom) package not being able to locate yaz.so.3.

After spending more time that I really should have tracking this down, it turns out that there are some changes to the names of the debian packages. What I ended up doing was reverting to my base snapshot (one of the handiest things in virtual test environments) and installed the yaz packages with

sudo apt-get install yaz libyaz2 libyaz2-dev

Then, when I ran the libraryfind software, it stopped producing an error.  I’m not totally out of the woods yet as I can’t get any of the sources (z39.50 or OAI_PMH) to actually do anything. As soon as I get that part working, I’ll post a setup guide for Ubuntu.

JVM Options

Posted August 31, 2007 by Wayne Graham
Categories: java

Stumbled across a great resource for the different options for JVM tuning the other day…A Collection of JVM Options. Definitely worth bookmarking if you ever need to do some Java tuning!

Library Find

Posted August 29, 2007 by Wayne Graham
Categories: federated searching, Library 2.0, libraryfind, OPAC, ruby

Stumbled across LibraryFind the other day and have been playing around trying to get it installed. I’ve not had many good experiences with Ruby based apps, but this looked really promising so I took the plunge. Unfortunately the searching doesn’t work because and just states that there was an error. Looking in the log files, it states that its “missing default helper dispatch_helper” and the record_set_helper. I also ran into a problem in the admin module when I attempted to add a target…just got a recordschema error. I ended up just writing a script to install a couple of EBSCO targets we had, but hopefully once I figure out what’s going on with the helpers, that problem will be resolved too.

Embedding Google Maps

Posted August 23, 2007 by Mack Lundy
Categories: Uncategorized

Google has made it very easy to embed a functional Google map into any web page or blog.  “Users can drag and click or zoom in on a location, and view it in map, satellite, and hybrid modes.”

Read the press release here: Google announces a simple new way to embed Google Maps.

Benchmarking Solr

Posted August 16, 2007 by Wayne Graham
Categories: java, system_administration, technical, technology, Ubuntu, vufind

There was some discussion on the Vufind about moving from Tomcat to Jetty. I first wanted to see if it was possible to run this so I got the latest nightly build from Solr to see which packages were needed to run the server. I then grabbed the latest Jetty (6.1.5) since the version in Solr’s build was 6.1.3. I packaged the same files that were in Solr’s distribution and dropped Vufind’s schema and config file into Jetty and fired it up. Voila…it worked like a champ.

The thing I really wanted to know is if this Jetty version would perform in a similar fashion to Tomcat. What I did to test was set up two visualized servers on the same box. Each were set up with the exact same hardware (2GB RAM, 1 processor, bridged 1GB network, running Ubuntu 7.04 server). I also used the same Java tuning on both machines (“-server -Xmx1024m -Xms1024 -XX:+UseParallelGC -XX:+AggressiveOpts“). The only difference between the two was that one ran Tomcat and the other Jetty.

For the test, I indexed our library’s 1.8+ million catalog records on both machines which both chewed through the records in about 9 hours. To do the actual testing, I used JMeter to query both systems at the same time using a few scenarios that I thought might possibly be “real.”

In the first test, I sent 10 users with 100 queries for the book title “Flashman” to see what happened. I was pretty impressed with the results:

Server Samples Average Median Min Max
Jetty 1000 4ms 4ms 1ms 17ms
Tomcat 1000 3ms 4ms 0ms 28ms

You know, we might get a few more users than just 10 at a time, so I ramped it up to 100 users doing 10 queries. Again, there wasn’t much of a difference.

Server Samples Average Median Min Max
Jetty 1000 12ms 8ms 1ms 565ms
Tomcat 1000 9ms 7ms 1ms 530ms

Now to really ramp things up with 100 users doing 100 queries

Server Samples Average Median Min Max
Jetty 100000 9ms 6ms 1ms 2349ms
Tomcat 100000 9ms 6ms 1ms 1844ms

And, just for kicks, 1000 users with 10 queries

Server Samples Average Median Min Max
Jetty 10000 32ms 6ms 0ms 5643ms
Tomcat 10000 26ms 5ms 0ms 4295ms

With median results within a millisecond of each other, Andrew went ahead and swapped out Tomcat in favor for Jetty for its smaller footprint. I have to say that any time I’ve needed to do anything with JSP, I’ve opted to go with Tomcat. More because I know the name, but I think I’m going to keep Jetty on my list from now on! I want to take a closer look at their ANT and Eclipse Plugins!

Breaking Windows

Posted August 12, 2007 by Mack Lundy
Categories: linux, Ubuntu

Wayne and Phil finally convinced me to switch operating systems on my office laptop. Wednesday they gathered around me as I took a deep breath and pressed enter to launch the installation of the Ubuntu version of Linux. Afterwards they led me, rapidly, through the installation of various applications and repositories using the command line interface and using the Synaptic Package Manager. I installed VMserver to run the few Windows applications I need, like the Unicorn client, Workflows. Next I need a book to achieve some independence. Right now it is still “Hey Wayne, how do I …” and “Wayne, why is it doing …”

I copied all my documents to another server prior to installing Ubuntu. From there I burned everything to CD. I find it liberating to be free of all the junk I had installed on my laptop and all the documents I had accumulated but not looked at for years. I still have the documents if I need them but I am greeted by a very clean desktop.

I think I am going to start using Google documents more. I like the ability to quickly share. Ubuntu automatically installs OpenOffice so I can easily create local documents.

Day-by-day use of Ubuntu is nothing I have to think about. I only have seven application icons on my toolbar: Synaptic Package Manager, Firefox, Help, VMserver console, Pidgin, Thunderbird, and a terminal window.

My desktop  only has folders with the Unicorn API documentation, and two links to our shared library network drive. One is an ssh link and the other is a Windows share. I only mounted the Windows share because there were some permissions issues when I was transferring files around. These links give me very quick access my folders and documents stored on the library’s shared network drive.

Before Ubuntu, I was not able to get a connection to my home wireless network. No matter what steps I took, which wizards I used, I couldn’t get a connection. When I brought home my laptop with Ubuntu I was connected in about a minute because that is how long it took to key in my pass code. Amazing.
We will see what unfolds over the next weeks but I’d say that I’m a Linux convert.