Archive for the ‘open-source’ category

Fedora Conference — Day Two Second Session

June 22, 2006

I stayed in Trac B this morning because I wanted to catch Eric Jansson's talk. We've been transcribing a lot of work recently using TEI and one of the projects he managed from NITLE is Manu. I've never been able to get Fedora to work long enough to test Manu out, I think it has a lot of potential for displaying original manuscripts in TEI with large images (you can zoom in on particular parts of the image by clicking-and-dragging a box on the area you want to highlight).

The first talk was from Ron Jantz on digital preservation. I have to say this kind of made my head swim (either that or the cold I was developing was really starting to kick in). Ron heads the Digital Preservation Services Working Group for the Fedora project, and this was an overview of what they are exploring. I don't envy this group's charge as they are attempting to look into the future and figure out a way to trace "digital originals" of pretty much everything. At a very basic level this seems very reasonable, but when you remember how quickly computer technology changes, this becomes an almost overwhelming task to undertake. What the working group is proposing as their candidate capabilities are:

  • Audit trails and datastream versioning for object
  • Persistent identifiers for objects
  • Checksum creation and validation for objects
  • Object formation validation
  • Content model validation
  • Whole-object version
  • Event management and event versioning
  • Repository redundancy/mirroring service
  • Format migration engine

It'll be interesting to see how and if these capabilities are integrated into Fedora, or in parallel with specific projects.

Now for Eric's talk…

Like I mentioned above, I've had some interaction with Eric in the past and was excited to hear him talk. He teamed up with Stacy Pennington to talk a little about how they've used Fedora at a small liberal-arts college (Rhodes College in Memphis, TN) to document the Civil Rights movement in the city. Eric started the talk with a scenerio that plays out in a lot of university libraries…a librarian (or someone in the library) hears about these digital libraries and institutional repositories and goes to talk with someone on their IT team who thinks its a great idea. They then turn it over to a programmer who then starts doing some research on the Internet and comes across Fedora and notices it's a very robust architecture for what they're wanting to do. However, the programmer starts looking at the documentation and starts having questions on how to implement the project. Since there's a significant learning curve, this takes quite a while. In the mean time, the librarian becomes unhappy with the lack of progress and talks to the IT folks, who also become unhappy with the lack of progress. In the end, an out-of-the-box solution like CONTENTdm is then used since you can have a collection quickly on the web without much of any programming.

Without getting into the details of why CONTENTdm isn't a long-term solution, Eric turned the talk over to Stacy from Rhodes College. Stacy's first comments were on the size of the IT staff for the entire college: 12 (exceptionally small considering some projects have more than 40 programmers on the repository project itself). He emphasized that because of their size, they are integraters rather than developers. So why would such a small staff choose a solution like Fedora? Because it provides an infrastructure to build on that doesn't shoehorn you into a single solution (or multiple solutions depending on the scope of the project). 

Stacy also talked about the frustrations they had with implementing Fedora and thanked Thorny for their help in getting them up-and-running. Some of these frustrations were a lack of a clear way to utilize Fedora as an end-to-end solution; the technical assumptions for primary integration and support for IT staff are too high; a lack of community-shared content models; and end-user expectations for the term "digital archive." I have to admit that in my own investigations I have run up against these same frustrations (and the fact that I keep getting errors when I attempt to install Fedora or one of the side projects to view Fedora contents).

One of the themes that resounded at this conference is the fact that Fedora is not "a thing in a box" that one can just install and start running. It is an architecture that allows developers (and I emphasize developers on purpose) "a way through open standards to associate meaning though digital objects". At the end of the talk, Thorny admitted to the group that these issues aren't just being seen at small colleges. He remarked that the only people at UVa who could implement the software were the people developing Fedora.

To this end, NITLE is developing spiffy, a PHP framework for working with Fedora. While Eric and I both share a general dislike of PHP, it does make sense to use it as a language to get more people using the repository. PHP has a great user-community that he hopes will "create an ecosystem of innovation (building bazaars, not cathedrals)," empower technical staff and developers, and build user experiences on top of Fedora. 

Eric also argued (and I have to admit it was one of the better ideas I heard at the conference) that 1/2 day training sessions to get people familiar with Fedora would be invaluable to growing the community. The Fedora project has been going on for almost six years, and the vast majority of projects out there are vaporware. The way to actually get to the next level is to grow the community through training, interaction, and simplification.  

Fedora Conference — Day Two First Session

June 22, 2006

I went to Trac B this morning to hear Chris Awre (University of Hull) talk about their research into determining their user's needs for an institutional repository and a presentation on Australia's ARROW (Australia Research Repositories Online to the World) project.

Chris explained how they approached their requirements for an institutional repository in conjunction with their work on the RepoMMan project. While their survey addressed UK researchers at University of Hull, I believe their findings are somewhat general to all institutions. For example, researchers save multiple copies of their work all over the place; their work computers, network shares, home computers, floppies, USB drives, etc. What they found was a requirement of their repository would then be convincing scholars that saving to a repository instance that took care of versioning would be as easy (if not more so) as what they are currently doing when writing their scholarly papers. I think this point really drives at what makes or breaks a successful repository — if you can make the repository part of the publication process in such a way that it makes scholar's research and writing easier/more accessible/fill in your own adjective, the faculty will buy-into the project making the overall success of the repository possible.

The second session on ARROW was also quite interesting. Their approach to a national research repository is quite novel. Instead of starting their project and hiring their own programmers, they outsourced the development of the software to VTLS, and require them to open-source certain portions of the software developed for them to the Fedora community. From this collaboration VTLS has released the SRU/SRW Interface, Metadata Extraction Service via JHOVE, Handle System integration, Content Model Configuration Serivice and a Web Crawler Exposure Service (e.g. to Google).

It's worth mentioning that VTLS is the only company right now with a out-of-the-box software package (VITAL) for Fedora. The rest of the implemented projects out there have been developed on top of Fedora, including several open-source versions like Elated, Fez, and the NSDL content management system.

The ARROW project also has a number of project documents available online for review. The slides from the ARROW Roadshow 2006 are very nice in presenting the partnership between ARROW and VTLS and the overall vision of the repository. 

Fedora Conference — Day Two Plenary Session

June 22, 2006

This session was absolutely fascinating. Sandy Payette from Cornell University talked about the next generation of researchers that will be hitting universities in the next 15 to 20 years and the expectations that these researchers will have. Sandy observed that 10-year olds are not solitary individuals when they go online, but part of communities. Sites like Yahoo! Music, Google, and Neopets are all contributing to the expectations of how they will perform research. More immediately, 20-somethings are blogging, instant messaging, finding almost everything with bitTorrent, and finding apartments and jobs with sites like craigslist. Sandy made the argument that choosing technology that is poised to handle the next generation of researcher's expectations and demands is something that universities need to be undertaking now.

There were also some hard questions asked…like do scholars really need institutional repositories? There is a growing number of research sites that are show these types of information resources are important. Sandy mentioned three in particular that showed this: The Rossetti Archive, The Valley of the Shadow, and Perseus. Not only are these three archives exemplary of modern information repositories, but sites like the National Virtual Observatory, Encyclopedia of Chicago, ARROW, and NSDL are all contributing to the advancement of scholarly communication and knowledge.

Sandy next discussed the future of the project in several areas:

  • Formalization of content model
  • Content Model dissemination architecture
  • Refactoring (fancy word for reworking code)
    • Deploy Fedora as a web application
    • Configuration and setup for web applications
    • Logging and unit testing
  • Message brokering service
    • Services defined in the architecture can subscribe to other services
    • Services can publish their own events
  • Web client development

Funding for the Fedora project from the Mellon Foundation is winding down and the planners are working on how to continue the project. It will certainly be interesting to see what comes of these discussions…

Fedora Conference — Second Session

June 19, 2006

The second session of the conference showcased how Monash University is using Fedora to do collaborative research with the DART project. Their example was with annotating DNA stands in crystals; researchers can creation their visualization of the the stands, and then using Fedora, go in and start annotating the compounds in the structure maps.

DART approached their projects by using the Pathways Model that rethins scholarly collaboration. Essentially this model of collaboration endeavors to create "a loosely-coupled, highly distributed, interoperable scholarly communication system." They pointed out an important article entitled Rethinking Scholarly Communication: Building the System Scholars Deserve. This is definately going onto my "to-read" list.

The closing talk was on the German eSciDoc project. They covered how they approached their project for the Max-Planck Society. This is a huge research conglomerate with a budget in the billions. The eSciDoc project is their attempt to create a repository for that research.

It's very interesting to see how these different projects have used the same back-end to approach their projects in very different ways. The nice thing is that these large projects are giving back to the community and releasing their code to other projects. It's nice to see scholarly projects that are tackling the same problems sharing with one another! 

Fedora Conference — First Session

June 19, 2006

I attended the A track and the first talk was on using the Fedora architecture to administer diverse collections from the folks at Indiana University and modeling rich disseminators from the folks at Tufts University.

As we begin to evaluate how to implement our own repository at W&M, I think it important to look at the successes (and failures) at other institutions. Indiana University's implementation department is a close collaboration between Information Technology and the library system. I really like this model, and I think it makes sense for institutions across the country for a number of reasons.

First, IT is very good at managing things like security, backup, patch management, and other day-to-day computer needs. IT also has the personnel to do the programming work necessary to implement robust repository solutions. Second, libraries are very good at managing knowledge and implementing standardized methodologies to organize and disseminate knowledge. While merging the overall library and IT groups has proven unsuccessful in many cases where the traditional library (e.g. books) is involved, special projects like the creation of institutional repositories makes a lot of sense.
The IU guys talked about some of the issues they've come up against. IU has a centralized storage capacity of 1.6 petabytes (that's almost 2,000,000 gigabytes) with higherarchical backup (geek translation — data is backed up to different devices depending on how often it changes and how often its requested).

Some of the other highlights of the discussion included their plans to release a tool to help automate the creation of metadata as it is placed (ingested) into the repository. They also talked a bit out their METS Navigator, which I plan on checking out when I get back.
The folks from Tufts has a pretty neat little tool to help in the use of the repository by professors. The Visual Understanding Environment allows users to search the repository and start dragging content they want into a graph of the data. Very cool way of visualinzing very complex data!

Fedora User’s Conference 2006

June 19, 2006

I hate it when this happens…I just wrote a fairly lengthy post on this morning's introductory session for the Fedora User's Conference at UVa and when I hit save, the text went away. I'm going to try to recreate my thoughts again and keep saving. So, if something looks incomplete, it probably is 😉

This morning's keynote speaker was Thorny Staples who, almost in passing, hit on something I thought was quite exciting. They are in the test phase of using Fedora to maintain their repository of student newspapers. Knowing some of the complexities of serials cataloging (or at least hearing Mack and Julie complain about it), I thought their approach made more sense than anything else I've heard.

Essentially student papers are two things, text and images. What they're doing is taking images of the newspapers and creating Fedora objects along with TEI markup of the newspaper (as seperate, yet related objects). These objects are part of an aggregated object (a paper), which is part of another aggregated object (a volumn), which is part of another aggregated object (a serial). Because objects can be aware of other objects, the metadata in the TEI header associates images and text to serials (and everything else that's related).

There was also some discussion on the boundaries of what data Fedora is indended to deal with, and where other technologies need to pick up. Thorny used the example of a satellite scanning the earch and as it's doing so, dumping this informatin into a Fedora repository. Disseminators would then allow users to view this data as it's being processed. After some additional back and forth, the conclusion was make that actual transactional data is not really the domain of Fedora, but the resulting information from that transactional data is. This is a subtle distinction, but an important one to make. 

The Catalog Under Scrutiny – Part 2, Open Source and the ILS

June 12, 2006

Before we look at the potential for an open-source ILS, let's take a quick look at open-ource in general. Wikipedia's article on open source provides a good entry point and provides a definition here. If you follow the links you'll get a good picture of the open-source movement and some of the key players. It wouldn't be too much of a stretch to say that philosophy behind open-source is to free users from the tyranny of proprietary software. Microsoft, perhaps? With open-source software, the users can make improvements, add functionality, fix bugs, etc. I use a open-source product at home — OpenOffice. OpenOffice is an effective alternative to Microsoft Office that runs on a variety of operating systems: Windows, Linux, Apple OS X. It even comes with a nifty portable version that will run off a thumb drive. Check it out.

Another open-source product is the operating system Linux. We haven't reached a tipping point where there are the applications to make Linux practical in a distributed environment such as ours but the potential is there.

There has been some movement among governments to force the issue. Brazil, for example, would like to transform itself "… into a tech-savvy nation where everyone from schoolchildren to government bureaucrats uses open-source software instead of costly Windows products." The state of Maine also made Microsoft nervous by expressing an interest in moving to open-source solutions. Think about what would happen if one of the big dogs, like China,decided to go open-source. And don't forget the European Union which isn't happy with Microsoft's lack of openness.

So how does this affect libraries? No one like everything about their ILS. This vendor has a really good serials module, that one excels in acquisitions, another has a really great cataloging module, and nobody is really happy with any OPAC (see part one of this series). We are at the mercy of the vendors' development cycle for changes.

What if there was an open-source ILS and modules could be written/modified/fixed by users. There is some movement in that direction.

The University of Rochester has a Mellon Grant to study "how best to develop an open-source online system that can unify access to traditional and digital library resources." They have the eXtensible Catalog (XC) blog if you want to follow their activities.

On the nuts and bolts level, one organization has been working on just such a product. The Georgia Library PINES Program is a consortium that didn't like any available ILS and embarked on a project named Evergreen to develop an open -source ILS. Their main website is The actually have a product to demonstrate and you can try the current stable version of the Evergreen OPAC. The latest and greatest but not necessarily stable version is here.

Open-source doesn't mean that there would no longer be vendors. a vendor can take an open-source application and brand it as Red Hat did with Linux. Red Hat's big revenue generator is support.

I'm going to follow the U. of Rochester's study and the Georgia PINES project pretty carefully looking for signs of the technological cavalry coming to save us.