One of the Many Heads of Hydra

at the Rock and Roll Hall of Fame

Adam Wead

Systems and Digital Collections Librarian

OhioNET

August 9, 2013

Introduction

Hydra webinar in three parts

Background: Hydra as a digital repository
Technical Matters: An overview of Hydra's technical components and what they're all about
Building a Hydra Head: Starting down the path to building your own head

There will a question and answer period after each part

Hydra at the Rockhall

Hydra satisfies all our needs in ways that other repository solutions could not

2010: Began R+D for a "D.A.M."
2011: went live with Hydra
Only video at the moment:
1650+ videos of institutional content, incl. induction ceremonies and performance series
175 TB of data: 95% uncompressed video files stored on LTO tape, and not hard disk
PBCore metadata schema
compressed H264 files for streaming
records get exported to a discovery interface

Today's Takeaways

What digital repository solution is best for me?

current trends in repository applications
the Hydra Philosophy
elements of the Hydra stack
what I will need to develop and use Hydra
how to get started with Hydra

Who am I talking to?

librarians, archivists, information professionals
you may not have technical skills
you know someone who does
you are a manger or supervisor
you have some technical skills and are looking at where to start

Part 1: Conceptual Background

The what's and who's of Hydra and repository applications in general

What is Hydra?

Conceptual level

Community: software developers, end users, adopters and institutions
Collaboration: shared solutions, or "heads," supported by a common core

Philosophy

One body, many heads

If you want to go fast, go alone; if you want to go far, go together

Hydra Fundamental Assumption

No single institution can resource the development of a full range of solutions on its own

Who is Hydra?

started in 2008 with Stanford, UVA, and U. of Hull
now includes over 19 partner institutions
many additional adopters
governed by a steering group, with partners meeting quarterly
membership is free

Hydra Adopters

Spoken Word Services (Glasgow Caledonian University)
University College Dublin
University of Illinois at Urbana-Champaign
The Digital Repository of Ireland
Museum of the Performing Arts (MAE) of the Theatre Institute of Barcelona
Johns Hopkins University
Tufts University

Scope of Current Solutions

image collections
media content
archival collection presentation
institutional repositories
electronic theses and dissertations
visit http://projecthydra.org/
- includes links to their Hydra-based websites
- screencasts

Digital Challenges

identifying content types: images, audio, text, pdf, video, etc.
description, i.e. metadata
storage and preservation
lifecycle: identify things that need to be deleted, kept, or migrated to newer formats
format conversions, such as creating derivative files
workflows for ingesting new content correctly, i.e. required information, supported formats
searching, updating, viewing your content
controlling access

Archiving vs. Managing

Libraries approach digital materials differently than other organizations
Digital archiving implies another set of features in addition to basic digital asset management features

Archival Needs

presentation of multiple items as a coherent unit, i.e. collections
hierarchical organization with varied levels of description
accessioning of content
top-down, collection-level driven

Digital Repository

generally more item-level driven
bottom-up
collections or groups, but not in the archival sense

Survey of Solutions

What's out there now to use as my digital repository or asset manager?

Proprietary solutions

built around an existing library product:
- ContentDM (OCLC)
- ContentPro (Innovative)
- Rosetta (ExLibris)
focus on specific type of content or business sector
- Canto (favors text documents for businesses)
- Piction (favors images for business, gov. and museums)
and many, many more

Open source solutions

Built on a specific technological platform
often using Fedora
DSpace
Islandora
RODA
Omeka
many others ...

Digital Pitfalls

Looking for a repository solution? Watch out for ...

Assumptions

"turnkey" solutions aim for the common denominator
pre-fab modeling: organization of content, collections, rights management
media types: AV formats, file formats, data
metadata: Dublin Core, EAD, or MARC

Customizations

your local implementation is limited to the constraints imposed by these assumptions
you will need to customize to overcome any of these constraints
you may need to customize just to get it to run "out of the box"
you'll need to customize even if it does run "out of the box" and you accept its constraints
did I mention you'll probably need to customize?

Costs

fiscal: $$$
technological: servers, storage, equipment
sociological: software developers, library technologists, users

Bundles

"stack" solutions
combination of tools and procedures grouped into a collective product
don't work well together
aren't targeted towards libraries or archives

Hydra Fundamental Assumption

No single system can provide the full range of repository-based solutions for a given institution's needs

What's good about Hydra

makes no assumptions* about your data
can model anything
using any metadata standard
using any content
stored anywhere
accessed by anyone or no one
presented as anything with HTML & Javascript
abstracts underlying technologies -- Fedora & Solr
free and open-source

What's not-so-good about Hydra

technologically daunting
deep "stack" of technologies
tied to the Ruby-on-Rails framework
favors a Unix environment
requires in-house expertise/ability/willingness
not a turnkey solution (yet...)
no hosted solutions (yet...)

Why should I use it?

decide for yourself
I'm not here to sell it to you
everything has costs

Any solution will require technical expertise and customization

getting the system running
learning
integrating with existing systems

Why not?

no "magic bullet" proprietary solution
vendor may limit options
no "magic bullet" open source solution
avoiding reinventing the wheel
get started quickly with a rich set of features
draw on a shared community and their technological resources

end of Part 1

Questions?

Part 2: Technical Matters

An overview of Hydra's technical components and what they're all about

What is Hydra, technically?

it's a web application
specifically, it's a Ruby on Rails web application
uses the Blacklight and Hydra gems
Fedora repository for storing and describing content
Solr for search and discovery
various other "Rails-isms" for additional features
- user accounts
- authentication and authorization
- MySQL database or other RDMS
- JQuery javascript library for building an interface

What is Ruby on Rails?

computer language (Ruby)
framework for web applications (Rails)
geared towards rapid development
modularized features with gems

Without Rails...

With Rails...

How to Use Rails

I really don't care about all the details

convention over configuration
auto-generate as much as possible:
- database table names
- field names
- relationships between tables
- most of the application code itself
rely on gems as much as possible

Using a Database

data is stored in tables
some data is easier to model than others
modeling library data is hard

Source: DICOM Clinical Data Manager system

Where it goes wrong

database tables get unwieldy when dealing with amorphous content
extending/changing/rearranging takes a lot of work
storing digital content in tables is problematic
data and metadata get separated

We need ...

something flexible and extensible
for digital objects
as a repository architecture

F E D O R A

flexible
extensible
digital
object
repository
architecture

Fedora Features

models the content, not the data
multiple means of description and arrangement
stores metadata and content data together
supports a wide variety of storage options
uses RDF for relationships, XML for metadata
fundamental repository functions are built-in
- versioning
- fixity, i.e. checksums
- unique identifiers

Example Digital Object

Object Relationships

Fedora Hangups

slow
it's just a backend
can't search like a traditional database
requires a RDMS for searching RDF relationships
if only we could search Fedora like an SQL database...

Solr

a search engine, all wrapped up and ready to go

About Solr

originated as the Excite's search engine
went open source, taken up by the Apache Foundation
proprietary branch: Lucene
open-source branch: Solr

What does Solr do, exactly?

Indexes a sets of text documents
Provides many of the core-features of a modern-day information retrieval system:
- boolean matching
- vector space model matching
- tunable relevance ranking
- stop word removal
- stemming
- support for multiple languages
- facet queries
very fast, easy to run

What's not to like?

Blacklight

Rails gem for faceted search and discovery
designed for library data
provides a working interface to Solr
includes a basic web interface for:
- searching with queries and facets
- displaying lists of search results
- displaying individual item records
additional functions:
- user accounts (using a Rails gem called Devise)
- bookmarking

The Hydra Stack

putting it all together

stores both content and metadata in Fedora
manages the relationships between your objects
indexes metadata into Solr for searching (so you don't have to mess with configuring solr)
uses the Blacklight gem to provide the search and retrieval interface

What's left...

you develop the interface to add/edit/delete content and link objects to one another
build additional features and the user interface design
accomplished mostly with gems

A hydra-head in action

end of Part 2

Questions?

Part 3: Building a Hydra Head

Getting started with your own hydra-head

Requirements

at least 1 developer/sys. admin/techie
ideally 2 people, one for dev one for admin
a server with enough storage and backup
hosting options
- Amazon AWS
- cloud storage with Fedora
- hosted Rails applications

Learning

online resources
Code School
Rails for Zombies
Hydra Tutorial
HydraCamp

Starting from nothing?

don't have a tech person?
never wrote any computer code?
give yourself a year
experiment
get some training
talk to people

Upcoming Developments

gems, gems and more gems
Sufia
an institutional repository gem
includes user interface, uploading, derivative creation
still a bit green, but improving daily
others: hydra-collections, hydra-derivatives
available at https://github.com/projecthydra

Get Involved

irc chat room
- #projecthydra
- #blacklight
- #code4lib
- #libtechwomen
hydra-tech email list
committer's calls

Parting Thoughts

we're all in this together
Hydra is not the end-all be-all
keep in touch!

Thanks!

Special thanks to OhioNET

References

Part 1

Tom Cramer, "Introduction to Hydra" Sept. 25, 2012, DuraSpace Hot Topics Webinar Series
Project Hydra website
ContentDM: http://www.contentdm.org/
ContentPro
Rosetta
Canto: http://www.canto.com/
Piction: http://www.piction.com/
DSpace: http://www.dspace.org/
Islandora: http://islandora.ca/
Omeka: http://omeka.org/
RODA: http://roda-community.org/

Part 2

Ruby on Rails: http://rubyonrails.org/
DICOM clinical data manger system
Fedora: http://www.fedora-commons.org/
Solr: http://lucene.apache.org/solr/
Blacklight: http://projectblacklight.org/
Naomi Dushay, "The Hydra Framework as a Series of Diagrams," Stanford University Libraries, April 2012

Part 3

Amazon AWS: http://aws.amazon.com/
IRC: http://en.wikipedia.org/wiki/Internet_Relay_Chat

Resources

Rails Bridge
Code School
Rails for Zombies
Hydra source code: https://github.com/projecthydra
Hydra Wiki
This presentation

Contact Me!

email: awead {at} rockhall dot org
twitter: @amsterdamos
irc: awead
github: https://github.com/awead

One of the Many Heads of Hydra

at the Rock and Roll Hall of Fame

Adam Wead

Systems and Digital Collections Librarian

Sponsored by

OhioNET

August 9, 2013

Introduction

Hydra webinar in three parts

There will a question and answer period after each part

Hydra at the Rockhall

Hydra satisfies all our needs in ways that other repository solutions could not

Today's Takeaways

What digital repository solution is best for me?

Who am I talking to?

Part 1: Conceptual Background

The what's and who's of Hydra and repository applications in general

What is Hydra?

Conceptual level

Philosophy

Hydra Fundamental Assumption

Who is Hydra?

Current Hydra Partners

Hydra Adopters

Scope of Current Solutions

Digital Challenges

Archiving vs. Managing

Archival Needs

Digital Repository

Survey of Solutions

Proprietary solutions

Open source solutions

Digital Pitfalls

Assumptions

Customizations

Costs

Bundles

Hydra Fundamental Assumption

What's good about Hydra

What's not-so-good about Hydra

Why should I use it?

Any solution will require technical expertise and customization

Why not?

end of Part 1

Questions?

Part 2: Technical Matters

An overview of Hydra's technical components and what they're all about

What is Hydra, technically?

What is Ruby on Rails?

Without Rails...

With Rails...

How to Use Rails

Using a Database

Where it goes wrong

We need ...

F E D O R A

Fedora Features

Example Digital Object

Object Relationships

Fedora Hangups

Solr

a search engine, all wrapped up and ready to go

About Solr

What does Solr do, exactly?

What's not to like?

Blacklight

The Hydra Stack

putting it all together

What's left...

A hydra-head in action

end of Part 2

Questions?

Part 3: Building a Hydra Head

Getting started with your own hydra-head

Requirements

Learning

Starting from nothing?

Upcoming Developments

Get Involved

Parting Thoughts