What I Know So Far About

Upgrading from Fedora 3 to 4

Adam Wead

Penn State University

awead@psu.edu / @amsterdamos

May the Fedora 4 be with you...

  • Currently in beta
  • Final release expected by year-end
  • Upgrade tools coming out with 4.1 release

What, we can't updatd until 4.1?!?

Who's Doing It Now?

  • Fedora4 beta pilots
    • Art Institute of Chicago
    • Penn State University
    • UC - San Diego
  • At present, only Penn State has existing Fedora3 repository

Fedora at Penn State

  • Repository storage for ScholarSphere and ArchiveSphere
  • Both are Sufia-based
  • 4775 objects / 37GB data in ScholarSphere
  • 72262 objects / 186GB data in ArchiveSphere
  • Looking to start migrating ScholarSphere at year-end
  • ArchiveSphere to follow after that

When I say Migrate...

  • Update Fedora to version 4, move data from a Sufia+Fedora3 repository to a Sufia+Fedora4 repository
  • Some details are due to Sufia's design
  • Most details will apply to any Fedora-based application
Sufia
 
ActiveFedora7 => ActiveFedora8
Fedora3 => Fedora4
Solr 4.x
Rails 4.x

Upgrade Overview

  1. Upgrade to latest hydra-head 7 with latest Fedora 3.x
  2. Make some decisions
    • any RDF translations?
    • version strategy?
    • object relationships?
    • intermediate nodes?
  3. Iterate over your Fedora3 repository, ingesting objects and versions into Fedora4
  4. Verify
  5. @noreply says you need to bang your head against it

Upgrading First

  • Upgrading to latest Fedora (3.8?) helpful but not required
  • Hydra updates are optional, but recommended
  • If migrating from an older Hydra, be aware of changes to models and datastreams
  • Example: ActiveFedora 7 now uses ActiveTriples for RDF datastreams

Decisions, Decisions, Decisions

Decisions: Descriptive Metadata Changes

Take advantage of native RDF in Fedora4


class FedoraThreeModel < ActiveFedora::Base
  has_metadata "descMetadata", MyDatastream
  has_attributes :title, datastream: :descMetadata
end
    			

becomes


class FedoraFourModel < ActiveFedora::Base
  property :title, predicate: RDF::DC.title do |index|
    index.as :stored_searchable, :facetable
  end
end
					

If MyDatastream is an ActiveFedora::NtriplesRDFDatastream under AF7, then easy, if not, you'll need to do some RDF translation.

Hey! What about terms with single values?

Decisions: Non-RDF Datastreams

  • XML datastreams are still supported via OM
  • No special treatment during migration
  • Best for storing generated XML
  • Only transfer versions unless absolutely necessary

Decisions: Versions

  • Versions have to be explicitly created during migration
  • Original Fedora3-assigned dates have to be preserved elsewhere
  • ActiveFedora tracks object and datastream versions separately
  • No object-level versions in ActiveFedora at this time

Decisions: Relationships

  1. Parent/child hierarchies of nodes
    • direct mapping of RELS-EXT in Fedora3
    • fixed points in the hierarchy with attached properties
    • currently supported in ActiveFedora 8
  2. Modeshape's weak references
    • allows you to make associations to nodes that don't exist
    • solves a particular use case in Sufia (batch edits)
    • currently not supported in ActiveFedora 8

Decisions: Intermediate Nodes

In Fedora4, every object must have a parent, even if you have a million objects and one parent

  • this causes performance issues due to Modeshape's behavior
  • need to resize the hierarchy with additional levels of nodes
  • we don't care about the these nodes
  • Sufia's solution: create intermediate nodes based on the noid
    • an object with an id of 12ab34cd
    • gets a hierarchy of nodes /12/ab/34/cd/12ab34cd
    • depends on how your objects are identified
    • specific to Sufia and not in ActiveFedora

The Upgrade Process

I thought he'd never get to this...

Getting from 3 to 4: Using Hydra

  • there is no ActiveFedora that does Fedora3 and Fedora4
  • access Fedora3 using the rubydora gem
  • utilize Hydra, with ActiveFedora 8, to converting/ingest data into Fedora4

Umm... have you actually tried this yet?

Getting from 3 to 4: Without Hydra

  1. Using a Fedora3 connector
    • Fedora4 accesses the Fedora3 API (readonly)
    • this might be the option provided in 4.1
  2. Don't use Hydra? Use it anyway!
    • you can still use rubydora
    • express your models and datastreams using ActiveFedora
    • use Hydra to migrate the data to Fedora4
    • you don't have to keep using Hydra

You have no idea what you're talking about, do you?

Plan of Attack

  • Work with DCE to complete AF8/Fedora4/Sufia integration
    • rightsMetadata RDF conversion
    • model BatchEdit with weak references or (strong) references?
    • reconfig audit process
  • Work with Duraspace on migration strategies and modeling decisions
  • Communicate progress to and elicit feedback from the Hydra community at-large
  • Circulate documentation and the tools that result from our work
  • Be the first guinea pig to help pave the way for future migrations

Tastes like bacon, right?

Thank You

Adam Wead

Penn State University

awead@psu.edu / @amsterdamos