Metadata – what a wild ride!

I remember being pretty terrified of this course. Some aspects of it, like all the XML back end programming bits, are definitely still out of my comfort zone, but I’ve come to have a better understanding of metadata systems as a whole than I thought I ever would. And I am extremely grateful.

I am grateful to Dr. MacCall, for always being patient when trying to describe something that was swimming over my head and for having a contagious level of enthusiasm towards the subject.

I am grateful to my fellow classmates. Sometimes it felt like being tossed out in the ocean, but there was always someone there to throw you a life preserver!

This has been a great experience. But my next class might suffer, because now I know of a million digital libraries and repositories to distract me!

MetaArchive – Digital Preservation

http://www.metaarchive.org/

“The MetaArchive Cooperative was founded to encourage archives, libraries, museums and other such organizations to build their own preservation infrastructures and expertise rather than outsourcing this core service to external vendors.”

MetaArchive is run by a board of member from highest level membership institutions.

MetaArchive uses LOCKSS (Lots of Copies Keep Stuff Safe – best acronym ever) software.

A member institutions gets content ready, the content source is visited by SEVEN servers that each replicate and preserve a copy. These servers are in varying geographic locations, as an extra security measure.

The servers continue to check back with the record to see if the member institution has altered it in any way. If a change has occurred, than both the changed version and the original copy are stored, so that previous records can still be recovered.

The seven servers do their own internal policing, checking for mismatch between files and repairing broken ones. The repair and the “bad” file are both stored so that the file can be restored if an error was made.

Of course, this system is not without cost. In the first year of membership, your institution must purchase a $4,600 server. You also pay $1.00 a year for every GB of content you store on the network.

On top of these fees there are dues that depend on the member level and institution you are. For most medium sized libraries, the fee would be about $6,533 a year.

It honestly seems like a pretty good way to preserve digital content, but it doesn’t really address the issues of changing and obsolete formats It does, however, deal with bit loss issues by synchronizing and checking for differences between the seven copies.

JPARC

http://www.glopad.org/jparc/ Japanese Performing Arts Resource Center

I mentioned JPARC briefly when discussing GloPAD, but I’d like the highlight the differences in it’s interface. All of the records in JPARC are GloPAD records, but the way they are presented does make it feel like a different repository.

JPARC has it’s own glossary of definitions related to Japanese theatre and performance art. This glossary is useful to both the causal searcher and those who would want to conduct true research on Japanese performing arts. JPARC also has a browse pieces, performer, and theatre option like that in GloPAD, but all the records are related to Japanese art works. And, JPARC certainly has one distinct over GloPAD at the moment…it’s Japanese text search function works!! This does not mean that JPARC is fulling functional to the designers wishes, as some records have not been fully connected and the “site help” page does not actually offer any help at the moment.

Under “Using JPARC”, http://www.glopad.org/jparc/?q=node/21, there are several short screencasts specifically geared towards use of JPARC, including how to browse lists, what JPARC is about, and navigating text links.

Though some of the sites functions could run a bit more smoothly, the material is really phenomenal. An interactive photo in the section titled “Noh Stage performers,” allows you to click each actor in the photo’s image to learn their role in the performance.  I highly suggest browsing through the site.

A main drawback of JPARC is the lack of advanced search. Some of the most captivating items in GloPAD/JPARC are the performance videos, but sometimes browsing through can make it difficult to find videos specifically. An advanced search function could be used to search specifically for videos.

http://www.glopad.org/jparc/?q=en/noh_performance/yam_dancetext

This link takes you directly to a set of Noh performance clips divided into each scene with descriptions of the action and meaning of the movements in each clip section.

 

Schema Set – VRA Core

Leah Allison presented on VRA Core 4.0 metadata schema. It was really interesting to me to see her presentation, because my repository (GloPAD) uses some aspects of VRA Core 3.0 in it’s metadata standards. In the VRA Standard, I recognized the work, image, and collection designations that were implemented in GloPAD. Work would be much like a particular performance of a piece, and image might be a representation of that production.

In GloPAD’s application profile, the many elements were taken at least in part from VRA core elements including coverage.spatial (mapped to VRA location), coverage.temporal (mapped to VRA date), creator – mapped to VRA creator, Creator.role (for multiple creator roles), culture to VRA cores cultural context, description to VRA description, format.medium to vra’s material and relation to relation. In fact, the majority of GloPAD’s elements had some connection to VRA elements.

I also really enjoyed being introduced to Harvard’s VIA collection and seeing how it differed in it’s more complete use of VRA core. Those were some beautiful wingalings!

Fun with Schemas!

Above is my handout for Lamont’s ID3 presentation. I had a lot of fun filling it out and getting a reminder about Lamont’s presentation, and I got to do a little end of semester stress release. Thanks Lamont for an engaging handout and a great presentation!

Much of the metadata work we examined in this class would not be done by someone with very little concept of metadata. The same is probably still true of the ID3 “casual user,” but it is much more approachable than some other standards.

It struck me in the presentation how much metadata is missing from official records through Amazon and Apple.

But as Lamont demonstrates in his screen cast, it is not so hard to use the tag and rename capabilities.

http://screencast-o-matic.com/watch/clhebQ5j5

 

Thanks for a great presentation on Tuesday, and a great overview of how to use the tag and rename functionalities of ID#

 

 

LONG term memory technology

Really long-term memory. (2009, August). Scientific American, 301(2), 14.

Now THIS is what I’m talking about! Making technology work for us, instead of working against the pull of insufficient technology:

“Today’s memory cards, holding 10 to 100 gigabits per square inch, last only 10 to 30 years. A solution could lie with an experimental memory device based on an iron nanoparticle that travels inside a carbon nanotube between two electrical contacts; an applied voltage shuttles the nanoparticle between the contacts. The device, described in the May 13 Nano Letters, can hold one trillion bits per square inch, and theoretical calculations suggest that the system could remain thermodynamically stable for one billion years.”

This is all theoretical, and the article is from 2009, but if the kinks are worked out of this technology it could be very promising for digital preservation purposes!

And for more of the actual science:

J. B. Cui,a) R. Sordan, M. Burghard, and K. Kern
Max-Planck-Institut fuer Festkoerperforschung, Heisenbergstr. 1, D-70569 Stuttgart, Germany

http://mpi-stuttgart.mpg.de/kern/publication/pdf/kk297.pdf

“Recent progress in the assembly of a small number of
molecules between two electrodes in sandwich configuration
has resulted in electrical devices revealing switching
behavior or negative differential resistance. These
achievements strongly encourage the development of molecular
electronic devices with the potential to overcome the
limitations of silicon-based microelectronics.”

For more information about how carbon nanotube memory works:

RESEARCHERS CREATE SPEEDY, RUGGED CARBON-NANOTUBE MEMORY. (2009). EDN, 54(5), 16.

Digital Dark Age

http://www.sciencedaily.com/releases/2008/10/081027174646.htm

University of Illinois at Urbana-Champaign (2008, October 27). ‘Digital Dark Age’ May Doom Some Data. ScienceDaily.

Another article addresses the issues with not being able to maintain digital information in a time of rapidly changing formats. Jerome P. McDonough, assistant professor in the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign, states that much of the data we produce today could eventually fall into a black hole of inaccessibility.

“If we can’t keep today’s information alive for future generations,” McDonough said, “we will lose a lot of our culture.”

Much has been lost already, including data from NASA’s 1976 Viking landing on Mars and census data from the 1960 Census. Many cultural items are locked behind proprietary software that will become obsolete. And, archives, governments, and libraries have been taking expensive steps to digitally preserve collections. If these collections are not properly preserved, the economic as well as cultural loss will be great.

McDonough suggests a multi-prong approach to avoiding the digital dark age:

“migrating data to new formats, devising methods of getting old software to work on existing platforms, using open-source file formats and software, and creating data that’s “media-independent.”

What I find interesting about these discussion of the digital dark age, is that there is little emphasis on attempting to develop more permanent methods of preservation. Of course, we do not want to put all our faith in an untrusted new technology that “says” it will last 100 years but has only existed for two. However, it seems to me as if our technology is moving backwards in some ways. It is amazing to me that we have 2,000 year old papyrus, but we are in danger of losing all of our digital record formats every 10 years! Obviously it isn’t as simple as “let’s go back to paper,” but instead of focusing mainly on transferring all our records whenever new technology comes along, the focus should be on improving the adaptability and permanent characteristics of current data.

 

What do data and a decomposing log have in common?

Answer – They both rot!

Data rot refers to the problems associated with the media information is stored on. Over time, temperature, humidity, exposure to light and other environmental factors cause data rot.

http://www.cbsnews.com/video/watch/?id=4836762n%3fsource=search_video

An edited transcript of the interview with Dag Spicer can be found here:

http://www.nytimes.com/2009/03/26/technology/personaltech/26pogue-email.html?_r=3

“Should you worry about data rot?” by David Pogue

 

An interview with Dag Spicer, is a main portion of the CBS video. The entire video is reminiscent of a something shown in a 6th grade science class in the early 90′s, despite being from 2009, but the style and simple explanations make the material approachable to anyone.

Simplistic presentation aside, the last line is basically the key issue when it comes to digital preservation:

“There never has been, and there never will be, a recording format that lasts forever.”

In the video you see an almost perfectly preserved campaign poster for Abraham Lincoln, and this is juxtaposed with statistics that a hard drive doesn’t last for much more than 5 years, and CDs last between 5 and 100 years.

The video and article are geared toward the general public. I consider that a demonstration of how large the problem of data rot is. I can picture entire generations losing wedding photos, baby photos, home videos, etc., because they didn’t realize about data rot or obsolete technology fast enough.

Though the “born digital” movement has allowed us to create more documents, images, videos, and audio than ever before, it seems likely that there will be less of this digital history preserved than ever.

This is a national issue, an international issue, not just an issue to be handled by archivists and librarians on miniscule budgets. Preservation of history and preservation of information are important for the functioning of society, and the word does need to be spread to the general public to warn them about data rot. Consumer and constituent choices will be critical to attaining funding to support research into longer lasting digital preservation technology.

 

Identifier Series: What do other universities do?

I attempted to do my best for our identifier element, and part of how I came up with possible options for our project was examining what other universities used for identifier in their digital repositories.

First up, the University of Southern California Digital Library:

http://digitallibrary.usc.edu/search/controller/index.htm

Some of the images in the digital library are from collaborating institutions, but most items are held by USC.

Their identifier label is “Record ID,” and the Record ID starts with a term denoting the collection it is a part of such as acsc for Automobile Club of Southern California or kda for Korean Digital Archives. This collection designation is followed by a dash, than the letter m followed by a number. I was unable to find the standard used to create this number, but it is likely either an asession number or an assigned number unique to each record in that particular collection.

Examples: jarda-m378

gg-m171

Next up: Indiana University Digital Library Program

http://www.dlib.indiana.edu/

In their “Archives Photo Collection,” the images have THREE identifiers. One, “image number”, appears to be an assigned unique number. The second, donor image number, is the location of the photo on a CD, the third, accession number, is the year acquired and a sequential number separated by a period. The sequence comes from when the item is entered into record. So, 95/014 was acquired in 95 and was the 14th item entered.

And finally: The University of Missouri Digital Library, which has specific image collections, one of which is a collection of University of Missouri Sports Posters.

This collection’s label for identifier is “Identifier Image.” and the identifier is a very simple file name, picture1.jpg, picture2.jpg, and so on.

For photographs held by other organizations, such as those held by the Boone Historical Society, the collections are designated similar to SC, with a collection abbreviation (bchs) followed by an assigned ordered number. (bschs-0001, bsch-0002, etc.)

The university also adds an “other identifier” label, that is an ordered number.

Examining these universities is what led me to create the identifier element indexing guidelines many moons ago. A lot of times the research, learning, and bookmarking part happens, but something gets short-circuited between those steps and the blogging step!

 

 

Personal Digital Preservation Posts – compiled

Below are my personal digital preservation posts:

From scroll to screen – Are e-books a path to prolonged preservation?

http://wp.slis.ua.edu/maccall-spring2012-ls566-14/2012/04/30/from-scroll-to-screen-are-e-books-a-path-to-prolonged-preservation/

Siobhan Davies Replay dance repository and Digital Preservation

http://wp.slis.ua.edu/maccall-spring2012-ls566-14/2012/05/03/siobhan-davies-replay-dance-repository-and-digital-preservation/

The Library of Utopia – Digital Preservation and Public Access

http://wp.slis.ua.edu/maccall-spring2012-ls566-14/2012/05/02/the-library-of-utopia-digital-preservation-and-public-access

Preservation of University’s Digital Heritage Part 1

http://wp.slis.ua.edu/maccall-spring2012-ls566-14/2012/04/05/preservation-of-universitiys-digital-heritage-part-1/

Preservation of University’s Digital Heritage Part 2

http://wp.slis.ua.edu/maccall-spring2012-ls566-14/2012/05/04/preservation-of-universitys-digital-heritage-part-2/

Preserving Digital Libraries – Syracuse University Wiki

http://wp.slis.ua.edu/maccall-spring2012-ls566-14/2012/05/04/preserving-digital-libraries-syracuse-university-wiki/

Digital Dark Age

http://wp.slis.ua.edu/maccall-spring2012-ls566-14/2012/05/04/digital-dark-age/

What do data and a decomposing log have in common?

http://wp.slis.ua.edu/maccall-spring2012-ls566-14/2012/05/04/what-do-data-and-a-decomposing-log-have-in-common/

LONG term memory

http://wp.slis.ua.edu/maccall-spring2012-ls566-14/2012/05/04/long-term-memory-technology/

 

Somewhat but not directly related links: Overview of the Library of Congress Flickr Project – Though Flickr might not be around forever, the act of placing records in a second location is part of preservation. Also, the public awareness created by publicizing a collection through Flickr can increase the public’s desire for digital archival efforts. This could lead to increased funding and research towards more permanent digital preservation.

http://wp.slis.ua.edu/maccall-spring2012-ls566-14/2012/05/01/library-of-congress-crowd-sourced-flickr-metadata-project/

http://wp.slis.ua.edu/maccall-spring2012-ls566-14/2012/05/01/library-of-congress-flickr-project-pilot-results/

http://wp.slis.ua.edu/maccall-spring2012-ls566-14/2012/05/01/library-of-congress-flickr-project-now-and-in-to-the-future/