Metadata_Blog_Post_2012-19-2

David Leone “File Naming:tedious but important

http://wp.slis.ua.edu/maccall-spring2012-ls566-09/2012/02/14/file-naming-tedious-but-important/

 

In all the readings we have for this class, it’s difficult to get a handle on every one of them. I had overlooked a video tutorial about changing and creating file names that David’s blog brought to my attention. So, thank you David!

 

The four part video tutorial set is found here:

http://digitalpreservation.ncdcr.gov/tutorials.html

 

The third video “What not to do when naming files,” pointed out something to me that I am have done consistently, which is leave spaces in my file names. Apparently, spaces left in file names can cause problems with some computer software reading those names. The tutorial suggests using underscores _ or dashes – to denote spaces in file names. Another helpful hint is to always use the same date format when naming files.

The key what NOT to do’s:

Never use special characters (except the previously mentioned dashes and underscores)

Never use spaces

 

Never change the file extension (it doesn’t actually convert the file and might make it unreadable.)

 

Never count on capitalization:

I am guilty of this. I’ve though, oh, I’ll capitalize this for the final draft, then I’ve forgotten which was final, capitalized or uncapitalized! The specific reason they state in the video is because many programs read and search with capitals meaning the same as not capitalized letters.

The key what TO do’s

Make your file names unique

(don’t leave your image files pulled off your digital camera as random numbers, who knows when you might be moving or sharing your files in places where DSC 002 has been used many times before)

Give the file a brief, meaningful name

Make your date format consistent throughout time and include date information

Consider using consistent file naming guidelines at your job, if such guidelines aren’t already in place.

The title of this blog post is my attempt at practicing some of these naming rules, and good luck to everyone in their file naming endeavors!

Rights Metadata and views from a SLIS cataloger

Metadata has rights?? Who knew? By Kristie Thomas

http://wp.slis.ua.edu/maccall-spring2012-ls566-16/2012/02/13/metadata-has-rights-who-knew/

 

I read Kristie Thomas’s blog about copyright and metadata and really appreciated that it came from her prospective as someone getting started in the cataloging field. Reading through the article she reference myself, Rights Metadata Made Simple by Maureen Whalen, found at http://www.getty.edu/research/publications/electronic_publications/intrometadata/rights.pdf, I found myself thinking that it would be pretty time consuming to include and figure out all of the data points necessary to list copyright status. But, Mrs. Thomas makes this point “as a cataloger, you’re already filling in a myriad of MARC fields, what is a couple more?” With my knowledge of cataloging, I can get behind this. Just trying to figure out what number to give something is a headache for me, but I’m sure that practiced cataloger’s can move through these fields much more quickly, and getting into the swing of figuring out copyright data would probably take some initial effort, but then the process will be more streamlined.

 

The article includes a handy table on p. 7 with broken down examples of how to handle rights data on Public Domain and non-Public Domain works. Looking at the information provided earlier in the article in table form made the idea that this wouldn’t be too arduous a task to add on to a cataloger’s plate seem more realistic. And as Mrs. Thomas points out, it helps fulfill a Cutter objective of making the item available to users. When it comes to digital materials, copyright data and whether or not something is in the public domain could be even more important than with physical items, because users seem to see items found online as “fair game” when they are very likely copyrighted!

OAIster Harvesting

OAIster reaches 10 million records

http://oalibrarian.blogspot.com/2007/01/oaister-reaches-10-million-records.html

 

In the blog, the author Heather Morrison quotes Roy Tennant, the User Services Architect at the California Digital Library, relating his experience comparing the results from searching OAIster to searching using the same terms in Google. OAIster has several advantages over Google type search engines. Google and other search engines crawl web pages and index all the words all the words contained on a page, where as OAIster searches metadata elements in records, such as author, title, subject, etc. OAIster works by tapping into collection of a variety of institutions using the Open Archives Initiative (OAI) Protocol for Metadata Harvesting.

 

With OAIster having reached 10 million records in 2007 when this blog was posted, I imagine that the results would be even more stark now after an additional 5 years of record harvesting. I decided to conduct my own OAIster harvest and compare the results to a search through the Google bay.

 

I accessed OAIster at http://oaister.worldcat.org/ and decided to look for something a bit silly first. I have bobtail cats and one of them is cross-eyed, and the fancy medical term for “cross-eyed” is medial strabismus. I wanted to find out whether or not being cross-eyed and bobtailed was a common thing for cats. This first search was a bust for comparisons, because both tools had no results for “bobtail cat” and “medial strabismus.” Though, Google gave me a ton of results that were barely related to my search, including lots of records that contained the word media, whereas OAIster soundly admitted defeat and didn’t give me any worthless records. I simplified my search to “medial strabismus” and cat, and the very first result from Google was titled “Strabismus and Astigmatism in cats” whereas OAIster had no records again. The sixth result on the Google search was a scholarly resource, Neurology for the Small Animal Practitioner,” a book authored by Cheryl L. Chrisman and available in it’s entirety through Google Books. Google was the clear winner when it comes to learning about cross eyes in cats, but one extraordinarily niche topic isn’t enough to count Google as the winner. I decided to give the title to whichever record wrangler gave me the best 2 out of 3 results.

 

I decided to stick with cats, because with Dr. MacCall being such a dachshund fan, I felt cats needed some metadata love. As anyone who has ever been around a cat knows, it often seems as if they are on drugs, so I wanted to know what actually happened to cats when they are on drugs. I searched “cocaine cat” on OAIster and on Google (not Google scholar). The first three results from OAIster were “Effects of cocaine on the rate of contraction to noradrenaline in the cat spleen strip:mode of action of cocaine.”, “Effects of cocain and antidepressant drugs on the nictitating membrane of the cat,” and “Effects of cocaine or denervation on responses of isolated strips of cat spleen to noradrenaline and isoprenaline.” Those are clearly some scholarly stuff, and I could learn a lot about how cocaine acts in different cat body systems by actually reading the articles provided to me in full text through OAIster. This is the first result on Google:

The second result is more serious and titled “Illicit Drug Exposure in Cats” but it is on a site called “Pet Place” and not a trusted peer reviewed journal. The third result is someone asking “What happens if you give a cat cocaine” on Yahoo Answers. In the fourth result, some truly evil people have videoed what their cat did after they gave it cocaine. So, not only are these results not particularly helpful, but they can raise your blood pressure and get you angry, when all you wanted was some scholarly research! It’s a bit unfair to compare straight google with OAIster when it comes to scholarly resources, so I ran the search through Google Scholar also. The first three results were all from scholarly journals, but the second result was not actually about cocaine results in cats, google had picked up the word “cat” in the phrase “time cat.” Also, because OAIster only searches open archives, all the results you get will lead you to a record that you can read in full text, whereas only one of the three top searches in Google scholar lead to a full text PDF. When it comes to scholarly articles about the result of cocaine on cats, OAIster wins this research round.

 

Now for the tie breaker. We’ve searched for cross-eyed cats, and coked up cats, how about some crazy cats? I’ve known a few people who have had to give their cats antidepressants, like kitty prozac, because of cat behavioral issues. I decided to search for “antidepressants and cat” in OAIster and Google. The first result in OAIster was related to what I wanted, but the next two results were thrown off because the word “cat” is also located in “CAT scan.” I decided to take advantage of what OAIster has to offer and try an advanced search. I guessed that any articles I wanted would likely have cat in the title, and so put antidepressant and cat through a title search. I was left with a single result, “Effects of cocaine and antidepressant drugs on the nictitating membrane of the cat.” which had been the first result before the advanced search. The first three results on Google were about cats and antidepressants, but one was an article on treatment for urination issues on “Pet Place” and the other two were links to pet forums. Though I had a page full of results related to antidepressants and cats, not a single one was a scholarly source. A search through Google Scholar, however, brought me the exact same article as the number one result I got from OAIster, and half the results on the page were also related, though Google had also included “CAT” scans in their results. The first article was the only one available in a full text PDF, so in terms of actually accessing the scholarly material, this search put OAIster and Google at a tie on this question.

 

A tie breaker that ends in a tie? I don’t know how many more alliterative cat searches I can come up with! How about “cool cats?” We all know cats are the bees knees, but what happens when they get too cold? I searched “hypothermia and cat” in OAIster and Google. Once again, the first three results in OAIster were scholarly and on point, “The effect of aminosteroid, ORG 6001, on hypothermia induced ventricular fibrillation in the cat.”, “Effects of hypothermia and anoxia on retention of noradrenaline by the cat perfused heart.” and “Electron Microscopy of Cat spinal cord subject to circulatory arrest and deep local hypothermia.” Through a main Google search, the results were related to hypothermia and cats, but were not from scholarly sources. In Google scholar, the results on the first page were mostly about cats and hypothermia, but only two of the articles were full text, and one of those was not about cats but had picked up another use of “CAT.” I think that this leads OAIster to be the winner in this round, but it’s close, as the next few pages in Google scholar do lead to several more full text resources about the effects of hypothermia in cats.

 

My conclusion from the results of the cross-eyed cat, coked up cat, crazy cat, and cool cat search test lead me to a few conclusions. The first conclusion is that Google (not Google Scholar) is basically useless when it comes to finding a scholarly search on the first page, and probably on the second and third. If you are looking for scholarly resources, going to Google alone is going to lead to a lot of headaches. My second conclusion is that though OAIster was the winner of this search round, Google Scholar is not bad as a close second. Google Scholar also offers an advanced search that you can use to return articles from particular journals, by particular authors, and you can search for words in the title only. There is not an option to limit the results only to what would be freely available (a free full text search), but it looks to me as if Google is pulling from some of the same records that are available to OAIster.

 

MODS vs METS: XML Schema Battle Royale

So, the title is a little misleading, or maybe a lot misleading. I’m trying to make schemas exciting here folks! There isn’t really a “battle” between the two schemas, but they can both be used to fill different niches in metadata services. Acronym decoding time:

 

MODS – Metadata Object and Description Schema – it’s MARC-compatible XML and used for encoding descriptive data.

 

METS – Metadata Encoding and Transmission Standard – a schema that is used for packaging descriptive metadata to assure the use and preservation of digital resources.

 

 

MODS is a useful encoding schema because it works well with MARC 21 and though it is simpler, it has ways of preserving items such as linked entry fields. MODS can also be used not just for displaying already existing material, but also for original resource description, because it is compatible with many library descriptors and is expressed in XML. Another positive to MODS is that unlike MARC records the tags are language based, rather than a number code.

 

METS is open source, which means that it is available freely and can be changed easily by developers. METS is composed of six parts, header, descriptive metadata, administrative metadata, file section, structural map and behavior section. METS works with several schemas as descriptive metadata, including MODS, Dublin Core, and MARCXML. The administrative metadata section is particularly useful because it can contain information about past changes and origination of the data, which are good for preservation. METS has been useful in the effort to digitize physical materials, but the extreme flexibility of the system could lead to interoperability issues.

 

Great Googley Searching

http://acrlog.org/2012/01/27/convenience-and-its-discontents-teaching-web-scale-discovery-in-the-context-of-google/

Convenience and its Discontents: Teaching Web-Scale Discovery in the Context of Google

by Pete Coco

 

The post highlights the balance that library professionals must walk with introducing students to new “Google like” search tools, such as Summon. Though these tools are more convenient for users who are acclimated to the Google style of search, they don’t produce the highest quality of results. For the student users, they use the Principle of Least Effort approach. They’ve found something using the scholarly web scale tool, and even though it’s not quite what they were looking for, it will do. However, if they had received a better set of instructions prior to their search, than students can use these web-scale discovery tools in a more refined manner and receive better results.

 

An interesting point is made that students might think that if this library search tool is like Google, why do we need the tool anyway? The author says that he can illustrate the difference between the two services with a simple example of running “Batman” through both and seeing the different results. I think that it is clear that students who are introduced to databases/other library web resources are going to need some instruction, even if the search tool looks like a Google search bar. They are going to need to be told where Discovery Service or Summon are found to start! I don’t see there being any issue with providing a tutorial of these search tools. If students know how to use them properly, they will see the results in their searches and the ease with which they can find and use research. Being taught Boolean terms and ways of refining a search is helpful even when using a search engine like Google, so librarian educators should not shy away from maintaining the need for user instruction of online library search tools.

 

 

Television Metadata, Do Viewers Want It?

Television Metadata: How Low Can We Go? Caitlyn Rush

http://wp.slis.ua.edu/maccall-spring2012-ls566-02/2012/01/30/television-metadata-how-low-can-we-go/

Caitlyn Rush’s blog about this blog post (http://www.appmarket.tv/opinion/1347-scene-level-television-metadata-tagging-tv-is-the-new-oil-in-the-industry.html) by Richard Kastelein opened up some interesting lines of thought for me. The example of a real-time pop-up on your i-phone of the store website of the designer of a red carpet dress reminded me of some user generated metadata I found recently. This website, http://curvio.com/, was linked to on the social aggregate news site reddit.com. Curvio is run by only a few people, so it has a limited number of data points and shows, but it features styles worn by various television starts on an episode and scene basis. The styles are then linked to websites that offer them and potential prices. So, you start with the data from the show as a viewer, which only consists of an image of the actor in some clothing, and then you have the data of price, brand, and purchasing location of the garment. This leads me to believe that even if “second screen” pop-ups are still in production, there is soon going to be greater integration between commercial product marketing, television content, and online content. What I find most interesting about this particular website is that it is run by fans of the fashion on the show, and not marketers of the show or the clothing featured.

I know that I am definitely guilty of dividing my attention while watching a program. I also use an adblocking program on my browser. This makes me wonder how receptive people will be to marketing intrusions on their “second screen” when they watch TV. I remember the advent of Tivo being seen as the “death” to commercials and how TiVo and other DVR devices did lead to many people skipping commercials. Will people opt in to this second screen content, or will it be forced, and how will people react to the intrusion on their second screen? As many issues as there are with the actual implementation of this scene by scene metadata marketing, I can’t imagine that the technology to block these marketing efforts will be far behind.

Born Digital and Films in the Digital Age

Defining “Born Digital” by Ricky Erway

http://www.oclc.org/research/activities/hiddencollections/borndigital.pdf

Let’s start out with the definition provided in the essay:

“Born-digital resources are items created and managed in digital form.”

Unlike film photos or archival materials that can be scanned and turned into digital objects, born digital resources have lived their entire life in a digital format. Some examples of born digital materials are digital photographs, digital documents (such as those created in a PDF or word processor), archived web content, digital art, digital manuscripts, and digital media publications.

As we see much of the world media and information content moving from tangible objects to digital objects, there are some concerns to deal with. Included in those concerns are bit rot, when files deteriorate over time and are not longer readable by software, the possibility that the media will become obsolete, as floppy disks and VHS tapes have, and along with the media becoming obsolete, there is the danger that the hardware and software which process the media will become obsolete.

Considering this issue of born digital media makes me think about what this process means for different sectors of the economy and for preservation and organization of digital output. One area of media that is making the major transition to digital is the film industry. Many prints of old films are becoming permanently lost because film canisters take up a large amount of space and are very delicate when it comes to temperature and humidity changes. From an archival standpoint, the cost of maintaining these collections is prohibitive, and preservation of these films is not considered enough of a priority to ensure adequate funding.

Could the digitization of film prevent a situation like this in the future, or are we setting up the same problem in a different format? If the technology for digitizing and playing digitized films changes drastically in the next 50 years, these born digital films will be even less accessible than 35mm films from the 1930′s are now.

This article discusses some of the setbacks to moving forward with born digital films:

http://www.marketwatch.com/story/hollywoods-move-to-digital-will-end-an-era-2012-01-26?pagenumber=1

“The science and technology council of the Academy of Motion Picture Arts and Sciences recently came out with a 136-page report called “The Digital Dilemma 2.” It concluded that digital technologies “do not guarantee long-term access to digital data; compared to traditional filmmaking using motion picture film stock, digital technologies make it easier to create motion pictures, but the resulting digital data is much harder to preserve.”

It is cheaper and faster for film producers to create films using digital technology. Instead of going through film dailies, and having extraordinarily time consuming editing processes, everything can be done quickly on computers. This is greatly opening the film community to independent productions, since the expense of filming on and editing on film is not an obstacle. However, the cost for movie theatres to transfer to digital is prohibitive for many small movie houses, and as more theatres switch to digital, there will be less access by the public to old films because there will be fewer theatres with the technology to support them.

Persistent Identifiers and Not So Persistent Technology

The blog post “More famous than Simon Cowell” was a humorous look at the idea of persistent identifiers when it comes to internet material. The post, found and identified by this http URI: http://www.blipfoto.com/entry/465380 discusses the tenuous nature of persistent identifiers in electronically programmed content, including Andy Powell’s own post, and now this post. Despite the fact that http URIs might in the far future be completely useless as persistent identifiers, they are still useful in that they are so wildly understood as a place marker for information. The title of this particular blog post refers to that fact. A http URI is more recognizable to a majority of the world population than any celebrity, such as Simon Cowell.

 

As useful as http URIs are now for locating and holding content, what does this mean for our future? The internet has seen the creation and dissemination of incomprehensible amounts of data, but what if the layout of this system is drastically changed in the next fifty or one hundred years? Do we, as the author suggests, not care if we lose all the blipfoto’s and blog posts that have been created? It seems as if yes, there has always been a point where some materials and resources are retained, and the vast quantity are lost. How many private letters have people sent and destroyed over the course of written history? How many books have been written and then completely lost to mankind, by the destruction of the library at Alexandria, or by not being considered that great in the first place? Even though it is true that every live journal or blog post or flickr account isn’t a work of staggering genius, it is distressing to think what important works might be lost out there in the sea of information, and that might be lost forever if a programming language or a domain name changes.

XC, DC, RDA, WT?

Getting into the swing of reading technical material about metadata can be difficult. While trying to work my way through “Supporting the eXtensible Catalog through Metadata Design and Services,” by Jennifer Bowen, I found myself having to stop and re-read almost every paragraph, because it seemed as if every sixth word was an acronym or unfamiliar term. I would find myself casually reading through the acronyms without my brain fully processing what it stood for in the context of the sentence. Part of this is my current unfamiliarity with these terms, but I think another issue is that these papers are technical and dense. Even if you are familiar with what the terms stand for, as I was by the end of the 22nd page, your brain can gloss over them and not fully absorb the material. I think that examining this realization has made me appreciate the importance of effective blogging and review of these topics.

 

After reading the paper, reading the blog post “More on XC from David Lindahl” at http://acrlog.org/2006/05/17/more-on-xc-from-david-lindahl/ actually gave me understanding, instead of being lost in a sea of information. I’m sure as the semester progresses, I will become more comfortable with the many acronyms and terms involved in metadata, but I know that I will also still find it useful to read condensed, thoughtful summaries of material, like those presented by my classmates in their blogs and by library professionals.

 

On the actual topic of XC (an acronym for eXtensible Catalog), the technology sounds promising and useful. Being open source allows for greater access to all the programming savvy library professionals and gives it a high degree of flexibility to adapt to the ever changing online internet landscape. The vast amount of metadata that libraries collect have a way of being stuck in libraries and are often organized into interfaces that users have difficulty working within. If XC can be approachable to users in the same way that using a search engine is, while yielding higher quality results, it could open up the availability of library held metadata to the public and cement the need and usefulness of library resources in the minds of internet generations