Friday, December 07, 2007

Interpretations of FRBR Classes

Because it makes use of an entity-relationship model, FRBR consists of two primary concepts: things and relationships. (I often think of them as nouns and verbs.) In the "things" category, FRBR defines 10, which it calls entities. They are: Work, Expression, Manifestation, Item, Person, Corporate Body, Concept, Object, Event, Place.

This is an admirably short list of basic building blocks for bibliographic data. The question is: is it enough? Can we really express our bibliographic data with just these basic concepts? The answer is: probably not. Although we should take a lesson from FRBR and try to keep our set of basic entities small, while allowing for extension of them to express more complex concepts.

As an exercise, I took two well-known attempts to model FRBR using formal definitions. One is the FRBR in RDF, the other is FRBRoo. I also took the RDF entries that Martha Yee created for her cataloging rules and added those to the comparison although it is important to note that Yee's set of RDF statements is intended to go beyond FRBR since it is an expression of cataloging rules, not just the FRBR model.

In each of these three efforts, the FRBR entities are recorded as classes, and the FRBR relationships are recorded as properties. This is in keeping with the definitions in the RDF schema. What is interesting is the number of classes that are defined:

  • FRBR in RDF: 13 classes
  • FRBRoo: 23 classes, 18 sub-classes, 41 total
  • Yee's schema: 23 classes
These are compared to the 10 classes (entities) defined in FRBR. Since no one defined fewer classes, we need to look at what additional classes were defined. But first, there are a few cases where FRBR classes were not included, usually because they were substituted with a set of more detailed classes.

  • FRBRoo does not include Manifestation, but instead has Manifestation product type and Manifestation singleton
  • Yee's substitutes Event as subject for the FRBR class Event and substitutes Place as geographic area and Place as Jurisdictional Corporate Body for the FRBR Place
FRBR in RDF

FRBR in RDF adds only three classes. Two of these (Endeavor and ResponsibleEntity) are supersets of FRBR classes. Endeavor is a generalization that can be related to a work, expression, or manifestation. Similarly, ResponsibleEntity is a more general term that can relate to either a corporate body or a person. Both of these seem fairly sensible, allowing you to refer to the intellectual content or some actor without having to specify more information. It's like being able to say "it" without having to saying exactly to what you are referring.

The third class that is added is Subject. As a matter of fact, all three of these include some instance of subjects as classes in their schemas. FRBR clearly treats subject as a relationship. (And I would like to understand why these three interpreted subject as a class -- so post if you have ideas/knowledge on that, please.)


FRBRoo

FRBRoo is a very interesting interpretation of FRBR. As they state in the document, attempting to re-define FRBR using object-oriented rules rather than entity-relationship rules is a way to test the underlying concepts in FRBR. They also tackle the elements that in FRBR that are called "attributes." (Aside: The FRBR attributes are a bit odd, IMO. They seem to be all over the place and there is no explanation of how they were determined or any way to give them some organization. I don't think they actually fit the definition of attributes in E-R, which seem instead to be on the order of identifiers). The folks working on FRBRoo decided to treat the attributes as properties, that is, relationships between the classes.

FRBRoo defines 23 primary classes with 18 subclasses. They address the issue of complex items, such as articles within serials or collections of essays, by creating classes for aggregate and serial works. Some of the classes seem to be what I would normally understand as genres. As an example, there is a class Performance Plan that is described as:
This class comprises sets of directions to which individual performances of theatrical, choreographic, or musical works and their combinations should conform.
Another example of a new class is Publication Event. This is an action that is part of the work flow of publication, such as

Establishing in 1972 the layout, features, and prototype for the publication of “The complete poems of Stephen Crane, edited with an introduction by Joseph Katz” (ISBN “0-8014-9130-4”), which served for a second print run in 1978.
Being an action, I would tend to express this as a property (a verb). So the layout, features, etc. could be subclasses of a manifestation, there would be an actor (a noun, or a class, probably the publishing house, or more specifically a book designer), and a time. The verb (or property) could be "designed" "typeset" "printed" etc. This makes me wonder about the FRBR class Event as a noun, but I think I could buy into a concept of named events ("WWII" "Election day 2008" "Beatles first appearance on Ed Sullivan"). Interestingly, it does appear that all of these are events as subjects, as Event is defined in FRBR; the FRBRoo event does not appear to have this noun-ish characteristic.

Yee Schema

Martha Yee's set of classes (23 of them, but not the same 23 as FRBRoo) includes Genre/Form as a class. Genre/form seems to be more of an attribute about a work rather than something that has "thingness" in itself. It's hard to imagine how you can have genre/form without it relating to a work. (As opposed to: you can have a person or a corporate body that are things in and of themselves -- that have specific, unique identities.)

It has some classes that might be considered sub-classes. For examples, Place as geographical area and Place as jurisdictional corporate body would seem to be sub-classes of Place, although Yee does not include Place itself in her schema. I'm less clear about classes such as Corporate Subdivision, which has a part/whole relationship with Corporate Body, not a sub-class relationship. (Sub-class would be an "is a type of" relationship, and corporate subdivision is not a type of corporate body, it's a part of a corporate body.) Ditto the subject-related terms: Subject, Subject subdivision, Subject chronological subdivision, Subject form subdivision, Subject geographical subdivision, Subject topical subdivision. In FRBR, the subject is a relationship with the work. These look to me to be relationships with the subject heading, although there is no class for subject headings (unless that is what is meant by the class Subject, but I don't think it would be a good idea to equate subject with subject heading because it makes it impossible to include classifications as subjects or keywords as subjects).

What's the upshot? Well, it would take a good sit-down with all involved to hash out the differences, to understand what each group or person was thinking, and to see if we can formulate a theory of how one extends FRBR to meet ones needs. If a number of people turn out to have the same needs, then it may be that the FRBR model itself needs to take in those ideas. The only way to work this out is to keep modeling and sharing. So I thank the three featured here for the extensive work that they have done in this area.

5 comments:

Bruce said...

The table is useful. FYI, there is also an extended FRBR ontology.

For background: Rich had started with a rather baroque integrated ontology. When Ian started working with him, he suggested splitting it into core and extended.

Oh, and if you're curious, the music ontology is an interesting creative interpretation of FRBR.

Owen said...

re: use of 'Subject' as class in FRBR in RDF.

I don't have any special insight, but it looks to me that the FRBR in RDF has included 'Subject' as a 'superset' in the same way as Endeavor and ResponsibleEntity - so it isn't meant to replace the idea of showing 'aboutness' by use of relationships. In this way it seems relatively consistent.

Andris said...

Beside a "good sit-down", it would be useful if we'd list hard-to-describe cases, excercises (like "an article in a newspaper", "translation of a book based on a movie", etc.), that all of us would complete with her or his own vision, or version of FRBR. RDF/XML (the hierarchial), the Object-lover, and my favourite, the Relational-way are different schools for similar problems. Religions for "beleivers". It's better see each "at work", not by itself.

We're (my and my collegue in Hungary) developing a union/social catalog, and we'll probably use the following relational schema to achieve FRBR like functionality in it. The "Work, Expression, Manifestation, Item, Person, Corporate Body" part of it.

http://www.gliffy.com/publish/1334968/

Works and Expressions are in one table (this gives us flexibility, we don't have to argue in which one a certain thing belongs). A Work is a Work in the FRBR sense, but Series, collections of novels, poems are also (several fields will help to describe the kind of the Work: novel, short story, article, etc.). Flexible connections can be formed between them ("translation of", "based on", "version of", "part of" - for series, etc) instead. Persons, Coropate bodies and Publishers are also in one table (Entity), and also using connections between themselfes (with possibilities for "part of", "successor of", etc.). Entities contibute to Works through Roles (publisher of, writer, illustrator, etc.). Works are conencted to Publications (Books, Serials, web-portals) through Content in an M:N relation. Each editions of a Book is a separate publication, and individual Newspapers too. Copies are equal to Items.

Maybe my english (and so choosing of words) are not perfect, but I hope it's still understandable - and you can see the difference, the flexibility of the version I propose.

I'd be happy to hear about hard to describe cases...

Anonymous said...

Andris:

Thanks for sharing your schema.

Have you elaborated something to add places and geographical data?

If you wish to have exercises, you could give a try to describe correspondance (letters) publishing down to individual autographs.

Years ago, I helped a group of academic editors to index Victor Hugo's letters and, given dBase capacities at the time, had made choices very similar to yours: persons and corporate bodies ("personnes morales" in French legal terms) were in the same table.
Relations between individuals and bodies were managed through autojointures.
A tricky one: a letter addressed to a family... Where do you stop in the genealogical relations?

The same went for places.

Another tricky thing is the description of dates and durations of relations together with events.

Andris said...

Alain, I'll think about your examples... And come back later.