The Early Novels Database Project: My first post of the summer!

Welcome to the Ea

rly Novels Database blog, summer 2010! This is the second summer Richard and I, both undergraduate students at Swarthmore College, have worked under the direction of Swarthmore Professor Rachel Buurma and our digital/technological specialist, Jon Shaw, on the web-based, open-access database that is "the END," and we're happy to be back on the sixth floor, examining [hilariously titled and] interesting rare books. For more information on the project, please feel free to look through our posts from last summer; for the current beta version of the END, go to http://syslsl01.library.upenn.edu/dla/earlynovels/search.html?q.

This post will be dedicated to what I think is important to get done in the month or so (!!!) of uninterrupted time I have left to work on the project this summer, and also what I will attempt to get done in the fall.

But first, here is what I accomplished this summer while Richard was off getting heatstroke in Tennessee and Jon and Rachel gave bi

rth to/cared for their really cute children: I created 989 fields for 147 records, tagged each noun, adjective, person- and place-name as such, and deleted the words in between that were not any of these things. This was more slow-going than I expected, mostly because I was also looking through the records to see if I could correct any obvious mistakes (and it was hard to do too much of this at one time). I read some articles on 18th-century indexing practices and the history of the book, namely "Fast Bind, Fast Find: The History of the Book and the Modern Collection" by Jeffrey Todd Knight, "Style, Inc. Reflections on Seven Thousand Titles" by Franco Moretti, "Richardson's Economies of Scale" by Leah Price, and "How Literature Becomes Knowledge: A Case Study" by Robin Valenza. I also wrote and submitted a proposal to present our project at an upcoming undergraduate digital humanities conference at Bryn Mawr.

Now that Richard is back in the world of air conditioning and Rachel and Jon are around again, it will be much easier to make some more visible progress on the database. For example, we can add new books to the database, which we began to do last week after we spent a day writing up call slips for 160 books (will it be possible to get all of these added to the database in the month of July? I think so!). Yesterday was spent adding new books to our database and attempting to get XP onto Richard's computer. On Thursday, Jon, Rachel, Richard and I are going to go to Bryn Mawr to look at some of their rare books in the hopes that we will be able to incorporate their collection into our database. So progress is truly being made.

In my opinion, there are things that must happen before we can confidently begin adding books to the database, however; I will now attempt to articulate what these things are so that Richard and I can perhaps accomplish them. Most of them have to do with standardization and record cleanup, things that were impossible last year but are crucial this year.

Firstly, Richard and I need to look through the records that we updated last year and compile lists of terms that we used to describe different aspects of the book. Last year, we had no idea what to expect out of the books we were looking at, since we had never physically encountered 16th- to 19th-century novels before. It surprised us, for example, that there was so much paratextual material in each book--this one has a preface! and a letter to the reader! and a dedication! and an index! and a half title! and so forth. Because of this, last summer's challenge was familiarizing ourselves with 18th-century novels (we primarily worked with novels from 1740-1749) and then figuring out how to identify and then label these unfamiliar and varied parts of the book. Because we didn't know what to expect from the books at the beginning, we ended up tagging traits of the novels in many different ways and with many different phrasings. As the summer went on, Richard and I managed to create our own language of standard definitions, most of which we shared with Rachel and Jon and recorded in a guide online, but some of which we did not, since either they seemed obvious at the time or we simply didn't have a standard set of terms yet. Below is a relatively tame example of what I'm talking about:

You can see here that I have used the "Author gender claim" facet; essentially, I have chosen to sort books by what claims the book makes about the author's gender. Here, there are seven categories that we can choose from: books that claim to have been written by a "Female," a "female," a "Male," a "male," or "Indeterminate," "indeterminate," or "Unknown." Clearly, however, there are really only three options to choose from in this facet: one that means "female," one that means "male," and one that means the book doesn't make any gender claims about the author. From looking at our wiki and what I remember from last summer, by the middle of last summer we decided to use the phrase "Indeterminate" to describe books that don't make any gender claims about the author; we also decided that every one-word phrase that is used to describe something in our database would be capitalized. Therefore, the correct three categories here should be "Male," Female," and "Indeterminate." Since we decided on this rule of standardization in the middle of the summer, it makes sense that we have a variety of terms here. We created these standards as we went along, since we decided at the beginning that we wanted the database to be defined by the collection, and not the other way around.

This type of error can be easily cleaned up, since all we have to do is pull up the improperly tagged books and change our phrasing. The more complicated issues lie in categories where we never standardized our phrasing and don't have their own facets yet, such as in the 261 field (printer's stamp). Let's say that we want to be able to search by the location of the printer's stamp in the book. Currently, we have phrased this every which way: "Printer stamp located on the verso of the first title page and on the verso of the last page," "Verso of last page of text," "Located on the verso of the half title and the last page of every volume," "Printer's stamp located at foot of last page," etc. Since we don't have a facet for the location of the printer's stamp, it will be a little more difficult to sort through all of the different ways we have described the printer's stamp location, but it will be possible (and necessary, if we ever want to create a working facet for the printer's stamp location).

So, the beginning of our potential to-do list (in order of perceived necessary occurrence):

1. We first need to look through each field of each record compile lists of all the terms we've ever used to describe the books. Most of this will be easy, since we have the actual END website and our facets to refer to, but it will take time.

2. After this is done, we must make sure that we have a set of standardized terms for each field. Much of this will be review, since we already went over a lot of the terms last year. This will require a meeting with both Rachel and Jon.

3. Then, we must edit the .bib records that are improperly tagged. (And perhaps this might also be a good time to do some more general cleanup, especially since whole fields are most certainly missing from some .bib records).

4. Then, we need to compose a list of all of our standardized terms, and also our formatting rules. Nothing should be left off; this should be The Guide to creating a .bib record. We already began this last year with our "Guide to Transatlantic.bib" wiki page, but it needs to be more current and detailed.

5. This list will lead to the creation of a glossary for the general public. It will describe the reasoning and meaning behind every facet/phrase we use. I definitely think that we should aim to get to this step by the end of this summer.

Of course, it will be possible to add books to the database while we are completing these steps; it might make more sense to clean up the records first, so we aren't adding mistakes upon mistakes. (At the same time, I think it is important to get more books into the database as soon as we can, so I'm not sure what I think about this last part.) Since there are always characteristics of novels that fail to be defined by preexisting terms, we will certainly need to add more terms to the database as we go along; we can always add terms to our glossary with little trouble.

These are all thoughts in progress, so please comment/criticize. I'll update again if I have more thoughts on what we should be doing now besides adding books to the database (and, you know, starting that article that's due at the end of the summer...should we talk about that soon?). Also, I didn't write about re-wording facets; that's my next in-progress blog post, which will focus on the problematic (?) 592 field. More soon!

-Anna

The Early Novels Database Project

Tuesday, June 29, 2010

My first post of the summer!

No comments:

Post a Comment

Relevant Links

Blog Archive