An alternative archive of 1760s novels, in a series of Vines

This project aims to create an alternative archive of Vine videos on the subject of 1760 self-labeled novels (books that include the term “novel” in their titles). It was conceived of as a response to a data problem in more traditional archives relating to these books, one that I believe represents a more integral problem in those archives and, by extension, in the research that takes place in them: data and metadata tend to be available for works which we have historically valued, and so research centers on those works, and canonical, traditional views of literature and its history are cyclically perpetuated (more on that here). I contributed to this problematic cycle when I eliminated the genre-category of “novel,” and so a large group of canonically “unimportant” books, from a project I worked on this past summer; I was deterred by the historically-determined data problems I encountered (specifically, lack of OCR digitized copies of these works). With these videos, I am hoping to in some sense work against this cycle, firstly by by creating an alternative archive to record and preserve 1760s novels, and secondly, by involving as many people as possible in a project and a conversation around these works. The project is inspired by and connected to END’s metadata project; hopefully, this archive and the linked END archive will provide alternative sources of information on often ignored works.

The Vine archive is available here!

Adventure, History, Letters, and Memoir: Mapping title to text in the 18th century novel

Eighteenth-century fictions often announce their genre in their titles: adventures, memoirs, etc. But what, if anything, do these “genre keywords” in titles actually indicate about the texts? Because researchers in the digital humanities frequently use metadata, like these titles, as a representation of a full work, it is important to investigate the connection between these two elements – here, title and text. To begin to analyze connections between title metadata and full-text data, I focused on the 1760s, using a small dataset of texts specifically from 1760-1770 (these texts represented the entirety of the “genred” texts I found searching through clean full text databases for titles on a full list of 1760s fiction in English created by gathering titles from ESTC and Raven’s bibliography of English fiction 1750-1770). I encountered significant data problems here – see this blogpost about those. My analysis of the four genres I ultimately worked with (adventure, history, letters, and memoir) suggest some very preliminary conclusions we can draw about the connection between the “genre” of a text based on the genre indicated in its title in the 1760s.



I collected as many titles published from 1760-1770 that included one of four “genre keywords,” including both titles first published in that decade and reprints. I chose these four keywords after making a wordcloud using full list of 1760s fiction titles compiled by END;[1] the wordcloud helped me determine some of the major, repeated keywords that might plausibly indicate some kind of category for the books they were labeling. Novel was originally a fifth category that I planned to use as one of my genres, but (see separate END post on this) I had to eliminate it due to a lack of available clean full texts of “novels” from this period. Some of the works I found fell into multiple genre categories, and I assigned them to one based on the genre categories that had fewer or more works.[2]

I ran a topic model across all my files,[3] which resulted in two pieces of output that I used in this analysis: a “composition” file, which shows the percent of each individual piece of text in the corpus that appears in each “topic;” and a “topic key,” which shows topics, or sets of words that probabilistically appear together throughout the corpus, and the relative “weight” or prominence of those topics. By going through this composition file, I was able to take the individual texts from each genre and re-calculate the percentage with which each of the genres appears in each topic. These are the percentages that appear throughout the analysis. The topics in the topic key are each actually very large, as all the words (minus stop words) in the entire corpus are divided amongst them, but what appears in each “topic” of the key is the top twenty most significant words of those keys – my analysis focuses on these words as indicative of the full topic. The “topics” in the topic key each have a numerical label, but because the computer generates them using probability, without any understanding of their “meaning,” it is up to the researcher to determine what the topic output actually means.

This meaning-assignment stage of analysis is one where the researcher’s subjective interpretations can get confused with the computer’s “objective” output. Throughout the analysis, I try to refer to the contents of each topic and explain the logic behind the “working titles” I used to compare the different topics and by extension the genres. But these working titles inevitably limited (even as they enabled) my analysis. For example, I called topic 1 “general positive human world,” certainly an interpretive leap from the computer-generated cluster of words in my topic key. For this and many of the other topics, I could have interpreted the key differently, or even given the topic a slightly different “name” and thus worked with the corpus at large differently.

The immediate work of looking through the differences between these title-defined genres resulted in some skewing of the data in both the memoir and letter categories. 29% of memoir appeared to be in a single topic in which no other genre appeared, and 35% of letters likewise appeared in a topic with a 1% showing for each of the other genres. Each of these topics seemed to correlate with a single work (in letters, Pamela, and in memoirs, The adventures of Peregrine Pickle. In which are included, Memoirs of a lady of quality), both of which were originally published before the 1760s, both of which were very long compared to the other works in their genre sections, and one of which included two genre keywords in its title. All of these factors may have contributed to their skewing of the data; to deal with the problem, I recalculated the percentages in each topic for the two genres without these particular works. All further analysis was done with these altered percentages in mind. The problem with my solution to problem #1 is that it makes my corpus of texts in both these categories smaller, and more subject to the particularities of other individual works, something I tried to bear in mind as I did my analysis. Another note on the particularities of this data as a product of topic modeling is that some of the topics should more accurately be split into two topics that happen to frequently appear together, or two topics might really be practically identical and could easily to be combined into one; this, too, I tried to account for in the analysis.




Although the majority of my analysis is genre-based, there are a few interesting categories that showed up with similar rates in all of the genres, some of them in very large quantities. The first, and largest, of these is topic 1, which I gave the working title “positive present human world.”

This is what that looks like from mallet:

1     1.23702    make give time good great life world present part mind thought person reason find pleasure manner love till kind heart

Broken down by genre, all the categories of texts fit into the topic with between 19% and 22%. This seems to suggest that all the texts approach the present human life as something that is ultimately positive: pleasurable, kind, reasonable. The verbs here are make, find, and give, and suggest that action in this positive world is generative and creative, with a purpose that propels into the future and is positive in the moment. More on this specifically in the analysis of the histories.

Another, much weaker category (9% in letters and 4% in all the other categories), but still notably evenly distributed category was number 11, which I dubbed “English people and especially men as intelligent and powerful.”

11    0.31762    man country men people letter great nature learning genius english proper beauty public human found author wisdom history taste china

This topic seems to locate wisdom, genius, and perhaps a history of those things with nature and English people, particularly men. It presents a nationalistic, patriarchal and (again) positive understanding of the world, when the world is premised on those nationalistic patriarchal terms. This is a weak topic, but its relatively even distribution suggests that the “weakness” of the topic refers to its lack of centrality in any particular text – it isn’t the “point” of any of the narratives, but rather a simply accepted fact in all of them, always present but never prominent, just as the kind of implicit assumptions the topic suggests tend to be.


The adventure texts, along with the history texts, grouped very clearly in a certain set of topics (perhaps because they are the categories I have the best data on, perhaps for other reasons – more on that when I get to the more ambiguous “letter” and “memoir” categories).

By far, adventure had its highest percentage in topic 19, at 25%, and (leaving out topic 1) topic 16, at 14%. Topic 19, which I named “public masculine economic activity,” looks like this:

19    0.10232    master guinea adventures sir made directly proper general chapter moment success power service business gave raised nature make human money

It suggests that “adventures” show up around business and money, and that they yield success, with power mixed in somewhere, perhaps in the conditions or the yield of adventures. All of this is (or at least shows up around things that are) natural, human, and proper. The topic is “masculine” in that the few person-indicators here are masculine, but probably first because the public economic world “adventures” seem to take place in are predominantly masculine in this period.

Topic 16 is formatted similarly, in a different location – if 19 is a public economic world for adventures to inhabit, 16 is the social world it inhabits:

16        0.65228           made man time gentleman place money company day young put house immediately gave master set honour friend people good gentlemen

As with topic 19, 16 is masculine-specific, but it focuses on the home; where the economic movements of the adventure meet the social world, they result in this “masculine domestic” topic, interesting in the context of debate that often focuses on hard-defined edges between male-female, public-private binaries. There aren’t a lot of verbs visible in the topic here, which implies a contrast to the economic activity that dominated topic 19. But the non-verb words that do appear in the topic suggest movement and action (e.g. day and immediately); the kind of action here suggests, however, the way in which the masculine domestic plays into the masculine public of topic 19. Immediately and day imply an outward focus, the possibility of motion in the future pointed towards windows signaling morning; even the word “company” suggests a porous boundary between the home and outside world. These promises of motion and references to the outside, without the overt action of verbs, suggest that the masculine-domestic allows adventures a connection to, and perhaps purpose within, the social schema it largely eschews, while still decentralizing those things from its narrative (this may function similarly to the socially-defined positive qualities like “proper” that appear with adventures in topic 19). Topic 2, which is adventure’s third-largest topic and references family positions and roles of both genders, totaling 11%, is actually the lowest genre showing in that category. And adventure is the only genre category with 0% in all the topics that suggest social/familial roles and relationships. It is focused on a world that lies outside of the social world of family and women in general, and it is perhaps for that reason that it seems to take characters that embody the norms of the social world – they are “gentleman” (although this could refer to their social position rather than their social behavior); they display “honour”; and they are significantly attached to their homes and households.

It is surprising that these proper, active men seem to be openly pursuing economic activity; markedly absent are words that signify glory – although adventure shows up with 6% in a maritime-focused topic, #7, the words there seem to signify means towards an end rather than the “ends” we associate with military and often adventure, e.g. commodore, hatchway, and consequence rather than glory, justice, freedom, etc. 19 is also one of two categories (the other is concentrated in history) that includes both the past and present tense of the verb “make.” It is worth noting that “made,” in both present and past tense, is the most frequent verb to appear in the top 20 words across all topics (what we see in the “topic key”), with 7 instances. But this particular combination of made and make in one topic suggests something generative in whatever else is happening in the topic, something that is creative in the past tense and moves forward into the present. It isn’t completely clear here what is being “generated” or made in this topic; perhaps it is money, perhaps it is the kind of proper rather than dangerous masculinity that adventures seem to rely on. Or perhaps it is the social order that adventure stakes a strong, if inattentive, claim to.


When I broke down the topic model I ran across all the texts by genre, the group of genre texts with the widest distribution amongst the topics was the history group (history texts appear in 16 of the 20 topics, followed by memoir at 15 and letters and adventure at 13). This means that “high” instances in history are comparatively lower than in other genres. The topics within which histories cluster most significantly, however, could all be grouped together as social category-focused. Aside from topic 1 (21%) the top categories for history are 2, 4, 9, 12, and 15. Topic 2 is the family role topic that was comparatively weak in adventures, at 15% for histories. Topic 12 is a similar family-specific topic, but with a focus on a male led family household (father, master etc without female equivalents like mother or madam). Topic 4 (8%) seems to delineate social roles (sir, gentleman, madam) in combination with speech and social qualifiers like age (“young”) and “manner,” but eschews family-specific social roles (mother, father); it seems to present the public face rather than private face of social interaction. Topic 8 is very similar to this one, at 4%, also noting social (but not family) roles, conversation, and youth. It is interesting that the specific age that gets mentioned in these two categories is youth – perhaps this is because that is the age most worth noting in a character, or perhaps because the focus of histories is on the youth (within the context of their social world and families).

This plays into several interesting qualities in topic 9, which is almost exclusive to histories, at 8% (1% from both letters and memoir, 0% from adventures). It is the only topic that seems to focus on romantic relationships – not, notably, on the emotions of romance, but on its formal social elements; this appears in words like “love,” “hand,” “dear,” and “hope.” But the category is also, in addition to adventure-heavy topic 19, one of two topics to include both the past and present forms of the verb to make. What is being “generated” in this topic is more suggestive than in topic 19:

­­­­­9      0.07818     sir man dear miss lady madam lord love charles good heart harriet lucy woman brother made hand hope make tho

The combination of a socially sanctioned and protected romance with a generative quality, in the wider context of the social-role and perhaps youth-focused histories suggests that what is generated in these histories, the aim of the socialization of youths through their families, is the regeneration of the social structure from the past to the present.

Topic 15, also notably strong in history (10%), is the only other topic that seems to approach physical, bedroom-located relationships – it probably denotes either romantic/sexual relationships or emotional lying-on-the-bed-crying-or-praying scenes.

15    0.69551    hand eyes night head face replied hands door found began room time heard soul bed lay left stood fell cried

If it denotes the former, this topic, distinct from topic 9, is not generative, and it is not positive. “Good” and “hope” can accompany “hand” in topic 9, and even “love” is present, but in topic 15, where the body is expanded through four body signifiers and the verb “lay,” urgency and negative emotion replace positivity – “cried” is combined with “left” and “fell.” The high level of motion here isn’t directed as the simple “make” is in topic 9 – “began” is combined with “left” and falling and finding and crying. 15 is slightly stronger than topic 9 amongst the histories, but it is also evident in all of the other genres, while topic 9 is almost excusive to histories. 15, then, might represent implicit negative associations with undirected, uncontrolled sexuality or emotions, while 9, and the social history, represent a means of controlling it. If the topic is more representative of religious emotion, the chaotic motion and physicality of the topic is actually socially harnessed and controlled. In that case, the topic might rather indicate the otherwise negative and dangerous passions and directionless energies that religion contains.


The topics that letters cluster in, excluding the general topic 1, are 5 (politics and war from the social position of the aristocracy), 11 (men and intelligence) and 14 (a topic focused on you in different variations, for example “thee” and “thou,” and on proper names). They also have a notably low presence of 7% in topic 2 (family roles).

Because the corpus of data, with one text removed, was very low for the letters, I am wary of assuming these results are due to qualities of letters in general rather than to specific texts. Topics 5 and 11 particularly seem like they may have been skewed towards their high percentages (23% and 9%, respectively) by particular texts: one of the “letters” is a history of England from the perspective of letters between a nobleman and his son, hence 5, and two are letters between two men, so perhaps mutual compliments of one another leading to 11.

The skewing here, first the extreme skewing from one text and then the possibility that the remaining texts are still skewing the results, may be a product of the fact that a letter is a form as well as a “genre.” As a form, “letters” can be filled with different kinds of content, across the 1760s and certainly over time.

If taken as a form that is the basis for something I decided to call “genre,” letters’ focus on “you” and on proper names (topic 14) in combination with their diminished focus on the family and general low showings in all the social role/relationship may suggest something about a particular perspective inherent in the (mostly second person) letter.

Letters approach topics through the filtered lens of a particular personal relationship, addressing “thou” and using lots of personal names. If the names stand in for individuals other than “thou,” than perhaps this personal relationship on which a letter is based leads to a more personal focus on other individuals immediately present in that filtering relationship. This personal focus doesn’t mean that letters are “emotional” or intimate in the way we might imagine a personal relationship, because the text itself is not about the personal relationship (see: about anything). It simply means that the approach to the topic is through a particular relationship and the particularities of that personal relationship rather than a larger social schema.


The memoirs, like the histories, were pretty evenly distributed, with low percentages across the topics. But they were distinctively strong in two topics: 2 (28%), the “family positions and roles” topic, and 14 (10%), the “you/proper names” topic. This is interesting in that the only other genre with a significant showing in 14 is letters, and in the case of letters, high percentage in 14 is paired with a distinctly low showing in topic 2 and in general across all the social category topics, both family-centric and more general or public. Memoir, in comparison, has a fairly even and high distribution across social category topics, second to history and ahead of the low letter showing and the almost-absent adventure showing. This is an interesting pairing, then: unlike letters, which perhaps privilege the one-to-one personal relationship over familial and social relationships, memoir seems to privilege one-to-one relationships in addition to social and, especially, familial relationships (re: high presence in topic 2). If a memoir is generally expected to focus on the life of an individual, whatever that individual’s life might contain, this is perhaps surprising. But it makes sense that immediate relationships, to “you”s and to the family, take extra precedence, followed by broader social relationships. This might place memoir somewhere in between the “form” that unites letters and the more clearly “genred” histories and adventures. Any life can be recorded in a memoir, but that life, at least in 1760s memoirs, seems to start with one-to-one relationships, expand to familial relationships, and generalize into social relationships.

[1] I also made wordclouds of the 1760s titles in the END database and of all the titles END has catalogued thus far, which span the 18th century. The results [available in this public file] for specifically END 1760s titles and all 1760s titles are approximately the same, which suggests that the END database is a fairly representative sample of all texts! The 18th century results are unsurprisingly different, with, among other things, notably higher rates of romances and tales.

[2] I chose not to double count these texts in an effort not to keep my categories as even as possible with the texts available, so as not to overweigh certain genres in the topic modeling output, but with a better dataset I would have preferred to double count these multi-genred texts.

[3] The program I used for this topic modeling was MALLET, an open source program created at UMASS Amherst, using their automatic settings (stop word list, 1000 iterations, etc.). It runs best with a large number of shorter pieces of texts, so I split all the books I was working with into 500-line documents before feeding them into the program. The data that I got from MALLET and used in this analysis is available [here.]

The Preface Project

[Abstract]   The Preface Project is a multi-modal digital archive exploring the relationships among truth claims, direct address of the reader, and authorial voice in the prefaces of 1760s novels. The archive conceives of prefaces as products of and catalysts for relationality, while weaving together new layers of enmeshment, through curated cataloging and audio, visual, and textual digital reproductions.  Through the proliferation and linking of metadata, through the multimedia presentation of the prefaces, and through the open-access publication of the archived materials, the Preface Project generates new networks, entangling them with the conversations and relations of 1760s prefaces. While currently housed in a publicly-accessible Google Drive Folder, by the spring of 2016, the materials will transform into an Omeka exhibition.

Page viii-ix, preface of Mr. Cleveland.

Page viii-ix, preface of Mr. Cleveland.

There he first shewed me his father’s papers, which gave me so much pleasure and satisfaction, that I was very urgent with him to have them printed, persuaded that they would be, a very acceptable present to the public. The only objection he made to my proposal, was, the confused method in which they were writ, and the difficult task it would be to digest them…[Mr. Cleveland]

This quote is an excerpt from the preface which launched a thousand xml record searches. It comes from one of the first books I cataloged, The History of Mr. Cleveland.  I expected a preface about the “natural born son of Oliver Cromwell” to be salacious and scandalmongering. Instead, it provides a staid and detailed backstory of the book: how the memoirs of Mr. Cleveland came to be edited, printed, and bound as a book for public sale. It elaborates on the staunchly virtuous character of Mr. Cleveland, who according to this preface, is nothing like the Merry King Charles II type of rake that I imagined. One should not, however, according to the preface, rely on imagined fancies. The preface claims authorship and historical existence for Mr. Cleveland. And it makes these truth claims through direct address of the reader: “…The histories of… private persons…serve as an excellent lesson to all who are desirous of avoiding those rocks on which others have split, and of meriting the highest character to which human nature can attain, that of wise men. That the following piece may justly be ranked among the latter, will, I believe, be readily granted by all judicious readers.” The reader! Seeing those words was like being called out of hiding; I was re-positioned from a pair of occulted eyes spying on an unaware text to a participant in a space-time crossing conversation. The preface acknowledges the reader’s role in the book, in the transmission of content; this acknowledgement signifies a mutual constitution of textual and extratextual worlds. This experience inspired a desire to learn more about how 18th century novels conceived of the reader. And so was planted the seed for my project, an archive probing the triangulation and construction of readership, truth claims, and authorial voice in the prefaces of 1760s novels.

First page, preface of  Dialogues of the dead.

First page, preface of Dialogues of the dead.

Lucian among the ancients, and among the moderns Fenelon, Arch-bishop of Cambray, and Monsieur Fontenelle, have written Dialogues of the Dead with applause. But in our language nothing of that kind has been published worthy of notice: for the very ingenious and learned dialogues written by Mr. Hurde are all supposed to have past between living persons. The plan I have followed takes in a much greater compass…[Dialogues of the dead]

The centerpiece of the project was the creation of digital copies of prefaces from 1760s novels, to address the fact that early novel repositories like Google Books, Hathi Trust, and ECCO, do not consistently include paratext in their digitization and OCR processes. I obtained my sample by searching the 520 and 500 fields of END’s catalog records, metadata without which I would not have been able to execute this project. The sample is based on the 1760s novels held in the British and American Fiction Collection of the University of Pennsylvania’s Rare Book and Manuscript Library; on novels draw from that collection and cataloged by END; and on novels whose prefatory invocations of the reader were caught by catalogers

To make the prefaces as accessible as possible, I photographed and transcribed them. This means that questions of emphasis (are certain words printed with capital letters or set off from the rest of the text, for instance) can be resolved by looking at the photographs, while issues of searching (for example, how a researcher is supposed to know which prefaces may be relevant to their line of study) can be resolved by doing a command F on the transcriptions. To the greatest extent possible, a balance has been struck between preservation of the preface-as-book object and the operationalization of the preface for research.

With the assistance of other END team members, I also created audio recordings of the prefatory texts. The audio recordings demand time and a deep listening—a close ‘reading’ of the prefaces, an attention to the prefaces as works of literature and not mere addenda. Hearing the texts read aloud highlights their invocatory nature: the strength of a narrative, authorial voice in the preface, and the importance of that voice’s interaction with the reader.

First page, preface of the Faithful Fugitives.

First page, preface of the Faithful Fugitives.

As curiosity is natural to the mind of man, and as every thing which tends to excite, without satisfying it, must prove, in some degree irksome, I have thought proper to give the reader some account how these memoirs fell into my hands. [Faithful fugitives]

Because utility for other scholars is one of my primary goals for the archive, I have kept the new, digitized representations of the preface I have created–images, audio files, and transcriptions–connected with the catalog metadata on the prefaces’ books. This ensures that future scholars can put the prefaces in their respective books contexts. The archives elevates the prefaces without divorcing them from the novels in which they were published in the eighteenth-century. Researchers can easily view information about the other paratext in the book (footnotes, table of contents), the narrative form of the main text, the people associated with the writing and publication of the book, and so on. Transparent metadata also allows a scholarly conversation to take place around my archive and the future exhibition. Without context for the paratext and clear paths back to my sources, it would not be difficult for me to make wild claims about prefaces. Without this metadata, no one could engage or critique my work, unless they sorted through the entire collection of British and American fiction at Penn by hand.

First page, the preface of the Hermit.

First page, the preface of the Hermit.

The preface. Truth and fiction have, of late, been so promiscuously blended together, in performances of this nature; that, in the present case, it seems absolutely necessary to distinguish the one from the other. If Robinson Crusoe, Moll Flanders, and Colonel Jack, have had their admirers among the lower rank of readers; it is certain, that the morality in masquerade, which may be discovr’d in the travels of Lemuel Gulliver, has been an equal entertainment to the superior class of mankind. Now it may, without the least arrogance, be affirmed, that tho’ this surprising narrative be not so replete with vulgar stories as the former, or so interspersed with a satirical vein, as the last of the above-mentioned treatises; yet it is certainly of more use to the public, than either of them, because every incident, herein related, is real matter of fact. [Hermit]

“Promiscuously blended together” is an apt description for the relationship among fact, fiction, reader identity, authorial voice, and textual authority in the 1760s prefaces I have encountered. Prefaces are places for the provision of the context of a story (how it came to be written, why it is published with a frontispiece) and for the establishment of legitimacy (how a story will improve the reader’s morals, why moral improvement is utterly irrelevant in an entertaining tale, why its use of ‘pagan’ allegories is not heretical). This is somewhat obvious.

More illuminating is a focus on the fundamentally conversational nature of prefaces. They are not the print-version of a single voice piping context and the air of legitimacy into the willing ear of a generic reader. Rather, the prefaces are a melee of voices in negotiation. “Promiscuously blended together” aptly describes the relationships under contestation in the prefaces, relationships among fiction, fact, authorial authority, and the identity of the reader. A preface may envision and address multiple types of readership, may respond to the idea of the preface as an obligatory writing convention, may compare the novel of which it is a part to popular published works, may include quotes from Greek philosophers long dead, may claim to be historical fact and yet also claim that a factual text could be one that has fictional events , as long as those events could have conceivably taken place (even if they hadn’t in the particular manner and circumstances the novel imagined).

There is a palpable transactionality to the prefaces, one that reminds the contemporary cataloger that these novels were not always stored in air-controlled, dark rooms, but had lives embedded in the economic and cultural webs of the 18th century. The novels and prefaces, although set in print, were not static or insulated objects, but vehicles for and responses to human interaction.

Last page, preface of the Hermit.

Last page, preface of the Hermit.

Whate’er we do, or wherefoe’er we’re driv’n–Still, we must own, such is the will of heav’n. [Hermit]

The Preface Project is still a work in progress. Eventually, the photographs, transcriptions, and audio records will be available, along with their corresponding novels’ END catalog records, as part of an Omeka exhibition.  I am only now developing a full plan for the organization of the exhibition, the secondary sources on which it will draw, the data visualizations and analyzes it will contain, and the more granular themes it will address. For now, a folder containing preface transcriptions, audio recording, catalog records, and images is publicly available through END’s website. Documents further detailing my sampling and archiving methods are also be included in the folder.

Works Referenced

Genette, Gerard. Paratexts: Thresholds of Interpretation. Translated by Jane E. Lewin. New York: Cambridge University Press, 1997.

Langford, Larry L. “Retelling Moll’s Story: the Editor’s Preface to ‘Moll Flanders.’” The Journal of Narrative Technique 22, no. 3 (1992): 164-179.

Ratner, Joshua Kopperman. “Introduction” to “American Paratexts: Experimentation and Anxiety in the Early United States.” University of Pennsylvania ScholarlyCommons: Publicly Accessible Penn Dissertations. 2011.

Barchas. Janine. “Expanding the Literary Text: a Textual Studies Approach.” Graphic Design, Print Culture, and the Eighteenth-Century Novel. (Cambridge: Cambridge University Press, 2003), 1-18.

Barthes, Roland. “The Death of the Author.” Image Music Text. Translated by Stephen Heath. (New York: Hill and Wang, 1977), 142-148.

Geographical Locations in Footnotes

Early in the cataloguing process I catalogued “Vaughan’s Voyages” and it inspired me to work with footnotes for my END project because of the abundance of geographical locations in its footnotes. I started thinking about why the author had chose to include footnotes,  if the extreme frequency of geographical locations could tell me anything about the importance of footnotes in assigning genre, and the relationship between the reader and the explanation of referential and fictional locations. Because these geographical locations are in footnotes, I wondered if I could figure out if certain places required more explanation, or if the presence of a location in a footnote meant that the reader was unlikely to be familiar with it. I also wanted to see if I could compare the presence of geographical locations in footnotes with that in the titles of novels END has worked with, and whether the comparison could reveal anything about the number of footnotes with geographical locations or any such correlation. If a title has a geographical location, is it more likely to also have footnotes with geographical locations?

I will now outline my methods in collecting the data for this experimental project. My main source for gathering the data is the END Flickr and the photos that have been added to the Footnotes album. These photos are added as the END cataloguers find and upload photos with footnotes. The first novel in the album I looked at was “Giphantia” and worked my way back through the preceding novels towards the beginning of the album. I had no idea how many footnotes a given title would have or if there would be geographical locations in any of them, but I knew that if “Vaughan’s Voyages” did then there must be other novels where referential and imaginary places required footnote explanations. I created a spreadsheet and organized my data in columns: Title, Place, Flickr link, Tag, Notes, Year, and Franklin link (to the record of the title). I collected data from 23 novels, which is an arbitrary stopping point. Ideally for this project I would have looked through every photo in our footnotes album for all the novels that END has catalogued, and I hope to eventually expand this project. Of these 23 novels, 22 are from Penn and one is from Swarthmore. In the time allotted I looked through 657 photos of our footnotes, and of those 206 had at least one geographical location in a footnote. In these 206 photos I found 510 separate instances of geographical location. I was able to organize and sort my collected data into Excel spreadsheets to be used for analysis and visualization. To visualize my data I used RAW to show all the locations and their frequency values, I used to collect the data needed to make maps on CartoDB, which I used to map the locations I tagged as city and country. All the spreadsheets I used for data analysis and organization can be found in the public Google Drive folder Footnotes.

During the project there were challenges that came up due to the amount of data I collected and its content. I struggled to decide how to sort and organized the data because there were so many unique locations and I needed to figure out how to make the data more easily analyzed. I also needed to figure out what to do with instances of geographical locations that are ancient, imaginary, fictional, and how or if to map them on a modern geographical representation of the world. For the ancient locations in my data I was able to map them as their modern counterparts, but for the majority of the fictional and imaginary places or ones I tagged as “unknown” I was unable to map the data. Some alternative mapping examples: I added the frequency value of “Cordova” to the frequency value for “Cordoba” because it was an older spelling for the Spanish city, I added the values for “The City of Jupiter” and “Diospolis” to the value for “Cairo” because these are ancient names for modern parts of Cairo,  and I left out locations such as “Forrest’s Coffee-house” and “Wenlo” that are fictional. Cleaning my data in such a manner allowed me to better map the cities and countries represented in footnotes and use the maps for analysis. I had a large amount of data but I had to work with the automated mapping program and this left me with only a percentage of my data I could easily work with. The unmappable locations can be found in the spreadsheet titled “fictional and imaginary locations” in the public Google Drive folder Footnotes.

From the RAW visualization of all the frequency values for the locations I was able to clearly see which locations occurred most frequently in my dataset. These locations are Egypt, China, Africa, Spain, Paris, Europe, France, Mediterranean, and Constantinople, and clearly stand out on the bubble visualization. I learned from looking at this visualization which locations are mentioned the most and therefore were most important relevant to the understanding of the corpus of my dataset.

footnotes all values

While this is a broad and mostly clear way of looking at the bulk of my data, delving deeper into the questions the footnotes raise requires a closer look and a different organization of the data. I made a chart for the novels I looked at comparing the number of photos on Flickr with footnotes with the number of those photos with geographical locations in the footnotes. The spreadsheet with this data can be found in the public Google Drive folder Footnotes and is labeled “# of photos vs # with location in footnote”. The image of the bar chart is labeled “photos with footnotes comparison.png” in the folder, and can be seen below:

photos with footnotes comparison

Looking at individual footnotes in a few of these novels allowed me to start to think about whether the number of footnotes with geographical locations is an indicator of type of narrative. In order to answer this question and to think more about the purpose of the footnotes, I took a closer look at “Vaughan’s Voyages”, which has 24 photos with footnotes and 23 of which have locations in the footnotes, and compared it with the kinds of footnotes in “Adventures of Captain Greenland” and “The history of Sir Charles Grandison”, both of which have 20 photos with footnotes, a similar number to that of “Vaughan’s Voyages”. The footnotes in “Vaughan’s Voyages” seek to explain customs and provide referential information in order to enhance and illuminate the text. A quote from “Vaughan’s Voyages” is is as follows: “Tho’ perhaps I have not, in my past days, had any great regard for religion, and might leave it to be decided by chance, as the king of Macafar did*:”(141)  and is enhanced by the following footnote: “*Macafar is a large kingdom on the south part of the Celebes, an island in the Indian Sea. Near three centuries ago, they worshipp’d the sun and moon, as the most worthy objects of their adoration…”. This footnote gives more information about the place the author is writing about. “Vaughan’s Voyages” is a travel novel, so it appears as though there could be a connection between the genre of travel narrative and the presence of explanatory footnotes with geographical locations. Both “Adventures of Captain Greenland” and “The history of Sir Charles Grandison”, following this conjecture, do not fit into the travel narrative, seeing as out of 20 photos with footnotes, only 2 and 5, respectively, have geographical locations in the footnotes. The footnotes for “Adventures of Captain Greenland” seem to be more comical and less a serious addition to the information in the main text. An example of this is this quote: “…seasonable as well as unwholesome, for people, who have good healthful appetites to hold a fast at such a * biting time of the year.” (110), which has the footnote: “*Being about the middle of winter”. The footnote does not add interesting or explanatory information, but is instead entertaining yet unnecessary. Similarly, in “The history of Sir Charles Grandison’, there is no indication in the title that it is a travel narrative, and the kinds of footnotes it employs to explain parts of the text give information and the particulars of the characters and the content. For example, the heading for Letter VII on page 39 includes the following information “Miss Byron. In continuation. [On Sir Charles’s first letter from Bologna, Vol. IV. Letter XL. p. 277.]” which has the footnote: “*Several letters of Miss Byron, Lady G. Lady L. and Miss Jervois, which were written between the date of the preceding letter and the present, are omitted”. What is interesting about this quote is that there is a geographical location in the text itself, but one that does not require an explanatory footnote. According to the content of the footnote, what is happening concerning the letter and the people involved is what is interesting, and not the geographical location in which the letter is taking place.

The last way I visualized and interpreted my data was through mapping visualizations. Using the tags “city” and “country”, among others I assigned to my raw data, I was able to map the frequencies of these two types of locations on a modern map of the world using CartoDB. The spreadsheet with all the instances of geographical locations and the tags I assigned to them (some but not all have explanatory notes in the spreadsheet) can be found in the Footnotes folder under the title “Geographical footnotes raw data”.

For the purpose of comparison, it should be noted that a similar project was done by a former END researcher, Emma Madarasz, who mapped the geographical locations in the titles of 18th century novels (link to her project on our END With Known site). Her data shows that a significant number of places in the United Kingdom are mentioned in titles, something that contrasts with the locations in the footnotes of the novels I have chosen to work with. The titles she looked at also include countries such as Italy, France, and India, though not as many unique countries appear in titles as do in footnotes, as shown by my map visualization below. However, there are a significant amount of titles that include places in America or are connected with America, a place that is only represented three times in my raw data. Another difference between my project and Emma’s that I would consider for future work done with this project is that she included adjectives in her mapping of geographical locations. In the example of “America”, she included in her data the title “Amelia; or, the faithless Briton. An original American novel, founded upon recent facts.”, whereas I only recorded place names, and would have left such a title out of my data. For the future of my project I would consider collecting adjectives that related to geographical locations, and it would be interesting to see the effect of that decisions on the frequency values of places in my dataset.

countries in footnotes

The map visualization for “countries” I created is shown above. Using the locations tagged as “countries” allowed me to narrow the scope of my data and it was easy to map because almost all the countries in the footnotes I looked at were referential. The spreadsheet with the mapping information for the map seen above can be found in the Footnotes folder and is titled “countries mapping data”. The only location I had to alter was Turkey, which was spelled “Turky” in a footnote for “Adventures of Sig. Gaudentio”. The map shows that there are a significant number of footnotes that mention countries in Europe, especially Spain, France, and England, but that there are also a good number clustered around the Middle East and a few in North America, Asia, and Africa. Places in Europe would have been more accessible to people in England, and I would argue that these referential countries would have helped orient the reader to familiar places as well as foreign. The absence of places in South America and Australia shows that these places were either not well known to the authors or readers, or not relevant to the narrative of the novels I looked at for this project. 

cities in footnotes

The mapping graph above shows the location and frequency of the locations I tagged as “city” in my raw data. Mapping these locations was more of a challenge because many were ancient cities or cities that no longer have the name used in the 18th century. An interesting example of this is that one of the cities mentioned in a footnote for “Vaughan’s Voyages” is Leghorn, which I found out was the English pronunciation for the Italian port city Livorno, and therefore I mapped Leghorn using the coordinates for Livorno, Italy. I had to do a lot more data cleaning for this visualization, and all my notes for this can be found in the Footnotes folder in the spreadsheet labeled “city mapping data”. From this map I can conclude that just like the “country” visualization, the nodes are mostly clustered in Europe and also around the Middle East. Therefore it was important for the authors to included clarifying notes about the cities in these regions, and that although many of these places could have been known or accessible to readers, it was still necessary to explain or embellish in a footnote. Maybe this gave readers additional information about places already within their grasp of knowledge or it simply added to the scope of their geographical understanding. It is also interesting that there is only one city mentioned in footnotes that is in the Americas, and that is Santo Domingo.

The thing that pops out from these maps is the centralization of this mappable data around Europe. One thought that comes to mind is that the types of locations I assumed would need explanation would be more faraway and exotic locations. These locations do exist in my raw data, but present the problem of not being easily mapped, and they do not fall into the more modern and restrictive categories of “city” and “country”. These two mostly Eurocentric maps put a spotlight on Europe and the nearby places in the world that would have be more accessible through travel, and moreover ones that are more conveniently referential.

This experimental project on geographical footnotes has been beneficial to me not only in the collection, cleaning, and piecing together of the data, but it has taught me the value of looking back on my methods in retrospect and realizing what things can be done to make this project better and more comprehensive in the future. This ranges from the simple task of making sure to record the number of footnotes of each novel and recording the novels I look at that do not have footnotes, to the larger conceptual choices of what novels to specifically look at and making educated decisions based on the topic of the project and the available information about the novels. The future of this project would no doubt include even more meticulous data collection and recording of notes. For example, in the future when an instance of a geographical location in a footnote is recorded, all the information about that place including tag, explanatory notes for the tag both from the footnote itself and the recorder, latitude, and longitude will be inputted into the raw data spreadsheet. I went back and did many of these things after all the instances had been recorded, and it would be infinitely beneficial to already have this detailed information for each instance at the end of data collection. I would also recommend that this project delve deeper into the content of the footnotes and the novel itself, and I am sure this would help me answer a lot of the research questions that this project has posed and that I have grappled with in attempting to determine the purpose of footnotes in these 18th century novels. Ideally this project for the future will include every instance of geographical location in every footnote of every novel END catalogues, and the possibilities for the interpretation and organization of this information are endless.

Publishers’ Network

I. Beginnings

For my project, I chose to look at the connections between publishers in the eighteenth century. The reason was simple: I was caught early on by the fact that so many of the same names showed up in the publication info fields of each book. That is, you might see “James Dodsley” across as many as twenty different works, and he usually worked with the same cadre of individuals. This lead me to suspect that these publishers – and indeed it should be no shock – were friends, comrades, spent time together, etc. Or perhaps they were not social with each other in that way, but they each were part of an elaborate web of working relationships. I wanted to see who worked with who, and how large these webs were: how many people they each encompassed. As it turns out, when you include all the data we’ve got, some of these webs of connected publishers are very, very, very large. So large as to be almost impossible to analyze. But I will try.

I also chose this project because I wanted a recourse to a skill I had already developed, but hadn’t used for some time: coding, specifically in Python and C++. I had taken C++ a year ago at Swarthmore, and began with the semi-ludicrous idea that I could code this all in C++ and put in online in C++ (I didn’t even know that you need to use JavaScript to put things online)… [Read more]

The Dublin Print Project

Note: This is a summary of the trajectory of my summer project. To jump straight to my results, please visit

It is not uncommon to find a title published both by London and Dublin printers in the eighteenth century. Copyright laws in the 1700s did not restrict the republishing of books originally printed in London in Ireland, and it was not uncommon to find the more popular titles in London being reprinted and sold in Ireland. The “Golden Age” of printing in Ireland in the eighteenth century, had a particularly large impact on the print industry in Dublin. Books were often printed faster and cheaper in Ireland, having a much larger turnover than the London print industry. Some historians believe that there were pirated versions of London books in Dublin, but because of the complexity of the copyright laws in the eighteenth century, it is very difficult to say what is and what is not a pirated copy of a book.

Copyright law was eventually revised towards the end of the eighteenth century, and therefore, the boom of the Dublin publishing trade is incredibly specific to books published within the 1700s. This means that at least part of the data that is collected on the Early Novels Database is relevant to researching the Dublin print trade.

This is where my research project started. When I started brainstorming project ideas, we had already started cataloging books printed in the 1760s. With the knowledge that the data cataloged in the Summer 2015 session was going to be especially useful, I started thinking what I could do with data gathered during cataloguing from Dublin reprints.

There were several roads I could go down– the first, and the one I jumped at initially, was gathering similarities between the Dublin publishings of London books. Given that only books with commercial and critical acclaim were picked up to be printed in Ireland, I was hoping to use the data that I found to analyse what qualifies as a popular book in the 18th century. However, I was not exactly sure what sort of data I wanted to start with, and how I would go about gathering such data. I had a vague idea that going through the MARC XML catalog records was going to best way of gathering information, but to manually go through each record seemed like a daunting task, At this point, Ian Hoffman, my summer coworker and one of my soon-to-be lunch buddies, had not yet perfected his Python code that he anticipated using to automatically go through the records and pick out specific data fields.

This was the point where I decided to refine the project, upon the suggestion of Alice McGrath, one of the project managers. I attempted to narrow down my data set even further, looking only at Dublin print books with advertisements. From this I was hoping to be able to draw a network of books, linked solely through their advertisements.

The problem I quickly hit was that there were only five Dublin-published books that featured advertisements, and it was near impossible to find any PDF scans of the books that were mentioned in the advertisements that actually were printed in Dublin books– or at least, it was impossible to find the Dublin versions of these books. London PDFs were almost always readily available on HathiTrust and Internet Archive, and this seemed to be the crux of the problem. My project, which was centred around the Dublin print industry, was incredibly difficult to research, because archival data of eighteenth-century novels was largely based off the London print editions, and not the Dublin reprints.

This was something I saw as a problem in and of itself. By and large, the text of the London “originals” was preserved, and there were no major editorial changes between the London and Dublin versions, as I had hoped to find. The Dublin versions were intended to be replicas of the London versions, not revised versions of already successful books. However, cataloging books through END had already shown me that data beyond the text of a book was valuable information worth recording. Even if the body text was untouched, the nature of the Dublin print scene was too different from that of the one in London for there not to be small differences between the books published in each country. There was metadata to be found in Dublin reprints, and I was going to visualise it as best as I could.

 The Dublin Print Project is a blog I created towards the end of my research period, where I combine my love for photography with my research in eighteenth-century novels that are published both in London and in Dublin. Although this blog has turned out to be rather small, this is where I have carefully documented differences and similarities between the London and Dublin novels we have at the University of Pennsylvania, where all my research was based. So far I have only catalogued three titles. If I had been able to reach this project sooner, I would have probably catalogued more, comparing the Dublin novels we have here at Penn to PDF versions of their London editions if I had to. But even these three titles have shown me what I suspected to be true of Dublin reprints– that Dublin publishings of books are NOT the same as their London counterparts, and, because of this, carry invaluable metadata in and of themselves that brings light to a different sect of print culture in the eighteenth century.

My research project ties in with what Ian was researching about publishers in the eighteenth century. Where Ian was trying to visualise the connections between the printers in our END data, I was visualising the books that they produced. What ended up happening was that Ian discovered that London printers had no ties whatsoever with Dublin printers, whereas I discovered that Dublin publications attempted to replicate their London counterparts as much as possible. Even without any interactions between the printers, Dublin and London books still managed to be replicated in the eighteenth century.

Going forward, I hope to expand the project to cover more books, and hopefully I’ll be able to work more with END in the future. The goal would be to have a complete comparison between all the Dublin and London pairings at Penn, and devise a way to make the information searchable and categorised. Wouldn’t it be neat if you could pull up all the photos taken of advertisements in Dublin reprinted books? I think so.

Text Analysis with R for Beginners – Summer 2015 Open Lab

Friday afternoons, 2-4:45, beginning June 5. Van Pelt Library, University of Pennsylvania, Vitale II (room 623), sixth floor.

Weekly summer open lab focusing on computational text analysis using the free and accessible R Studio to reveal patterns in texts.

Join us in person Friday afternoons on the sixth floor of Penn’s Van Pelt Library as we work through Matt Jockers’s accessible textbook *Text Analysis with R for Students of Literature*, or follow along at home and post questions and comments on our workshop wiki space.

No previous experience necessary, no homework, and no need to come every week. To express interest email A collaboration between the Penn Libraries W0rdLab, the Swarthmore Libraries, and the Early Novels Database.

Anticipated meetings: June 5, 12, 19, 26, July 10, 17, 31, August 7

Exhibit A: The Novel Index


in indexes in fiction, see
History of Peggy Black and Wilmot Bond, the  (1784)
History of Sir Charles Grandison, the  (1754)
Letters Writ by a Turkish Spy (1730, 1734)
Memoirs of a Certain Island 1726
Salmagundi 1814

Search the pages of the above novels and you will find that they contain an object not traditionally associated with fiction: an index. Though the presence of an index now signals a metafictional self-consciousness traditionally associated with postmodern novels, a la Pynchon and David Foster Wallace, the presence of an index.

In History of Peggy Black and Wilmot Bond includes an  in the middle of volume two with an alphabetical listing various philosophers on a variety of topics, from architecture to pleasure to witchcraft (p. 220-226, v.2), as well as a full page list of foods a shepherd might eat.  Eliza Haywood’s Memoirs of a certain Island offers a pointed list of characters in the novel, the pages they are introduced on, and their “real” or historical counterpart.

Though frequently occurring in the backmatter of texts, these indexes were no afterthought but were part of the the way authors branded their forays into this genre called the “novel”. Salmagundi advertises itself as “A new and improved edition, with tables of contents and a copious index” and  the title page of Richardson’s Sir Charles Grandison includes the note: “To which is added, an historical and characteristical index.” Clearly the index is a selling point.

Why might this be? Indexes linked these works of fiction with other genres of scientific and taxonomic work. A glance at the publication at the history of reference texts –

Ephram Chambers Cyclopaedia (1728)
Diderots’ Encyclopedie (1751-65)
Samuel Johnson’s Dictionary (1754)
Encyclopaedia Britannica (1768-1771)
Carolus Linneaus’ System Naturae (1758)
Charles Messier’s Catalogue (1774)

illustrates the idea that idea of taxonomy and reference texts was circulating in the mid 18th century (Barachas).  Janine Barachas notes that this list-making mentality agrees with some of the tenets in the rise of the “realist” novel as sketched by paradigmatic novel theorists like Ian Watt. By imitating modes of historical and scientific reference, the presence of a copious index, as in the case of Sir Charles Grandision, “signals the literary gravitas” ( Graphic Design, 174) of the text. Where Barchas leaves off is perhaps at the most interesting aspect of the index: the difference between and index and an ordinary list in a text.

The fact that a reader, having come to an index, is instructed to read the text in an entirely different way – one that presupposes discreet categories and sections for its content. While it might have signaled prestige, the fact that it imagines a kind of discontinuous reading.  This kind of index creates knowledge that is, in Robin Valenza’s words, more like “scientific knowledge.” A more apt term might be “portable knowledge.” Indexes, by design, are designed not to be read, but rather, referenced.

The index possesses a strange economy: in an almost Derridean fashion, it paradoxically aids in the navigation of the excess material of a multi-volume novel by adding supplementary material to the text in order to abridge it and, in so doing, justify the size of the original text that necessitates the abridgment. By virtue of its subject headings, the index provides a sort of oblique reading instructions and a handy compendium the kind of “useful” reading that contemporary critics like Samuel Johnson praised.

Looking closer at this compendium-like paratext reveals a fraught nature of compilation colored by ideological and political context. Letters Writ by a Turkish Spy contains two forms of index: a detailed and descriptive table of contents and “An index, interpreting some Turkish and Arabick words, which may seem obscure and unitelligible, either in these letters, or in their titles.” The headers says it all. This short index is affixed in the front matter of the first volume. The typographic difference between the stylized, neo-Old English font for the Arabic words and the modern typeface for their definition gives a clear example of how the index itself works to obscure and make unintelligible the foreign language.

What is indexed, in addition to how it is indexed, is also key. Haywood limits the index of the Female Specatator to character names, while Richardson, though writing a “historical and characteristical index,” extend his to situations and types and he includes moral sentiments and general themes, such as “Beauty” or “Artful Men”, in a separately-published Collection of moral sentiments to his three novels – Pamela, Clarissa, and The History of Sir Charles Grandison.

These are merely a selection of the 28 novels thus far catalogued that include indexes. For the 21st century novel scholar, these pieces of paratext offer another way of imagining how understanding 18th century texts documented themselves and the world of the “real” in a way documented and described and also referenced and pointed and how they might have been consumed.

On Marginalia

Image 1: Monimia

Image 1: Monimia

END currently deals with marginalia by placing it in its own 595 category.  The field contains subfields for medium (ink, pencil, etc.), content, and extra notes.  All in all, the field is extremely open, a necessity considering the range of information it is meant to contain.  Marginalia by its very nature varies widely and pushes against any sort of uniformity.  Leaving the field open appears to be an attempt to place all marginalia on an even playing field, ranking no type as inherently better.  Everything from a detailed drawing [image 1] to a stray line or an “x” gets catalogued in the same way with only descriptive notes to truly distinguish them, making every marking seem to weigh about the same.  The END field was designed to behave thus.

In practice however, marginalia cataloguing can be sporadic.  As cataloguers we often tend to spend more time laboring over handwritten text (legible or otherwise), trying to accurately transcribe its meaning than looking at which sections have been underlined or marked in some other way and trying to decipher a meaning from that.  Depending on the amount of detail in them, doodles can get more than a cursory description, but written words are still regarded as the most sought after marginalia.[1]  There is an understood, though debated, hierarchy of these accompaniments to the book that centers around our accepted and internalized definition of what marginalia actually is.  This definition is text-focused not book-focused.  According to the OED, marginalia are “[notes], commentary, and similar material written or printed in the margin of a book or manuscript.  Also (in extended use): notes, comments, etc., which are incidental or additional to the main topic.”[2]  There is an underlying implication that marginalia should be related to the text/content of the book instead of the book-object, that although they do not have to apply, the notes and commentary should be relevant in some way.  Not every coherent marking in a book fits into this definition, and consequently library marginalia gets left behind.

Image 2

Image 2

Image 3

Image 3

Library marginalia generally consists of sequences of numbers and letters written in pencil in the corners and on the edges of title pages and end paper [image 2].  These sequences are book as an object fits into the grand organizational structure of the library, where it fits in with the classification system of the institution (which in the case of Penn Rare Books and Manuscripts library can be Library of Congress, Dewey Decimal, or an institutionally specific system called Culture Classification).  Occasionally they include stamps that press words (such as the name of a library) into the pages or that punch holes in the paper to the same effect [image 3].  By and large, END cataloguers choose not to include library markings in their 595 fields.  If library marginalia is included in a record at all, it is usually captured as a 500 note, a catch-all field for information with no clearly defined home subject to the whims and judgment of individual cataloguers.  Although not really marginalia by any definition, University of Pennsylvania bookplates [image 4] also tend to be omitted from records as an extension of the logic that finds the cataloguing of library marginalia unnecessary and redundant.  The library marking and bookplates that do tend to be captures are those belonging to institutions other than the University of Pennsylvania, aka things that prove that the book had a life and a position in a collection before finding its way to UPenn.  Marginalia that appears to be more recent and consistent with Penn’s system of classifying and marking up books is usually glossed over or ignored, the logic being that that information is captured elsewhere in the record.

Image 4

Image 4

I would argue that although the information exists elsewhere in the record and functions differently than what we typically classify as marginalia, the fact of it having been written in the book is still important.  It does not do to be only partly concerned with the object-hood of books.  I also think that upon further reflection this institutional marginalia does not behave so differently from personal marginalia.  Marginalia creates evidence of a book’s past life and of its interactions with people.  It serves to both make the book a unique object, something that exists separate from the text printed on the page and is unique despite being a mass-produced commodity, and connect it to the world at large through references to or depictions of it.  Library marginalia does this same inward/outward dance.  It gives the book a singular identity through letters and numbers, differentiating it from all other copies that may be floating around in the world, yet it very clearly situates the book in a larger system.  The library system may not be able to compare with the entirety of the world, but it is still something significantly larger than the book itself tethered to the book through writing.  Library markings prove that a book has a specifically defined place for itself, a niche carved out in a shelf waiting for its return, and that it fits into a larger body of knowledge.

Basically, all I am saying is that library marginalia is a part of the book as an object.  It is part of its physical being and its history.  By choosing to exclude it we choose to ignore arguably one of the most important factor in its continued existence and the means by which we are allowed to access the materials.  It may not be useful or time effective to capture it in the long run, but we should at the very least reflect on why we are doing so and what that means instead of dismissing it out of hand.

[1] Considering how the majority of the cataloguers are English literature majors this should probably not come as a surprise

[2] The oldest example for the use of the word “marginalia” included in the OED is from 1819, placing the introduction of the word into (relatively) common usage and the accompanying concept somewhere squarely in the middle of our time range (1660-1830).  I am not saying that the concept of marginalia is unique to novels and related to their raise necessarily, only that both seem to gain traction and legitimacy around the same time.

Indexing the Index: Part I

In that Empire, the Art of Cartography attained such Perfection that […] the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. (Jorge Luis Borges, “On Exactitude in Science”)

Borges’ allegory of the perfect-yet-useless map frames the theoretical issues at stake in the Early Novels Database cataloging schema. Our extensive bibliographic records serve as maps to the physical, full-text novel. The process of mapping paratext requires that the cataloger make decisions on what is too much or too little information to capture. With each new piece of paratext, this process repeats each level of increasing detail and specificity leading to more and more complex notes fields and, eventually, a question Borges would have approved of: What happens when the index needs an index? For me, this dilemma surfaced while cataloguing footnotes and indices in two late 18th century novels and, in so doing, grappling with the genre of the index.

The novel Memoirs of the Year 2500 presents a classic perfect-map dilemma. The 1795 science fiction novel relies heavily on footnotes as it chronicles the state of politics and society in France during the second millennium. Following protocol, I catalogued every single footnote as I read – which meant recording the page number and photographing the footnote(s) for all but 50 or so pages of the 358-page novel. Why? The footnotes themselves were intriguing. Unlike notes in an epistolary novel, these paratextual additions behaved more like extensions of the text itself, often going on for several paragraphs and bleeding onto the next page (or pages). Certainly this data has use: we can now easily compare and classify the different types of footnotes in the novel, as Alli has done. But once you take into consideration the five hours of intensive documentation it required, such record seems far less useful. Like the chapter headings for each of Memoirs’ 44 chapters, this information might be better left out of the record.

Like Memoirs, Samuel Richardson’s The History of Sir Charles Grandison includes paratextual objects whose contents themselves index the novel. The novel’s 47 footnotes and two indices refer the reader to other sections in the text, delineating the novel into thematic sections much like the encyclopedic table of contents for Memoirs. As in Memoirs, the sheer quantity of paratext is daunting. Richardson’s footnotes–intended to aid the reader in quickly navigating a text –now pose a real-time burden on cataloger time and energies. Viewing the records of Grandison and Memoirs side-by-side, the most striking feature of either record is their abundance of paratextual indices.

Excerpt from the record of "Memoirs of the Year 2500"

Excerpt from the record of “Memoirs of the Year 2500”

Grandison record

Excerpt from the record of “The History of Sir Charles Grandison”

Despite their similarities, the two books possess fundamentally different formats–one a short, two-volume novel, the other, a massive, seven-volume novel – and make use of footnoting practices in vastly different ways, a fact which becomes hidden beneath the mounting annotations in our records. The lengthier novel, as one might imagine, uses the reading directions of the footnote to simplify the massive amount of text. Richardson’s footnotes are pithy–no more than a sentence–and tend to systematically cross-reference letters in other volumes, keeping the reader oriented while amidst hundreds of pages of text. The footnotes of Memoirs, by contrast, elaborate what was a smaller main text, creating the appearance of a more complex narrative. A single footnote the latter novel might continue over two or three pages. As a device, the footnote indexes each novel in two distinct ways: in Memoirs it builds the text, while in Grandison, it organizes it. Yet, in order to differentiate between the two types of footnotes in the record, the cataloguer must transcribe or label these footnotes in a notes field. This problem becomes more acute when considering indices and tables of contents, whose contents include lists. More detailed clarification in cases such as this leads to exponentially more extensive–and thus, more difficult to navigate–records.

Beyond the practical problem of cluttered, unattractive records lies a theoretical one. Concern about information management in a large-scale databases has conventionally focused on level of the database as a whole. The process of cataloging “cataloging objects” –such as an index or series of footnotes– reframes this problem into a question of form directed at the level of individual record entries. If END’s data do, indeed, need further stratification, this, in turn, leads to more questions: Once we see an object as requiring an index, what status and autonomy are we bestowing on it? What level of reification does the index impose on the object being indexed?

These problems at the level of the database record – END’s “index” of a novel – are similar to the genre problems of 18th century novels themselves: issues of readability, completeness, referentiality vs representation, and audience; while still unresolved, they force us to reconsider the genre of the index, in both its 18th century and 21st century incarnations. The problem of bibliographic indexing will not be solved overnight. To begin, we might first rethink END as aspiring to be less like a verisimilitudinous Map, and instead, more like Borges’ other labyrinthine construction: The Library of Babel’s mythical index of indices.

[Continued in Part II]

Page 1 of 212