[Default header. If you see this text, the jQuery code to dynamically load the header isn't working.]

 

About this project

This is a digitization of Henry Sweet's A Student's Dictionary of Anglo-Saxon, published in 1896. [1] More specifically, it's a rescanning and conversion of the PDF version into HTML. In addition to simply converting the text to HTML, I've implemented a couple of features to help with the goals of the project, as explained below.

The Preface, Arrangement and Contractions, and Variations of Spelling pages are all from the original dictionary. I did not include the Inflections section, which is a brief summary of early West-Saxon grammar. I reckon that that information is easily found elsewhere, such as in Sweet's First Steps in Anglo-Saxon.

Tips and tricks

The dictionary entries are in the Entries tab. You can find entries in the dictionary by using the browser's Find box (Ctrl+F or Cmd+F). I added a few features to make terms more searchable.

Goals

In some ways, the Sweet dictionary is better than the classic Bosworth-Toller (BT). (Sweet has some thoughts about the BT dictionary in his preface, which are mostly interesting for historical interest and for the amusement value of lexicographical infighting.) And at the moment, the Wiktionary entries for Old English, which are great, are incomplete, though of course that will change in the fullness of time. In any event, as a PDF, the Sweet dictionary has some limitations. My goals in this conversion are:

Disclaimers

[top]

Bookmarks

We can link directly to individal entries in BT and Wiktionary, which is very useful. I've emulated this by adding a linked symbol () to the end of each entry. When you click that link, it copies the URL of that entry to the clipboard, and you can then paste it, bookmark it, send it to someone, whatever.

To see this in action, go to an entry and click the symbols. If it's working, you'll see a message that the link has been copied, and you'll be able to paste the link somewhere.

This approach relies on what's supposed to be cross-browser JavaScript, but we'll see.

Note: When you use one of the bookmark URLs (e.g. you paste it into the browser's URL box), you might need to scroll up a few lines to see the target entry. I'm low-key working on this issue.

Deriving unique IDs for each entry

In order to implement this bookmarking system, I had to derive full headwords for each entry to act as bookmark IDs. In particular, for the compounds, I extracted the root from the main entry and then concatenated it to the second part of the compound. For example, Sweet has the entry ā·fand|ian, which in this conversion appears as the following:

āfandian (ā·fand|ian)

Following this he has the compounds ~igendlic and ~ung.

To create the bookmark IDs, I took the root (āfand) and added the compounds, so that the IDs for these three entries are:

āfandian
āfandigendlic
āfandung

After I created these IDs, I checked for duplicates and de-duped any by adding integers (1, 2, etc.) to the ends of the IDs. Note that the IDs contain OE characters (I did not strip or convert diacritics).

This seems to work for most cases, but there are almost certainly some edge cases where this doesn't work, and there are probably a few duplicated IDs. These are scenarios that will have to be cleaned up later.

[top]

Searching for terms

The premise of this conversion effort is that we can load up the entries as a web page and then use the browser's Find function (Ctrl+F or Cmd+F) to search for specific terms. This is a simple search mechanism and does not support nice-to-have features like near searches (basically, "did you mean ...?"), etc. — the sorts of things that could be built into a database-based search.

Browser tests

I've done some experimentation in different browsers using Find to search for entries. All of my testing has been done on a Windows 11 computer.

The testing suggests that although some browsers simplify some searches, users might need to enter the literal OE characters that are in the word that they want to find (though probably without diacritics). Given the audience for this work, maybe it's not a big burden for people to enter OE characters for their search, in the same way that they already enter those characters in other contexts? I'm interested in feedback on this question.

[top]

"Hidden" search terms

A major goal for this conversion effort is to make it easy to find words in either OE or Modern English. As noted earlier, Sweet's space-saving conventions work against this. For example, suppose you want to look up the term ēarclǣnsend. Sweet has this in the dictionary as ēar|e followed by the entry ~clǣnsend. This is two separate entries, so it's not directly searchable.

I'm doing this by adding an "invisible" version of the headword at the end of each entry. (The word is "invisible" because it's the same color as the background). However, the invisible text is searchable, and in my experimentation, the search highlighting makes the term appear.

For compounds, the invisible search term is the full term/headword — the root plus the compound. (I'm reusing the unique IDs I created for the bookmarks described earlier.)

You can see the effect by loading the page and searching for earclaensend, which finds the entry ~clǣnsend under ēare. A kind of bonus is that as you enter each letter, it shows all matches, so you can review the matching entries letter by letter.

[top]

"Starts with" and "ends with" searches

I enclosed the invisible search terms in ^ and $ delimiters. This makes it possible to do a simple "starts with" and "end with" search by entering something like ^eoh to find headwords (only) that start with eoh or ^eoh$ to find only the eoh entries.

To be clear, this is not extending the browser's search facility in some way. And it's not a general mechanism; it only helps find headwords (including compounds), because it's based on the "invisible" words I added to the end of entries.

I chose the characters ^ and $ for two reasons. The first reason is that these two characters are not otherwise used in the dictionary. The second reason is that those are the characacters in regular expressions for the semantics of "beginning of line" and "end of line", respectively. However, to reiterate, the mechanism that I devised here for "starts with" and "ends with" is not true regular expression search; it's a wee hack, not a real regex search.

[top]

ge prefixes in search terms

As Sweet explains in the "Brevity" section of the Preface, he ignores the ge prefix for purposes of alphabetical order. In the original dictionary, such ge prefixes are set in a lighter font (not bold). Some examples:

gehabban
gedafen
ge~fullian (geincfullian)

I've retained this convention.

When I created the searchable headwords, if Sweet had a lighter-font ge prefix, then the prefix is part of the searchable headword. For example, Sweet has an entry geærnan. If you are looking for this term, you could enter any of the following searches to find it:

The important point is that the "starts with" search will be looking for the ge prefix. So when you're looking for a term, try adding a ge prefix.

[top]

Alternate search terms for compounds

In some cases, the headword for a compound has alternate forms. For example, the headword īsen has the variants īsern and īren. In cases like these, by default, the subsequent compounds are searachable under the first headword that's listed. In this case, that would be something like ^isenheard$ for the entry ~heard under īsen.

As an experiment for searchability, for a few compounds, I've tried adding variant search terms for each headword. For example, you can search for ^isenheard$, ^isernheard$, or ^irenheard$ to find the term īsen-heard.

For now I've only done this for a few terms, but if it seems useful and doesn't otherwise interfere with search, I'll eventually implement this for all compounds that have alternate headwords. (FWIW, I worry that this could suggest that these compounds are actually attested, which isn't clear from the way Sweet has listed alternate forms and compounds. Perhaps that isn't a showstopper, though.)

[top]

Notes on the entries

A full-on conversion could do a lot with the information that Sweet has. The information could be broken down into relevant lexical elements, stored in a database, and fronted with an app that could search this data and then pull up all related entries. (I think this is what BT does?)

That would be great, but I'm not the person to do that. Instead, the more modest goal here is to convert the PDF into HTML. I have taken a couple of small steps to help achieve the goals. Here are some notes about the current state of the conversion:

I realize that Sweet's space-saving conventions aren't necessary in an online version. But it would add add quite a bit of time to reverse these, so to speak. Plus it introduces other possible issues, as noted in the To-do list. Hopefully, the light semantic marking of the components of the entries (using CSS styles) will help toward that goal in the future.

[top]

To-do list

Expand contractions/abbreviations

Either expand Sweet's contractions (e.g. lLt = "late Latin") into full terms or at least link them to the abbreviations page.

[top]

Substitute headwords (roots) for the ~ character

Sweet uses the ~ character in examples and in subentries to indicate "put headword here". It would be useful to just go ahead and substitute the headword, since we're not concerned aboout space.

However, replacing ~ with the headword/root has to be done carefully; it's not entirely a mechanical substitution, especially when there's a ge prefix involved.

There's also an issue in that substituting headwords for ~ in subentries can put things out of alphabetical order (if that matters). Here's an example, using one of Sweet's entries (simplified):

dæg|rīm number of days.

~rima . dawn.

~sang daily service.

[…]

~wōma dawn.

dǣge bread-maker [dāg].

dægþerlic daily: on þisum ~an daege on this very day [dæg].

If we blindly substitute dæg for ~ in ~wōma, we end up with a headword dægwōma, which is correct (there is such a word), but it's out of alphabetical order with respect to dǣge and dægþerlic, the terms that follow.

[top]

Expand other shortenings

Sweet often uses hyphens to indicate alternative forms, like the -yn addition here:

cynren, -yn n. kind, species.

I'm pretty sure that we can expand these truncated alternatives, which would make the alternative forms more searchable. For example, in this case, the entry might look like this, with the stem repeated:

cynren, cynryn kind, species.

Another shortening that Sweet uses is to simply list alternative stem vowels (more rarely, consonants) after the main entry, like this:

cięrm, ea, eo shout, clamour, cry.

It would be useful also, I think, to expand these into full words that include the alternative letters.

There are many hyphen-based abbreviations and other shortenings like this, given Sweet's interst in saving space, so this would not be a mechanical substitution.

[top]

Better layout

The entries page is long — about 20,000 separate entries. I believe also that the markup + code that I'm using to implement the search features slows the browser down when the page is loading. (I've found it hard to use the page on my phone, for example.) I need to do something about this, but I also don't necessarily want to split the entries into separate pages. Suggestions welcome. :)

The current layout uses jQuery to dynamically load the header information (at the top of the page). This is only possible for HTML that's served by a server; for security reasons, it's not possible to load a page from a user's local disk.

I mention this because it would be neat to be able to just send someone a zipped folder of HTML files so that they could have a local copy of all this information. If people want to be able to do this, we can cobble together versions of the files that e.g. repeat the header information in each page.

[top]

Correct some of Sweet's typo-type issues?

I'm reasonably sure that there are tiny mistakes in the text — terms that should be marked as metadata (italized) that are not; choices that Sweet made about when to create a separate headword vs. an inline alterative; etc. In all cases, if we can be certain that this is the issue, and if fixing it would make the conversion more useful (better search, easier comprehension), we should make appropriate changes.

PS I haven't found any outright typos yet, at least, not in the modern English definitions. :)

[top]

Other improvements to be named later

:)

[top]

Who did this

This work was started by me, Mike Pope, and it currently lives on my personal website. At the time I started this project (early 2024) I was a second-year student of Old English. I learned about Sweet's dictionary after I'd already been using Bosworth-Toller and of course Wiktionary. I found Sweet's entries very useful, but found the PDF limited for searching for OE words. So I embarked on this project. Whether the end result will actually be useful to anyone is entirely unknown, but in the meantime I will have had an interesting (though at times tedious) project to work on.

You can contact me with questions, comments, and corrections via email at mike (dot) pope at the Gmail place. If you'd like to help, I'd also be happy to share the work.

[top]

License

This HTML conversion is licensed under a "Attribution-NonCommercial-ShareAlike 4.0 International" license. This means:

Be nice :)

Here's the formal statement and link to the license details:

HTML conversion of Sweet's "A Student's Dictionary of Anglo-Saxon" by Mike Pope is licensed under Attribution-NonCommercial-ShareAlike 4.0 International

[top]

Other credits

Friend Scott Butler wielded Adobe Acrobat Pro to produce a Microsoft Word version (.docx file) of the original Sweet PDF file. I'm grateful for this help, which helped get me started on the lengthy converion process.

And Friend Michael Broschat then provided tremendous help by getting his own copy of Sweet's dictionary and rescanning it, page by page, which provided an even better base to work from the than the PDF-to-Word conversion. Michael spent a bunch of time experimenting with ways to try to capture the various unusual characters in the text, like æ̂, ė, and ȳ. ("I have never seen such a typographically difficult text," he observed.)

Scott's effort and especially Michael's in-depth effort speeded the conversion process considerably, by at least 50%. (I was originally scraping each PDF page, pasting it into Word, and then doing a ton of manual fixup, which took me well over an hour per page. But once I started to work with Scott and Michael's Microsoft Word files, the process went lots faster.)

That said, there are errors in this text — places where my converted text doesn't match Sweet's original — and those errors are entirely on me. (If you notice any, please do contact me; I'm happy to make fixes.)

[top]

[1] As Dr. Hana Videen explains in a blog post, for a variety of reasons the scholarly community has moved away from the term Anglo-Saxon for the language and now favors Old English.