About this project
This is a digitization of Henry Sweet's A Student's Dictionary of Anglo-Saxon, published in 1896. [1] More specifically, it's a rescanning and conversion of the PDF version into HTML. In addition to simply converting the text to HTML, I've implemented a couple of features to help with the goals of the project, as explained below.
The Preface, Arrangement and Contractions, Variations of Spelling, and Inflections pages reflect content from the original dictionary.
- Tips and tricks
- Goals
- Bookmarks
- Searching for terms
- Browser tests
- "Hidden" search terms
- "Starts with" and "ends with" searches
- ge prefixes in search terms
- Alternate search terms for compounds and derived terms
- Alternate search terms that Sweet doesn't include
- Notes on the entries
- To-do list
- Who did this
- License
- Other credits
Tips and tricks
The dictionary entries are in the Entries tab. You can find entries in the dictionary by using the browser's Find box (Ctrl+F
or Cmd+F
). I added a few features to make terms more searchable.
-
OE characters. Most browsers let you enter
ae
to find æ. To find æg, you can enteraeg
.However, in my testing, to search for a thorn (þ), you must enter
þ
in the Find box. Diacritics. In my experimentation, the Find function in most browsers ignores diacritics. For example, if you search for
ablican
, it finds āblīcan. Similarly, a search foraeg
finds ǣg and æ̂g.-
Closed-up headwords. To help make entries searchable, there are closed-up versions of headwords. For example, ā·etan also appears as āetan, and dēor-wierþ|e also appears as dēorwierþe.
Searchable compounds and derived terms. The page includes complete, closed-up versions of any compounds/derived terms that Sweet has as standalone entries. For example, you can search for
abbodisse
even though Sweet has this as a separate headword ~isse under abbod. Note that this does not (for now) work for compounds/derived terms that are defined within an entry."Starts with" and "ends with" search. When you're searching for headwords, you can add
^
in the browser's Find box to mean "at start of word" and$
to mean "at end of word". For example, if you search foreoh
, you get hundreds of hits. But you can search for^eoh
to find entries that start with eoh, and you can search for^eoh$
to find just the words eoh and ēoh.This is not a general search mechanism — it doesn't work for arbitrary words anywhere in the text. It only works for headwords (including compounds/derived terms), and even then it's somewhat limited. For details, see "Hidden" search terms later.
-
Can't find the word? Things to try. Old English spelling and word forms varied a lot by dialect and over time. Some things to try when searching:
- Try with and without a ge prefix.
- If you're looking for a word that contains ð, search using þ or d. (Sweet standardized on þ; there's no ð in the dictionary.)
- Try using d in place of any ð or þ in the word.
- Try leaving off any initial h.
- See Sweet's long list of letter substitutions. Vowels are particularly tricky because of the many dialectical and scribal variations. I'll eventually update the entries to make it easier to search for variations.
Looking for modern English words. Try putting a space in front of the word you're looking for. For example, if you want to find OE words that Sweet lists for mud, search for
mud
(note the leading space).Getting a link/bookmark for a term. Click the ∞ symbol at the end of an entry. This copies a link to the clipboard for that entry.
Goals
As a PDF, the Sweet dictionary has some limitations. My goals in this conversion are:
Searchability. Sweet's entries make it hard to search for an OE word. (The PDF mostly works ok for searching individual modern English words.) For example, Sweet has an entry ā·bīd|an; the dot marks stress (it precedes the stressed syllable), and the | character separates the stem/root from the infinitive marker. That's useful information, but no student will know to search for this. Instead, we want to be able to search for abidan.
Find OE equivalents for modern English words. This digitization lets you search for a modern English word and find its OE equivalent. For example, do you need an OE word for a spear? Search for
spear
and see what OE words Sweet has listed with "spear" in the definition. In fact, you can use this digitized version of the dictionary as a kind of simple (if imperfect) thesaurus.Clarity. Some of the entries in the scanned PDF are not so clear — information is faint, stained, and/or has other readability issues.
Fun (?) It's a learning experience for me :)
Disclaimers
The digitization is not intended to be a 100% faithful reproduction of Sweet's original. As noted elsewhere, I've prioritized creating listings that make it easier to search for OE terms and for modern English ones. I've added the closed-up headwords; sometimes I split entries into two that Sweet had listed as one (to save space, I think). Obviously, I didn't retain his three-column layout. However, I've tried to preserve his markup/OE characters, his use of "thin letters" (e.g. for ge prefixes and "doubtful endings"), his meta text, and other details. Also his British spellings. :)
There are lots of little errors because this was a pretty manual process. Please let me know if you find any! Or just generally if something looks weird — contact me and we can sort it out together. (Contact info is under Who did this later in this page.)
[top]
Bookmarks
We can link directly to individal entries in BT and Wiktionary, which is very useful. I've emulated this by adding a linked symbol (∞) to the end of each entry. When you click that link, it copies the URL of that entry to the clipboard, and you can then paste it, bookmark it, send it to someone, whatever.
To see this in action, go to an entry and click the ∞ symbol. If it's working, you'll see a message that the link has been copied, and you'll be able to paste the link somewhere.
This approach relies on what's supposed to be cross-browser JavaScript, but we'll see.
Note: When you actually use one of the bookmark URLs (e.g. you paste it into the browser's URL box), you might need to scroll up a few lines to see the target entry. I'm low-key working on this issue.
Deriving unique IDs for each entry
In order to implement this bookmarking system, I had to derive full headwords for each entry to act as bookmark IDs. In particular, for the derived terms and compounds, I extracted the root from the main entry and then concatenated it to the second part of the derived term/compound. For example, Sweet has the entry ā·fand|ian, which in this conversion appears as the following:
āfandian (ā·fand|ian)
Following this he has the derived terms ~igendlic and ~ung.
To create the bookmark IDs, I took the root (āfand) and added the endings for the derived terms, so that the IDs for these three entries are:
āfandian
āfandigendlic
āfandung
After I created these IDs, I checked for duplicates and de-duped any by adding integers (1, 2, etc.) to the ends of the IDs. Note that the IDs contain OE characters (I did not strip or convert diacritics).
This seems to work for most cases, but there are almost certainly some edge cases where this doesn't work. These are scenarios that will have to be cleaned up later.
[top]
Searching for terms
The premise of this conversion effort is that we can load up the entries as a web page and then use the browser's Find function (Ctrl+F
or Cmd+F
) to search for specific terms. This is a simple search mechanism and does not support nice-to-have features like near searches (basically, "did you mean ...?"), etc. — the sorts of things that could be built into a database-based search.
Browser tests
I've done some experimentation in different browsers using Find to search for entries. All of my testing has been done on a Windows 11 computer.
-
Find ignores diacritics. If you want to find ābīdan, you can search for
abidan
. Firefox lets you specify an option to include diacritics with the search. -
In Chrome and Edge (but not in Firefox), you can search for æ by using
ae
. For example, to find æpel, you can search foraepel
. -
I haven't found a shortcut to search for thorn (þ). For example, Find doesn't recognize
ætheling
(oraetheling
) to search for æþeling.
The testing suggests that although some browsers simplify some searches, you might need to enter the literal OE characters that are in the word that you want to find (though probably without diacritics). Given the audience for this work, maybe it's not a big burden for people to enter OE characters for their search, in the same way that they already enter those characters in other contexts? I'm interested in feedback on this question.
[top]
"Hidden" search terms
A major goal for this conversion effort is to make it easy to find words in either OE or Modern English. As noted earlier, Sweet's space-saving conventions work against this. For example, suppose you want to look up the term ēarclǣnsend. Sweet has this in the dictionary as ēar|e followed by the entry ~clǣnsend. This is two separate entries, so it's not directly searchable.
My hack-y fix for this is to add an "invisible" version of the headword at the end of each entry. (The word is "invisible" because it's the same color as the background). However, the invisible text is searchable, and in my experimentation, the search highlighting makes the term appear.
For compounds and derived terms — entries that begin with ~
— the invisible search term is the full term/headword — the root plus the derived term/compound. (I'm reusing the unique IDs I created for the bookmarks described earlier.)
You can see the effect by loading the page and searching for earclaensend
, which finds the entry ~clǣnsend under ēare. A kind of bonus is that as you enter each letter, it shows all matches, so you can review the matching entries letter by letter.
[top]
"Starts with" and "ends with" searches
I enclosed the invisible search terms in ^ and $ delimiters. This makes it possible to do a simple "starts with" and "end with" search by entering something like ^eoh
to find headwords (only) that start with eoh or ^eoh$
to find only the eoh entries.
To be clear, this is not extending the browser's search facility in some way. And it's not a general mechanism; it only helps find headwords (including compounds and derived terms), because it's based on the "invisible" words I added to the end of entries.
I chose the characters ^
and $
for two reasons. The first reason is that these two characters are not otherwise used in the dictionary. The second reason is that those are the characacters in regular expressions for the semantics of "beginning of line" and "end of line", respectively. However, to reiterate, the mechanism that I devised here for "starts with" and "ends with" is not true regular expression search; it's a wee hack, not a real regex search.
[top]
ge prefixes in search terms
As Sweet explains in the "Brevity" section of the Preface, he ignores the ge prefix for purposes of alphabetical order. In the original dictionary, such ge prefixes are set in a lighter font (not bold). Some examples:
gehabban
gedafen
ge~fullian (geincfullian)
I've retained this convention.
When I created the searchable headwords, if Sweet had a lighter-font ge prefix, then the prefix is part of the searchable headword. For example, Sweet has an entry geærnan. If you are looking for this term, you could enter any of the following searches to find it:
^geærnan
or^geaernan
(uses^
to search for a headword that starts with ge)ærnan
(adding^
to the beginning won't work because the headword starts with ge)aernan
(usingae
foræ
)
The important point is that the "starts with" search will be looking for the ge prefix. So when you're looking for a term, try adding a ge prefix.
[top]
Alternate search terms for compounds and derived terms
In some cases, the headword for a compound has alternate forms. For example, the headword īsen has the variants īsern and īren. In cases like these, by default, the subsequent compounds and derived terms are searachable under the first headword that's listed. In this case, that would be something like ^isenheard$
for the entry ~heard under īsen.
As an experiment for searchability, for a few compounds, I've tried adding variant search terms for each headword. For example, you can search for ^isenheard$
, ^isernheard$
, or ^irenheard$
to find the term īsen-heard.
For now I've only done this for a few terms, but if it seems useful and doesn't otherwise interfere with search, I'll eventually implement this for all compounds/derived terms that have alternate headwords. (FWIW, I worry that this could suggest that these compounds and derived terms are actually attested, which isn't clear from the way Sweet has listed alternate forms. Perhaps that isn't a showstopper, though.)
[top]
Alternate search terms that Sweet doesn't include
In a few cases, I've added alternate search terms that are not explicitly listed in an entry as alternative spellings, but that are attested. Some examples:
Sweet: fāg
Search terms: fāg, fāh (Beowulf line 305)
Sweet: giėdd
Search terms: giėdd, gidd, gydd (B 151)
Sweet: līeg
Search terms: līeg, līg (B 83)
Sweet: rand
Search terms: rand, rond (B 326)
For the most part, the alternates are implicitly suggested by Sweet's variations of spellings chart. In a few cases, it's possible that by including these alternate search terms, I might be going beyond the charter of simply converting Sweet's work to HTML and adding search optimizations. However, I did this because it seemed useful to provide searchability for spellings found in the Klaeber edition of Beowulf, a work that many students use.
I didn't do this for every alternate spelling that's in Klaeber or in other works, though that would be useful — I've been doing it as I personally encountered spellings that weren't listed. I'm happy to add alternate search terms as anyone needs them.
Notes on the entries
A full-on conversion could do a lot with the information that Sweet has. The information could be broken down into relevant lexical elements, stored in a database, and fronted with an app that could search this data and then pull up all related entries. (I think this is what BT does?)
That would be great, but I'm not the person to do that. Instead, the more modest goal here is to convert the PDF into HTML. I have taken a couple of small steps to help achieve the goals. Here are some notes about the current state of the conversion:
-
As noted earlier, to help with searchability, I created closed-up versions of the headwords as needed. For example, Sweet has ā·bīd|an. In this conversion, the entry is ābīdan (ā·bīd|an) — in other words, there's a "normal" headword followed by Sweet's original markings in parentheses. I wanted to retain the information that he condenses into his headwords, hence the parenthetical term, but needed a cleaner version of the words for search, hence the closed-up heardword.
-
I retained most of the space-saving conventions that Sweet uses. For example, as noted earlier, I've retained his use of ~ in entries to mean "headword". He explains the conventions in the Preface.
I've made some modifications to the text to make these compound entries searchable, as explained elsewhere.
However, I sometimes (not always) created two entries out of one when it seemed to me that Sweet had combined entries only to save space. For example, Sweet has this entry under mann:
~slaga, ~slięht homicide.
I separated these into two entries:
~slaga homicide.
~slięht homicide.
Again, this was all about searchability and making both compounds separately findable and linkable.
-
Along those lines, I sometimes (not always) made two headwords out of a single one within the same entry. For example, Sweet marks what I believe are variant spellings using parentheses, like this (this example is under ā·węnd|an):
~e(n)dlic changeable.
I recast these terms with parenthetical letters as two headwords within the same entry, like this:
~edlic, ~endlic changeable.
I worry that I might have gotten some of these wrong and that for this first version of the conversion, I should be more literal. Comments and corrections welcome.
Moreover, per the preceding item, these would ideally be two separate entries for maximum searachability. However, my belief is that for now, entries that differ only in e.g. doubled letters should (?) be reasonably findable as is.
-
Sweet uses a lot of abbreviations (what he calls "contractions"), like Arrangement and Contractions in the preface.
for "transitive" and or for "late". I retained these for now. He explains these in the section called -
I retained Sweet's use of circumflex æ (æ̂), tailed e (ę), and dot e (ė). (I believe that instances of what look like æ with grave accent (æ̀) in the PDF are just very faint instances of æ̂.) Sweet uses these orthographical conventions to render information about dialectical variations, as he explains in the "Spelling" section of the Preface.
To render æ̂, I had to use Unicode combining characters. These combining characters are not supported in all font families (typefaces), but they are in Times New Roman, which is why I used that venerable typeface.
I've tested this conversion on Windows (Chrome, Edge, Firefox) and on my phone (Android), and the weird characters all seem to be working. I've done no testing on iOS, macos, or Linux.
As an aside, the scanning from Sweet's original text to HTML had a lot of trouble with diacritics (æ, īe, others), so this is a place where I probably made errors.
When Sweet breaks a set of compounds/derived terms across a column, he repeats the headword at the top of the next column (as with ǣ|bræ̂ce (page 3 of the PDF) and æfter|sōna) (page 4 of the PDF). He does this when the list of compounds/derived terms for a headword is quite long, which is for readability within the book layout. My thinking is that this should not be reflected in the HTML version and doesn't need to be treated as a new headword. When I encountered this, I simply continued the list of compounds/derived terms without repeating the headword.
-
Under the covers, I used styles (CSS classes) to mark a few distinct parts of the entries:
- entry (headword and strings closely related to the headword, like variant spellings and inflections).
- entryPrefix (ge prefixes that Sweet collates with other entries).
- expression ("constructions", as Sweet calls them — usage examples and collocations).
- metadata (parts of speech, other authorial commentary).
- unsure (for declensional endings that Sweet marked with "thin letters").
- unknown (for things I didn't understand).
So there is a small measure of semantic tagging. Some of this tagging required judgment calls, and I might have called it wrong sometimes.
There are lots of mistakes; I had to do a lot of manual work to clean things up and mark diacritics. Plus I'm still learning.
I realize that Sweet's space-saving conventions aren't necessary in an online version. But it would add add quite a bit of time to reverse these, so to speak. Plus it introduces other possible issues, as noted in the To-do list. Hopefully, the light semantic marking of the components of the entries (using CSS styles) will help toward that goal in the future.
[top]
To-do list
Expand contractions/abbreviations
Either expand Sweet's contractions (e.g. lLt = "late Latin") into full terms or at least link them to the abbreviations page.
[top]
Substitute headwords (roots) for the ~ character
Sweet uses the ~ character in examples and in subentries to indicate "put headword here". It would be useful to just go ahead and substitute the headword, since we're not concerned aboout space.
However, replacing ~ with the headword/root has to be done carefully; it's not entirely a mechanical substitution, especially when there's a ge prefix involved.
There's also an issue in that substituting headwords for ~ in subentries can put things out of alphabetical order (if that matters). Here's an example, using one of Sweet's entries (simplified):
dæg|rīm number of days.
~rima . dawn.
~sang daily service.
[…]
~wōma dawn.
dǣge bread-maker [dāg].
dægþerlic daily: on þisum ~an daege on this very day [dæg].
If we blindly substitute dæg for ~ in ~wōma, we end up with a headword dægwōma, which is correct (there is such a word), but it's out of alphabetical order with respect to dǣge and dægþerlic, the terms that follow.
[top]
Link "see" and other "related to" terms
Sweet occasionally adds See or Cp. [term] notes or other references to other entries. Since this is an online rendering, it would be useful for those references to be linked. As examples of where this could be useful, see the following entries:
- āhte, which includes "prt. of āh".
- astorfen, which includes "see āsteorfan.
- tēonian, which includes "Cp. tīenan".
Some entries also include a word at the end that indicates a related term. These likewise could be linked. Examples:
[top]
Expand other shortenings
Sweet often uses hyphens to indicate alternative forms, like the -yn addition here:
cynren, -yn kind, species.
I'm pretty sure that we can expand these truncated alternatives, which would make the alternative forms more searchable. For example, in this case, the entry might look like this, with the stem repeated:
cynren, cynryn kind, species.
Another shortening that Sweet uses is to simply list alternative stem vowels (more rarely, consonants) after the main entry, like this:
cięrm, ea, eo shout, clamour, cry.
It would be useful also to expand these into full words that include the alternative letters.
There are many hyphen-based abbreviations and other shortenings like this, given Sweet's interst in saving space, so this would not be a mechanical substitution.
[top]
Better layout
The entries page is long — about 20,000 separate entries. I believe also that the markup + code that I'm using to implement the search features slows the browser down when the page is loading. (I've found it hard to use the page on my phone, for example.) I need to do something about this, but I also don't necessarily want to split the entries into separate pages. Suggestions welcome. :)
The current layout uses jQuery to dynamically load the header information (at the top of the page). This is only possible for HTML that's served by a server; for security reasons, it's not possible to load a page from a user's local disk.
I mention this because it would be neat to be able to just send someone a zipped folder of HTML files so that they could have a local copy of all this information. If people want to be able to do this, we can cobble together versions of the files that e.g. repeat the header information in each page.
[top]
Correct some of Sweet's typo-type issues?
I'm reasonably sure that there are tiny mistakes in the text — terms that should be marked as metadata (italized) that are not; choices that Sweet made about when to create a separate headword vs. an inline alterative; etc. In all cases, if we can be certain that this is the issue, and if fixing it would make the conversion more useful (better search, easier comprehension), we should make appropriate changes.
PS I haven't found any outright typos yet, at least, not in the modern English definitions. :)
[top]
Other improvements to be named later
:)
[top]
Who did this
This work was started by me, Mike Pope, and it currently lives on my personal website. At the time I started this project (early 2024) I was a second-year student of Old English. I learned about Sweet's dictionary after I'd already been using Bosworth-Toller and of course Wiktionary. I found Sweet's entries very useful, but found the PDF limited for searching for OE words. So I embarked on this project. Whether the end result will actually be useful to anyone is entirely unknown, but in the meantime I will have had an interesting (though at times tedious) project to work on.
Contact me
You can contact me with questions, comments, and corrections via email at mike(dot)pope
at the Gmail place (where (dot)
represents an actual dot).
[top]
License
This HTML conversion is licensed under a "Attribution-NonCommercial-ShareAlike 4.0 International" license. This means:
- The author (me) must be credited. (Attribution)
- No commercial use. (NonCommercial)
- Derivative works are ok. (Share, Adapt)
- If you (re)distribute, it must be under these same license terms. (ShareAlike)
Be nice :)
Here's the formal statement and link to the license details:
HTML conversion of Sweet's "A Student's Dictionary of Anglo-Saxon" by Mike Pope is licensed under Attribution-NonCommercial-ShareAlike 4.0 International
[top]
Other credits
Friend Scott Butler wielded Adobe Acrobat Pro to produce a Microsoft Word version (.docx file) of the original Sweet PDF file. I'm grateful for this help, which helped get me started on the lengthy converion process.
And Friend Michael Broschat then provided tremendous help by getting his own copy of Sweet's dictionary and rescanning it, page by page, which provided an even better base to work from the than the PDF-to-Word conversion. Michael spent a bunch of time experimenting with ways to try to capture the various unusual characters in the text, like æ̂, ė, and ȳ. ("I have never seen such a typographically difficult text," he observed.)
Scott's effort and especially Michael's in-depth effort speeded the conversion process considerably, by at least 50%. (I was originally scraping each PDF page, pasting it into Word, and then doing a ton of manual fixup, which took me well over an hour per page. But once I started to work with Scott and Michael's Microsoft Word files, the process went lots faster.)
That said, there are errors in this text — places where my converted text doesn't match Sweet's original — and those errors are entirely on me. (If you notice any, please do contact me ; I'm happy to make fixes.)
[top]
[1] As Dr. Hana Videen explains in a blog post, for a variety of reasons the scholarly community has moved away from the term Anglo-Saxon for the language and now favors Old English.