Four-letter words in unexpected places

About

I'm Mike Pope. I live in the Seattle area. I've been a technical writer and editor for over 35 years. I'm interested in software, language, music, movies, books, motorcycles, travel, and ... well, lots of stuff.

Read more ...

Blog Search

(Supports AND)

Include comments

Feed

Subscribe to the RSS feed for this blog.

See this post for info on full versus truncated feeds.

Quote

It's amazing how many early advancements in math were based on gambling. I guess it's sort of the same historical relationship between video technology and pornography. Not that there's anything wrong with that.

— Jeff Atwood

[ view all quotations ]

Navigation

[ home ]

April 2025

25 Most-Visited Entries

1. THE most basic way to implement ASP.NET Razor security
2. Fun (or not) with noun stacks
3. Rendering HTML as HTML in Razor
4. Using the confirmation feature for ASP.NET Web Pages security
5. Changing the headlight on a 2010 Honda Shadow Phantom
6. RSS feed truncation: reader's choice
7. When in English, do like English
8. Constructing site URLs in ASP.NET Razor
9. Dude, where's my Razor API ref?
10. WebMatrix Beta 2 tip: why can't I use helpers?
11. Documenting ASP.NET Razor
12. Anniversaries in IT
13. Tech review tips and strategies (Part 1)
14. Moving a website in WebMatrix
15. Finding the right trousers
16. Blog update: Added a Facebook Like button
17. How Feynman learned to crack a safe
18. Blog feed fixed, oops
19. Moving the blog
20. Blog move successful (?)
21. Geopolitical concerns and the 4 fingers of ... change
22. Theory And Practice Of Editing New Yorker Articles [Selections]
23. Each unique different kind of its own
24. Well, THAT happened
25. SME + training doth not (necessarily) a writer make

Categories

RSS
RSS aspnet
RSS blog
RSS books
RSS editing
RSS family
RSS Friday words
RSS FridayFun
RSS funny
RSS general
RSS history
RSS house
RSS language
RSS motorcycles
RSS movies
RSS MS Word
RSS music
RSS personal
RSS politics
RSS readings
RSS roundup
RSS seattle
RSS teaching
RSS technology
RSS travel
RSS ukulele
RSS webmatrix
RSS whidbey
RSS work
RSS writing

Contact Me

Email me

Blog Statistics

Dates
First entry - 6/27/2003
Most recent entry - 9/4/2024

Totals
Posts - 2655
Comments - 2678
Hits - 2,734,198

Averages
Entries/day - 0.33
Comments/entry - 1.01
Hits/day - 344

Updated every 30 minutes. Last: 4:00 AM Pacific

Four-letter words in unexpected places

Wednesday, 10 August 2011 10:26 PM

I've noted before that we have tools to help us find geopolitical issues in our docs. This includes profanity. Not that this comes up a lot, of course. But the point is both to guard against unintentional uses, like maybe a writer who's in a hurry and copies something out of an email thread that includes some dubious language. Or to guard against profanity in places where people (editors, say) might not think to look.

And indeed, I personally ran across a couple of such instances in the last few weeks, much to my amazement. (Actually, amusement, and you'll see why.)

The story involves GUIDs, or "globally unique identifiers." A GUID is a 16-byte/128-bit number; that's 3.4 × 10³⁸. You use GUIDs to identify things because, given this huge pool of numbers, and the fact that GUIDs are generated randomly (for practical purposes), the chance of two GUIDs ever colliding is infinitesimal.

So? Well, for humans, GUIDs are normally represented like this:

936DA01F-9ABD-4d9d-80C7-02AF85C822A8

That's a 128-bit number in hexadecimal (base 16, useful for computers), which uses the digits 0–9 and the letters A–F to represent numbers. You can see from the example that a GUID will often have consecutive letters.[1] And any time you can put letters together, you can, advertently or otherwise, form words. And when you can form words, ...

The massive internal database we use to store our work uses GUIDs as IDs for everything — documents, pieces of art, code snippets, everything. So for example, while we're authoring, if we want to create a link from one topic to another topic, we specify the GUID of the target topic. (Of course, the tools do all this for us; we don't have to actually know GUIDs or anything.)

As I say, twice in the last week or so I've run our geopolitical check tool and it objected, vehemently, to what it fingered as profanity in the docs. This was the first one:

CCEE44F2-86E3-CACA-9E53-8CA5F25F8F62

Namely, a GUID (probably in a link) that included the string "CACA". This amused me, because apparently you can take the boy out of the 2nd grade, but you can't take the 2nd grade out of the boy.

The second hit was a little more mysterious. Again the tool had tripped over a GUID; this time it was this:

4D07A497-37F7-4454-BBCD-732EA5CDD059

The sequence of letters that had raised the alarm was "BBCD". Uh ... what? But the tool said it was profanity, so I went looking. And sure enough: BBCD is a derogatory term in Britain (I guess). Wow, interesting catch.

Of course, there's nothing for us to do here. First, the character strings are in GUIDs, which we can't change. And anyway, customers never see these; they're used internally for identifying "assets," as they're called, but are converted to other types of identifiers (comparatively tame stuff like http://msdn.microsoft.com/en-us/library/fddycb06.aspx, for example) during the publishing process.

This whole business got me to thinking about how many offensive terms we could construct out of the combination of letters A–F (repeats allowed) with 8 + 4 + 4 + 4 + 12 places to play with. Not so many, probably. (Lots of words, but not so many offensive words.) Which was why I found it all the more interesting to have come across two examples in just a few days.

More reading
Online GUID generator

[1] There are 8 + 4 + 4 + 4 + 12 places, all of which could have letters. That's a lot of combinations of letters. Of course, odds are against this. Colleague Ron does the math: "Since the range of letters in a GUID is 0-F, the probability of consecutive digits being in the range A-F decreases sharply as the total number of digits increases. (About 14% for 2 digits, about 5% for 3 digits, less than 2% for 4, about .75% for 5, and about .27% (or one time in four hundred) for 6, and .1% (one time in a thousand) for 7. The number of possible combinations is exponential — 36 for 2 digits, 216 for 3, 7,776 for 4."

[categories] editing, writing

comment [4] | link

mike's web log