About

I'm Mike Pope. I live in the Seattle area. I've been a technical writer and editor for over 30 years. I'm interested in software, language, music, movies, books, motorcycles, travel, and ... well, lots of stuff.

Read more ...

Blog Search


(Supports AND)

Google Ads

Feed

Subscribe to the RSS feed for this blog.

See this post for info on full versus truncated feeds.

Quote

It appears as if partisans twirl the cognitive kaleidoscope until they get the conclusions they want, and then they get massively reinforced for it, with the elimination of negative emotional states and activation of positive ones.

Drew Westen, director of clinical psychology at Emory University, in a study of brain activity and political bias.



Navigation





<December 2014>
SMTWTFS
30123456
78910111213
14151617181920
21222324252627
28293031123
45678910

Categories

  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  

Contact

Email me

Blog Statistics

Dates
First entry - 6/27/2003
Most recent entry - 11/24/2014

Totals
Posts - 2316
Comments - 2506
Hits - 1,692,459

Averages
Entries/day - 0.55
Comments/entry - 1.08
Hits/day - 404

Updated every 30 minutes. Last: 1:33 PM Pacific


  04:09 AM

We English speakers can occasionally have some hiccups sorting out the singular or plural nature of nouns, especially when the nouns represents represent a collection of individuals.

Basically speaking, in American English, a mass noun tends to be treated as a singular:

Apple has announced a new version of the iPhone.
Microsoft releases a new update every week.

In British English, these tend to be treated as plural:

Apple have announced a new version of the iPhone.
Microsoft release a new update every week.

Not long ago, an FB friend of mine was posting about an upcoming tour by the rock back The Who. He wrote:

The Who is (are?) coming.

Following the general rule, this is The Who is coming in American English, and The Who are coming in British English.

But consider mass nouns of this type when the noun itself is marked for plural:

The Rolling Stones are going on tour.[1]

Not even Americans will treat this as singular.

I ran across another angle on this issue today when I saw a headline about Marshawn Lynch, who plays football for the Seattle Seahawks. Behold:

As with sports teams generally, the name is plural. And as with the Rolling Stones, even in American English, we'll treat this name—which is trademarked—as plural, since it's marked that way: The Seahawks have won the game.

But the writer here got in a bind: if the Seahawks™ are a team, and even if we think of them as a (plural) collection of individual, how do you refer to any one member?

I suspect that in informal settings, people will mostly use the singular: Marshawn Lynch will be a Seahawk. Perhaps an overly attentive editor got concerned about using a trademarked name incorrectly. But the result in this case comes out sounding very odd.


[1] Not that I know of.

[categories]  

[1] |


  11:50 PM

I’ve had two occasions recently of seeing myself represented in, like, actual books. This is a little startling, in a pleasing kind of way.

The first reference is explicit. In his book Engineering Security (or at least in the April 2013 draft of it—download it here), Peter Gutmann is discussing the problem of putting security decisions in front of users. Here’s a paragraph out of that chapter:
The abstract problem that the no-useless-buttons policy addresses has been termed “feature-centric development”. This overloads the user with decisions to a point where they adopt the defensive posture of forgoing making them. As Microsoft technical editor Mike Pope points out, “security questions cannot be asked on a ‘retail’ basis. The way users make security decisions is to set their policies appropriately and then let the security system enforce their wishes ‘wholesale’”
Boy, was I tickled when I ran across that. But I didn’t remember being that smart, so I went to the blog to figure out where I had said such an interesting thing. Alas, although it is true that this information appears on my blog, it’s actually a citation from the eminently quotable Eric Lippert, who knows a great deal more about security than I ever will.

And then today I was reading Steven Pinker’s new book A Sense of Style. This is Pinker’s shot at a guide to writing (i.e., a usage guide), with the twist that Pinker is a cognitive psychologist, so he proposes guidance for clarity and comprehensibility in terms of how the brain processes the written word. (It’s more interesting than I’ve just made it sound.)

At one point, Pinker is talking about how the “geometry” of sentences determines how well readers can comprehend them. For example, it can be problematic for readers to parse long “left-branching” constructions, where qualifiers come at the beginning of the sentence: “if [the modifier] starts to get longer it can force the reader to entertain a complicated qualification before she has any idea what it is qualifying.”

He has a number of examples, including the following:
  • The US Department of the Treasury Office of Foreign Assets Control

  • T-fal Ultimate Hard Anodized Nonstick Expert Interior Thermo-Spot Heat Indicator Anti-Warp Base Dishwasher Safe 12-Piece Cookware Set.
And here’s another of this examples:
  • Failed password security question answer attempts limit
Ha! I thought. I know exactly where he got that last example: from me. Well, sort of. Once upon a time I wrote a blog entry about “noun stacks”—big ol’ piles of words like these examples. In the entry I included a number of examples that I had run across at Microsoft. The blog entry was picked up by the Language Log, which is undoubtedly where Pinker actually found the example. But I know where that example really came from.

Naturally, many people find themselves cited constantly, both formally (like, academics) and in popular writing. I suppose a person can get used to reading along and seeing something they’ve written cited in an article or book or whatever. For me, though, even just these tenuous associations with real books is quite exciting. :-)

[categories]   ,

[1] |


  10:26 PM

Sarah and I have been engaged in a gradual process of downsizing, and one of the ways we’ve been doing that is by shrinking our extensive collection of books. Not long ago we did another round of culling and pulled five boxes of books off the shelves. Then, in keeping with what we’ve done many times before, we lugged our boxes around to bookstores in order to sell them.

Prior experience suggested that we’d have the best luck with specific bookstores. Several times I’ve sold books to Henderson’s and Michael’s in Bellingham; the former in particular has always paid top dollar for books, which is reflected in their excellent on-shelf inventory. We have reason anyway to occasionally visit Bellingham, so not long ago we hauled our boxes northward.

But it proved disappointing. We used their handcart to wheel our five boxes in; the stony-faced buyer picked out about 25 books, and we wheeled five boxes back to the car. Michael’s, which is across the street from Henderson’s, was not buying at all, only offering store credit.

With diminished enthusiasm, we headed back south. Our next stop was Third Place Books in Lake Forest Park. Like Henderson’s, they carefully picked out a small stack of books and gave us back the rest. Although I was tempted to visit Magus Books in the U District—in my experience, they’ve always been interested in more academically oriented books—the day had already gotten long for little gain. Therefore, our last stop was Weasel Half-Price Books, which gave us a handful of change for the remaining four boxes. Presumably we could have demanded back the books they were not interested in, but by then we we'd lost pretty much all of our energy for dealing with the boxes, even to donate them to the library.


All in all it was a heartbreaking experience. The web has been a good tool for those who like books. Sites like Abebooks have created a global market for used books, so that a place like Henderson’s can offer its inventory not just to those in the environs of Bellingham, WA, but to anyone with an internet connection. But the internet has also brought a lot more precision to this market; a bookseller has a much better idea today of what a book is worth—or not worth—on the open market. One effect certainly has been that the buyers at all these bookstores are much choosier than they might have been 15 years ago, when (I suspect) buying decisions were still reliant on a dash of instinct.

More than that, and a fact that’s hard for me to accept, is that used books are a commodity of diminishing value. We collected those books over decades, and each acquisition had personal meaning to us. I could easily have spent an hour pulling books out of the boxes and explaining to the buyers at Henderson’s or Third Place or Half-Price why I bought the book, and when, and why I’d kept it all these years, and why it was a book sure to appeal to some other reader. But they don’t care about your stories, a fact that’s all too obvious when you’re standing at their counter, meekly awaiting a payment that represents a tiny fraction of your investment—financial and otherwise—in the books you’ve handed over.

No one really wants my old VCR tapes or CDs or even DVDs much anymore, either, although I don’t have as much emotional investment in those as I do in books. And I can’t really fault booksellers for their choosiness, since their continued success is dependent on hard-headed decisions about their inventory.

We still have five bookshelves filled with books at home, and we'll continue to downsize. I think I might be done with trying to sell the books, though. I'm not sure I want to experience the sadness of seeing how little all these lovely books are worth to anyone else but us.

[categories]   ,

|


  09:02 PM

I was reading some employee policy documents recently when I ran across this:
It is still preferred that complaints are handled internally.
There are some interesting things here to contemplate. Let's start with It is [still] preferred that.... A more active way to phrase this is We still prefer that .... The construct that starts with It is is not technically passive—there's no subject-object inversion (as in "The man was bitten by the dog.") But the it is used in an impersonal way here, which has a passive feel, and it seems clear (<-- haha) that whoever wrote this was intent on not stating who was doing the preferring.

Then there's ... complaints are handled internally. This actually looks like a real passive ([someone] handles complaints becomes complaints are handled.) Again it seems that there's an intent to avoid stating a subject for handle.

But an odder thing is that complaints are is an example that might be cited when people talk about how the subjunctive is disappearing in English. Many people would rewrite the sentence as ... preferred that the complaints be handled internally, which is a fine use of subjunctive ("be") to indicate a statement that represents "opinion, belief, purpose, intention, or desire." Consider:

They insist that he is there.
They insist that he be there.

The sentences mean different things, and the latter uses be to mark a subjunctive that indicates the aforementioned intention, desire, etc.

Gabe Doyle has a writeup on the what's invariably referred to as the "death" of the subjunctive, and one of his examples (3a and 3b) shows the same conflation between subjunctive be and indicative is.

Anyway, it's a lot of grammatical food for thought in one sentence, don't you think?

[categories]  

|


  09:52 AM

Carrying on with adventures using the Tumblr API. (Part 1, Part 2)

As noted, I decided that I wanted to create a local HTML file out of my downloaded/exported Tumblr posts. In my initial cut, I iterated over the list of TumblrClass instances that I'd assembled from the downloaded posts, and I then wrote out a bunch of hard-coded HTML. This worked, but was inflexible, to say the least—what if I wanted to reorder items or something?

So I fell back on yet another old habit. I created a "template" of the HTML block that I wanted, using known strings in the template that I could swap out for content. Here's the HTML template layout, where strings like %%%posttitle%%% and %%%posturl%%% are placeholders for where I want the HTML to go:
<!-- tumblr_block_template.html -->
<div class="post">
    <div class="posttitle">%%%posttitle%%%</div>
    <div class="postdate">%%%postdate%%%</div>
    <div class="posttext">%%%posttext%%%</div>
    <div class="postsource">%%%postsource%%%</div>
    <div class="posturl"><a href="%%%posturl%%%"
        target="_blank">%%%posturl%%%</a></div>
    <div class="postctr">[%%%postcounter%%%]&nbsp;
        <span class="posttype">%%%posttype%%%</span>
    </div>
</div>
The idea is to read the template, read each TumblrClass item, swap out the appropriate member for the placeholder, and build up a series of these blocks. Here's the code to read the template and build the blocks of content:
html_output = ''
 
html_file = open('c:\\Tumblr\\tumblr_block_template.html', 'r')
html_block_template = html_file.read()
html_file.close()
 
ctr = 0
for p in sorted_posts:
    new_html_block = html_block_template
    ctr += 1
    new_html_block = new_html_block.replace('%%%posttitle%%%', p.post_title)
    new_html_block = new_html_block.replace('%%%postdate%%%', p.post_date)
    new_html_block = new_html_block.replace('%%%posttext%%%', p.post_text)
    new_html_block = new_html_block.replace('%%%postsource%%%', p.post_source)
    new_html_block = new_html_block.replace('%%%posturl%%%', p.post_url)
    new_html_block = new_html_block.replace('%%%postcounter%%%', str(ctr))
    html_output += new_html_block
To embed these <div> blocks into an HTML file, I did the same thing again—I created a template .html file that looks like this:
<!-- tumblr_template.html -->
<html>
<head>
  <link rel="stylesheet" href="tumbl_posts.css" type="text/css">
  <meta http-equiv="content-type" content="text/html;charset=utf-8">
</head>
<body>
<h1>Tumblr Posts</h1>
%%%posts%%%
</body>
</html>
With this in hand, I can read the template .html file and do the swap thing again, and then write out a new file. To actually write the file, I generated a timestamp to use as part of the file name: 'tumbl_bu-' plus %Y-%m-%d-%H-%M-%S plus '.html'.

There was one complication. I got some errors while writing the file out, which turned out to be an issue with Unicode encoding—apparently certain cites that I pasted into Tumblr contain characters that can’t be converted to ASCII, which is the default encoding for writing out a file. The solution there is to use the codecs module to convert. (It’s possible that this is a problem only in Python 2.x.)

Here’s the complete listing for the Python script. (I wrapped some of the lines in a Python-legal way to squeeze them for the blog.)
import datetime,json,requests
import codecs # For converting Unicode in source

class TumblrPost:
def __init__(self,
post_url,
post_date,
post_text,
post_source,
post_title,
post_type):
self.post_url = post_url
self.post_date = post_date
self.post_text = post_text
self.post_source = post_source
self.post_type = post_type
if post_title is None or post_title == '':
self.post_title = ''
else:
self.post_title = post_title

all_posts = [] # List to hold instances of the TumblrPost class
html_output = '' # String to hold the formatted HTML for all the posts
folder_name = 'C:\\Tumblr\\'

# Get the text posts and add them as TumblrPost objects to the all_posts_list
print "Fetching text entries ..."
request_url = 'http://api.tumblr.com/v2/blog/mikepope.tumblr.com/posts/text?api_key=[MY_KEY]'
offset = 0
posts_still_left = True
while posts_still_left:
request_url += "&offset=" + str(offset)
print "\tFetching text entries (%i) ..." % offset
tumblr_response = requests.get(request_url).json()
total_posts = tumblr_response['response']['total_posts']
for post in tumblr_response['response']['posts']:
# See https://www.tumblr.com/docs/en/api/v2#text-posts
p = TumblrPost(post['post_url'],
post['date'],
post['body'], '',
post['title'],
'text') # No source for text posts
all_posts.append(p)
offset += 20
if offset > total_posts:
posts_still_left = False

# Get the quotes posts and add them as TumblrPost objects to the all_posts_list.
print "Fetching quote entries ..."
request_url = 'http://api.tumblr.com/v2/blog/mikepope.tumblr.com/posts/quote?api_key=[MY_KEY]'
offset = 0
posts_still_left = True
while posts_still_left:
request_url += "&offset=" + str(offset)
print "\tFetching quote entries (%i) ..." % offset
tumblr_response = requests.get(request_url).json()
total_posts = tumblr_response['response']['total_posts']
for post in tumblr_response['response']['posts']:
# See https://www.tumblr.com/docs/en/api/v2#quote-posts
p = TumblrPost(post['post_url'],
post['date'],
post['text'],
post['source'], '',
'quote') # No title for quote posts
all_posts.append(p)
offset += 20
if offset > total_posts:
posts_still_left = False

sorted_posts = sorted(all_posts,
key=lambda tpost: tpost.post_date,
reverse=True)

print "Creating HTML file ..."

# Read a file that contains the HTML layout of the posts,
# with placeholders for individual bits of data
html_file = open(folder_name + 'tumblr_block_template.html', 'r')
html_block_template = html_file.read()
html_file.close()

ctr = 0
for p in sorted_posts:
new_html_block = html_block_template
ctr += 1
new_html_block = new_html_block.replace('%%%posttitle%%%', p.post_title)
new_html_block = new_html_block.replace('%%%postdate%%%', p.post_date)
new_html_block = new_html_block.replace('%%%posttext%%%', p.post_text)
new_html_block = new_html_block.replace('%%%postsource%%%', p.post_source)
new_html_block = new_html_block.replace('%%%posturl%%%', p.post_url)
new_html_block = new_html_block.replace('%%%postcounter%%%', str(ctr))
new_html_block = new_html_block.replace('%%%posttype%%%', p.post_type)
html_output += new_html_block

# The template has a placeholder for the content that's generated dynamically
html_file = open(folder_name + 'tumblr_template.html', 'r')
html_file_contents = html_file.read()
html_file.close()
html_file_contents = html_file_contents.replace('%%%posts%%%', html_output)

# Open (i.e., create) a new file with the ability to write Unicode.
# See http://stackoverflow.com/questions/934160/write-to-utf-8-file-in-python
file_timestamp = datetime.datetime.now().strftime('%Y-%m-%d-%H-%M-%S')
with codecs.open(folder_name +
'tumbl_bu-' +
file_timestamp +
'.html', 'w', "utf-8-sig") \
as new_html_file:
new_html_file.write(html_file_contents)
new_html_file.close()

print 'Done!'

[categories]  

|


  09:17 PM

I wonder how many people do this. Let’s say I’m reading something on Wikipedia, and a paragraph includes a link that’s seductively drawing my attention away from the current article. In a show of resistance to ADHD, I won’t just click that link—instead, I’ll Ctrl+click it, thus opening the linked page in another tab “for later.”

After some amount of reading, I’ll have, oh, a dozen tabs open in the browser:


Or 20. Or 30. In another exhibit of discipline, I will occasionally drag all of these open tabs from the many and various browser windows I have open into a single browser window. Now, that’s organized.

Perhaps it’s the “for later” part that I’m wondering about. I just checked some of the pages in those tabs in the screenshot. As near as I can tell, the oldest one goes back about three months. Here’s a sampling of the pages I currently have open:
  • The Tumblr API reference
  • Three (!) articles on time perspective.
  • An article on how to use Twitter for business.
  • The article “Complements and Dummies” by John Lawler, a linguist.
  • An article on high-impact training in 4 minutes.
  • An article on how to create effective to-do lists.
  • An article on how to adjust the aim of the headlight on a motorcycle.
  • The syllabus, wiki, and video page for a Coursera course I’m taking.
  • A Wikipedia article about the 1952 steel strike (related to the previous).
You can see that these are all pages that I want to keep handy, ready to read when I get a few spare minutes.

My officemate and I were talking about this today, and it turns out he does something similar. My collection of open tabs has survived several computer reboots (thanks, Chrome!), and my officemate confirms that his collection has persisted through a number of upgrades to Firefox.

It seems like a logical approach would be to bookmark these pages, either in the browser, or using some sort of bookmarking site like Pinterest or (ha) Delicious. Or heck, OneNote or EverNote.

But in my case, tossing a link into any of these is almost the equivalent of throwing it into a black hole. Yes, I have the link, but I don’t make a habit of going back to my saved links and looking for things that had struck my fancy days or weeks or months ago.

No, the habit of keeping these pages open seems to act as a kind of short-term bookmarking. Now and then I might actually click on one of the tabs just to remind myself of why I have all these pages open. For the most part, any given page still looks interesting, so I don’t want to close it. After all, I still intend to read that page Real Soon Now.

[categories]  

[1] |


  10:51 AM

I’m not sure whether this is an eggcorn or just a homonym mistake whose tense logic amused me. I was reading an article and ran across the following (picture here in case they edit the text later):

(The text of interest says “the diatribe was entirely representative of the reality, which is bared out not only by the aforementioned Pew poll, but another Pew poll”)

The author intended to bear out, meaning to “substantiate, confirm” (see definition 30). One reason to suspect that this is an eggcorn is that, as with eggcorns generally, the word substitution sort of makes sense: to bare out could mean, perhaps with a little squinting, something along the lines of “to make bare,” hence perhaps to make obvious.

And as I say, I liked the logic of the past tense. The past of bear out is born out or borne out. Thus this sentence was intended to read “… which is born(e) out not only by …”. But if you substitute bare, you’ve got a regular verb in terms of past tense, so it is inevitably bared out.

Eggcorns are interesting because they offer a tiny peek into how speakers parse and interpret things they hear. (And they are primarily based in sound, not reading.) Chris Waigl maintains a great database of eggcorns that’s fascinating to browse through for just this reason.

You don’t find eggcorns—or whatever this mistake is—in formal articles, not nearly as often as you do in blog posts or other unedited material. So this is, I think, a real find. :-)

[categories]  

|


  12:14 PM

This is part 2. (See part 1.)

Previously on Playing with the Tumblr API:

“I have a Tumblr blog …”
“Tumblr’s search feature is virtually useless …”
“However, Tumblr does support a nice RESTful API …”
“I wanted to build an HTML page out of my posts …”


As I described last time, once you’ve registered an app and gotten a client ID and secret key, it’s very easy to call the Tumblr API to read posts:
http://api.tumblr.com/v2/blog/mikepope.tumblr.com/posts/text?api_key=secret-key
This API returns up to 20 posts at a time. Each response includes information about how many total posts there match are that your request criteria. So to get all the posts, you make this request in a loop, adding an offset value that you increment each time, and stopping when you’ve hit the total number of posts. Here’s one way to do that in Python:
import json
 
request_url = 'http://api.tumblr.com/v2/blog/mikepope.tumblr.com/posts/text?api_key=key'
offset = 0
posts_still_left = True
while posts_still_left:
    request_url += "&offset=" + str(offset)
    tumblr_response = requests.get(request_url).json()
    total_posts = tumblr_response['response']['total_posts']
    for post in tumblr_response['response']['posts']:
        # Do something with the JSON info here
    offset += 20
    if offset > total_posts:
        posts_still_left = False
I’m using the awesome requests library (motto: “HTTP for Humans”) to make the API requests. The response is in JSON. In raw Python, the return value is typed as requests.models.Response, but the json library makes it easy to convert that to a dict. You can then easily pluck out the values you want. Here, for example, I’m extracting the values of the total_posts field. Inside the response element there’s a posts array that contains the guts of each of the 20 posts that the response returns.

Normalizing  Post Info

I noted before that I was interested in 2 (text, quote) of the 8 types of posts that Tumblr supports, and that different post types return somewhat different info. The JSON info for Tumblr posts contains a lot of information—a lot of it is metadata like state (published, queued), note_count, tags, and some other stuff that, while essential to Tumblr’s purposes, did not interest me personally. I’m interested in just these things: post_url, date, title, and body (text posts) or source (quote posts).

To normalize this information, I fell back on old habits: I created a TumblrPost class in Python and defined members that accommodated all of the JSON values I was interested in across both post types:
class TumblrPost:
    def __init__(self, post_url, post_date, post_text, post_source, post_title):
        self.post_url = post_url
        self.post_date = post_date
        self.post_text = post_text
        self.post_source = post_source
        if post_title is None or post_title == '':
            self.post_title = ''
        else:
            self.post_title = post_title
Should I want at some point to accommodate additional types of posts, I can add members to this class. I guess.

Having this class lets me read the raw JSON in a loop and create an instance of the class for each Tumblr post I read. I can then just add the new instance to a Python list. My code to read text posts looks like the following:
all_posts = []      # List to hold instances of the TumblrPost class

request_url = 'http://api.tumblr.com/v2/blog/mikepope.tumblr.com/posts/text?api_key=key'
offset = 0
posts_still_left = True
while posts_still_left:
    request_url += "&offset=" + str(offset)
    print "\tFetching text entries (%i) ..." % offset
    tumblr_response = requests.get(request_url).json()
    total_posts = tumblr_response['response']['total_posts']
    for post in tumblr_response['response']['posts']:
        p = TumblrPost(post['post_url'], post['date'], post['body'], '', post['title'])
        all_posts.append(p)
    offset += 20
    if offset > total_posts:
        posts_still_left = False

Reading both Text and Quote Posts

So that took care of reading text posts. As I say, quote posts have slightly different JSON layout, such that reading the JSON and instantiating a TumblrPost instance looks like this (no body, but a source):
p = TumblrPost(post['post_url'], post['date'], post['text'], post['source'], '')
I debated whether to try to tweak my loop logic try to accommodate both text-type and quote-type requests in the same loop. In that case, the loop has to a) issue a slightly different request (with /quote? instead of /text?) and then b) extract slightly different JSON when creating the TumblrClass instances. This would require a variable to track which type of post I was reading and then some if logic to branch to the appropriate request and appropriate instantiation logic. Bah. In the end, I just copied this loop (*gasp!*) and changed the couple of affected lines.

Next up: Creating the actual HTML, and then done, whew.

[categories]  

|


  03:05 PM

One of the delights of my job has always been the chance to work with people from all over, and I mean, like, from all over the globe. A nice side effect is that people bring their unique brands of English with them, affording endless opportunities to listen to, read, and think about the vast dialectal variations in our language.

One of our developers has the task of sending out a biweekly email with tips and tricks about using our tools. He happens to be from Sri Lanka, so his English is primarily informed by British usage, and the subject line of his email read “Tip of the Fortnight.” Apparently having second thoughts after the email went out, he popped into my office and asked “Will people understand the term fortnight?”

I think it’s safe to say that literate Americans understand fortnight just fine. But it’s not a term that many Americans produce, I think. I lived in England for a couple of years, and I got very used to expressions like a fortnight’s holiday, but even with this exposure, the term never entered my active vocabulary.

His question, tho, sent me on a bit of a quest to try to determine what the, you know, isogloss is for fortnight. Right across the hall from me is a Canadian, so I asked him. Nope, he said, they don’t use it. My wife has cousins in Australia, so I sent a query off to one of them. Oh, yes, they use it all the time, she said. In fact, she asked, what do you say in the States when you're referring to something on a two-weekly basis? Good question, which underscored why fortnight is such a handy word. I mean, really: how do you phrase "Tip of the Fortnight" in American English?

The word has a long history—according to the OED, it goes back to Old English (first cite 1000), and if I read their note right, Tacitus referred to a Germanic way of reckoning time by nights. (Interestingly, the most recent cite in the OED is for 1879, not that they really needed a cite more recent than that for a term that is in everyday use in Britain.)

I looked in a couple of dictionaries, but neither of them indicated anything along the lines of “chiefly Br.”, as they occasionally will with a regional term. The two usage guides I have handy, Garner and the MWDEU, are both silent on the term. (I slightly expected Garner to comment on the term’s use in, say, legal writing, but nope.)

But I’ll stick to my now-anecdotally based theory that fortnight is just not used much in North American English. Still, I don’t think my colleague had much to worry about regarding the subject line of his email. As I say, I’m pretty sure that my American and Canadian colleagues recognize the term. And of course, many others come from places where it’s a perfectly normal word, and like the cousin, they might wonder why we don't adopt such an obviously useful term.

[categories]  

[4] |


  10:43 PM

I have a Tumblr blog where I stash interesting (to me) quotes and citations that I've run across in my readings. Tumblr has some nice features, including a queue that lets you schedule a posting for "next Tuesday" or a date and time that you pick.


Tumblr Woes

However, Tumblr’s search feature is virtually useless, which I sorely miss when I want to find something I posted in the distant past. As near as I can tell, their search looks only for tags, and even then (again, AFAICT) it doesn't scope the search to just one (my) blog.

In theory, I can display the blog and use the browser's Ctrl+F search to look for words. Tumblr supports infinite scroll, but, arg, in such a way that Ctrl+F searches cannot find posts that I can plainly see right in the browser.

When search proved useless, I thought I might be able to download all my posts and then search them locally. However, and again AFAICT, Tumblr has no native support for exporting your posts. There once was a utility/website that someone had built that allowed you to export your blog, but it's disappeared.[1]

APIs to the Rescue

However, Tumblr does support a nice RESTful API. Since I've been poking around a bit with Python, it seemed like an interesting project to write a Python script to make up for these Tumblr deficiencies. I initially thought I'd write a search script, but I ended up writing a script to export my particular blog to an HTML file, which actually solves both of my frustrations—search and export/backup.

Like other companies, Tumblr requires you to register your application (e.g. "mike's Tumblr search") and in exchange they give you an OAuth key that consists of a "consumer key" and a "secret key." You use these keys (most of the time) to establish your bona fides to Tumblr when you make requests using the API.

(Side note: They basically have three levels of auth. Some APIs require no key; some require just the secret key; and some require that you use the Tumblr keys in conjunction with OAuth to get a temporary access key. This initially puzzled me, but it soon became clear that their authentication/authorization levels correspond with how public the information is that you want to work with. To get completely public info, like the blog avatar, requires no auth. In contrast, using the API to post to the blog or edit a post requires full-on OAuth.)

Tasks I Needed to Perform

The mechanics of what I wanted to do—namely, get all the posts—are trivially easy. For example, to get a list of posts of type "text" (more on that in a moment), you do this:

http://api.tumblr.com/v2/blog/mikepope.tumblr.com/posts/text?api_key=secret-key

This returns 20 posts' worth of information in a JSON block that's well documented, and which includes paging markers so that you can get the next 20 posts, etc. In a narrow sense, all I needed to do was to issue a request in a loop to get the blog posts page by page, concatenate everything together, and write it out. I’d then have a “backup”—or at least a copy, even if it was a big ol’ JSON mess—of my entries, and these would be somewhat searchable.

As it happens, you use different queries to get types of posts. Tumblr supports a half dozen types of posts—text, quote, link, answer, video, audio, photo, chat. Each type requires a separate query[2], and each returns a slightly different block of JSON. For just basic read-and-dump, it’s a matter of looping through again, but this time with a slightly different query.

So that’s the basics. As noted, I got this idea that I wanted to build an HTML page out of my posts, and that complicated things. But not too terribly much. (I’m using Python, after all, haha). More on that soon.

Update: Part 2 is now up.

[1] One of the original reasons I got interested in writing this blog, in fact, was that LiveJournal did not support any form of search way back in 2001.

[2] Their docs suggest that if type is left blank, they'll return everything, but that was not my experience.

[categories]  

|