# 2008 in numbers

A few numbers to round off 2008.

The year has seen 327 posts (0.89 per day), the second biggest calendar year to date after the 413 posts of 2006. (The heaviest 12 months ran from March 2006–February 2007, 428 posts. The lightest 12 month period was the 2005 calendar year, a mere 224 posts.) 2008’s posts elicited 347 comments, or 1.06 comments per post.

2008’s peak came in September, with 37 posts, and after a rather lacklustre start to the year, April onwards saw an average of over 30 posts per month.

There are now an estimated 242,472 words of tat, printable in Times New Roman 12 point on an estimated 612 pages of A4. If you choose to do this, please go double-sided.

Despite the relative regularity with which my tangential thoughts, nay ramblings, have brought themselves to the fore as posts herein, I’ve grown a little tired of the blog over the last two or three months. It needs some fresh impetus which I’ve not had the inclination to give it of late. Hopefully early 2009 will give me some fresh insipration and some new food for erudite thought. Alas, maybe erudition is a little too much to expect. Why start now?

Happy new year, one and all.

Simon introduced me to the St. Petersburg paradox the other day. Here’s how it goes. Basically, you flip a coin over and over again until you get a tails. You win 2^n dollars, where n is the total number of coin tosses, including the failed tails toss. So a tails straight away wins you \$2; HT wins you \$4; HHT \$8; HHHT \$16 and so on.

The question is: how much should you be willing to stake to play the game?

Now the answer is different depending on whether you play repetitively or you only play a single game. If your pockets are infinitely deep and time is not an issue, then any amount of money, no matter how large, is worth the investment in a game. Here’s why.

The expected earnings from a game can be calculated by multiplying the earnings by the probability of those earnings for each eventuality, and summing these. The possible outcomes are shown below, with the columns showing the coin pattern (C), the associated winnings (W), the probability of that event happening (P) and the expected value of that line (E) respectively.

C W P E
=======
T 2 0.5 1
HT 4 0.25 1
HHT 8 0.125 1
HHHT 16 0.0625 1
HHHHT 32 0.03125 1

As you’ll notice, the expected winnings of each line is \$1. So summing these for the infinite series gives you infinite expected winnings. Each remote possibility of a long string of heads comes with it a winnings pot commensurate with the remoteness of it happening.

So it’s worth a trillion dollars per game. You’d need 29 heads in a row to win back that amount or greater, but over time, the odds are such that you’d do it. And if you got 35 heads in a row, you’d win \$68 trillion. And you could pick any stake greater than \$1 trillion, and the numbers would always show it’s worth it.

But if you only had enough money for a single game, what would you stake? The game is certainly worth \$2, as that’s the minimum you could win. And it’s arguably worth \$3, as that would give you a 50% chance of losing a dollar, and a 50% chance of winning a dollar or more. At \$4, you have a 50% chance of losing \$2, a 25% chance of breaking even, and a 25% chance of winning \$4 or more. The decision as to how high a stake is worthwhile is subjective based on the value of money to that person. Or more importantly, the detrimental effect that losing a proportion of the stake would have on the player. If you have \$1m in the bank, then you might risk \$10 for a game. If you’re down to your last \$10, you’re unlikely to do the same.

Very interesting conundrum.

# Facebook can teach the direct marketeer a thing or two

Facebook has done something that Direct Marketing has largely failed to do for 15 years at least: personalise the message.

Although Facebook occasionally struggles with genders, not all applications having access to sufficient personal details to choose an appropriate pronoun, it certainly has a go. Instead of listing as separate updates people whose profile pictures have recently been changed, it comma-separates those people in a single update element.

The direct marketeer often struggles between the richness of data available and the possibility that for some people, many elements of this data are likely to be missing. The data on offer through a 100+ question lifestyle questionnaire may look like a marketeer’s dream, but its so many variables make the automated personalisation of a message very difficult. And does missing data mean a lack of interest in something or a lack of interest in the very act of answering that question?

In reality, it shouldn’t really be that difficult. Analysis should be able to identify those variables that the marketeers should be interested in, either because of the targeting of the message or the correlation of such a variable with uplifted response rates. And once the variables of interest have been chosen and prioritised, relatively targeted messaging can be tailored around the values of these. An interest in golf may be the trumping factor, after which a salary in excess of a certain figure, followed by being male, being single then being over a certain age. The messaging behind each of these segments can be tailored appropriately to make the communication suitably targeted, with other variables like gender allowing for more localised tweaking of the English.

More transactional data sources (e.g. Ocado’s buyer history) can allow for much more extensive customer sets, the personalisation behind each one likely being more straightforward than the lifestyle data mentioned earlier. (“As someone who’s bought Macleans toothpaste in the past, you may be interested in their new mouthwash.”)

In either example above, each cell (in the marketing sense of the word) needs to be represented by a line in a spreadsheet in order of priority, with the columns representing the variables whose values will vary because of the targeting: pronouns, pieces of prose, URLs, link texts etc.

Often, marketeers see the swathe of data available to them and regard the problem as insurmountable. It’s not. It just takes some careful planning, and demands a focus on a manageable number of variables and cells rather than trying to truly personalise the message.

Which is it to be?

# 33<36

I’m not very well at the moment. I’ve been struck down for the second time in as many months by something akin to ‘flu’, and am now dosed up on antibiotics and Nurofen, looking forward to a hacking, alcohol-free Christmas.

Among other things, I’ve been suffering mild hallucinations in the night, with particularly vivid dreams and semi-conscious illogical thought processes. But I think the symptoms started on the way to an afternoon party on Sunday. I’d bought a present for the hosts’ son, who will be three in March. The present had a "36 months +" warning attached. I struggled in my head all the way down the hill from Sainsbury’s to the Smoke Rooms to figure out whether the boy was less than or greater than 36 months of age, despite knowing he was two years and nine months. Eventually I convinced myself that 36 months was actually one and a half years, and was content in the suitability of the toy.

Very bizarre.

(I’ve since warned the parents, btw.)

# No spam day/week

I have received 4,087 spam emails in the last 30 days. And because of my need to turn off the captcha for comments on this blog because of its incompatibility with the new version of php, I’m getting swathes of spam comments here too. (They’re all pre-moderated, so none of them make it to the site, thankfully. Which begs the question: why do they bother hitting me?)

Could we have a spam-free day? Or better still, a spam free week? I know these people have very few morals, if any, but it’s worth asking them isn’t it? Can I propose the first Wednesday in February, which this time will take place on 4 February 2009? Or else the first week of February (2–8 February).

It might be nice not to get any, for a little while at least.

While on the subject, I’m intrigued as to why spam messages are so dreadfully bad. Full of typos and illiterate English, surely this in itself makes them more easily detectable. Or maybe the poor, modern-day English standards make them blend into the rest. Who knows?

# Obama, McCain, Kerry, Bush

Inspired by some comments Alan made today detailing Lycos’ top 50 searches of 2008, I decided to check Google Trends for the relative popularity of Barack Obama and John McCain searches on Google.

Here’s a link to the 2008 timeline, below is the chart.

McCain is red, as per his party’s colours; Obama is blue.

Only at one point in the year did John McCain searches exceed those of his rival, around August/September.

Here are the comparable trends for John Kerry and George Bush in 2004 and below is the chart.

That time, the losing candidate, John Kerry, is in blue. A much more mixed picture, with Kerry enjoying several peaks above those of his rival. Interestingly, the news searches always favoured Kerry, likely because of the Democratic bent of the early adopters, given that Google News was then in its infancy.

# Complimentary spam

Below is the opening line from each of the last twelve comments I’ve received:

• Hehe! Good work!
• Very good site! Thanks! 🙂
• Very pretty design! Keep working. Go on!
• Great work!
• Thank you!
• Very nice site, i love it!
• Very useful information was found here, thank you for your work.
• You have an outstanding good and well structured site. I enjoyed browsing through it.
• COMPRAR VIAGRA COMPRAR
• Not much on my mind right now, but it is not important. I have just been letting everything happen without me. I just do not have anything to say right now.
• Well done!
• Very superior site. Good job. thnx.

Unfortunately, they were all spam comments, each succeeded by a bunch of links to sites no doubt promoting their wares. But that’s not the point: why can’t the genuine commenters be as complimentary?

# Cell mutation

I’m always confused by the Mute/Unmute toggle button on my mobile phone, which I often use while on conference calls during which there is background noise at my end.

When I’m unmuted, it says “Mute”, and while muted, it says “Unmute”. The confusion arises for two related reasons:

• The button doesn’t look like a button. It’s an unstyled rectangle alongside Speaker On, Hold, Note, Contacts and End
• There is a suble difference between offering me a status update and offering me an action.

If the button looked more like a button, perhaps I’d immediately understand its label to represent an action, and know exactly whether or not the phone was muted.

But when I look at the phone and it says “Mute”, I have to think whether this means it’s currently muted (always my immediate reaction), or whether pressing the “button” will mute it. I think a status symbol (not in that sense) would be useful to immediately tell me whether or not people at the other end of the phone can hear me.

I’m just saying…

# Server-side fonts

If I wanted to present my website in a non-standard font, is there any way of doing this without generating images of the content? Can I put the fonts on the server and get the client machine to access these files to correctly render the font? And if so, and I’ve bought the font, is it OK to put it on a readily-accessible server for all and sundry to download?

Any help greatly appreciated.

