## Archive for the 'Programming' Category

### Movember: Day 12

Yes, there’s a good reason that I’m wearing a high-contrast stripy… is it “stripy” or “stripey”? “stripy” looks like a case-insensitive string processing library for Python or something.

…shirt.

### Shell sort

I was watching a COMP1927 lecture online (yes, this is what I do for fun), and remembered something that I was going to post but hadn’t gotten around to yet.

I’ve been fiddling with Shell sort gap sequences a bit. The idea I was working on was that a good gap sequence would consist mostly of numbers that are mutually prime; it seems like numbers with common factors would lead to sub-sorts that cover a lot of the same ground, for the same sort of reason that hash functions don’t work well if the hash value is modulo a number that’s some kind of factor in the data. (I don’t have any particularly good theory to back this up; it’s just intuition.)

Apparently (or “wikily”, a word we invented recently to mean “according to Wikipedia”) a geometric sequence that increases by a factor of 2.2 makes a good Shell sort sequence. So I tried out a sequence that:

• consists of products of the first n primes (if you write out the prime factors of each number in the sequence, each of the first n primes occurs exactly once); and
• increases by approximately a factor of 2.2.

Finding the best such sequence (the one with a ratio between pairs of elements that’s close to 2.2, by one of a handful of definitions of “close”) is a task in itself. Computing the sequence for n=16 took nearly a week, and the result is:

1, 2, 5, 11, 23, 53, 119, 377, 817, 1739, 3813

or, as products of primes:

1, 2, 5, 11, 23, 53, 7×17, 13×29, 19×43, 37×47, 3×31×41

And, empirically, it does pretty well. Not as well as Ciura’s sequence (1, 4, 10, 23, 57, 132, 301, 701, 1750), but pretty close. I don’t have the test-run data on hand, but I might put it in a follow-up post later.

### Mercurial

Sooooo. I’ve been a Bazaar fanboy for a few years now. At the time I had a pretty exhaustive look at most of the open source options, so I feel like it was an informed decision. Actually I used Darcs for a while, but eventually came to the conclusion that it was more academic than practical.

This was within maybe a year of the BitKeeper/Linux/Git thing, which I think of as the point in time when the open source community decided to get their shit together with distributed VC; so most of the projects were at about the same level of maturity, i.e. not very, but they were suddenly getting a lot of dev attention. (GNU arch had been around for a while, but everyone hated it, and more specifically everyone hated Tom Lord.) At the time, Bazaar-NG (as it was then) was a bit of a dark horse but it had the backing of Canonical (this was when Ubuntu was starting to look like king of the hill as well) and it seemed to have a better focus on usability than some of the others.

In the time since then a lot has happened, and for some reason that I momentarily forget, I decided that today was a good time to take another look at the state of the art. And I must say that Mercurial looks like it might be the new favourite. For goodness’ sake, OpenOffice just announced that they’re switching to it. (Yes, MySQL uses Bazaar. Yes yes, half the world uses Git. This is getting us nowhere.)

So I’m going to make a tentative switch to Mercurial for… you know… stuff I might be working on. Very special projects and… things. Whatever.

Stay tuned.

### Silly trick

I’m running a script to do something a friend of mine asked for. (If you care about the details, go here.) We’re going out to a birthday thing with Tina’s family in about half an hour, and there’s a good chance it’ll run longer than that. (I actually have a pretty good idea of how long it’ll run, but let’s ignore that for now.)

So I was going to set up a loop to upload the results to somewhere every 15 minutes or so. But I’ll be out until later today, and it’ll be uploading the same thing every 15 minutes until I get home and stop it. (The bandwidth involved isn’t very big, but let’s pretend for a moment that this is a consideration.) But I want him to be able to check the progress, so making the interval much longer is potentially annoying. I could work out when the process has finished and stop uploading, but let’s assume that I don’t know how to do that off the top of my head (because I don’t) and that I don’t have time to work it out (which I probably do, but I’m blogging about it instead).

Then an idea popped into my head out of nowhere: Double the interval every time.

This way, it’ll be updating regularly at the start, so he (and I) can check the progress almost in realtime. Then the rate will slow down exponentially so that it’s not uploading all the time unnecessarily. And the delay between when the process finishes and when it does the final upload will naturally be in the order of the duration of the process itself (and in fact will be strictly less than it). And there will be very few extraneous uploads after it’s finished until I get home and kill it.

For this particular problem it was probably overkill, but if you change the scale of the problem and remove an easy way to find out when the process has ended or how long it will take in advance, it might actually be useful. It even seems to approximate people’s habits (or at least, my habits) when checking the progress of a long-running process – I tend to watch it closely for a minute or so, then check it occasionally at increasing intervals until it’s done.

Plus it was dead simple to do:

n=1; while true; do {upload stuff}; sleep \${n}s; n=\$((\$n*2)); done

Is this a known trick? Or does it even make sense?

Update: Looks like it’s going to finish just in the nick of time. Still… cool trick, right? Right?

What I learnt from logging CAPTCHA attempts was, essentially, nothing. Every post either entered nothing at all or the correct word, and all of the correct ones were obviously from the same source (not the same IP address, but structured the same way and advertising similar things). That is, it’s not being broken by a huge dictionary attack or anything.

So I think the remaining alternatives are that someone has actually broken my CAPTCHA with 100% accuracy, or this is getting back to a human at some point. Possibly I’m caught in one of those man-in-the-middle attacks where my CAPTCHA is relayed to some porn site and decoded by a horny teenager. Alternatively, people are getting paid to leave spam. I’ve heard of this sort of thing happening – Mr Shellshear caught some of it a while ago – but if that were the case I’d expect the comments to bear traces of humanity as well, and as it stands they’re fairly obviously auto-generated. I think. Or written by particularly obtuse and formulaic humans.

Now that I read over that last paragraph again, none of those options stands out as being clearly more likely than the others.

So in a further attempt to narrow down the options, I’ve changed my CAPTCHA again. This has two advantages. One, it’s different enough to the old one that I’m fairly confident that it won’t be immediately legible to any bot that could read it before. And two, the new one looks cool. Also illegible, so I probably won’t leave it this way forever, but come on, it’s sleek.

### Sorting cards

For a while now I’ve had my own manual sorting algorithm that I use for, for example, sorting cards.

Actually the thing I most often end up sorting is card sleeves. Our Magic drafting group, being a highly distilled collection of nerds, draft into sleeves that are marked with the pack (A-C) and pick order (1-15), so that we can reconstruct and analyse the draft afterwards. So at some point before every draft I need to un-sleeve the previous draft, then sort the sleeves from A1 to C15.

Now… there’s a whole branch of computer science devoted to sorting algorithms (and by “branch” I mean “volume of the Knuth books”), but I’ve never come across any kind of systematic advice for sorting manually, i.e. without the convenience of having a computer instead of a brain and a bunch of digital artifacts instead of some physical pieces of plastic or cardboard. Most people just tend to sort in an ad-hoc way that vaguely resembles an insertion or selection sort. Occasionally I’ve seen someone attempt a radix sort with playing cards (into numbers, then into suits). If you watch this lecture you’ll see someone try to do a bubble sort with a bag over his head.

My manual sorting algorithm isn’t, as far as I know, a well-known programmatic sorting algorithm. You probably wouldn’t want it to be. It involves a variable number of variable-length lists, and doesn’t perform obviously better than a whole class of O(n2) algorithms that are in-place and much easier to implement. But it has the advantage that, for input that’s approximately O(a deck of cards), its space complexity is roughly O(a small area on my desk).

Here’s the algorithm. It’s in two passes.

1. Take a card from the deck. Look for a pile on the table (initially there are none) whose top card is lower than the card you’re holding. Put the card on face-up top of that pile. If there’s no such pile, put the card in a new pile. Repeat until the deck is empty.
2. Find the highest card among the top cards of the piles on the table. Move that card to the deck. Repeat until there are no piles left.

That’s a pretty sparse explanation. I might take a shot at making a video with actual cards – not that I expect it to be very useful, but it would also serve the purpose of playing with the 5D Mk II’s video mode.

One interesting thing is that the algorithm sort of takes advantage of the fact that a human can pick one from a handful of cards or piles at a glance in roughly constant time – or, at least, it doesn’t quite feel like a linear search because you quickly pick up ad-hoc shortcuts to find the right pile. (Also, the overhead of physically moving the cards around, which you can’t get away from using any technique, outweights the time it takes you to find the right pile.)

Even if you do end up doing a linear search for the right pile, the time complexity isn’t too bad. Worst case is quadratic, where the cards start in reverse order and each card ends up in its own pile. Best case is linear, where the cards start in order and all the cards go in one pile. The expected complexity for shuffled cards is… I dunno, I haven’t worked it out. The important factor is the average number of piles you end up with, which is complicated. I’d guess it works out to something like √n, which would make the whole thing O(n3/2), maybe. You could look at it as a selection sort with a probabilistic speedup for the “selection” bit.

Anyway, that’s how I’ve been sorting card sleeves for a while. I hope someone found that interesting. And if this is acutally a well-known algorithm, let me know what it’s called. (Haven’t really looked for it myself yet. My Knuth books are at work at the moment, so I might check there on Monday.)

### Auto-drafter, cont’d

Mr Shellshear pointed out to me last night that the record of our M10 draft had the seating arrangement in the wrong order. So the auto-drafted results in my last post were still interesting, but didn’t necessarily boil down to actual packs with the right rarity distribution and whatnot.

The results with the correct seating order are below the fold. I also turned up the bias towards already-drafted cards. I still wouldn’t call the behaviour sensible, but I wouldn’t be totally upset with the pool it drafted. (In particular, I made good use of Whispersilk Cloak on the night, especially alongside an Enormous Baloth. And I seem to have picked up a Protean Hydra somehow.)

### Magic auto-drafter

This is kind of cool.

I’ve had an idea for a while to make an automatic draft picker for Magic: the Gathering. The idea was to use a sort of naive Bayes approach, using the records of previous drafts to estimate:

• the probability that a player, given a pack with cards A and B, will pick A over B;
• the probability that a player, given a pack with card A and having already drafted card B, will pick A over the other cards in the pack.

The idea is that the former gives an estimate of card quality, and the latter estimates the influence of earlier picks on each card. To use this in a draft, you (that is to say, the program that’s doing it for you) just multiply all the probabilities together for each card, and pick the one with the highest score.

So I finished coding it up this morning. Then I trained it using four Magic 2010 drafts from the MtG website (e.g. this one), then re-ran our draft from a couple of weeks ago, to see what would have happened if we were all replaced by auto-drafters.

The results are below the fold. And it’s not completely ridiculous. It’s probably less focused in colours than it could be; maybe there’s some tweaking to be done by weighting the influence of previous picks. But at a glance, I can’t see anyone’s card pool that couldn’t be made into, say, a two-colour deck with a third splash. (Edit: Now that I think about it, this is probably true of any 45 random cards. Hmm. Well, most of the pools have a strong leaning towards two or at most three colours, and Andrew C could almost get away with mono-green.)

Bizarrely, it’s drafted a red-blue deck for me instead of the white-green deck I actually played (although I did feel bad about not using that first-pick Lightning Bolt in the second pack).

There are a few hiccups because the training set of four drafts wasn’t big enough – there’s no way Ajani should have gone fifth pick, for example. I’m optimistic that throwing more data at it would make it better. In an ideal world, I’d somehow get hold of the results of every Magic Online draft to pump through it, and it would represent the collective wisdom of every online player everywhere.

In fact, if anyone from Wizards would like to do this, let me know.

Actually, that’s not as silly as it sounds. (Well, letting me do it probably is.) There’s already an AI of some kind controlling the online draft simulator. That could be something similar to my system, or it could just be a hand-crafted pick order along with some kind of futzing to get it to pick roughly consistent colours. It’d be interesting to find out how it works. (But they’ll never tell us.)

Interestingly, some kind of auto-picker could be useful for Magic Online. If you don’t pick a card within the time limit in an online draft, it randomly picks one for you. (At least, it was random last time I checked. I haven’t drafted online for a while.) MO has recurring problems with lag and dropped connections, so it would make a lot of people very happy to have an auto-drafter that makes vaguely intelligent choices when they can’t do it themselves.

Now, it can’t be too good, or you could start relying on it to do the draft for you. This could be a showstopper, since the outcome of a draft is supposed to depend on the players’ skill. (In fact, there’s a good chance that someone in Wizards has already had this conversation.) But maybe an auto-drafter that’s not a genius, but knows not to take the basic land and will generally pick playable cards in your colours, would be just enough of a safety net that people wouldn’t be driven to mass killings when their connection drops. Maybe.

I might put the code up later (after I do a cleanup and/or rewrite – it’s pretty ugly right now). Meanwhile, keep reading for the results of the re-draft.

Edit 2: I tried doubling, then quadrupling the effect of the already-drafted cards on the next pick. This puts nearly everyone solidly into two colours. (Unfortunately it leaves Mr Shellshear with an impressive blue-white deck, and anyone who remembers our first Lorwyn draft will realise what a bad idea that is. Apparently some people do just open all the good cards.) I’d post the new results, but I’m wary of picking at this too much, and would rather just leave the original results as they are until I get around to cleaning up the code.

### Can’t sleep

Was trying to get to sleep, got mentally worked up about something, and now I can’t. Sleep, that is. So here are some random thoughts about nothing.

The phrase “I don’t know” can be used in two completely different ways that are almost diametrically opposed in terms of what the speaker is implying. Observe:

1. “I don’t know why the sky is blue.”
2. “I don’t know why people like Britney Spears.”

There should be different words for these.

I’ve been messing around with a kd-tree to find nearest neighbours in a vaguely secret thing I’ve been working on (not very secret, or even interesting enough to justify calling it secret, which I’ve now done three times, so it’ll be really anti-climactic when I link to it, which I will as soon as it’s public). Wikipedia says this about the performance improvement for nearest-neighbour search:

These asymptotic behaviors only apply when N is much greater than the number of dimensions. In very high dimensional spaces, the curse of dimensionality causes the algorithm to need to visit many more branches than in lower dimensional spaces. In particular, when the number of points is only slightly higher than the number of dimensions, the algorithm is only slightly better than a linear search of all of the points.

I’m starting to suspect that this statement is way too weak. My rough back-of-the-mental-envelope conclusion is that the performance boost only happens when the number of points is much higher than two to the power of the number of dimensions. I’m not confident enough in this statement to edit the Wikipedia article, but I’d be interested if anyone can explain why I’m wrong.

Mentioned Richard Buckland, lecturer extraordinaire, in an earlier post. Last week I took a few hours off work one afternoon to wander into one of his first-year uni lectures. It turned out to be one of the few lectures from this year that’s made it onto YouTube. (They’re all supposed to go up at some point, but anyone who’s ever dealt with video files knows that they have a kind of inertia or built-in procrastination field or something.)

After the lecture (actually, after the extension lecture that comes after this), I chatted to him for a while about some stuff, which has been mulling over in my head since then, and which I’m not going to say much about right now (what the hell kind of an introverted blog post is this)… um… yeah. More on this soon, maybe.

Finally started watching Firefly earlier tonight. I feel like I’m getting my geek credentials back in order.

Picked up a cheap-ish tablet laptop in an online auction a few weeks ago. It’s ex-lease or something. Not the ThinkPad that I really wanted, but a good starting point considering that I’m not entirely sure why I wanted it at all (at least, now that my career as a webcomic artist has been definitively shut away in the Closet of Things that are Unlikely to Happen). Anyway, it is really, really cool.

I installed Xubuntu on it. Xubuntu has come a long way. I’m not entirely converted away from mainline Ubuntu yet, but for a lower-performance machine – and in a scenario where screen real estate is particularly valuable and can be conserved with a panel that behaves well on the side of the screen, which is an act Gnome hasn’t quite gotten together yet – it behaves very nicely indeed.

We did a Magic 2010 draft at Mr Shellshear’s place last Friday. For me, it really captured the feel of old-school Magic. I’ve enjoyed Alara block, but one thing that bugged me about it is that mono- or even dual-colour decks were a non-option. I’ve always thought that one of the big appeals of the Magic mana system is the ability to choose between the breadth of a multi-colour deck and the smooth focus of a mono deck. In fact, I think one of the crucial moments in many a beginning casual player’s Magic education is when he (or she… but let’s face it, it’s probably he – apologies to Michelle and Kat) manages to corral enough cards of the same colour to make up a deck.

For me, many years ago, it was a green deck. Its theme was the somewhat high-concept “every green card and forest I own” and not much beyond that, but the payoff of freedom from mana problems is an experience I’ve never forgotten.

Alara block’s three-colour theme was an interesting design area, and would seem to have been a popular one, but it turned mana fixing from optional to mandatory. It changed the game from one where multi-lands and whatnot were one of the ways of getting your spells out reliably – the other being to focus on one colour, which is an interesting, flavourful, and personality-reflecting decision – to one where they were the only way. That took something away from the game in my mind, and I’m happy to be back on solid monocolour ground again.

Okay, I think I’m out of random thoughts, and am starting to get vaguely sleepy. G’night all.

### Fractal landscape

Time for an algorithmic interlude.

I’ve been playing with some code to generate a fractal landscape, and it had a flaw in it that was one of the more insidiously subtle bugs I’ve come across.

The classic way to do a fractal landscape is the diamond-square algorithm, but (for reasons I’m not revealing for now… ooooh, drama) I decided to do something different. The high-level algorithm is this:

• Start with a small grid (say, 10×10) of uniformly random values. (The value will be the height of the landscape at that point.)
• Repeat until big enough:
• Expand the grid horizontally by one square as follows:
• For each row, insert a new, empty square at a random location. Shift the squares after it to the right.
• Set each new square’s value to the average of the four neighbouring squares, plus or minus a small random jitter. The landscape should be a torus (i.e. the right side wraps around to the left, and the top to the bottom, so that it can be tiled), so include squares on the opposite edge in the average if necessary. Disregard any neighbours that were newly created in the previous step.
• Do the same thing vertically.

There’s some extra tweaking and smoothing and whatnot, but that’s the basic idea.

Now… what’s the flaw? There’s a prize* for the first person to find it.

* Prize not guaranteed to have material value or physical existence. Prizes will only be awarded for the particular flaw I have in mind, not any other flaw, whether I’d noticed it beforehand or not.