Cross-posted to the Nudge blog
There are really three basic genetic programming algorithms I like to use out of the box. One’s stupid, one’s (almost) standard, and one’s pretty fancy and effective.
Start with one nugget of assumption: Every search algorithm should be multi-objective. Period. No exceptions. You do a single-objective search on any real-world problem, you’re either lazy or a liar. The end, no exceptions. I don’t care if you’re an Operations Research guru or a lone stock trader looking to make a buck, no problem is single-objective.
So when I say “better” in any context, I am talking about Pareto domination, or if you’re an economist kinda girl, “relative Pareto efficiency”. Say you have two alternatives, A and B, and you’re deciding between them on the basis of two minimization objectives, w1 and w2. A is dominated by B if w1(B)<w1(A), and w2(B)<w2(A). B is dominated by A if w1(A)<w1(B), and w2(A)<w2(B). Neither one dominates the other unless it is strictly better on all comparisons. Otherwise, A and B are nondominated by one another; they can’t be differentiated in a meaningful way without saying which objective is “more important”.
You want to survive a car crash, or do you want to drive really really fast? Which is better? So, get that into your head. There is never one objective. Never.
Let’s make sure we understand why there are always multiple objectives in GP runs. You want accuracy, right? But you also want parsimony. You’re evolving models, solutions to problems, functions, structures. Things can get hairy very quickly in a runaway evolutionary world: programs can get superstitious, they can inherit weird cultish quirks, like learning to use “Cosine(0)” or Sqrt(Sqrt(Sqrt(Sqrt(Sqrt(8.9912)))) instead of “1”, if some forgotten ancestor happened to find that first and passed it on as Big Math Juju. So as God of Your Little Engineering Problem trust me: you honestly want to reach down out of the clouds and say, “Hey, guys, chill. Remember, we use our human-readable voice when we model, OK? Two commandments here: accurate, and succinct.”
So there’s an intrinsic reason right there why you want to minimize model error and model complexity. At least two objectives. Always.
Because there’s always a less efficient way to solve any problem without sacrificing accuracy.
Anyway, those search algorithms.
“Stupid” is random search:
- Create a random program, and evaluate it
- Create a new random program, and evaluate that
- Throw away the one that’s dominated, or the newer one if neither is dominated
- Go to 2
Seem so ridiculous it’s not worth considering? Not really. It’s essentially the worst you could possibly do, knowing nothing about your search problem. So if nothing else, it provides a useful baseline, as well as a nice test of your randomization methods, evaluator timing, and other infrastructure that almost any search algorithm will share with this dumb-as-a-sack-of-hammers approach.
Plus: Easy to explain? Damn straight. Always run random search on your problem, with your representation, full-fledged evaluation, and data collection system in place.
“(Almost) standard”? Steady-state Pareto tournament selection works for me for a number of reasons, at least as a first approach:
- Create a population of, oh I don’t know, 500 random programs
- Select a tournament of maybe, ummm, 10? yeah 10 programs from the population, uniformly at random, and evaluate them
- Identify the nondominated “best” programs among them, and the dominated not-so-best ones, and split them into two groups
- If there are no dominated ones, pick one of the “best” pool at random and demote it out of spite. If there’s only one “best” program, it’s lonely, and inbreeding is bad, so promote one of the lesser ones out of pity.
- Replace all the programs in the not-so-best pool with offspring of the nondominated “best” pool
- Pick two parents, with replacement, from the nondominated set
- Use one-point crossover to make offspring
- Mutate the babies slightly, here and there, to taste
- go to 2
OK, I confess. This is standard in almost no ways whatsoever. It’s a “standard” toolkit for me, sure, but it’s a pretty far cry from Koza’s classic lockstep population-based methods, the kind everybody and their brother has copied wholesale. I like it anyway: it gives rapid feedback on progress, it’s kindof elitist (except for that “spite” step), it saves a lot of search time by limiting expensive multiobjective comparisons to relatively small pools, it works with fitness caching. It’s not perfect, but I already linked to the NFL Wikipedia page, so what are you, so dumb you can’t read? Nothing’s perfect.
Fancy and effective? I really like the ParetoGP method from Mark Kotanchek and Guido Smits (and their colleagues):
- Create an Archive of 500 random programs
- Create a Population of 500 random programs
- Create a container for the Next Generation (an empty list; hold onto that)
- Select a tournament of 20 Archive programs with uniform probability, and evaluate them, and forget about the dominated ones. Just make a list of the nondominated ones from the Archive tournament.
- Select a tournament of 20 Population programs with uniform probability, and evaluate them, and forget about the dominated ones. Just keep the subset of nondominated ones from the Population tournament.
- For each Population tournament winner you got from step 5, pick an Archive tournament winner you kept from step 4, at random. Cross them over to create an offspring, maybe with some mutation if you must.
- Stick the baby into the Next Generation. If you have 500 programs in the Next Generation, replace the Population with it and go to 3. If you’ve run out of tournament winners before filling the Next Generation, just go to 4. If you choose to attack the Orc, turn to page 88.
- You’re going to be wondering when you stop cycling the Next Generations back into the Population, aren’t you? Oh, I dunno. Pick a number. Say “10”. Go on, say it. There: do it ten times. Or more. Or less. Whatever; it’ll work.
- Aha! But then: once you’ve done your ten renewals of the Population (what Kotanchek and such call a “cascade”), you’re not done. Oh, no, boyo. You get your butt up there and you start the whole thing, all over again at step 2. That’s right. Another cascade, from another new, random Population, and another ten generations or so. But—but—before you do, take the final Population, the best best best of that cascade you just finished, and you stick it into the Archive. Add it to the programs that were already there, and take the time to trim the Archive back down to 500 programs. You can do a full-fledged 1000-wise Pareto tournament if you have the time, or you could do some other cunning selection method if you want. Often I tend to do something quick and dirty, like just picking tournaments of 20 and culling all the dominated ones until the Archive size drops back down to 500. So, anyway, you were going back to Step 2 to run another cascade.
- Do that until you’re bored.
Now this is a close approximation of the ParetoGP algorithm described in a number of papers by Mark Kotanchek, Guido Smits and Katya Vladislavleva. It’s a broadly customizable framework, and if it seems like a lot of extra work compared with the “simpler” GP algorithms described above… well, that may be true. I like it very much because it exposes exploitation (via the Archive) and exploration (via the Population) as almost separable processes. You feel like things are getting stuck, let the Population grow a bit and do more random restarting and exploratory search for weird alternatives; you feel like things are getting too wild, you do fewer, shorter cascades and keep a bigger Archive, really nail down the variations on the already-discovered themes you’ve collected there. Play around, fiddle the knobs, see what you get.
And that’s the point, of course. There is no “best” search algorithm. Don’t make me link to that paper again.
You have to pay attention. You have to bring to bear a vast combinatorial armamentarium. You have to keep a seven-drawer toolbox in the virtual garage, with a hundred different wrenches in it, and then look at your problem and listen to your algorithm purring along, and reach into a drawer without looking and find just the right tool, and give that algorithm a little tightening here, a little shimmy there, grind a bit off the edge, take it apart and put it together again slightly better. With grease.
See, it’s not about solving your problem as fast as possible. It’s about watching it improve, and having the sense to see it improving. About having the tools on hand and the experience under your hat so you can intervene in a reasonable and effective way.
So, yes: there are other ways. None are best.