Search algorithms

Cross-​​posted to the Nudge blog

There are really three basic genetic pro­gram­ming algo­rithms I like to use out of the box. One’s stu­pid, one’s (almost) stan­dard, and one’s pretty fancy and effective.

Start with one nugget of assump­tion: Every search algo­rithm should be multi-​​objective. Period. No excep­tions. You do a single-​​objective search on any real-​​world prob­lem, you’re either lazy or a liar. The end, no excep­tions. I don’t care if you’re an Oper­a­tions Research guru or a lone stock trader look­ing to make a buck, no prob­lem is single-​​objective.

So when I say “bet­ter” in any con­text, I am talk­ing about Pareto dom­i­na­tion, or if you’re an econ­o­mist kinda girl, “rel­a­tive Pareto effi­ciency”. Say you have two alter­na­tives, A and B, and you’re decid­ing between them on the basis of two min­i­miza­tion objec­tives, w1 and w2. A is dom­i­nated by B if w1(B)<w1(A), and w2(B)<w2(A). B is dom­i­nated by A if w1(A)<w1(B), and w2(A)<w2(B). Nei­ther one dom­i­nates the other unless it is strictly bet­ter on all com­par­isons. Oth­er­wise, A and B are non­dom­i­nated by one another; they can’t be dif­fer­en­ti­ated in a mean­ing­ful way with­out say­ing which objec­tive is “more important”.

You want to sur­vive a car crash, or do you want to drive really really fast? Which is bet­ter? So, get that into your head. There is never one objec­tive. Never.

Let’s make sure we under­stand why there are always mul­ti­ple objec­tives in GP runs. You want accu­racy, right? But you also want par­si­mony. You’re evolv­ing mod­els, solu­tions to prob­lems, func­tions, struc­tures. Things can get hairy very quickly in a run­away evo­lu­tion­ary world: pro­grams can get super­sti­tious, they can inherit weird cultish quirks, like learn­ing to use “Cosine(0)” or Sqrt(Sqrt(Sqrt(Sqrt(Sqrt(8.9912)))) instead of “1”, if some for­got­ten ances­tor hap­pened to find that first and passed it on as Big Math Juju. So as God of Your Lit­tle Engi­neer­ing Prob­lem trust me: you hon­estly want to reach down out of the clouds and say, “Hey, guys, chill. Remem­ber, we use our human-​​readable voice when we model, OK? Two com­mand­ments here: accu­rate, and succinct.”

So there’s an intrin­sic rea­son right there why you want to min­i­mize model error and model com­plex­ity. At least two objec­tives. Always.

Because there’s always a less effi­cient way to solve any prob­lem with­out sac­ri­fic­ing accuracy.

Any­way, those search algorithms.

Stu­pid” is ran­dom search:

  1. Cre­ate a ran­dom pro­gram, and eval­u­ate it
  2. Cre­ate a new ran­dom pro­gram, and eval­u­ate that
  3. Throw away the one that’s dom­i­nated, or the newer one if nei­ther is dominated
  4. Go to 2

Seem so ridicu­lous it’s not worth con­sid­er­ing? Not really. It’s essen­tially the worst you could pos­si­bly do, know­ing noth­ing about your search prob­lem. So if noth­ing else, it pro­vides a use­ful base­line, as well as a nice test of your ran­dom­iza­tion meth­ods, eval­u­a­tor tim­ing, and other infra­struc­ture that almost any search algo­rithm will share with this dumb-​​as-​​a-​​sack-​​of-​​hammers approach.

Plus: Easy to explain? Damn straight. Always run ran­dom search on your prob­lem, with your rep­re­sen­ta­tion, full-​​fledged eval­u­a­tion, and data col­lec­tion sys­tem in place.

(Almost) stan­dard”? Steady-​​state Pareto tour­na­ment selec­tion works for me for a num­ber of rea­sons, at least as a first approach:

  1. Cre­ate a pop­u­la­tion of, oh I don’t know, 500 ran­dom programs
  2. Select a tour­na­ment of maybe, ummm, 10? yeah 10 pro­grams from the pop­u­la­tion, uni­formly at ran­dom, and eval­u­ate them
  3. Iden­tify the non­dom­i­nated “best” pro­grams among them, and the dom­i­nated not-​​so-​​best ones, and split them into two groups
  4. If there are no dom­i­nated ones, pick one of the “best” pool at ran­dom and demote it out of spite. If there’s only one “best” pro­gram, it’s lonely, and inbreed­ing is bad, so pro­mote one of the lesser ones out of pity.
  5. Replace all the pro­grams in the not-​​so-​​best pool with off­spring of the non­dom­i­nated “best” pool
    • Pick two par­ents, with replace­ment, from the non­dom­i­nated set
    • Use one-​​point crossover to make offspring
    • Mutate the babies slightly, here and there, to taste
  6. go to 2

OK, I con­fess. This is stan­dard in almost no ways what­so­ever. It’s a “stan­dard” toolkit for me, sure, but it’s a pretty far cry from Koza’s clas­sic lock­step population-​​based meth­ods, the kind every­body and their brother has copied whole­sale. I like it any­way: it gives rapid feed­back on progress, it’s kindof elit­ist (except for that “spite” step), it saves a lot of search time by lim­it­ing expen­sive mul­ti­ob­jec­tive com­par­isons to rel­a­tively small pools, it works with fit­ness caching. It’s not per­fect, but I already linked to the NFL Wikipedia page, so what are you, so dumb you can’t read? Nothing’s perfect.

Fancy and effec­tive? I really like the Pare­toGP method from Mark Kotanchek and Guido Smits (and their colleagues):

  1. Cre­ate an Archive of 500 ran­dom programs
  2. Cre­ate a Pop­u­la­tion of 500 ran­dom programs
  3. Cre­ate a con­tainer for the Next Gen­er­a­tion (an empty list; hold onto that)
  4. Select a tour­na­ment of 20 Archive pro­grams with uni­form prob­a­bil­ity, and eval­u­ate them, and for­get about the dom­i­nated ones. Just make a list of the non­dom­i­nated ones from the Archive tournament.
  5. Select a tour­na­ment of 20 Pop­u­la­tion pro­grams with uni­form prob­a­bil­ity, and eval­u­ate them, and for­get about the dom­i­nated ones. Just keep the sub­set of non­dom­i­nated ones from the Pop­u­la­tion tournament.
  6. For each Pop­u­la­tion tour­na­ment win­ner you got from step 5, pick an Archive tour­na­ment win­ner you kept from step 4, at ran­dom. Cross them over to cre­ate an off­spring, maybe with some muta­tion if you must.
  7. Stick the baby into the Next Gen­er­a­tion. If you have 500 pro­grams in the Next Gen­er­a­tion, replace the Pop­u­la­tion with it and go to 3. If you’ve run out of tour­na­ment win­ners before fill­ing the Next Gen­er­a­tion, just go to 4. If you choose to attack the Orc, turn to page 88.
  8. You’re going to be won­der­ing when you stop cycling the Next Gen­er­a­tions back into the Pop­u­la­tion, aren’t you? Oh, I dunno. Pick a num­ber. Say “10”. Go on, say it. There: do it ten times. Or more. Or less. What­ever; it’ll work.
  9. Aha! But then: once you’ve done your ten renewals of the Pop­u­la­tion (what Kotanchek and such call a “cas­cade”), you’re not done. Oh, no, boyo. You get your butt up there and you start the whole thing, all over again at step 2. That’s right. Another cas­cade, from another new, ran­dom Pop­u­la­tion, and another ten gen­er­a­tions or so. But—but—before you do, take the final Pop­u­la­tion, the best best best of that cas­cade you just fin­ished, and you stick it into the Archive. Add it to the pro­grams that were already there, and take the time to trim the Archive back down to 500 pro­grams. You can do a full-​​fledged 1000-​​wise Pareto tour­na­ment if you have the time, or you could do some other cun­ning selec­tion method if you want. Often I tend to do some­thing quick and dirty, like just pick­ing tour­na­ments of 20 and culling all the dom­i­nated ones until the Archive size drops back down to 500. So, any­way, you were going back to Step 2 to run another cascade.
  10. Do that until you’re bored.

Now this is a close approx­i­ma­tion of the Pare­toGP algo­rithm described in a num­ber of papers by Mark Kotanchek, Guido Smits and Katya Vladislavl­eva. It’s a broadly cus­tomiz­able frame­work, and if it seems like a lot of extra work com­pared with the “sim­pler” GP algo­rithms described above… well, that may be true. I like it very much because it exposes exploita­tion (via the Archive) and explo­ration (via the Pop­u­la­tion) as almost sep­a­ra­ble processes. You feel like things are get­ting stuck, let the Pop­u­la­tion grow a bit and do more ran­dom restart­ing and exploratory search for weird alter­na­tives; you feel like things are get­ting too wild, you do fewer, shorter cas­cades and keep a big­ger Archive, really nail down the vari­a­tions on the already-​​discovered themes you’ve col­lected there. Play around, fid­dle the knobs, see what you get.

And that’s the point, of course. There is no “best” search algo­rithm. Don’t make me link to that paper again.

You have to pay atten­tion. You have to bring to bear a vast com­bi­na­to­r­ial arma­men­tar­ium. You have to keep a seven-​​drawer tool­box in the vir­tual garage, with a hun­dred dif­fer­ent wrenches in it, and then look at your prob­lem and lis­ten to your algo­rithm purring along, and reach into a drawer with­out look­ing and find just the right tool, and give that algo­rithm a lit­tle tight­en­ing here, a lit­tle shimmy there, grind a bit off the edge, take it apart and put it together again slightly bet­ter. With grease.

See, it’s not about solv­ing your prob­lem as fast as pos­si­ble. It’s about watch­ing it improve, and hav­ing the sense to see it improv­ing. About hav­ing the tools on hand and the expe­ri­ence under your hat so you can inter­vene in a rea­son­able and effec­tive way.

So, yes: there are other ways. None are best.