What you miss when you plot “best so far”

I spend a lot of time work­ing in a field other peo­ple call “Genetic Pro­gram­ming”, though I pre­fer to call what I do “not rely­ing quite so much on received wis­dom and dog­matic habits regard­ing data and mod­el­ing.” “Genetic Pro­gram­ming” is admit­tedly shorter; then again, every real project has mul­ti­ple objectives.

And that’s a nerdy in-​​joke right there. It’s about multiple-​​objective (or multi-​​criterion) search. I’ll tell you about it another day. Every time I make a nerdy joke, or you hear a nerdy joke, that you don’t under­stand, you write it down and I’ll tell you what it means. Then you mem­o­rize it and soon you’ll have a decent arma­men­tar­ium of nerdy jokes. Oh, the adven­ture of mold­ing lit­tle new nerds!

So any­way, the dis­tinc­tion between the names we use is not spe­cious. Very Smart Peo­ple do Genetic Pro­gram­ming. I am stu­pid, and also a fool, and for­get­ful, and have some kind of amne­siac fault in my head that makes it so when­ever I do an exper­i­ment I some­how for­get all the stuff that nor­mal peo­ple are able to assume is true about the sys­tem and their tools and their habits and their data.

This is a seri­ous hand­i­cap. It means I am forced to con­stantly peek at things every­body else is able to take for granted. I plot charts of mea­sures that nobody else both­ers to plot; I write tests for code that every­body knows will just work; I refuse to trust other people’s libraries if they don’t have tests because me, I am so dumb; I emit a series of plain­tive “Why?” noises when­ever I read other people’s preprints or pub­lished papers. And lord help the speaker at any con­fer­ence I attend, me with my “two ques­tions” I always ask after every god­damned one of them when all they really need is a drink and a chair.

Poor Bill. You’ll have to for­give him—he has a con­di­tion.

To the point: In the field of Genetic Algo­rithms, and by (spe­cious) descent in Genetic Pro­gram­ming and more gen­er­ally in Machine Learn­ing, the Very Smart Peo­ple have some habits I’m unable to save in my faulty long-​​term habit cache.

One of the most com­mon is The Plot of the Best So Far.

In Machine Learn­ing projects (includ­ing Genetic Algo­rithms and Genetic Pro­gram­ming ones), what you often have when you start is just a bunch of exam­ples of some­thing. Then you run a pro­gram that takes those exam­ples as inputs and chugs away and finally poops out a model of the structure:function rela­tions to be found within those exam­ples. The algo­rithm does magic think­ing stuff, and it knows what you want, and it looks at the data for you. For cul­tural rea­sons Machine Learn­ing (and GA/​GP) folks talk about this process as “search” rather than “guess­ing a lot of times”, but it is nonethe­less a fact that most algo­rithms really are just gen­er­at­ing a series of increas­ingly biased guesses.

With few excep­tions, Machine Learn­ing (and GA/​GP) algo­rithms start with a cou­ple of dumb guesses, then “look” at those and try to “learn” how to make more cun­ning guesses over time. They guess and guess and guess, until even­tu­ally they either start being bor­ing, or they “prove” in some very com­pli­cated but author­i­ta­tive way that the final guess is the absolute best that could ever be found.

As a side effect of this iter­a­tive process, they pro­duce not only a series of guesses, but of neces­sity those guesses are asso­ci­ated with a series of scores. No mat­ter what they’re up to or when they quit, Machine Learn­ing algo­rithms are pretty much all about mov­ing away from the bad-​​scoring ini­tial dumb guesses, and towards the better-​​scoring sort of well-​​considered and ele­gant and refined models.

Bear with me. Because of my per­sis­tent trou­bles, I have to remind myself of this.

So any­way, this is how Machine Learn­ing works: Very Smart Peo­ple who don’t have my prob­lem are handed a big pile of exam­ples col­lected by a “domain expert”, and they know already what it is they want to do for mod­els. (I feel like a blind man explain­ing color when I say they “know already”.) And then they just down­load and run and report the ele­gant and refined mod­els they get at the end of run­ning a “search”. I mean they start a com­puter run­ning, and move the lap­top off their legs so their thighs don’t burn while it’s guess­ing, and when the guess­ing is done they will have a very nice model of the exam­ples, that has a good score. This very nice model is handed to the domain expert whose exam­ples were dumped into the hop­per. In my expe­ri­ence, the domain expert is typ­i­cally very happy to have some­thing more explana­tory than the huge pile of vir­tual post-​​it notes and dumb guesses they started with.

Very Smart Peo­ple who write Machine Learn­ing algo­rithms go through a bit more effort. For them, the moti­va­tion for writ­ing new Machine Learn­ing algo­rithms is to show that the series of guesses pro­duced by an old, obso­lete algo­rithm any fool can down­load as an R library won’t get to the ele­gant and refined scores as fast as the new one they thought up. So Very Smart Peo­ple who write Machine Learn­ing algo­rithms like to run a lot of horse races.

What’s a horse race? Well, first you need a pile of exam­ples from a domain expert. Then you take Bad Old Machine Learn­ing Algo­rithm Num­ber One, and you start it run­ning, and instead of walk­ing away and wait­ing for it to be done, you write down the list of scores for the guesses it pro­duces along the way. And then you do the same for Excel­lent New Machine Learn­ing Algo­rithm Which You Wrote. And you show that ENMLAWYW is con­sis­tently bet­ter at reach­ing the best-​​sounding scores, or gets bet­ter faster, or more often gets bet­ter, or maybe has more some­thing else than BOMLANO. And then you pub­lish a paper!

Sucky old BOMLANO. Nobody uses it any more.

Now it’s fine to make the claim; that will almost always get you where you want to be. But in papers, it’s impor­tant to show and tell. So you need to make a draw­ing of some sort, the kind that fits into a two-​​column lay­out in a four-​​page paper with­out get­ting too squished. Alas, many Machine Learn­ing algo­rithms (even that crap BOMLANO) pro­duce a huge num­ber of scores as they guess towards suc­cess, and plot­ting those is messy and confusing.

So in your time-​​series plots of the horse race, you show the Best So Far. For sev­eral rea­sons: Plot­ting 50000 points or so is messy and con­fus­ing, espe­cially in a 5-​​cm square. To be hon­est most peo­ple don’t really care about your thing or what you claim to do; they just want to know you algo­rithm isn’t worse than BOMLANO (which they learned in grad­u­ate school so it has a spe­cial place in their habit caches), and so for them best at the end is important—hell, if they’re domain experts, they only want one model after all. But best at the end is a tricky thing, and really you want to show one of those other things I men­tioned: ENMLAWYW gets bet­ter faster than BOMLANO, or ENMLAWYW gets higher scores than BOMLANO, or ENMLAWYW isn’t as good as often as BOMLANO.

Notice that nobody really cares, beyond a few cranks like me, about any­thing but the One Final Answer a Machine Learn­ing algo­rithm pro­vides. Some rare advanced engi­neer­ing pro­fes­sors might some­times teach a sem­i­nar in multi-​​objective search, which (I said I’d explain) usu­ally return a whole bunch of high-​​scoring things that are dif­fer­ent from one another but “tied” in the sense the domain expert cares about. But even that is the final answer: it’s just a lot of them.

It’s stu­pid to care about what hap­pens before an algo­rithm is done. Every­body knows that stuff.

Woe for me. I’m a sick and con­fused man because of my habit prob­lem. It makes me look. Not only when I write a new algo­rithm, but when­ever I run one. Not only at the best scores so far, but at every score of every guess. Because I just fail every damned time at assum­ing I know what’s hap­pen­ing in there.

And every time I look, I am con­fused by what I see. All kinds of stuff, usually.

Here’s an exam­ple from yesterday.

I’ve been look­ing into some bioin­for­mat­ics data from Jason Moore’s lab. The pile of exam­ples here are SNP geno­type data col­lected from sick and healthy peo­ple, and the mod­els one wants are sup­posed to “explain” the rela­tion between those folks’ genomes and dis­ease state. I’m ini­tially rep­re­sent­ing the mod­els using a lit­tle domain-​​specific lan­guage I copied from some soft­ware they use, though I called the mod­els “Snip scripts” for rea­sons that aren’t impor­tant except to explain the plot title.

Now a fully able Genetic Pro­gram­ming prac­ti­tioner (and surely most Machine Learn­ing ones too) would be able to look at that prob­lem, and say “Aha!” or what­ever it is they say, and boot up R and make it go and not burn their knees, and they’d print the best model out on some kind of dot matrix printer (I sup­pose for dra­matic effect) and tear it off and hand it to Jason. And that would do.

But here’s what I do. I worry about how much bet­ter one kind of guess­ing (call it “dumb”) is, com­pared to another kind of guess­ing (call it “machine learn­ing”). So I have to check and see.

Here’s what you get, score-​​wise, when you make 100000 guesses. That is, in this dopey hand­i­capped algo­rithm, I picked 100000 ran­dom Snip scripts, scored them accord­ing to Jason’s instruc­tions, and plot­ted them as a time-​​series of eeny weeny lit­tle Xs. The scores range (in the­ory) between 0.5 and 1.0. (Don’t ask about the extra space on the y axis, OK? It’s for some­thing else.)

This is just the sort of thing I like to see. I’m so skit­tish, I need the reas­sur­ance that dumb guess­ing won’t work on a com­pli­cated prob­lem like this. And I also like to have some kind of “com­pared to what?” dis­tri­b­u­tion to com­pare against, when I do fancy machine learn­ing things, and there you got one. The dis­tri­b­u­tion is: pretty much some­where between 0.5 and 0.6 is all you’re gonna get. There’s also a lit­tle bit of struc­ture in there, like maybe some kind of hor­i­zon­tal strip­ing, but that might just be round­ing. Noth­ing fret­ful at all.

But I’m still a skit­tish fel­low with a habit prob­lem, and like I said I don’t really “do” Genetic Pro­gram­ming so much as poke around in the world of data and mod­els. Genetic Pro­gram­ming as such is actu­ally kind of com­pli­cated and Microsoft Wordish, to be hon­est: over the years Very Smart Peo­ple who write new algo­rithms have thrown a lot of junk in there, in the spirit of “bio­log­i­cal inspi­ra­tion” or some­thing like that. They wrote a lot of papers, so they’re pretty good, whereas I’ve writ­ten damn all for papers, so clearly I need to take it slow.

Now Genetic Pro­gram­ming is as a King among Meta­heuris­tics, and Dumb Guess­ing is a lowly ant in com­par­i­son. Being skit­tish, I decide yes­ter­day to make a lit­tle step and see what hap­pens when we approach the noble Genetic Pro­gram­ming. So I try hill-​​climbing.

Now hill-​​climbing is a time-​​honored straw man of a Machine Learn­ing algo­rithm. If you can’t fig­ure out the algo­rithm from the name, let me tell it to you right now: make a ran­dom guess, then make another ran­dom guess and keep the new one if it’s at least as good as the first one. If you want to see dumb hill-​​climbing in action, look at the plot I’ve already posted, and imag­ine that instead of throw­ing every guess away when I make a new one, I kept the best so far.

Him again.

If you peer at the guess­ing plot, you can see I could maybe have acci­den­tally found a best-​​so-​​far score of about 0.6 after 20000 guesses or so, and pretty much not improved that. Dumb hill-​​climbing is pretty dumb.

So yes­ter­day I think I can check off “dumb hill-​​climbing” as com­pletely entailed by my “dumb guess­ing” plot, and I have to ask: What’s my next step towards King Genetic Programming?

That would be “not-​​so-​​dumb hill-​​climbing”. Basi­cally the only dif­fer­ence between “dumb” and “not-​​so-​​dumb” is the way I gen­er­ate new guesses (and that phrase is in itself a deep les­son in Machine Learn­ing you should really take to heart): instead of replac­ing the entire guess, I’ll use what I already “know” to make an “informed” guess.

In this case, what I did was take these Snip scripts, which are strings of tokens, and replace some of the tokens with new randomly-​​selected ones. For rea­sons I don’t need to explain here, any string of Snip lan­guage tokens is a valid model, whether or not it’s a good or bad scor­ing one.

So I’ve got my next ten­ta­tive step in meta­heuris­tic space towards Genetic Pro­gram­ming (all kneel), which is this: I make a dumb guess, and I score it, and then I make a sec­ond one where I change some pro­por­tion p of the old script’s tokens to new ones, and I score that, and I keep the new one if its score is no worse than the old one’s.

Ah, but, but… what pro­por­tion p should I use? Crap, I have no damned idea—damn this bro­ken brain! If the num­ber is too high, I’m chang­ing every token so it’s back to dumb guess­ing; if the num­ber is too low, I’m not chang­ing hardly any tokens, and it’s like even dumber guess­ing where I keep look­ing at the same guess over and over. Ummm, some­where in between?

Now there are Very Smart Peo­ple who have explored this at length, but like I said I find I can’t read their papers with­out emit­ting a series of “Why?!” noises, so to avoid wak­ing up the dog from his nap I just tried a bunch. Like, all of them. I just started with a high num­ber (0.5), and cut it in half every once in a while until it got down to pretty much noth­ing, and then popped it back up to 0.5 again.

Do I know what’s hap­pen­ing? I do not.

So the I look at some runs. Here’s one. Ignore the lit­tle tracy lines on the bot­tom half, which I won’t bore you by explaining.

The eeny weeny “X” marks are once again the scores of every new guess I make. From left to right I’m mak­ing a guess, and then mak­ing a new guess based on that, and keep­ing the new guess if it’s at least as high-​​scoring as the old one, or the old one oth­er­wise. And because I don’t know what muta­tion rate (which is tech­ni­cally con­trol­ling “uni­form muta­tion”, mean­ing p indi­cates the chance that I replace any given token in the old script with a new ran­dom token) I start it ridicu­lously high, and grad­u­ally drop it to 0.0, and then pop it back up again to a high num­ber. That’s the “oscil­la­tion” you can see over time, even though I don’t trace the muta­tion rate itself on this plot.

You can see when the muta­tion rate is high, because the dis­tri­b­u­tion of “new mutant guesses” looks an awful lot like the dis­tri­b­u­tion we see in the “dumb guesses” plot we already talked about. And you can see when it’s small, because the dis­tri­b­u­tion of new guesses is pretty much “even dumber” by pick­ing the same damned script over and over—the best I’ve seen so far—so the lit­tle Xs clus­ter up near the plateaus.

But in between, stuff happens.

Now if I weren’t a hand­i­capped fel­low, I would have been con­tent to plot the best scores so far. I’d watch the trace pop up to 0.6 or so around 5000 guesses, and I’d be like “hey!”. I’d grin and watch the best score so far pop up again to 0.7! around 10000 guesses, and I’d be like “whoa!”. And I’d see it hop all the way up to around 0.8 and I’d be like “omg!” and I’d print it out on the Oki­data and tear it off with a flour­ish and hand it to the domain expert with a lit­tle dust­ing off of my hands to indi­cate how well I know it’s a job well done.

But I acci­den­tally plot­ted every score, not just the best ones. Well, no more acci­den­tally than I acci­den­tally wrote this long-​​winded expo­si­tion: it’s what I am forced to do by my faults. It’s not a choice: I was born this way.

As a result, I am given some puz­zles. Not just puz­zles, because there are the things I’ve already pointed out to you, like: I can infer some­thing from this plot of every­thing that you would never see if you just looked at the max­i­mum score over time. I can explain some­thing about the effect of muta­tion rate on the abil­ity of this guess­ing process to find improvements.

But there are puz­zles, too.

Do you notice, like I do, that there are an awful lot of dif­fer­ent high-​​scoring X marks, ones that are well out­side the expected dis­tri­b­u­tion of scores reached by “dumb guess­ing”? And do you also see that they only appear after a new best score has appeared?

Do you see the hor­i­zon­tal band­ing, over time, down in the mid­dle of all those scores? Remem­ber that those are still ran­dom vari­a­tions of the “cur­rent best”. Some­how ran­dom vari­a­tions tend to be extremely biased, score-​​wise, and pro­duce quite sim­i­lar func­tions even though their struc­tures must be sub­stan­tially different.

Do you see some­thing qual­i­ta­tively change, around 25000 guesses? There’s some­thing dif­fer­ent about the way the scores are clumped up around the best so far, and in a few other clumps some­what below that plateau. There’s also some­thing sub­tle about the way the plateau itself is com­posed, as though there were a lot of lit­tle incre­men­tal improve­ments over the last 20000 guesses.

Nei­ther of us would have seen those details, if I’d plot­ted “best score so far”. If I were a real, non-​​handicapped Machine Learn­ing prac­ti­tioner (note I don’t say Very Smart Per­son, because that’d be pre­ten­tious), that would be fine: my domain expert cus­tomers would be inter­ested in the one best-​​scoring answer at the end, not that wob­bly stuff in the mid­dle; my paper-​​reading cus­tomers would be inter­ested in the time-​​course of best-​​scoring answers com­pared to the well-​​known BOMLANO bench­mark, not all that not-​​quite-​​so-​​good junk that mud­dies the issue.

Hell, I only showed you one of five runs I did. The oth­ers look dif­fer­ent. Did I do those other runs Because Sto­chas­tic­ity, and I want to be extra-​​fancy and report a vari­ance? To make the case for my pro­nounce­ment of the Best Model Evar?

No chance; that’s the Very Smart way to work. I did five because I don’t even trust that one exam­ple is enough to show me every­thing I might stum­ble over. I’m a skit­tish fumble-​​thumbs when it comes to this stuff, and I want to know what actu­ally is going to hap­pen before I stride on into using Canned Algo­rithms (aka, “where angels fear to tread).

And I burned my knees, too, by the way. Seri­ously. This is a “note­book com­puter”, not a “lap­top”, and tech­ni­cally its CPU is not sup­posed to reach its 180°C sit­ting on your thighs.

So here’s the inter­est­ing thing, to me: What’s hid­den away in all that not-​​quite-​​so-​​good stuff? That is, what are the almost-​​as-​​good mod­els, and what can I learn from them about the sys­tem this data represents?

Very Smart Peo­ple have writ­ten, author­i­ta­tively, that the strength of mod­ern Machine Learn­ing algo­rithms is the degree to which they tac­itly embody that knowl­edge as they effi­ciently cap­ture infor­ma­tion in the exam­ples. Genetic Pro­gram­ming uses the Power of Bio­log­i­cal Inspi­ra­tion; other Machine Learn­ing algo­rithms use other pow­ers, like the Pow­ers of Def­i­nite Con­ver­gence and Ergodic Cov­er­age and Gra­di­ent Descent to cap­ture the same stuff. They’ve shown this through math­e­mat­ics (based on only a very few assump­tions), and through exper­i­men­ta­tion (based on a huge num­ber of bench­marks). They trust one another. This stuff is built right into R libraries and Math­e­mat­ica; it’s no longer sub­ject to ques­tion, honestly.

I’m not that smart. So I have to look. And as result I get dis­tracted: Isn’t it inter­est­ing that there is so much hid­den in that unex­plored detail? Who but the bro­ken shall attend to the oth­er­wise unremarked?

This is the sort of thing you’ll get told does not sub­stan­tively advance the field. You can’t write a paper about it, or even give a talk at a work­shop. The field has moved on, way far past this sort of pok­ing, on to other much Smarter things.

Which is one rea­son why I inces­santly claim to have no field. Me? I just answer ques­tions for peo­ple, and we explore together. I don’t have the right to tell them stuff.

Also, I have mis­placed the dra­matic dot matrix printer.

Today’s Academic Counterfactual Cultural Exploration (ACCE™)

I had the plea­sure (and honor) of vis­it­ing Jason Moore’s lab at Dart­mouth ear­lier this week, and giv­ing a lit­tle sem­i­nar ver­sion of some­thing big I’ve been work­ing on for the last a few months. More about that project in a few days; the visit helped clar­ify a num­ber of open ques­tions and focus atten­tion where it was needed.

This was my first “real” visit to an aca­d­e­mic envi­ron­ment in a few years—the sort where I’m not just lurk­ing in the back­ground and hang­ing out with my tenure-​​track friends. Indeed, the last time I did some­thing like this I think it was my 2008 visit to Nic McPhee at the Uni­ver­sity of Min­nesota at Mor­ris. Like Jason, Nic was also nice and help­ful, but UM Mor­ris a qual­i­ta­tively dif­fer­ent aca­d­e­mic cul­ture from that of the med­ical school at Dart­mouth. Both times I vis­ited mainly to observe the local work cul­tures, espe­cially look­ing at the col­lab­o­ra­tive net­work that con­nects stu­dents, fac­ulty and staff—within and between their respec­tive labs, depart­ments, dis­ci­plines and institutions.

I’ve been build­ing a cat­a­log of cul­tural and insti­tu­tional rou­tines and obsta­cles that side-track—and (often per­ma­nently) delay—potentially valu­able projects that could oth­er­wise be explored quickly. The same old ques­tion I always ask, more or less: What do you wish you had more resources to pursue?

Recently I’ve found a use­ful way to explore these rou­tines and obsta­cles is to dis­cuss lit­tle coun­ter­fac­tual sce­nar­ios and see what bub­bles to the sur­face. It can be an inter­est­ing way to sur­face trans­gres­sive behav­ior with­out actu­ally, you know, try­ing it out in real life.

Here’s a vari­ant that came to me as I stared out an air­plane win­dow recently:

Sup­pose a highly-​​respected but soon-​​to-​​retire researcher in Com­pu­ta­tional Phys­i­ol­ogy vis­its the salient depart­ment at Large Ivy Uni­ver­sity to give a sem­i­nar. As one comes to expect from a late-​​career lumi­nary, her talk tends a bit towards the philo­soph­i­cal, but it brings up a num­ber of inter­dis­ci­pli­nary ques­tions and uncon­ven­tional approaches to the con­struc­tion, use and study of Com­pu­ta­tional Phys­i­o­log­i­cal sys­tems. There’s a lot to think about, and a lot of mate­r­ial that most main­stream col­leagues just don’t run into very often.

After her sem­i­nar, she spends a day or two vis­it­ing her Host’s lab and a few of his col­le­gial LIU labs, chat­ting with staff, stu­dents, junior fac­ulty, and their var­i­ous Prin­ci­pal Inves­ti­ga­tors about their ongo­ing research and tech­nol­ogy, and com­par­ing notes on the inter­est­ing things that folks in other insti­tu­tions and dis­ci­plines have been doing.

As it devel­ops, she takes an inter­est in one of the ideas a grad­u­ate stu­dent brings up in pass­ing. The idea isn’t a part of the student’s the­sis research, nor is it even salient to the funded projects in any of the LIU Comp Phys labs. But it’s a good idea, and she decides it would be fas­ci­nat­ing to see how it would play out, and (even bet­ter) it’s a purely com­pu­ta­tional project that the vis­it­ing scholar real­izes could be done in a few weeks… by an agile team of soft­ware devel­op­ers. It wouldn’t need a grant or even a long plan­ning or pro­posal process to see what happens.

Nei­ther LIU nor the visitor’s home insti­tu­tion has any­thing like an “agile team of soft­ware devel­op­ers” as a component—hah! Not even a lit­tle bit. But in her increas­ing time spent “out in the world”, the vis­i­tor has actu­ally run into folks who have worked in those envi­ron­ments, and started to see the point of the var­i­ous “agile val­ues and practices”—at least as a kind of Utopian ideal.

Mind you, this idea isn’t any­thing com­mer­cial. But it’s a damned inter­est­ing project, and to be frank it would be a pity to see it delayed until the stu­dent grad­u­ates, and fin­ishes her post-doc(s), and gets done with tenure track, and so on and on.…

So the vis­i­tor chats online with a few peo­ple she knows, and they agree the project as sketched is a fea­si­ble way to spend about a month of work. Obvi­ously the stu­dent should have the lion’s share of aca­d­e­mic (and other!) credit if it goes for­ward. But the agile folks she chats with remind her that the point of the “one team” prac­tice is that the stu­dent prob­a­bly needs to be co-​​located with the team doing the work with her.

Alas, the stu­dent has a the­sis com­mit­tee meet­ing com­ing up shortly. She’s been asked by her com­mit­tee to work over the draft bib­li­og­ra­phy and bring it more in line with the stan­dards expected in the high-​​impact jour­nals in the field: get rid of those weird ref­er­ences from graph the­ory and ecol­ogy papers and add more from the mod­ern Comp Phys lit­er­a­ture, for example.

Noth­ing like this project has ever been in any of the Comp Phys jour­nals. It may not even catch on in the com­mu­nity, com­pared with the more obvi­ously recep­tive audi­ence over in Arti­fi­cial Men­ta­tion. But the AM folks have never even con­sid­ered Comp Phys as a domain where their stuff might be use­ful. It’s a blue-​​sky project, in that sense.

What has to hap­pen to get this work done? Does the stu­dent leave for a month? Does every­body wait until “it’s safe”? Does the student’s advi­sor col­lab­o­rate with the vis­i­tor on a grant, and use the funds to (even­tu­ally) fund an in-​​house (and almost cer­tainly inag­ile) devel­op­ment project that will take sev­eral years to do what might hap­pen in a month under other circumstances?

Who gets credit? The vis­i­tor wants the stu­dent to get essen­tially all of it. Does the student’s advi­sor get some? Under what circumstances?

Who gives per­mis­sion? Who needs to give per­mis­sion? The stu­dent should be work­ing on her the­sis. The advi­sor should be see­ing to his student’s pro­fes­sional track. And so on.

Who is a risk? What sort of risk?

An extract from The Last Lost World

As I’ve men­tioned, I’m read­ing and enjoy­ing Pyne & Pyne’s The Last Lost World, inso­far as it isn’t a “pop­u­lar­iza­tion” of Pleis­tocene pale­on­tol­ogy so much as it is a use­ful and well-​​built con­struc­tion com­bin­ing aspects of lit­er­ary crit­i­cism and sci­ence report­ing to that field. That is, in this book we’re actu­ally talk­ing about nar­ra­tives, and sur­fac­ing the ten­sion in sci­en­tific dis­course between the cre­ation of gen­eral robust facts and obser­va­tions as opposed to the con­tin­u­ously multi-​​scaled dynam­ics of the actual world: the ways in which a “species” becomes “real” for example.

Mid-​​book, I find the fol­low­ing lovely lit­tle pas­sage. In a sense it says: per­haps finally we can pro­ceed mind­fully. Maybe that’s what I’m ask­ing for when I harp so much and often about the lack of sci­ence (and please, some­day, engi­neer­ing) books like this one: that it is time now to be mind­ful of our roles in the world we cre­ate or discover.

It was how that trans­fig­u­ra­tion had hap­pened [from Dar­win to Neo­dar­win­ism] that per­haps holds the most inter­est. In con­clud­ing the Ori­gin of Species Dar­win imag­ined “a tan­gled bank” over­flow­ing with liv­ing forms yet orga­nized by dis­cernible laws, and while full of “grandeur,” a scene that did not result from a pre­formed pat­tern. Yet as Ernst Cas­sirer has argued, “Man can­not escape from his own achieve­ment.” Darwin’s tan­gled bank has been replaced by a “tan­gled web of human expe­ri­ence” that weaves together lan­guage, myth, art, reli­gion, and all the other strands of humanity’s “sym­bolic net.” That pecu­liar capac­ity of human thought remade Darwin’s tan­gled bank into a shelf of braided nar­ra­tives in which the entwin­ing of genomic and geo­graphic data had to play out over a cul­tural land­scape: that was where, to con­tinue the anal­ogy, the selec­tion would take place. The revival of neo-​​Darwinian con­cepts, how­ever, too often brought with it a neo-​​Darwinian sci­en­tism that failed to apply to its own inform­ing con­ceits the per­spec­tive it demanded of oth­ers. In par­tic­u­lar, it made Dar­win­ian evo­lu­tion an act of spe­cial creation.

It was a sim­plis­tic nar­ra­tive that assumed that ideas could be dis­cov­ered out of data the way bones could be found in sand­stone or tuff, and it viewed the progress of bio­log­i­cal sci­ence (and archae­ol­ogy) in a way par­ti­sans scorned when oth­ers applied it to their own fields. They did not appre­ci­ate the extent to which their explana­tory ideas, even the the­ory of organic evo­lu­tion, had a long his­tory, and that, like Equ­uus cabal­lus within the equids or Homo sapi­ens among the hominins, the idea was not the intended end prod­uct towards which all research had trended but the selected sur­vivor of ancient stock, a prod­uct of hap­pen­stance, his­tor­i­cal con­tin­gency, and use­ful­ness. Dis­ci­pli­nary his­to­ries tended to be tele­o­log­i­cal, as nar­ra­tive must be; the his­tory of the idea of evo­lu­tion was thus orth­o­genic in ways the theory’s advo­cates denounced when applied to nature.

Dar­win­ian evo­lu­tion was less a spe­cial cre­ation, the spark of a divine insight, than it was the rough, imper­fect, best adapted, use­ful, and can­tan­ker­ous out­come of a tedious and often errant chron­i­cle of obser­va­tions and imag­in­ings. It was a pow­er­ful idea, and once dis­cov­ered, des­tined (so it seemed to many) to ram­ify across whole con­ti­nents of learn­ing. It offered a promised con­silience, which could seem the apex to which all prior study had tended. But such appar­ent inevitabil­ity was an inher­ent con­struct of nar­ra­tive, and just as an organism’s traits are not intrin­si­cally bet­ter or worse but bet­ter or more poorly adapted to its set­ting, so it is with ideas. The evo­lu­tion­ary par­a­digm achieved much of its power and reach because it tapped into very old tra­di­tions of thought. Far from being a rad­i­cal inno­va­tion with­out prece­dent, Dar­win­ian evo­lu­tion had itself evolved by fits and starts out of one of the hoari­est con­cepts in West­ern civ­i­liza­tion, the Great Chain of Being.