The Mirror Dojo: Genetic Programming for Agile Teams

This is the current iteration of the Genetic Programming workshop née Agile Team Dojo I’ve been working on over the last few years.

I’m looking at the Michigan/Ohio/Indiana region for an interesting place to run it. If you’re interested in scheduling me for a two- or three-day workshop, feel free to contact me online.

You know how to do that.

The Mirror Dojo: Genetic Programming for Agile Teams

Genetic Programming has been actively researched and promoted for more than a quarter century. It’s a broad collection of design practices and modeling techniques for the “automatic” discovery of abstract patterns and structures.

And that means full-fledged patterns and structures: algorithms, predictive models, complete mechanical and optical and electronic designs, and even blue-sky artificial intelligence systems.

Some of the field’s big hits include:

Sexy stuff! Nerds like us love it.

Better yet: I can describe all the basic design principles of Genetic Programming in four sentences. It’s so simple to describe that I’m completely confident that I can help you you—a competent software developer working on an agile team—write a working GP system in an hour or so!

And that’s just what we’ll do in this two-day workshop.

But there’s one more thing.

You’ve probably noticed that I always put extra-scary scare-quotes around “automatic” whenever it comes up.

During this dojo we’ll be approaching this material as an agile team. We’ll build at least two full-featured genetic programming systems, and we’ll bump very quickly into those scare-quotes.

And that’s what this workshop is about.

See, I’ve been working in this field for most of 20 years. It turns out that even after all that time, there’s a large and troubling gap between the tutorials and demos of genetic programming, and successful problem-solving with GP. You can measure that gap in terms of time, or computational resources, or expected quality of results.

Sound familiar?

Much of the advanced academic research being done today in Genetic Programming focuses on ways to increase the computational power, to bring more processors and faster code to bear so that “automatic” problem-solving has a better “chance of success” on a complex problem.

Ah, look; more scare quotes.

See, in this workshop we’re not advanced academic researchers in Genetic Programming. We’re much better prepared than they are: we’re an agile team.

In the workshop we’re going to be exploring how to tell our little artificial “team” of “automatic developers” what it is we want, and how they should go about making it for us, and (because GP just works) they’ll be “releasing software” for us. What we’ll be doing is designing the rules by which they solve our problem: especially the ones that spell out how we want them to interact with one another.

Which should explain the name of this dojo. And maybe even why it will take little bit longer and a bit more effort than most others you’ll run into.

Scope: This is a two- or three-day workshop, for three to eight software developers, engineers, coaches, designers, scientists, and other nerds.

The majority of participants should be familiar with common platform-agnostic programming languages (Ruby, Python, Java, Smalltalk). They should be comfortable working in an agile team: we’ll collectively work in one shared programming language, and rely on automated unit and acceptance tests, rapid release schedules, agile planning, and pair programming. There should be enough laptops for every pair to code, and network connectivity enough to use github for version control and coordination.

On the first day of the workshop (6 hours plus lunch) we’ll establish the social infrastructure, and implement a simple but full-scale genetic programming system for symbolic regression. At the end of the day, we’ll choose an advanced project for the next day.

Because we’re all nerds, you and I both know you can’t “stop working” after just five or six hours—and that’s fine. But the work day for the project is six hours plus lunch. So no commits overnight!

On the second day (6 hours plus lunch) , we’ll use genetic programming to address a technical problem where results are obviously practical—and probably publishable.

In three-day workshops, the final day can be used (at the team’s discretion) for either refinement and public release of tangible product from the prior days, or for a third project using different GP design patterns.

Why?: The dojo is just what it says: an exposure for agile software developers to a sexy but poorly-understood technical practice with great economic potential in the coming years. At the same time they’re learning about the tech, they’ll be surfacing aspects of their own work, and the way agile practices mold project management in the “real world”: requirements, goal-settings, information-sharing, metrics, collaboration patterns, infrastructure, delivery schedules, and even the jurisdiction of management vs. developers.

Cost: The most important cost for this exercise is the participants’ interest and attention. If those have been made available, the only financial costs are for the venue, travel, food and board (where needed) for the participants.

Notes for a CodingDojo, 3x power

I’m thinking this will want to be two or three consecutive weeks of our Ann Arbor-based Tuesday evening “Craftsman Guild” meeting, but I could be convinced to run it on a couple of consecutive weekends, as well. The gaps in between sessions are actually useful, since I think they give folks a chance to think about stuff that should be bothering them as we proceed.

Developing an awesome Genetic Programming system… from scratch

The Point:

We often run these agile coding exercises as if user stories and acceptance tests drop from the sky. In real projects, they’re typically the biggest source of confusion and pain—even in projects we’re working on by ourselves. The subject matter we’ll explore here, Genetic Programming, is hugely sexy, technically simple, and offers only trivial coding challenges.

You might wonder why so few people use it, then, after 20 years. Why it hasn’t changed the world and made artificial intelligence part of our everyday lives.

The answers to those questions have nothing to do with the computer.

The Structure:

Two or three sessions, each about 2 hours.

We’ll run the sessions in a CodingDojo format, much like the “coding randori” we’ve seen in earlier CraftsmanGuild meetings, where there’s one “driver” and one “navigator” pairing on a laptop connected to a projector, with the entire “audience” helping them along the way as they write code (and do chores).

If it seems practical in a later session, we may split into two teams (still with one customer and one project).

I’ll role-play “the customer representative” for a customer who’s off-site, with the rest of the group acting as “the dev team”.

In addition to the coding computer, we’ll set up a projector up showing a live PivotalTracker instance where we can collect, sort and make progress on stories as an integral part of the development process.

During the first iteration we’ll decide on language and infrastructure, based on who’s there and what they want and know.

As code is written it’ll be committed to the github project (so the audience can fork it and work along), but we’ll have formal review sessions with “the customer” accepting or declining particular solutions after every iteration, looking at the stories we worked on and gathering new ones as they crop up.

Participants:

…should have used some modern testing framework, but they don’t need to be experts (or evangelists) at either TDD or BDD. They should be comfortable, but don’t need to be fluent, in at least one modern programming language like Java, Ruby, Python, &c. They should at least have looked at pivotaltracker.com to familiarize themselves with the feature set and story-sorting idiom.

The language we pick should be the one which most participants are most comfortable using when they do real work. Whatever language and infrastructure we decide on, it shouldn’t be an obstacle to take a simple user story like “Adding two numbers together should return their sum” and actually write the acceptance test, and then run it.

The Project:

The Customer’s overall goal is to build a Genetic Programming system that can accept a set of (x, y) data, a set of mathematical primitives, and will evolve mathematical equations of the form y=f(x) that fit the data. Here’s an (antique!) Java applet that does something along those lines already.

This sort of GP project typically breaks down into five chunks:

  • build a simple but full-featured interpreter for a domain-specific language (DSL) intended for mathematical modeling
  • build an evaluator that determines how well an arbitrary DSL script matches target data
  • write methods to create random programs, and also mutate and cross over DSL scripts
  • build a simple symbolic regression system that fits numerical data with arbitrary mathematical models
  • adapt to some minor problems that may arise along the way

These may sound like big, ambitious steps, but in fact they’re all technically simple.

The goal of the dojo is not to learn to type something quickly or get as much done as possible.

It’s designed so we adapt the emerging codebase and our collective understanding of the problem the customer is asking for, in a context where there are no “tricks” (I won’t be lying to you, except maybe by omission), but where there are plenty of traps.