Cross-posted to the Nudge blog
There are really three basic genetic programming algorithms I like to use out of the box. One’s stupid, one’s (almost) standard, and one’s pretty fancy and effective.
Start with one nugget of assumption: Every search algorithm should be multi-objective. Period. No exceptions. You do a single-objective search on any real-world problem, you’re either lazy or a liar. The end, no exceptions. I don’t care if you’re an Operations Research guru or a lone stock trader looking to make a buck, no problem is single-objective.
So when I say “better” in any context, I am talking about Pareto domination, or if you’re an economist kinda girl, “relative Pareto efficiency”. Say you have two alternatives, A and B, and you’re deciding between them on the basis of two minimization objectives, w1 and w2. A is dominated by B if w1(B)<w1(A), and w2(B)<w2(A). B is dominated by A if w1(A)<w1(B), and w2(A)<w2(B). Neither one dominates the other unless it is strictly better on all comparisons. Otherwise, A and B are nondominated by one another; they can’t be differentiated in a meaningful way without saying which objective is “more important”.
You want to survive a car crash, or do you want to drive really really fast? Which is better? So, get that into your head. There is never one objective. Never.
Let’s make sure we understand why there are always multiple objectives in GP runs. You want accuracy, right? But you also want parsimony. You’re evolving models, solutions to problems, functions, structures. Things can get hairy very quickly in a runaway evolutionary world: programs can get superstitious, they can inherit weird cultish quirks, like learning to use “Cosine(0)” or Sqrt(Sqrt(Sqrt(Sqrt(Sqrt(8.9912)))) instead of “1”, if some forgotten ancestor happened to find that first and passed it on as Big Math Juju. So as God of Your Little Engineering Problem trust me: you honestly want to reach down out of the clouds and say, “Hey, guys, chill. Remember, we use our human-readable voice when we model, OK? Two commandments here: accurate, and succinct.”
So there’s an intrinsic reason right there why you want to minimize model error and model complexity. At least two objectives. Always.
Because there’s always a less efficient way to solve any problem without sacrificing accuracy.
Anyway, those search algorithms.
“Stupid” is random search:
- Create a random program, and evaluate it
- Create a new random program, and evaluate that
- Throw away the one that’s dominated, or the newer one if neither is dominated
- Go to 2
Seem so ridiculous it’s not worth considering? Not really. It’s essentially the worst you could possibly do, knowing nothing about your search problem. So if nothing else, it provides a useful baseline, as well as a nice test of your randomization methods, evaluator timing, and other infrastructure that almost any search algorithm will share with this dumb-as-a-sack-of-hammers approach.
Plus: Easy to explain? Damn straight. Always run random search on your problem, with your representation, full-fledged evaluation, and data collection system in place.
“(Almost) standard”? Steady-state Pareto tournament selection works for me for a number of reasons, at least as a first approach:
- Create a population of, oh I don’t know, 500 random programs
- Select a tournament of maybe, ummm, 10? yeah 10 programs from the population, uniformly at random, and evaluate them
- Identify the nondominated “best” programs among them, and the dominated not-so-best ones, and split them into two groups
- If there are no dominated ones, pick one of the “best” pool at random and demote it out of spite. If there’s only one “best” program, it’s lonely, and inbreeding is bad, so promote one of the lesser ones out of pity.
- Replace all the programs in the not-so-best pool with offspring of the nondominated “best” pool
- Pick two parents, with replacement, from the nondominated set
- Use one-point crossover to make offspring
- Mutate the babies slightly, here and there, to taste
- go to 2
OK, I confess. This is standard in almost no ways whatsoever. It’s a “standard” toolkit for me, sure, but it’s a pretty far cry from Koza’s classic lockstep population-based methods, the kind everybody and their brother has copied wholesale. I like it anyway: it gives rapid feedback on progress, it’s kindof elitist (except for that “spite” step), it saves a lot of search time by limiting expensive multiobjective comparisons to relatively small pools, it works with fitness caching. It’s not perfect, but I already linked to the NFL Wikipedia page, so what are you, so dumb you can’t read? Nothing’s perfect.
Fancy and effective? I really like the ParetoGP method from Mark Kotanchek and Guido Smits (and their colleagues):
- Create an Archive of 500 random programs
- Create a Population of 500 random programs
- Create a container for the Next Generation (an empty list; hold onto that)
- Select a tournament of 20 Archive programs with uniform probability, and evaluate them, and forget about the dominated ones. Just make a list of the nondominated ones from the Archive tournament.
- Select a tournament of 20 Population programs with uniform probability, and evaluate them, and forget about the dominated ones. Just keep the subset of nondominated ones from the Population tournament.
- For each Population tournament winner you got from step 5, pick an Archive tournament winner you kept from step 4, at random. Cross them over to create an offspring, maybe with some mutation if you must.
- Stick the baby into the Next Generation. If you have 500 programs in the Next Generation, replace the Population with it and go to 3. If you’ve run out of tournament winners before filling the Next Generation, just go to 4. If you choose to attack the Orc, turn to page 88.
- You’re going to be wondering when you stop cycling the Next Generations back into the Population, aren’t you? Oh, I dunno. Pick a number. Say “10”. Go on, say it. There: do it ten times. Or more. Or less. Whatever; it’ll work.
- Aha! But then: once you’ve done your ten renewals of the Population (what Kotanchek and such call a “cascade”), you’re not done. Oh, no, boyo. You get your butt up there and you start the whole thing, all over again at step 2. That’s right. Another cascade, from another new, random Population, and another ten generations or so. But—but—before you do, take the final Population, the best best best of that cascade you just finished, and you stick it into the Archive. Add it to the programs that were already there, and take the time to trim the Archive back down to 500 programs. You can do a full-fledged 1000-wise Pareto tournament if you have the time, or you could do some other cunning selection method if you want. Often I tend to do something quick and dirty, like just picking tournaments of 20 and culling all the dominated ones until the Archive size drops back down to 500. So, anyway, you were going back to Step 2 to run another cascade.
- Do that until you’re bored.
Now this is a close approximation of the ParetoGP algorithm described in a number of papers by Mark Kotanchek, Guido Smits and Katya Vladislavleva. It’s a broadly customizable framework, and if it seems like a lot of extra work compared with the “simpler” GP algorithms described above… well, that may be true. I like it very much because it exposes exploitation (via the Archive) and exploration (via the Population) as almost separable processes. You feel like things are getting stuck, let the Population grow a bit and do more random restarting and exploratory search for weird alternatives; you feel like things are getting too wild, you do fewer, shorter cascades and keep a bigger Archive, really nail down the variations on the already-discovered themes you’ve collected there. Play around, fiddle the knobs, see what you get.
And that’s the point, of course. There is no “best” search algorithm. Don’t make me link to that paper again.
You have to pay attention. You have to bring to bear a vast combinatorial armamentarium. You have to keep a seven-drawer toolbox in the virtual garage, with a hundred different wrenches in it, and then look at your problem and listen to your algorithm purring along, and reach into a drawer without looking and find just the right tool, and give that algorithm a little tightening here, a little shimmy there, grind a bit off the edge, take it apart and put it together again slightly better. With grease.
See, it’s not about solving your problem as fast as possible. It’s about watching it improve, and having the sense to see it improving. About having the tools on hand and the experience under your hat so you can intervene in a reasonable and effective way.
So, yes: there are other ways. None are best.
A really cool summary — one of the best blog posts I’ve ever seen about GP, to be honest.
Some recent work by Stephen Dignum and Riccardo Poli shows that things like size limits actively speed up bloat — the average program sizes grow faster with limits, they just stop growing when they hit the limit. Do you know if anyone’s looked at the impact of the kind of pareto approaches you’re suggesting? My guess is that they wouldn’t have that affect (which would be cool), but no promises.
Bah. Don’t flatter me. And let’s take the discussion to the Nudge blog. Post future discussions there, anyway; the guys would all love to chat.
Seems to me you should talk with Guido and Katya about that question. I have a vague memory of her possibly doing some real experiments with ParetoGP size dynamics for her thesis. And I have vivid memories of Guido dismissing bloat out of hand, since it “never happens if you just maintain model complexity as a distinct objective.” I know they focus 99% on symbolic regression, so they’re firmly in your theorists’ space of Koza-style trees.
But you know all that, of course, having spent time sitting and talking with Kotanchek, Smits and all. So I’m not sure I follow your question. You asking whether it works?
I’d imagine theory is almost trivial: The expected offspring size change after crossover must surely decrease if there’s been selection pressure for smaller parents. Doesn’t that make sense?
Further, as the population matures there are small, high-error parents and large, low-error parents in the population (look at any of Kotanchek’s & Smits’s graphs, say in GPTP IV). Under those circumstances, I can’t see how crossover can ever exceed a certain steady-state limit. It’s just a matter of the innate tendency of unequal crossover to grow bigger code trees, offset by the innate selection for smaller ones.
Or are you asking something else I’m missing?
In any case, it’s worth a look. You wanna do it? We’ll help.
Excellent and very interesting read. I’ll be sure to try that out when I get the opportunity.
A few months ago I tried solving some small problem with what I remembered about GA’s and had no luck. After reading your article, I’ll be smarter the next time around. It really has been quite some time since I read about the subject.
Also, good reading links. You’ve been added to my rss reader
I’ve never been a fan of multi-objective functionals. In fact, I would say they are somewhat meaningless. How does one determine the weighting between any particular objective (usually completely subjective)? This is similar to adding apples and oranges. For a cost functional to make sense it must be composed of common units.
Adan,
Multiobjective optimization is, indeed, subjective. That’s the point. Any idiot who linearizes two objectives, sets up an affine weighting scheme that says “0.8 safety + 0.2 speed”, they’re… well, they’re an idiot. They’re stepping on the domain of the decision-maker: (a) they’re presupposing that the objectives are reconcilable (meaning, the solution set is convex, not concave) and (b) they’re presuming that externalities not captured by those apples and oranges won’t skew things in another direction. Many’s the time when a customer/decision-maker has said , “Give me all the optimal solutions, and I’ll pick the best one.” That’s not dumb on their part, it’s dumb on your part if you prematurely limit their choices.
Best always to keep all objectives orthogonal, and let the decision-maker take her pick.
So, in sum: You don’t determine weighting. Weighting is not your job, it’s a domain-specific expert’s job. Odds are any affine weighting or even nonlinear combination functional is in practice a lie or a misstep every time it’s assigned before searching. Weight after you search; let the decision-maker consider all the Pareto-optimal solutions, and take their pick.
Try it my way first, and see what clients—actual people making actual decisions—think. And maybe some of the things you’re assuming about multiobjective optimization are misapprehensions of the terminology, the methodology, and the intent. If a customer wants to optimize for both apples and oranges, then why insist they do the “simpler” thing and pick one, or make up some stupid fake thing that supposedly combines them? Why not just do the relatively easy math, use the correct algorithm for the problem at hand, and optimize for both objectives simultaneously?
I have a strong feeling your answer will involve fictions like “tractability” and “standard algorithms”. Which are, no matter how important one’s department is in a field, lazy BS.
That’s my opinion, of course. You’re welcome to carry on however you feel is appropriate for your customers’ needs.
lorg,
I wouldn’t call it an “article”. It’s a post in a rambling story that’s really happening over here. We’re building a[nother] open-source genetic programming system, and hopefully one that’s not as stupid as all the rest in the world, and maybe as useful eventually as the good ones like ECJ.
Point of order, though: Genetic Programming is not just a genetic algorithm. Genetic Programming is search through a set of algorithms, complex grammatical structures, or functions, not search through a parameter space. The “genetic” part of the title is essentially a red herring, too: what I described above is random search, and a couple of population-based metaheuristics that are somewhat kindof like GAs, but not really.
Have a look at Ricardo, Bill and Nic’s book [a free download], for more information on GP vs. GAs.
My position remains that advocating one way to perform all multiple objective optimisation problems is a gross simplification. It is better to understand what the decision maker wants and allow that to guide our problem solving approach.
From http://www.jstor.org/pss/2581556:
A Priori Preference Articulation: Combines all objectives into a single objective through the use of an aggregation function, turning the multi-objective problem into a single-objective problem to search.
Progressive Preference Articulation: May have partial preference information, and this preference information is adjusted as the search continues by interpreting the results of the search.
A Posteriori Preference Articulation: No preference information, instead the decision maker is presented with a set of candidate solutions (generated by some search process) to choose from.
If we know that the decision maker wants a specific trade-off it would be useless to search the extent of the Pareto front (wasted computation). Similarly if we don’t know what the decision maker wants it would be silly to suggest one specific trade-off as being ‘optimal’.
Nobody ever said anything about enumerating the entirety of the Pareto front. And did I say the decision-maker has no right to minimize computation effort (or clock time spent) as an objective?
“My position remains” is what’s throwing me off; did I come across as trying to convince you of something? Or are you chatting with Adan?
In any case, you’re saying things I agree with entirely, in a tone I’m not sure I get. Yes, you’re right. On all counts.
Daniel,
Please post only publicly readable links to scholarly works. JSTOR links are not readable outside the academy.
Pingback: things to look at (april 1st - april 8th) | stimulant - changing things around. . .
Pingback: Short Notes: Quantitative Hedge Funds, Google App Engine, DTH, itimes « Blue Screen Of Duds
Hi Bill,
I think your comment about convexity/concavity of the solution set is a bit of the mark. Using MDL type of arguments, you can actually show that the front of non-dominated solutions for ‘log-likelihood’ type of error functions is necessarily convex. I.e., you will try to minimize
error + constant * complexity
or alternatively (as a likelihood function):
p(D | M) * P(M)
(where ‘constant’ is hidden as an exponent in the size prior in P(M)).
Of course, fixing the constant before the run is no solution as you state, you need a multi-objective search to find the trade-off. However, strict Pareto dominance is a bit of overkill in this situation, as you can use convexity to cull the solution space.
For the sake of argument, assume that we use ‘speed’ instead of ‘size’ as a measure of model complexity. Apart from this being a more natural measure, it also allows one to highlight the convexity.
Suppose one has two solution, each running at a different speed. They also have a different error (MSE). The slower solution has a smaller error than the faster solution (they are non-dominated).
Now create a ‘combined model’, which selects a model with a probability for the fitness cases. Flip a biased coin using this probability and select the first or second model based on this outcome. This means that each solution will be used randomly, biased by the coin.
Then, the error/speed trade-off of this model will necessarily lie on the line between the two original models: it’s error is a weighted average of the errors of the two original models, while it’s speed is the weighted average of the speeds of the two original models (plus a small constant for tossing the coin. This constant we ignore.).
Hence, any two non-dominated models dominate the entire area that lies above the line spanned by the combined model, and the solution set is convex. The pareto set that is usually used is larger than the true solution set.
(Of course, this whole argument depends on the fact that we assume that the ‘size’ of a model is commensurable with the ‘complexity’ of a model. In general, Kolmogorov disagrees. The speed argument however holds.)
Good points, Maarten. True enough when we’re talking about offline search and optimization, and when the constraint set is relatively tractable.
In working on this project, and writing these posts, I’ve come to realize that my real complaint is that most of offline optimization is short-sighted. Rude to a real decision-maker. Inagile, and frequently maladaptive.
A lot of my rants through the years, where I ask people why they did something dumb with multiobjective search, are not really about the convexity structure of solution spaces, or even about premature convergence to misleading subsets of pseudo-optima.
They’re actually questions about evidence. Practices. Whether their decision was informed by the actual structure of the actual solution space being explored, or if they just did it because it’s the way they learned in school, or because nobody complained before, or because it never broke in earlier problems. Or because it’s easier that way.
And then I ask: Who told them “easier” was an objective of their project?
And as it happens, I’ve rarely met people who have explored the solution set and mapping to objective space of real problems. And fewer people who have solved problems or run optimization “online” — paying attention to what’s happening, and changing their goals or definitions.
But your point is absolutely right, and helps clarify my thinking: Of course there are convex projections of high-dimensional multiobjective search spaces onto lower dimensions. One can use an arithmetic combination model in many common cases, whether it’s an affine combination of objectives or even a nonlinear mapping. There are plenty of cunning statistical methods one could use to simplify all kinds of fun stuff in hard problems; principle components analysis leaps to mind.
So yes, you’re right. I’m using hyperbole. What I’m always yelling about is sometimes unimportant.
When a technician, or even the decision-maker herself, decides to collapse objectives without first modeling the objective space by sampling or first-principles modeling, that’s a mistake. They may be correct in doing it, in their problem, but the way they did it was a mistake.
Not always a mistake in terms of results. In the case of many simple search space structures, like the one you point out, it’s AOK.
If simplification tended to give a wrong answer, then the mythology of single-objective search wouldn’t have so much traction. People would know better if they encountered problems sooner.
My point is that for many practitioners, especially “in theory” and in the Academy, life has given them an easy ride. They’re making assumptions based on a string of early successes. People raised in a culture of models assume they can get away with this crap in real life, and I admit that bugs me. Look at Economics for analogies, if you want. Premature simplification without rigorous reality check is a mistake in terms of project risk.
Let’s change gears briefly. Look at software development.
Suppose I’m a project manager. Some cowboy solo coder tells me not to worry about the code he wrote by himself, in the middle of the night, without testing, without having the customer in the room at the time. Without any unit tests. Without a pair programmer. Sounds like a student programmer I used to
knowbe.That’s project risk: The customer is at risk because the coder might have failed to do what they thought they were doing (“made mistakes”). And also because the customer might have seen the intermediate results and changed her mind about something she specified earlier in the project. Or might have discovered new business cases that weren’t specified as goals.
And the project is at risk in future because other programmers cannot trust code like this black box, not without going to the trouble of writing their own suite of automated unit tests. Worst: the rest of the world wasn’t involved in the conversation at all, didn’t share the knowledge that this code represents; I’ll bet even the cowboy and the client will have to start over if they try to do similar work again.
That’s a bunch of risk. More risk if it’s an interesting project. All risks that the project will fail—where success is defined as delivering what the client needs.
And yes, I know the cowboy coder I described is “doing it the normal way.” That’s often the wrong way.
Is what the client specifies always what she needs? Suppose she came in and handed a three-ring binder of specifications to a smart programming team. Wrote down what she wanted, all the way to the function call structure, walked away, and came back to get the results when the team had written all the software specified in the stupid binder. No questions about language, about typos in the binder, about better ways to write the code that a team might know, no insight gained by actually making the project work. Just up-front specification of the exact solution she wants.
I’ll argue that project is also at risk. For many of the same reasons as the first one. It’s inagile, and any of us who have worked on software development projects know there’s always a better way to solve a problem than the first one you consider. You always adapt.
Now. Think about any complex search and optimization project. Stock trading. Pharmaceutical design. Supply chain management. Transportation scheduling. Anything more interesting than sticking integer-sized boxes in a backpack so a salesman can deliver them to the Towers of Hanoi.
Think about what you just said about modeling. About your assumption that we know ahead of time the relationship between variables. Think about what that implies about solutions’ interaction with complex nonlinear constraints.
Think about when we make contingent decisions to combine objectives or not, depending on the way they interact with one another in a given problem.
So for sake of the other readers, consider the design pattern you just spelled out: Take two separate nondominated solutions (both are elements of the solution set mapped to points in the objective space) and create a new model that’s a simple stochastic mixture. You’ve defined a new point in the solution set—assuming I allowed you to include stochastic combination in the solution set’s definition—and this maps to a new point in objective space.
Your point is that for certain aggregate error measures, this new point will lie between the first two in the objective space.
If we (whoever “we” is) are certain we’ll never, ever care about anything but the average error, never care about that little constant you mention, or the description length of your second-order solution, but only the number of states it passes through in executing its algorithm… sure. You’re right.
How’s that work when we ever consider the variance of the errors, though? What if the nondominated “simple error-prone” model is a coin-flip itself? What if I didn’t let you have stochastic elements when I specified the problem, if all models were deterministic? What if the data we’re testing your models on is itself full of structure, of the type which biases the log-likelihood errors?
What if, what if?
That’s really the basis of all these years of complaint about multiobjective search. There are always contingencies. Always reasons to change, to improve, to adapt any model, any algorithm, any approach.
Linear programming is a stupid way to solve problems more often than it’s the right way… but it’s right sometimes. Neural nets are stupid more often than they’re useful, but sometimes they’re the right way. Single-objective search is stupid more often than not. Nonadaptive search, offline optimization: stupid. Blind application of any methodology, or making assumptions about objectives or constraints or parameters or relations or constants: stupid.
We all know this.
So let’s go back to that software analogy I was using, just for one minute. If I were running a programming team, they would be Agile. They’d be using XP or Scrum or something to address that risk I mentioned. There would be a customer on the team. There would be tests. There would be objective, communicative tests of all design decisions.
So: Write me a test that will detect the conditions under which a mapping from solution set to objective space can be assumed, with high probability, to be convex. Alternately, and more usefully, write me a test that will detect a failure in that assumption. Something I can attach to any search method I choose, as a monitor or a decorator, and see when things are getting out of hand. A diagnostic. You can do it statistically, or analytically. I don’t care.
When a decision-maker can ask me to reduce the number of objectives for her problem, and we can run some tests and she can decide with confidence whether it’s safe or not, then I’ll be happy. When I am in the middle of a project and we’ve decided to collapse some objectives and a little alarm goes off saying “FAIL: convexity assumption violated”, then I’ll be happy.
If I have to remember to check all the time… well, that’s stupid. Just as stupid as if I had started using Linear Programming at the beginning without checking to see whether the problem was well suited to it, and spent hours running LP algorithms with no way of knowing if the solutions were converging.
See: you and I, and Nic and Riccardo and Koza and my friends on the Nudge project and all the rest, we’re in worse shape than almost every other search geek. Think about it. Go back to that software project analogy. Not only are we trying to model complicated problems at the same time we’re searching through them—in the analogy, specify and refine business deliverables at the same time they’re being developed. We’re using genetic fucking programming.
We get solutions, but we have to figure out how the solutions work. Think about it. A million cowboy programmers writing a million black boxes, all trying to write programs that we specify up front?
Or are we better off if we watch what search is making them discover, and reach in and interact, and drive solutions toward what we mutually discover are not only better solutions to what we specified initially, but also better-specified?
How many times have we seen a GP search spit out an answer that optimizes everything we asked for… but which nonetheless “cheats”? Fails because of something we didn’t spell out.
I prefer dialog. Agility. Constant interactivity. On-line search and optimization, only, all the way down. Because I don’t even trust myself to specify or solve interesting problems. I don’t trust myself to know what I want, or how to describe it, or to know beforehand how my toolkit will interact with it. I don’t even trust myself to stick to the plan. I don’t trust myself programming.
If I don’t even trust myself, even though I so obviously think highly of myself, consider how much I trust programmers writing programs for me, or running an optimization for me, or optimizers solving other complex problems for me.
Especially when the “programmer” or the optimizer isn’t even human.
So that’s why I continue advising people never to simplify prematurely. I think I’d advise them never to do anything algorithmic at all unless they’ve got fine-grained automated tests in place to cover the details, and reliable acceptance tests in place to interrogate prospective “solutions”.
But I don’t know what those tests are, yet. You sketched one, though you didn’t present a reliable automated version yet. Until then, every time we step ahead without those explicit tests in place—so we know when something complicated happens—our project is at greater risk of failure.
And combining objectives is exactly the sort of step I mean.
Wow Bill, a post worthy of a blog entry in its own right. But this thread is getting interesting, so I’m not letting it die yet.
First of all, I want to re-emphasize that I’m not advocating linearly combining objectives in order to reduce the objective space. I am simply saying that usually, one is interested in the convex hull of the Pareto set, not the entire set. This is still a large set, and the set needs to be discovered. This may sound like a dangerous simplification to you, but I think it is warranted, especially in the error / size trade-off I sketched.
Multi-objective problems are multi-dimensional, and prematurely reducing them to a single objective will lead to poor results. I completely agree with you that you want the algorithm to search out the alternatives, and let the decision maker pick the best (combination of) solutions *after* the set has been produced.
If you present it to the ‘customer’ in this way, even with a full Pareto based approach, you will notice that the decision maker will always choose a solution on the convex hull, never one in a concave bend. (Yes, also I can be very confident in my assertions without any evidence to back me up)
Yes, these concave bends are very interesting, almost as interesting as the first front of *dominated* solutions. That is, potentially important, but given that we’ve got a full convex hull to go through for analysis, they can wait.
I think we’re on the same page generally, as also I have stared and stared at what this genetic programming is producing. I violently agree that one needs to examine and re-examine everything that is produced, try out different things. I love to inspect the evolutionary trees that produce a solution (I believe that Nic is one of the few who has that same hobby). I have gotten great result by mixing objectives. I’m just interested to make my life a bit easier, by reducing the solution set to this convex hull. Why?
I’m into this symbolic regression thing. From the way I view it, each and every fitness case has an unknown associated with it, the noise of that point (even when assuming Gaussian noise). Calculating something trivial as the MSE is already cheating, as you put the assumption in that every point has exactly the same noise. You’re reducing the objective space from all points to an unweighted average. We don’t know if this is the case, it’s an assumption.
So now, if I have 100 datapoints, I’m theoretically confronted with at least 101 objectives: one objective for each datapoint, and one (or more) objectives for the succinctness of the solution. For starters, each constant that is a target value is part of the convex hull, and thus part of the Pareto set. Now, if I were to do a full Pareto dominance optimization in this 101 dimensional space, this would become a nightmare. Not only does the size of the solution set becomes intractable (also the convex hull would be huge), the algorithm itself would come to a screeching halt simply because of the sheer complexity of calculating the dominance relation. Convexity can however be used to at cull the space a bit, and can be used for quicker algorithms to determine if the point is on the hull or not. I would also not want to waste my time examining these concave bends when there are more promising points.
No, I’m currently not tackling problems this way (although I would like to), but just making the point that with a growing number of objectives, utilizing convexity is going to be a help, both in getting the algorithms to perform, and in culling the solution set to the most interesting points.
Also given the strong theoretical support for convex hulls as objects of interests next to Pareto sets, I’m amazed for years now that the Multi-Objective community in EC has completely ignored it, and is fully focused on Pareto sets alone (with additional assumptions (eta-dominance) to manage more-than-two objective problems). Pareto sets are more difficult to find and to manage than convex hulls, and the additional benefit of these concave bends that they can induce is at best marginal. You would find them by examining the second convex hull.
As for assumptions, the additional assumption of convexity over Pareto dominance is that you can indeed stochastically mix solutions.
As far as assumption go, this is quite a better assumption than assuming that the noise is distributed equally over all your datapoints. Now that’s a dangerous one! ROC curves are also based on the principle of stochastic mixing. It’s sound, it works in many case, it’s what your decision maker wants.
The convexity assumption usually works (as a first-order approximation) for error measures (actually it only works exactly for absolute error, the line between two points has a little bend based on the covariance of the two solutions for squared error.), and for speed it is quite correct.
For size it would be incorrect, but given that size itself is already a hopeless measure of complexity (the Kolmogorov kind), also Pareto concavity will easily lead one astray.
So, what am I saying here? Possibly: better to add a few more objectives and keep things manageable by mainly considering the convex hull, than using fewer objectives so that you get this concave
stuff as well.
And now I’ve got to quit. Need to get some sleep. Got a scrum first thing tomorrow morning.
Hello,
I was wondering if there is an implementation of ParetoGP ? and any ideas of its performance compared to others such as SPEA2,NGSA-II
thank you!
Well, the vast majority of that went over my head.
I just picked through some genetic algorithms, and I’m noticing some interesting trends.
- They tend to be good at identifying *roughly* where a local minimum is located in a space, but become terribly inefficient at progressing from there.
- Mutation and crossover rates need to be dynamically balanced based on the variance, especially in dynamic sets, to prevent premature homogenization of the population. There shouldn’t be a trigger for hyper-mutation, it should be dynamic.
- Non-mutated candidates are worthless except to add weight to the success of a trait. These candidates tend to be heavier than simply tracking a data point which can be used to determine ‘virility’ of a trait.