Sunday, March 29, 2015

Deobfuscating a theorem of Ewert, Marks, and Dembski

Back in July, I found that I couldn’t make heads or tails of the theorem in a paper by Winston Ewert, Robert J. Marks II, and William A. Dembski, On the Improbability of Algorithmic Specified Complexity. As I explained,

The formalism and the argument are atrocious. I eventually decided that it would be easier to reformulate what I thought the authors were trying to say, and to see if I could generate my own proof, than to penetrate the slop. It took me about 20 minutes, working directly in LaTeX.
I posted a much improved proof, but realized the next day that I’d missed something very simple. With all due haste, here is that something. The theorem is best understood as a corollary of an underwhelming result in probability.

The simple

Suppose that $\mu$ and $\nu$ are probability measures on a countable sample space $\Omega$, and that $c$ is a positive real number. What is the probability that $\nu(x) \geq c \cdot \mu(x)$? That’s a silly question. We have two probabilities of the event \[ E = \{x \in \Omega \mid \nu(x) \geq c \cdot \mu(x) \}. \] It’s easy to see that $\nu(E) \geq c \cdot \mu(E)$ when $\nu(x) \geq c \cdot \mu(x)$ for all $x$ in $E$. The corresponding upper bound on $\mu(E)$ can be loosened, i.e., \begin{equation*} \mu(E) \leq \frac{\nu(E)}{c} \leq \frac{1}{c}. \end{equation*} Ewert et al. derive $\mu(E) \leq c^{-1}$ obscurely.

The information-ish

To make the definition of $E$ information-ish, assume that $\mu(x) > 0$ for all $x$ in $\Omega$, and rewrite \begin{align} \nu(x) &\geq c \cdot \mu(x) \nonumber \\ \nu(x) / \mu(x) &\geq c \nonumber \\ \log_2 \nu(x) - \log_2 \mu(x) &\geq \alpha, \end{align} where $\alpha = \log_2 c$. This lays the groundwork for über-silliness: The probability of $\alpha$ or more bits of some special kind of information is at most $2^{-\alpha}$. This means only that $\mu(E) \leq c^{-1} = 2^{-\alpha}.$

The ugly

Now suppose that $\Omega$ is the set of binary strings $\{0, 1\}^*$. Let $y$ be in $\Omega$, and define an algorithmic probability measure $\nu(x) = 2^{-K(x|y)}$ for all $x$ in $\Omega$. (I explained conditional Kolmogorov complexity $K(x|y)$ in my previous post.) Rewriting the left-hand side of Equation (1), we obtain \begin{align*} \log_2 2^{-K(x|y)} - \log_2 \mu(x) &= -\!\log_2 \mu(x) - K(x|y) \\ &= ASC(x, \mu, y), \end{align*} the algorithmic specified complexity of $x$. Ewert et al. express an über-silly question, along with an answer, as \[ \Pr[ASC(x, \mu, y) \geq \alpha] \leq 2^{-\alpha}. \] This is ill-defined, because $ASC(x, \mu, y)$ is not a random quantity. But we can see what they should have said. The set of all $x$ such that $ASC(x, \mu, y) \geq \alpha$ is the event $E$, and $2^{-\alpha} = c^{-1}$ is a loose upper bound on $\mu(E)$.

Tuesday, March 10, 2015

Tolstoy on the studious deceit of children by the church

Nothing captures my experience with the church better than does this passage from Leo Tolstoy’s The Kingdom of God is Within You (1894).

The chief and most pernicious work of the Church is that which is directed to the deception of children — these very children of whom Christ said: Woe to him that offendeth one of these little ones. From the very first awakening of the consciousness of the child they begin to deceive him, to instill into him with the utmost solemnity what they do not themselves believe in, and they continue to instill it into him till the deception has by habit grown into the child's nature. They studiously deceive the child on the most important subject in life, and when the deception has so grown into his life that it would be difficult to uproot it, then they reveal to him the whole world of science and reality, which cannot by any means be reconciled with the beliefs that have been instilled into him, leaving it to him to find his way as best he can out of these contradictions.

If one set oneself the task of trying to confuse a man so that he could not think clearly nor free himself from the perplexity of two opposing theories of life which had been instilled into him from childhood, one could not invent any means more effectual than the treatment of every young man educated in our so-called Christian society.

(I provide context here.) It’s not exactly surprising that people who refer to indoctrination as Christian education should regard education as indoctrination when it happens to conflict with their beliefs.

Sunday, September 28, 2014

Would E.T. notice an icon of ID creationism?

Robert J. Marks II in his article on IDC in the conservative political outlet Human Events:

Yet we all agree that a picture of Mount Rushmore with the busts of four US Presidents contains more information than a picture of Mount Fuji.
As Jeff Shallit indicates, no, we really don’t. He has formal measures of information in mind, as I usually do. But I’ve posted a lot of formal stuff lately, and I’m going to do something more intuitive. [What you see here is an abortive attempt at late-night writing from over a month ago. Now that Jeff has posted a note he sent to Marks, I'm going to let it go as is. The pictures are fun.]

Is there some special kind of information in an image of Mount Rushmore that would grab the attention of an extraterrestrial flying by? A bright patch is certainly noticeable, but I don’t think that qualifies as a special kind of information, or as much information of any kind. And as lichen grows on the sculpture, it darkens. (This video has before-and-after shots at 4:50.) If you want to know what really wows E.T., click on the image below.

Photo by Volkan Yuksel (cropped).

There may well be a “look here, look here” icon long after the faces have crumbled.

Am I playing a dirty trick? No, by showing you the big picture, I’m allowing you to see that the form of the sculpture does not stand out from the rest of the mountain. It could not have been otherwise. A sculptor subtracts from what is already present to arrive at the result. Even when the medium is marble, there are sometimes features that drive the composition (see the quotes of Michelangelo and Henry Moore in a past post). Gutzon Borglum could not simply imagine the form of the monument, and then pick a mountain arbitrarily. He had to study available mountain form-ations, and imagine what he could produce by removing modest amounts of material.

Am I trying to diminish the work of Borglum? Certainly not. For someone to envision a monument in the side of a mountain is amazing. My point is that much of the form-ation of the sculpture was already done. The in-form-ation by the sculptor was relatively fine detail, for the most part, and that is why the gross features do not stand out from the surrounding stone.

Of course, the ID creationists make E.T. get up close and personal. The point has been made a gazillion times that an extraterrestrial may be so unlike a person that faces mean nothing to it. What objectively stands out in a shot that is tighter, but not as tight as the IDCists want it to be, is the relatively flat surface surrounding the heads. The pile of rubble beneath the carving also draws attention to it. How ironic.

The IDCists always frame what they say contains some sort of special information, without accounting for how that happens. Put simply, why does E.T. zoom in on a relatively small part of Mount Rushmore, if it doesn’t stand out? To come at this another way, Marks expects us to compare the typical image of Mount Fuji, far in the distance, to the typical image of Mount Rushmore, which is a small part containing the sculpture. That is what prompted me to go looking for shots from different perspectives and different distances. [… “If you want any more, you can sing it yourself.”]

"Mountfujijapan" by Swollib

Special Added Bonus Feature: Creationist Persecution Fantasy

Thursday, September 25, 2014

Response to Denyse O’Leary at Salvo

I received a tip that my name had been “taken in vain” by Denyse O’Leary. Unfortunately, the context is one in which I am more doglike than godlike: “The Law of Conservation of Information.”

Dembski did not invent the underlying idea of conservation of information. Biologist Peter Medawar (1980s) and computer scientist Tom English (1996) advanced the view that information is not created from scratch but rather is redistributed from existing sources. Robert Marks II and his students at Baylor University in Texas have developed the idea in terms of search, and their approach has profound consequences for plausible ideas of how evolution occurs, especially when vast claims are made for WEASEL and other evolution computer programs. As we will see in Part II, they are smuggling in information in order to arrive at their target.
I’ve come to understand a fair amount of the psychology of creationists. But I remain mystified by their proclivity to hold forth on anything and everything that comes along. What I’ve learned from my errors is that I’m qualified to speak authoritatively on precious few matters. And even on those, I have to be exceedingly careful. Denyse has had her head handed to her various times at Uncommon Descent, when she’s ventured into the simplest of math. Is she un­em­bar­rassed, or undeterred by embarrassment? Similarly, when she apes the rhetoric of the likes of Demkski and Meyer, where does the unconscious lying to herself end, and the conscious lying to her readers begin?

Ms. O’Leary, my 1996 formulation of "search" was needlessly complicated. With simplification, search is clearly a process of sampling a set of alternatives (which Dembski and Marks refer to as the sample space). To my huge embarrassment, conservation of information turns out to be nothing but obfuscation of statistical independence — a concept that undergraduates encounter early in introductory courses on probability and statistics. There can be no conservation of information in random selection of a sample because there is no information whatsoever. It is absurd to speak of conserving what does not exist.

If samplers have no information about the samples they draw, then how do we account for the fact that sampler (search) A is more likely than sampler B to select a sample that includes at least one element of the target (to hit the target)? There is not the least mystery here. Samplers differ in their biases. That is the gist of why I was wrong to indicate in 1996 that information somehow resides in samplers, and why Dembski and Marks are wrong to do so today.

The following includes a technical correction of my own errors, but ends with exposition that should make sense to everyone who is able to follow you:

The errors of Dembski and Marks apparently derive from a misunderstanding of the "no free lunch" theorem for search. The following links to an interview in which Marks attempts to explain the theorem in layperson's terms, and provides an accessible discussion of how he goes awry:

P.S.—Note that much of the misunderstanding is attributable to misnaming. I know that Ms. O’Leary appreciates the powerful impact of language upon thought. If you refer to the process of sample selection as search, designate a particular subset of the sample space as the target, and say that the selection process hits the target when the sample includes an element of the subset, then you will have a very hard time thinking straight about sampling.

Wednesday, September 24, 2014

Encourage Winston Ewert to lift the five-year embargo on his dissertation

In the days before electronic dissemination of theses and dissertations, I heard of a trick to see if someone had paid attention to your work: insert a buck between the pages of the library copy, and check on it a year later. Obviously, communication among scholars has changed radically. But what remains the same is the hope that someone will actually delve into the full account of your scholarship — i.e., that the document amounts to more than an exercise that you had to complete in order to move on to other things.

Students rarely withhold their theses and dissertations from public view. It makes sense if you have developed a valuable trade secret, or if you reasonably believe that someone might steal your results. There has been no such sense in the one-year embargoes that students working with Professor Robert J. Marks II at Baylor have placed on their masters' theses. But I never groused. And I waited patiently to see Winston Ewert's dissertation, Algorithmic Specified Complexity (August 2013). However, it turns out that he has opted for a five-year embargo.

This is exactly the opposite of what Winston should do. Please contact him to explain that shutting out the light is a bad move for someone who has chosen the path of creationism. He provides an email address at the Evolutionary Informatics Lab website.

Monday, August 25, 2014

The censorial anti-ID activist is… Bob O'Hara?

Those of you acquainted with Bob O’Hara have already busted a gut laughing. There is not, as best I can tell, a censorial bone in his body. He is the mildest of the pack nipping at the heels of ID creationists. As I recall, he even respects their wish not to be called creationists.

A year ago, ID-creationism advocate Casey Luskin alternately detailed and insinuated everything he ever wanted you to believe about Springer’s abandonment of the creationist-edited volume Biological Evolution: New Perspectives. And what he insinuated about Bob O’Hara was ugly.

[A]pparently [Matzke’s] post generated a lot of complaints to Springer from people who didn’t want the company to publish a book with articles sympathetic to ID. For example, one of Matzke’s Panda’s Thumb followers, statistician Bob O’Hara, reported that “I’ve been in contact with one of the editors at Springer, so they’re now certainly aware of the situation.” Within a day or two, Springer had removed its page for Biological Information: New Perspectives from its website.
Luskin cutely juxtaposes Bob’s comment with the removal of the book announcement, inviting you to read the worst into it. But this isn’t strong enough a rhetorical trick for the political magazine Human Events, which brings us “Powerful Conservative Voices.” So the powerful Robert J. Marks II resorted, apparently, to embellishment in his article of last week, Biological Information: New Perspectives from Intelligent Design:
Despite the intelligent design content, the German publishing company Springer invited the organizers to publish papers from the conference. But, even though no one had yet seen the book, publicity at an atheistic leaning neo-Darwinist blog prompted an anti-ID activist to contact Springer upper management and claim Springer’s publishing of the book would ruin Springer’s reputation in science. So Springer reneged on its contract with the Editors at the last minute.
I now have Bob’s permission not only to invoke his name, but also to reveal how he bullied a senior editor at Springer US on February 27, 2012. (He sent me a copy of his note, three days later.) After identifying the book, he wrote:
This has the potential to be a controversial text (as the editors are all active in pushing Intelligent Design), so I'm wondering why it's being published as an engineering text, rather than biology: it would seem to be a better fit there.
Gee. No threat. No doomsaying. No claim to know anything about what he had not yet seen. I see a flat statement of fact, followed by a gentle suggestion that the book was misclassified. Look back at what Marks wrote, and consider the warped mind that would concoct such propaganda.

Did Springer drop the title because of outside pressure? I doubt it highly. As I explained in my last post, the book deal was shady from the get-go. The most that Bob O’Hara did was to shine light on it. Predictably, the Pharisees focus on the alleged breach of contract instead of their own dubious ethics.

[Edit. I just learned that Marks has deleted an erroneous erratum from the online copy of an article, leaving no indication that one of the two highlighted theorems is severely botched. The kicker is that Marks received, long before submitting the paper to the journal that published it, a clear explanation of the error. His correspondent CC'd me! See “The theorem that never was: Diversionary ‘erratum’ from Dembski and Marks.”]

Saturday, August 23, 2014

You’re making things up again, Robert J. Marks II

One of the “20 Most Influential Christian Scholars,” the distinguished professor who approved a master’s thesis that plagiarized his own publications, your favorite whited sepulcher and mine, Robert J. Marks II, is making things up again. This time it’s a tale of censorship of the creationist-edited volume Biological Information: New Perspectives, told through a conservative political outlet, Human Events. Marks embellishes and contradicts what Casey Luskin, an intelligent-design advocate at the Discovery Institute, reported a year ago. He would have you believe:

Despite the intelligent design content, the German publishing company Springer invited the organizers to publish papers from the conference. But, even though no one had yet seen the book, publicity at an atheistic leaning neo-Darwinist blog prompted an anti-ID activist to contact Springer upper management and claim Springer’s publishing of the book would ruin Springer’s reputation in science. So Springer reneged on its contract with the Editors at the last minute.
Luskin flatly contradicts the first statement. And I’ve had, since March 2, 2012, a copy of the email that the “anti-ID activist” sent to Springer. It is utterly devoid of what Marks attributes to it [see for yourself]. Here is what I think is significant: the author sent the note to a Ph.D. scientist-editor at Springer US. But Springer DE was handling the book, according to Luskin. [The editor did not reply.] It may well be that New York pushed the panic button in Heidelberg. In any case, we know for sure:

You’re making things up again, Robert

Marks points out that Springer reneged on the contract, but somehow forgets to mention that he knew from the outset that the deal was shady. According to Luskin, it was Dembski who proposed to an editor of a Springer engineering series on intelligent systems that Biological Information: New Perspectives be included. Dembski would think that he could talk a fast line to justify it. He thinks that about everything he does. But Marks’ field is intelligent systems. And so is mine. He knew just as well as I did that it was wrong to dump the book into that series. A big threat to the publisher’s reputation, I think, was that institutions buying all volumes, expecting them to be about intelligent systems as advertized, would scream loudly when they got creationism instead.

I can’t resist amplifying Marks’ first sentence.

A diverse group of [secretly invited] scientists [many of whom were not scientists] gathered at [but not under the auspices of] Cornell University in 2011 to discuss their research [not peer-reviewed] into the nature and origins of biological information [loosely interpreted].
My initial response to the proceedings of the enclave is here. I wish that I’d given more emphasis to the fact that, despite all of Marks’ bragging about the attendees, the organizers didn’t, well, organize them to review the papers of their peers. The symposium was pretty much a group-hug. There were many presentations by John Sanford, who is busy setting up simulations to show that genomes only deteriorate — and rapidly, at that. He’s confident that our species won’t survive to the end of the century. It’s all science, of course. But he does hope to persuade you that the End Time is at hand.

Marks and Luskin carry on about their contract. But they don’t seem terribly anxious to admit that they consort with people who, if their madness did not align with established religions, would be locked up. At present, a guy I know well is heavily preoccupied with off-brand religion, and is in the protective custody of the state. He seems no crazier to me than Sanford. (Watch this if you think I’m exaggerating.) And the injustice of the difference in society’s treatment of him and it’s treatment of a YEC with an elaborate delusional system is weighing heavily on my mind.

Thursday, August 21, 2014

Calling all statisticians and probabilists

The following is a question I posted at the Mathematics Stack Exchange. Folks who see it here are likely to understand that my "Making Less of No Free Lunch" campaign is intended as much to hose down a creationist cloud of dust as to rain on the NFL parade. Please contribute a bit of your expertise to a worthy cause.

I hope to replace a proof of my own, in a paper explaining that the "no free lunch" theorems for optimization actually address sampling and statistics, with a reference to an existing result on sampling. The random selection process $$X_1, X_2, \dotsc, X_n$$ over the domain of the random objective function $F$ is statistically independent of the selected values, $$F(X_1), F(X_2), \dotsc, F(X_n),$$ despite the functional dependence $$X_i \equiv X(F(X_1), F(X_2), \dotsc, F(X_{i-1})),$$ $1 < i \leq n,$ where $X$ is statistically independent of $F.$ (See Section 0.1 here [or my last post] for more detail.)

To put this in terms of sampling, $F$ associates members of a statistical population with their unknown responses. The sampler $X$ processes data already in the sample to extend the selection. This typically biases the selection. But the selection process is statistically independent of the sequence of responses. That is the justification for referring to the latter as a sample.

This looks like textbook stuff to me: "Data processing biases the selection." But I am awash in irrelevant hits when I Google the terms you see here, and others like them.

Friday, August 15, 2014

Sampling Bias Is Not Information

[Edits 8/19/2014 and 8/20/2014: Although the proof of the so-called “no free lunch” theorem is logically very simple, it looked like a vomited hairball in the first version of this post. I’ve made it considerably more inviting. Note also that the online version of “Evaluation of Evolutionary and Genetic Optimizers: No Free Lunch” now has a preface.]

This corrects a post that I retracted, and promised to replace, more than a year ago. It took me only a few days to redo the math. But then I executed a fabulously ADDled drift into composition of a preface for my first paper on “no free lunch” in optimization, planning to use it also as a replacement post. Now I must admit that I am ensnarled in words, and that I don’t know when, or even if, I will break free. Consequently, I am dumping here some fragments that should make sense, though they say much less than I would like to get said.

For those of you who are more interested in ID creationism than in NFL, the highlight is that my references to “conservation of information,” elevated to “English’s Principle of Conservation of Information” by Dembski and Marks in “Conservation of Information in Search: Measuring the Cost of Success,” amount to nothing but failure to recognize that I was discussing statistical independence. Much of what I say in the subsection “Understanding the Misunderstanding of Information” applies not only to me, but also to Dembski and Marks. In fact, I recommend reading the relatively accessible “Bob Marks grossly misunderstands ‘no free lunch’” first.


The recent “no free lunch” theorems of Wolpert and Macready indicate the need to reassess empirical methods for evaluation of evolutionary and genetic optimizers. Their main theorem states, loosely, that the average performance of all optimizers is identical if the distribution of functions is average. [An “optimizer” selects a sample of the values of the objective function. Its “performance” is a statistic of the sample.] The present work generalizes the result to an uncountable set of distributions. The focus is upon the conservation of information as an optimizer evaluates points [statistical independence of the selection process and the selected values]. It is shown that the information an optimizer gains about unobserved values is ultimately due to its prior information of value distributions. [The paper mistakes selection bias for prior information of the objective function.] Inasmuch as information about one distribution is misinformation about another, there is no generally superior function optimizer. Empirical studies are best regarded as attempts to infer the prior information optimizers have about distributions [match selection biases to constrained problems] — i.e., to determine which tools are good for which tasks.

0. Sampling Bias Is Not Information

This preface (Sect. 0) addresses expository errors in the original paper (English, 1996), but stands as a self-contained report. Specific corrections and amplifications appear in the remainder of the text (Sects. 1–6).

Figure 1: Objective function $ \newcommand{\disteq}{\stackrel{\scriptscriptstyle{\mathcal{D}}}{=}} \newcommand{\Fseq}[2]{% {F_1^{#1}\hspace{-1pt}({#2)}} } \newcommand{\selection}{{X_1^n}} \newcommand{\sample}{{\Fseq{n}{X}}} \newcommand{\Fn}[1]{{\Fseq{n}{#1}}} \newcommand{\wn}{{w_1^n}} \newcommand{\xn}{{x_1^n}} \newcommand{\yn}{{y_1^n}} \newcommand{\zn}{{z_1^n}} \newcommand{\boldx}{\mathbf{x}} \newcommand{\boldw}{\mathbf{w}} \newcommand{\X}{\mathcal{X}} \newcommand{\Xdiamond}{\mathcal{X}_\diamond} \newcommand{\Y}{\mathcal{Y}} \newcommand{\Ydiamond}{\mathcal{Y}_\diamond} F$ and quality measure $\psi$ comprise the optimization problem. An optimizer is a sampler $X,$ along with a reduction $r$ of the selection $\xn$ and the sample $\yn$ to outputs $\tilde{x}_1^m = x_{j_1}, \dotsc, x_{j_m}$ and $\tilde{y}_1^m = y_{j_1}, \dotsc, y_{j_m},$ with $j_1^m$ strictly increasing. NFL analyses make four assumptions: the selection is non-repeating, the reduction is identical for all optimizers under consideration, the output $\tilde{y}_1^m$ depends only on the sample, and quality depends only on $\tilde{y}_1^m.$ Accordingly, write $\tilde{y}_1^m = r(\yn)$ and $\phi_n = \psi(\tilde{y}_1^m).$ Then the quality $\psi(r(\yn))$ of the optimizer’s output is a statistic $\phi = \psi \circ r$ of the sample it generates.

Black-box optimization may be decomposed into generation of a sample of the values of the objective function, and use of the sample to produce a result. “No free lunch” (NFL) analyses assume explicitly that sampling is without repetition, and define quality of the optimization result to be a statistic of the sample (Wolpert and Macready, 1995, 1997). Fig. 1 unpacks the tacit assumptions, the most important of which is that the optimizers under consideration differ only in selection of samples. Fig. 2 shows how postprocessing of the sample, assumed to be identical for all optimizers, is embedded in the statistic. Although the statistic combines part of the optimizer with part of the optimization problem, the NFL literature refers to it as the performance measure, and to samplers as optimization (or search) algorithms.

Figure 2: Random sampler $X$ is statistically independent of random objective function $F,$ and the selection $X(\lambda) = x_1, \dotsc, X(y_1, \dotsc, y_{n-1}) = x_n$ is statistically independent of the sample $F(x_1) = y_1,$ $\dotsc, F(x_n) = y_n.$ The statistic $\phi = \psi \circ r$ is the composition of the quality measure $\psi$ on optimizer outputs, given in the optimization problem, and the reduction $r$ of samples to outputs, assumed to be identical for all optimizers. The selection and the sample are abbreviated $\selection = \xn$ and $\sample = \yn,$ respectively.

The better known of NFL theorems, including that of Sect. 3.2 (which appears also in Streeter, 2003, and Igel and Toussaint, 2005), are reinventions of basic results in probability (Häggström, 2007). They address sampling and statistics. This claim is justified by showing that the selection process \[ X_i \equiv X(F(X_1), \dotsc, F(X_{i-1})), \] $1 \leq i \leq n,$ is statistically independent of the selected values \[ \sample \equiv F(X_1), \dotsc, F(X_n), \] despite its data processing. Sect. 3.1 supplies proof for the special case of deterministic selection, with the domain and codomain of the random objective function restricted to finite sets. Here the result is generalized to random selection, with repetition allowed, and with the domain and codomain possibly infinite, though countable. Furthermore, the selection process is equipped to terminate. Although this would seem to complicate matters, the proof is greatly simplified.

The selection process cannot be regarded as having or gaining information of unselected values of the objective function when it is statistically independent of the selected values. Data processing is a source of bias in the selection. The paper fails, in exposition, to recognize statistical independence for what it is, and refers instead to “conservation of information.” (The formal results are correct.) It furthermore equates the propensity of an optimizer to perform better for some distributions of the random objective function and worse for others with prior information that somehow inheres in the optimizer. The notion that one entity simply has, rather than acquires, varying degrees of prior information and misinformation of others is incoherent.

The following subsection formalizes the system sketched in Fig. 2, proves that the selection is indeed statistically independent of the selected values, and goes on to demonstrate that a general NFL theorem, so called, follows almost immediately. This exercise leaves very little room for contention that the NFL theorems address something other than sampling. The preface concludes with a sub­sec­tion that briefly deconstructs the paper’s erroneous claims about information.

0.1 Formal Results on Sampling

Sampling processes do not necessarily terminate, and it is convenient to treat them all as infinite. The distinguished symbol $\diamond$ is interpreted as the sample terminator. Let countable sets $\Xdiamond$ = $\X \cup \{\diamond\}$ and $\Ydiamond$ = $\Y \cup \{\diamond\},$ where sets $\X$ and $\Y$ exclude $\diamond.$ The random objective function $F$ maps $\Xdiamond$ to $\Ydiamond,$ with $F(x) = \diamond$ if and only if $x = \diamond.$

Finite sequences of the forms $\alpha_1, \dotsc,$ $\alpha_n$ and $\gamma(\alpha_1), \dotsc,$ $\gamma(\alpha_n)$ are abbreviated $\alpha_1^n$ and $\gamma_1^n(\alpha),$ respectively. A sampler is a random function, statistically independent of $F,$ from the set of all finite sequences on $\Ydiamond$ to $\Xdiamond.$ A selection is a random vector $\selection$ with \[ X_i \equiv X(F(X_1), \dotsc, F(X_{i-1})) \] $1 \leq i \leq n,$ where $X$ is a sampler. The selection is non-repeating if $\selection \in \pi(\X)$ surely, where $\pi(\X)$ is the set of all non-repeating sequences on $\X.$

A statistic is a function with the set of all finite sequences on $\Ydiamond$ as its domain. Assume, without loss of generality, that the codomain of a statistic is a countable superset of its range, which is countable like the domain.

Probability measure is unproblematic when considering the selection $\selection$ and the corresponding sequence $\sample,$ because both take values in countable sets. The following lemma establishes that the selection is statistically independent of the selected values, i.e., that it is correct to refer to $\sample$ as a sample of the values $\{F(x) \mid x \in \Xdiamond\}$ of the objective function. The data processing in extensions of the selection, highlighted in the proof, is a potential source of selection bias, not information about the objective function.

Lemma. Selection $X_1, \dotsc, X_n$ is statistically independent of $F(X_1), \dotsc, F(X_n).$

Proof. Let $\xn$ and $\yn$ be nonempty sequences on $\Xdiamond$ and $\Ydiamond,$ respectively, such that $P( \Fseq{n}{x} = \yn) > 0.$ Then \begin{align*} P( \selection = \xn, \sample = \yn) &= P(\selection = \xn, \Fseq{n}{x} = \yn) \\ &= P(\selection = \xn \mid \Fseq{n}{x} = \yn) \cdot P(\Fseq{n}{x} = \yn), \end{align*} and \begin{align*} &P(\selection = \xn \mid \Fseq{n}{x} = \yn) \\ &\quad= P(X(\Fseq{0}{x}) = x_1, \dotsc, X(\Fseq{n-1}{x}) = x_n \mid \Fseq{n}{x} = \yn) \\ &\quad= P(X(y_1^0) = x_1, \dotsc, X(y_1^{n-1}) = x_n \mid \Fseq{n}{x} = \yn) \\ &\quad= P(X(y_1^0) = x_1, \dotsc, X(y_1^{n-1}) = x_n) \end{align*} because sampler $X$ is statistically independent of objective function $F.$ 

Corollary 1. Selection $X_1, \dotsc, X_n$ is statistically independent of $\phi(F(X_1), \dotsc, F(X_n))$ for all statistics $\phi.$

The following NFL theorem, so called, uses the symbol $\disteq$ to denote equality in probability distribution. It says, loosely, that the distribution of a statistic depends on the choice of sampler if and only if the distribution of the statistic depends on the choice of sample.

Theorem. Let $x_1, \dotsc, x_n$ be a nonempty, non-repeating sequence on $\X,$ and let $\phi$ be a statistic. Then \begin{equation} \phi(F(X_1), \dotsc, F(X_n)) \disteq \phi(F(x_1), \dotsc, F(x_n)) \label{eq:selections} \end{equation} for all non-repeating selections $X_1, \dotsc, X_n$ if and only if \begin{equation} \phi(F(w_1), \dotsc, F(w_n)) \disteq \phi(F(x_1), \dotsc, F(x_n)) \label{eq:sequences} \end{equation} for all non-repeating sequences $w_1, \dotsc, w_n$ on $\X.$

($\Rightarrow$) Suppose that (1) holds for all non-repeating selections $\selection,$ and let $\wn$ be a non-repeating sequence on $\X.$ There exists sampler $X$ with constant selection $\selection = \wn,$ and thus (2) follows.
($\Leftarrow$) Suppose that (2) holds for all non-repeating sequences $\wn$ on $\X,$ and let $\selection$ be a non-repeating selection. A condition stronger than (1) follows. For each and every realization $\selection = \wn,$ non-repeating on $\X$ by definition, \begin{equation*} \phi(\sample) \disteq \phi(\Fseq{n}{w}) \disteq \phi(\Fseq{n}{x}). \end{equation*} The first equality holds by Corollary 1, and the second by assumption. 

Corollary 2. Let $x_1, \dotsc, x_n$ be a nonempty, non-repeating sequence on $\X.$ Then \begin{equation} F(X_1), \dotsc, F(X_n) \disteq F(x_1), \dotsc, F(x_n) \label{eq:distselections} \end{equation} for all non-repeating selections $X_1, \dotsc, X_n$ if and only if \begin{equation} F(w_1), \dotsc, F(w_n) \disteq F(x_1), \dotsc, F(x_n) \label{eq:distsequences} \end{equation} for all non-repeating sequences $w_1, \dotsc, w_n$ on $\X.$

Proof. Set the statistic $\phi$ in the theorem to the identity function. 

The corollary is essentially the NFL theorem of English (1996, Sect. 3.2). When (4) holds for all $n,$ the random values $\{F(x) \mid x \in \X\}$ of the objective function are said to be exchangeable (Häggström, 2007).

0.2 Understanding the Misunderstanding of Information

The root error is commitment to the belief that information is the cause of performance in black-box optimization (search). The NFL theorems arrived at a time when researchers commonly claimed that evolutionary optimizers gained information about the fitness landscape, and adapted themselves dynamically to improve performance. Wolpert and Macready (1995) observe that superior performance on a subset of functions is offset precisely by inferior performance on the complementary subset. In online discussion of their paper, Bill Spears referred to this as conservation of performance. My paper suggests that conservation of information accounts for conservation of performance.

The lemma of Sect. 3.1, “Conservation of Information,” expresses the absolute uninformedness of the sample selection process. The performance of a black-box optimizer has nothing whatsoever to do with its information of the objective function. But the paper recognizes only that information gain is impossible, and claims incoherently that prior information resides in the optimizer itself. Conservation of this chimeral information supposedly accounts for conservation of performance in optimization. Here are the salient points of illogic:

  1. Information causes performance.
  2. The optimizer gains no exploitable information by observation, so it must be prior information that causes performance.
  3. There is no input by which the optimizer might gain prior information, so it must be that prior information inheres in the optimizer.
  4. Prior information of one objective function is prior misinformation of another. Conservation of performance is due to conservation of information.

It should have been obvious that prior information is possessed only after it is acquired. The error is due in part to a mangled, literalistic half-reading of (Wolpert and Macready, 1995, p. 8, emphasis added):

The NFL theorem illustrates that even if we know something about [the objective function] … but don’t incorporate that knowledge into [the sampler] then we have no assurances that [the sampler] will be effective; we are simply relying on a fortuitous matching between [the objective function] and [the sampler].
The present work (Sect. 6) concludes that:
The tool literally carries information about the task. Furthermore, optimizers are literally tools — an algorithm implemented by a computing device is a physical entity. In empirical study of optimizers, the objective is to determine the task from the information in the tool.
This reveals confusion of one type of information with another. When a toolmaker imparts form to matter, the resulting tool is in-formed to suit a task. But such form is not prior information. Having been formed to perform is different from having registered a signal relevant to the task. An optimization practitioner may gain information of a problem by observation, and then form a sampler to serve as a proxy in solving it. Although the sampler is informed to act as the practitioner would, it is itself uninformed of the problem to which it is applied, and thus cannot justify its own actions. The inherent form that accounts for its performance is sampling bias.

Unjustified application of a biased sampler to an optimization problem is merely biased sampling by proxy. The NFL theorems do not speak to this fundamental point. They specify conditions in which all of the samplers under consideration are equivalent in overall performance, or are nearly so. Ascertaining that none of these theorems applies to a real-world circumstance does not justify a bias, but instead suggests that justification may be possible. There is never a “free lunch” for the justifier.


T. M. English. Evaluation of evolutionary and genetic optimizers: No free lunch. In L. J. Fogel, P. J. Angeline, and T. Bäck, editors, Evolutionary Programming V: Proceedings of the Fifth Annual Conference on Evolutionary Programming, pages 163–169. MIT Press, Cambridge, MA, 1996.

O. Häggström. Intelligent design and the NFL theorems. Biology & Philosophy, 22(2):217–230, 2007.

C. Igel and M. Toussaint. A no-free-lunch theorem for non-uniform distributions of target functions. Journal of Mathematical Modelling and Algorithms, 3(4):313–322, 2005.

M. J. Streeter. Two broad classes of functions for which a no free lunch result does not hold. In Genetic and Evolutionary Computation – GECCO 2003, pages 1418–1430. Springer, 2003.

D. H. Wolpert and W. G. Macready. No free lunch theorems for search. Technical report, SFI-TR-95-02-010, Santa Fe Institute, 1995.

D. H. Wolpert and W. G. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67–82, 1997.

Tuesday, August 12, 2014


My last post showed that a claim of Winston Ewert, William Dembski, and Robert J. Marks II to groundbreaking research does not withstand a minute of Googling. In the comments section, I made some admissions about my own work on "no free lunch" theorems for optimization. I want to highlight them here.

If I had a face-to-face chat with Winston, I'd tell him that I really, really, really regret having gotten into the "no free lunch" thing. Rather than play to my strengths — I'm a crackerjack computer scientist — I wasted a lot of myself on obfuscated reinvention of a probabilist's wheel. What's dreadfully embarrassing is that it took me many years to realize how bad my work was. I had to acknowledge that Haggstrom [cited below] was correct in his criticism of NFL. And Winston should acknowledge that he does not have a theoretical bone is his body.
Quoting what I said about myself, Anonymous asks:
Would you be willing to elaborate on this? It might help others avoid the types of mistakes you want them to avoid, if you could explain specifically: 1) in what ways your work was poor, or a "reinvented wheel", 2) why you did not notice your work was bad, and 3) what work/areas should be avoided (for fear of reinvention). Thanks.
The following is my response, lightly edited.

The first two questions are entirely legitimate. The last borders on the fallacy of the complex question. I'm more than willing to elaborate. But I'm going to limit my response, because I'm struggling to get a couple other things written. [Obviously, I've drifted off-task.]

1) I have toiled and toiled over a preface [now available] for my first NFL paper (1996), discovering errors in my explanations of errors, and compounding my embarrassment. Dog willing, and the Creek don't rise, I will eventually post "Sampling Bias Is Not Information" here. You can arrange for Google Scholar to alert you when the title appears, if you don't want to follow my blog. For now, I'll say only that what I call "conservation of information" is nothing but statistical independence of the sampling ("search" or "optimization") process and the sample of random values of the objective function. I describe statistical independence in the introduction to the paper, but fail to identify it. That is indeed bad.

2) For one thing, I had given my confusion a fancy name. But I would rather focus here on my runaway arrogance. I independently proved the main NFL theorem in 1994. Questioning a student during his thesis defense, I got him to state clearly that "future work" was supposed to lead to a generally superior optimizer. After a centisecond of thought, I sketched the simple argument that Häggström gives in "Intelligent Design and the NFL Theorems" (2007). I considered going for a publication, but, as obvious as the result was to me, I had to believe that it was already in the optimization literature. Twenty years ago, a lit review in an unfamiliar field was a big undertaking. And I had more than enough work to do at the time.

When Wolpert and Macready disseminated "No Free Lunch Theorems for Search" through GA Digest (early 1995), stirring up a big controversy, I concluded that I was remarkably clever, rather than that the whole affair was silly. The accepted version of my first NFL paper included my simple proof, but a reviewer's suggestion led me to see exchangeability (Häggström provides the term) as the "NFL condition." Rather than save the result for another paper, I dumped it into Sect. 3.2 about the time that camera-ready copy was due. The upshot is that I obsoleted what is known as the NFL theorem, 16 months before it was published. (I sent a copy of my paper to Bill Macready, who responded, "Nice work, Tom.")

As the reputation of NFL grew, so did my arrogance. I forgot what I had suspected in the beginning, namely that something so simple was probably not a novel discovery. The more work I did with NFL, the more it became "my thing." In 2000 and 2001, I gave tutorials on the topic at conferences. I put NFL in the cover letters of my applications for academic positions. When Dembski came along with his book No Free Lunch (2002), I was able to respond with "authority." And the number of citations of Wolpert and Macready (presently 3823) continued to go up and up and up. "Fifty Million Frenchmen Can't Be Wrong," you know, about the importance of NFL.

3) Do not presume to contribute to a field that you have not bothered to study in depth.