Thursday, August 21, 2014

Calling all statisticians and probabilists

The following is a question I posted at the Mathematics Stack Exchange. Folks who see it here are likely to understand that my "Making Less of No Free Lunch" campaign is intended as much to hose down a creationist cloud of dust as to rain on the NFL parade. Please contribute a bit of your expertise to a worthy cause.


I hope to replace a proof of my own, in a paper explaining that the "no free lunch" theorems for optimization actually address sampling and statistics, with a reference to an existing result on sampling. The random selection process $$X_1, X_2, \dotsc, X_n$$ over the domain of the random objective function $F$ is statistically independent of the selected values, $$F(X_1), F(X_2), \dotsc, F(X_n),$$ despite the functional dependence $$X_i \equiv X(F(X_1), F(X_2), \dotsc, F(X_{i-1})),$$ $1 < i \leq n,$ where $X$ is statistically independent of $F.$ (See Section 0.1 here [or my last post] for more detail.)

To put this in terms of sampling, $F$ associates members of a statistical population with their unknown responses. The sampler $X$ processes data already in the sample to extend the selection. This typically biases the selection. But the selection process is statistically independent of the sequence of responses. That is the justification for referring to the latter as a sample.

This looks like textbook stuff to me: "Data processing biases the selection." But I am awash in irrelevant hits when I Google the terms you see here, and others like them.

Friday, August 15, 2014

Sampling Bias Is Not Information

[Edits 8/19/2014 and 8/20/2014: Although the proof of the so-called “no free lunch” theorem is logically very simple, it looked like a vomited hairball in the first version of this post. I’ve made it considerably more inviting. Note also that the online version of “Evaluation of Evolutionary and Genetic Optimizers: No Free Lunch” now has a preface.]

This corrects a post that I retracted, and promised to replace, more than a year ago. It took me only a few days to redo the math. But then I executed a fabulously ADDled drift into composition of a preface for my first paper on “no free lunch” in optimization, planning to use it also as a replacement post. Now I must admit that I am ensnarled in words, and that I don’t know when, or even if, I will break free. Consequently, I am dumping here some fragments that should make sense, though they say much less than I would like to get said.

For those of you who are more interested in ID creationism than in NFL, the highlight is that my references to “conservation of information,” elevated to “English’s Principle of Conservation of Information” by Dembski and Marks in “Conservation of Information in Search: Measuring the Cost of Success,” amount to nothing but failure to recognize that I was discussing statistical independence. Much of what I say in the subsection “Understanding the Misunderstanding of Information” applies not only to me, but also to Dembski and Marks. In fact, I recommend reading the relatively accessible “Bob Marks grossly misunderstands ‘no free lunch’” first.



Abstract
The recent “no free lunch” theorems of Wolpert and Macready indicate the need to reassess empirical methods for evaluation of evolutionary and genetic optimizers. Their main theorem states, loosely, that the average performance of all optimizers is identical if the distribution of functions is average. [An “optimizer” selects a sample of the values of the objective function. Its “performance” is a statistic of the sample.] The present work generalizes the result to an uncountable set of distributions. The focus is upon the conservation of information as an optimizer evaluates points [statistical independence of the selection process and the selected values]. It is shown that the information an optimizer gains about unobserved values is ultimately due to its prior information of value distributions. [The paper mistakes selection bias for prior information of the objective function.] Inasmuch as information about one distribution is misinformation about another, there is no generally superior function optimizer. Empirical studies are best regarded as attempts to infer the prior information optimizers have about distributions [match selection biases to constrained problems] — i.e., to determine which tools are good for which tasks.

0. Sampling Bias Is Not Information

This preface (Sect. 0) addresses expository errors in the original paper (English, 1996), but stands as a self-contained report. Specific corrections and amplifications appear in the remainder of the text (Sects. 1–6).


Figure 1: Objective function $ \newcommand{\disteq}{\stackrel{\scriptscriptstyle{\mathcal{D}}}{=}} \newcommand{\Fseq}[2]{% {F_1^{#1}\hspace{-1pt}({#2)}} } \newcommand{\selection}{{X_1^n}} \newcommand{\sample}{{\Fseq{n}{X}}} \newcommand{\Fn}[1]{{\Fseq{n}{#1}}} \newcommand{\wn}{{w_1^n}} \newcommand{\xn}{{x_1^n}} \newcommand{\yn}{{y_1^n}} \newcommand{\zn}{{z_1^n}} \newcommand{\boldx}{\mathbf{x}} \newcommand{\boldw}{\mathbf{w}} \newcommand{\X}{\mathcal{X}} \newcommand{\Xdiamond}{\mathcal{X}_\diamond} \newcommand{\Y}{\mathcal{Y}} \newcommand{\Ydiamond}{\mathcal{Y}_\diamond} F$ and quality measure $\psi$ comprise the optimization problem. An optimizer is a sampler $X,$ along with a reduction $r$ of the selection $\xn$ and the sample $\yn$ to outputs $\tilde{x}_1^m = x_{j_1}, \dotsc, x_{j_m}$ and $\tilde{y}_1^m = y_{j_1}, \dotsc, y_{j_m},$ with $j_1^m$ strictly increasing. NFL analyses make four assumptions: the selection is non-repeating, the reduction is identical for all optimizers under consideration, the output $\tilde{y}_1^m$ depends only on the sample, and quality depends only on $\tilde{y}_1^m.$ Accordingly, write $\tilde{y}_1^m = r(\yn)$ and $\phi_n = \psi(\tilde{y}_1^m).$ Then the quality $\psi(r(\yn))$ of the optimizer’s output is a statistic $\phi = \psi \circ r$ of the sample it generates.


Black-box optimization may be decomposed into generation of a sample of the values of the objective function, and use of the sample to produce a result. “No free lunch” (NFL) analyses assume explicitly that sampling is without repetition, and define quality of the optimization result to be a statistic of the sample (Wolpert and Macready, 1995, 1997). Fig. 1 unpacks the tacit assumptions, the most important of which is that the optimizers under consideration differ only in selection of samples. Fig. 2 shows how postprocessing of the sample, assumed to be identical for all optimizers, is embedded in the statistic. Although the statistic combines part of the optimizer with part of the optimization problem, the NFL literature refers to it as the performance measure, and to samplers as optimization (or search) algorithms.


Figure 2: Random sampler $X$ is statistically independent of random objective function $F,$ and the selection $X(\lambda) = x_1, \dotsc, X(y_1, \dotsc, y_{n-1}) = x_n$ is statistically independent of the sample $F(x_1) = y_1, \dotsc, F(x_n) = y_n.$ The statistic $\phi = \psi \circ r$ is the composition of the quality measure $\psi$ on optimizer outputs, given in the optimization problem, and the reduction $r$ of samples to outputs, assumed to be identical for all optimizers. The selection and the sample are abbreviated $\selection = \xn$ and $\sample = \yn,$ respectively.


The better known of NFL theorems, including that of Sect. 3.2 (which appears also in Streeter, 2003, and Igel and Toussaint, 2005), are reinventions of basic results in probability (Häggström, 2007). They address sampling and statistics. This claim is justified by showing that the selection process \[ X_i \equiv X(F(X_1), \dotsc, F(X_{i-1})), \] $1 \leq i \leq n,$ is statistically independent of the selected values \[ \sample \equiv F(X_1), \dotsc, F(X_n), \] despite its data processing. Sect. 3.1 supplies proof for the special case of deterministic selection, with the domain and codomain of the random objective function restricted to finite sets. Here the result is generalized to random selection, with repetition allowed, and with the domain and codomain possibly infinite, though countable. Furthermore, the selection process is equipped to terminate. Although this would seem to complicate matters, the proof is greatly simplified.

The selection process cannot be regarded as having or gaining information of unselected values of the objective function when it is statistically independent of the selected values. Data processing is a source of bias in the selection. The paper fails, in exposition, to recognize statistical independence for what it is, and refers instead to “conservation of information.” (The formal results are correct.) It furthermore equates the propensity of an optimizer to perform better for some distributions of the random objective function and worse for others with prior information that somehow inheres in the optimizer. The notion that one entity simply has, rather than acquires, varying degrees of prior information and misinformation of others is incoherent.

The following subsection formalizes the system sketched in Fig. 2, proves that the selection is indeed statistically independent of the selected values, and goes on to demonstrate that a general NFL theorem, so called, follows almost immediately. This exercise leaves very little room for contention that the NFL theorems address something other than sampling. The preface concludes with a subsection that briefly deconstructs the paper’s erroneous claims about information.


0.1 Formal Results on Sampling

Sampling processes do not necessarily terminate, and it is convenient to treat them all as infinite. The distinguished symbol $\diamond$ is interpreted as the sample terminator. Let countable sets $\Xdiamond$ = $\X \cup \{\diamond\}$ and $\Ydiamond$ = $\Y \cup \{\diamond\},$ where sets $\X$ and $\Y$ exclude $\diamond.$ The random objective function $F$ maps $\Xdiamond$ to $\Ydiamond,$ with $F(x) = \diamond$ if and only if $x = \diamond.$

Finite sequences of the forms $\alpha_1, \dotsc,$ $\alpha_n$ and $\gamma(\alpha_1), \dotsc,$ $\gamma(\alpha_n)$ are abbreviated $\alpha_1^n$ and $\gamma_1^n(\alpha),$ respectively. A sampler is a random function, statistically independent of $F,$ from the set of all finite sequences on $\Ydiamond$ to $\Xdiamond.$ A selection is a random vector $\selection$ with \[ X_i \equiv X(F(X_1), \dotsc, F(X_{i-1})) \] $1 \leq i \leq n,$ where $X$ is a sampler. The selection is non-repeating if $\selection \in \pi(\X)$ surely, where $\pi(\X)$ is the set of all non-repeating sequences on $\X.$

A statistic is a function with the set of all finite sequences on $\Ydiamond$ as its domain. Assume, without loss of generality, that the codomain of a statistic is a countable superset of its range, which is countable like the domain.

Probability measure is unproblematic when considering the selection $\selection$ and the corresponding sequence $\sample,$ because both take values in countable sets. The following lemma establishes that the selection is statistically independent of the selected values, i.e., that it is correct to refer to $\sample$ as a sample of the values $\{F(x) \mid x \in \Xdiamond\}$ of the objective function. The data processing in extensions of the selection, highlighted in the proof, is a potential source of selection bias, not information about the objective function.

Lemma. Selection $X_1, \dotsc, X_n$ is statistically independent of $F(X_1), \dotsc, F(X_n).$

Proof. Let $\xn$ and $\yn$ be nonempty sequences on $\Xdiamond$ and $\Ydiamond,$ respectively, such that $P( \Fseq{n}{x} = \yn) > 0.$ Then \begin{align*} P( \selection = \xn, \sample = \yn) &= P(\selection = \xn, \Fseq{n}{x} = \yn) \\ &= P(\selection = \xn \mid \Fseq{n}{x} = \yn) \cdot P(\Fseq{n}{x} = \yn), \end{align*} and \begin{align*} &P(\selection = \xn \mid \Fseq{n}{x} = \yn) \\ &\quad= P(X(\Fseq{0}{x}) = x_1, \dotsc, X(\Fseq{n-1}{x}) = x_n \mid \Fseq{n}{x} = \yn) \\ &\quad= P(X(y_1^0) = x_1, \dotsc, X(y_1^{n-1}) = x_n \mid \Fseq{n}{x} = \yn) \\ &\quad= P(X(y_1^0) = x_1, \dotsc, X(y_1^{n-1}) = x_n) \end{align*} because sampler $X$ is statistically independent of objective function $F.$

Corollary 1. Selection $X_1, \dotsc, X_n$ is statistically independent of $\phi(F(X_1), \dotsc, F(X_n))$ for all statistics $\phi.$

The following NFL theorem, so called, uses the symbol $\disteq$ to denote equality in probability distribution. It says, loosely, that the distribution of a statistic depends on the choice of sampler if and only if the distribution of the statistic depends on the choice of sample.

Theorem. Let $x_1, \dotsc, x_n$ be a nonempty, non-repeating sequence on $\X,$ and let $\phi$ be a statistic. Then \begin{equation} \phi(F(X_1), \dotsc, F(X_n)) \disteq \phi(F(x_1), \dotsc, F(x_n)) \label{eq:selections} \end{equation} for all non-repeating selections $X_1, \dotsc, X_n$ if and only if \begin{equation} \phi(F(w_1), \dotsc, F(w_n)) \disteq \phi(F(x_1), \dotsc, F(x_n)) \label{eq:sequences} \end{equation} for all non-repeating sequences $w_1, \dotsc, w_n$ on $\X.$

Proof.
($\Rightarrow$) Suppose that (1) holds for all non-repeating selections $\selection,$ and let $\wn$ be a non-repeating sequence on $\X.$ There exists sampler $X$ with constant selection $\selection = \wn,$ and thus (2) follows.
($\Leftarrow$) Suppose that (2) holds for all non-repeating sequences $\wn$ on $\X,$ and let $\selection$ be a non-repeating selection. A condition stronger than (1) follows. For each and every realization $\selection = \wn,$ non-repeating on $\X$ by definition, \begin{equation*} \phi(\sample) \disteq \phi(\Fseq{n}{w}) \disteq \phi(\Fseq{n}{x}). \end{equation*} The first equality holds by Corollary 1, and the second by assumption. ☐

Corollary 2. Let $x_1, \dotsc, x_n$ be a nonempty, non-repeating sequence on $\X.$ Then \begin{equation} F(X_1), \dotsc, F(X_n) \disteq F(x_1), \dotsc, F(x_n) \label{eq:distselections} \end{equation} for all non-repeating selections $X_1, \dotsc, X_n$ if and only if \begin{equation} F(w_1), \dotsc, F(w_n) \disteq F(x_1), \dotsc, F(x_n) \label{eq:distsequences} \end{equation} for all non-repeating sequences $w_1, \dotsc, w_n$ on $\X.$

Proof. Set the statistic $\phi$ in the theorem to the identity function. ☐

The corollary is essentially the NFL theorem of English (1996, Sect. 3.2). When (4) holds for all $n,$ the random values $\{F(x) \mid x \in \X\}$ of the objective function are said to be exchangeable (Häggström, 2007).


0.2 Understanding the Misunderstanding of Information

The root error is commitment to the belief that information is the cause of performance in black-box optimization (search). The NFL theorems arrived at a time when researchers commonly claimed that evolutionary optimizers gained information about the fitness landscape, and adapted themselves dynamically to improve performance. Wolpert and Macready (1995) observe that superior performance on a subset of functions is offset precisely by inferior performance on the complementary subset. In online discussion of their paper, Bill Spears referred to this as conservation of performance. My paper suggests that conservation of information accounts for conservation of performance.

The lemma of Sect. 3.1, “Conservation of Information,” expresses the absolute uninformedness of the sample selection process. The performance of a black-box optimizer has nothing whatsoever to do with its information of the objective function. But the paper recognizes only that information gain is impossible, and claims incoherently that prior information resides in the optimizer itself. Conservation of this chimeral information supposedly accounts for conservation of performance in optimization. Here are the salient points of illogic:

  1. Information causes performance.
  2. The optimizer gains no exploitable information by observation, so it must be prior information that causes performance.
  3. There is no input by which the optimizer might gain prior information, so it must be that prior information inheres in the optimizer.
  4. Prior information of one objective function is prior misinformation of another. Conservation of performance is due to conservation of information.

It should have been obvious that prior information is possessed only after it is acquired. The error is due in part to a mangled, literalistic half-reading of (Wolpert and Macready, 1995, p. 8, emphasis added):

The NFL theorem illustrates that even if we know something about [the objective function] … but don’t incorporate that knowledge into [the sampler] then we have no assurances that [the sampler] will be effective; we are simply relying on a fortuitous matching between [the objective function] and [the sampler].
The present work (Sect. 6) concludes that:
The tool literally carries information about the task. Furthermore, optimizers are literally tools — an algorithm implemented by a computing device is a physical entity. In empirical study of optimizers, the objective is to determine the task from the information in the tool.
This reveals confusion of one type of information with another. When a toolmaker imparts form to matter, the resulting tool is in-formed to suit a task. But such form is not prior information. Having been formed to perform is different from having registered a signal relevant to the task. An optimization practitioner may gain information of a problem by observation, and then form a sampler to serve as a proxy in solving it. Although the sampler is informed to act as the practitioner would, it is itself uninformed of the problem to which it is applied, and thus cannot justify its own actions. The inherent form that accounts for its performance is sampling bias.

Unjustified application of a biased sampler to an optimization problem is merely biased sampling by proxy. The NFL theorems do not speak to this fundamental point. They specify conditions in which all of the samplers under consideration are equivalent in overall performance, or are nearly so. Ascertaining that none of these theorems applies to a real-world circumstance does not justify a bias, but instead suggests that justification may be possible. There is never a “free lunch” for the justifier.


References

T. M. English. Evaluation of evolutionary and genetic optimizers: No free lunch. In L. J. Fogel, P. J. Angeline, and T. Bäck, editors, Evolutionary Programming V: Proceedings of the Fifth Annual Conference on Evolutionary Programming, pages 163–169. MIT Press, Cambridge, MA, 1996.

O. Häggström. Intelligent design and the NFL theorems. Biology & Philosophy, 22(2):217–230, 2007.

C. Igel and M. Toussaint. A no-free-lunch theorem for non-uniform distributions of target functions. Journal of Mathematical Modelling and Algorithms, 3(4):313–322, 2005.

M. J. Streeter. Two broad classes of functions for which a no free lunch result does not hold. In Genetic and Evolutionary Computation – GECCO 2003, pages 1418–1430. Springer, 2003.

D. H. Wolpert and W. G. Macready. No free lunch theorems for search. Technical report, SFI-TR-95-02-010, Santa Fe Institute, 1995.

D. H. Wolpert and W. G. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67–82, 1997.

Tuesday, August 12, 2014

NFL&I

My last post showed that a claim of Winston Ewert, William Dembski, and Robert J. Marks II to groundbreaking research does not withstand a minute of Googling. In the comments section, I made some admissions about my own work on "no free lunch" theorems for optimization. I want to highlight them here.

If I had a face-to-face chat with Winston, I'd tell him that I really, really, really regret having gotten into the "no free lunch" thing. Rather than play to my strengths — I'm a crackerjack computer scientist — I wasted a lot of myself on obfuscated reinvention of a probabilist's wheel. What's dreadfully embarrassing is that it took me many years to realize how bad my work was. I had to acknowledge that Haggstrom [cited below] was correct in his criticism of NFL. And Winston should acknowledge that he does not have a theoretical bone is his body.
Quoting what I said about myself, Anonymous asks:
Would you be willing to elaborate on this? It might help others avoid the types of mistakes you want them to avoid, if you could explain specifically: 1) in what ways your work was poor, or a "reinvented wheel", 2) why you did not notice your work was bad, and 3) what work/areas should be avoided (for fear of reinvention). Thanks.
The following is my response, lightly edited.


The first two questions are entirely legitimate. The last borders on the fallacy of the complex question. I'm more than willing to elaborate. But I'm going to limit my response, because I'm struggling to get a couple other things written. [Obviously, I've drifted off-task.]

1) I have toiled and toiled over a preface [now available] for my first NFL paper (1996), discovering errors in my explanations of errors, and compounding my embarrassment. Dog willing, and the Creek don't rise, I will eventually post "Sampling Bias Is Not Information" here. You can arrange for Google Scholar to alert you when the title appears, if you don't want to follow my blog. For now, I'll say only that what I call "conservation of information" is nothing but statistical independence of the sampling ("search" or "optimization") process and the sample of random values of the objective function. I describe statistical independence in the introduction to the paper, but fail to identify it. That is indeed bad.

2) For one thing, I had given my confusion a fancy name. But I would rather focus here on my runaway arrogance. I independently proved the main NFL theorem in 1994. Questioning a student during his thesis defense, I got him to state clearly that "future work" was supposed to lead to a generally superior optimizer. After a centisecond of thought, I sketched the simple argument that Häggström gives in "Intelligent Design and the NFL Theorems" (2007). I considered going for a publication, but, as obvious as the result was to me, I had to believe that it was already in the optimization literature. Twenty years ago, a lit review in an unfamiliar field was a big undertaking. And I had more than enough work to do at the time.

When Wolpert and Macready disseminated "No Free Lunch Theorems for Search" through GA Digest (early 1995), stirring up a big controversy, I concluded that I was remarkably clever, rather than that the whole affair was silly. The accepted version of my first NFL paper included my simple proof, but a reviewer's suggestion led me to see exchangeability (Häggström provides the term) as the "NFL condition." Rather than save the result for another paper, I dumped it into Sect. 3.2 about the time that camera-ready copy was due. The upshot is that I obsoleted what is known as the NFL theorem, 16 months before it was published. (I sent a copy of my paper to Bill Macready, who responded, "Nice work, Tom.")

As the reputation of NFL grew, so did my arrogance. I forgot what I had suspected in the beginning, namely that something so simple was probably not a novel discovery. The more work I did with NFL, the more it became "my thing." In 2000 and 2001, I gave tutorials on the topic at conferences. I put NFL in the cover letters of my applications for academic positions. When Dembski came along with his book No Free Lunch (2002), I was able to respond with "authority." And the number of citations of Wolpert and Macready (presently 3823) continued to go up and up and up. "Fifty Million Frenchmen Can't Be Wrong," you know, about the importance of NFL.

3) Do not presume to contribute to a field that you have not bothered to study in depth.

Tuesday, August 5, 2014

ID creationist “scholarship” sinks to new depths

[Edit: I regret turning nasty here. See the comments section for something rather more kind I would say to Winston Ewert.]

I mentioned a couple posts ago that I was annoyed with the editors and reviewers of an article, apparently to be published in an IEEE journal, by Winston Ewert, William Dembski, and Robert J. Marks II. Although I focus on the authors here, it will be clear enough what the folks responsible for quality control, including those on Ewert’s dissertation committee at Baylor, should have done. The short version is that, Googling obvious keywords, it takes at most a minute to discover that prior work, of which the authors claim to have no knowledge, exists in abundance.

Before continuing, I want to emphasize that my response to ID creationism is anything but knee-jerk. Consider, if nothing else, that the way I dealt with a dubious theorem supplied by Ewert et al. was to attempt to prove it. I not only succeeded in doing so, but supplied the proof here. I have since proved a fundamental theorem that some might take as advancing ID theory, though I see it otherwise.

Back to the article. There is no way around concluding that the veterans, Dembski and Marks, are either incompetent or dishonest. (You can guess which way I lean from a subtitle I’ve considered for this blog: Truth Is the First Casualty of Culture War.) And Youngman Ewert is either lazy, dishonest, or intellectually hobbled. (I see many signs of hero worship and true-believerism.)

You need no clear idea of what information theory is to grasp the significance of Dembski allowing himself to be promoted as the “Isaac Newton of information theory.” Marks edited a volume on engineering applications of information theory. The two have repeatedly cited standard texts that I own, but have not studied thoroughly. I am not an information theorist. So how is it that I should be thunderstruck by a flatly incorrect statement in the second sentence of the abstract, and then be floored by an intensification of it in the second paragraph of the text?

You may think that you are unqualified to judge whether Ewert, Dembski, and Marks are right or wrong. But Google Scholar settles the matter unequivocally. That is what makes the falsehood so atrocious. The authors state in the introduction:

Both Shannon [1], [2] and Kolmogorov-Chaitin-Solomonoff (KCS)1 [2]-[9] measures of information are famous for not being able to measure meaning. [...] We propose an information theoretic method to measure meaning. To our knowledge, there exists no other general analytic model for numerically assigning functional meaning to an object.

1Sometimes referred to as only Kolmogorov complexity or Kolmogorov information.
My immediate thoughts, incomprehensible to most of you, but eminently sensible to many who’ve had a course in information theory, were Kolmogorov structure function and Kolmogorov sufficient statistic. I first learned about Kolmogorov complexity from a chapter in reference [2]: Cover and Thomas, Elements of Information Theory. See Section 14.12, “Kolmogorov Sufficient Statistic.” If Dembski and Marks do not understand the relevance, then they are incompetent. If they do, then they are dishonest.

Back to Google. And to Ewert. And to Baylor. The article clearly draws on Ewert’s dissertation, which had better include a chapter with a title like “Literature Review.” Actually, his dissertation proposal should have surveyed the relevant literature. A scholar should comprehend a body of knowledge before trying to extend it — right? Let’s make the relatively charitable assumptions that Ewert, though presuming to make a fundamental contribution to information theory, never took a course in information theory, and failed to grasp some aspects of the introduction to Kolmogorov complexity in Cover and Thomas. He nonetheless had obvious keywords to enter into Google Scholar. There are 41,500 results for the unclever search

meaningful information Kolmogorov.
(The term meaning is too general, and meaningful is the last word of the first sentence in the abstract.) Close to the top are two articles by Paul Vitányi, both entitled “Meaningful Information.” With the search narrowed to
"meaningful information" Kolmogorov,
there are about 1,310 results. This is what I did first. I immediately went to the more recent of Vitányi's articles, because he’s a prominent researcher in Kolmogorov complexity, and also the coauthor of the standard text on the topic (reference [24] in Ewert et al.). I have highlighted key phrases in the abstract for those of you who don’t care to grapple with it.
Abstract—The information in an individual finite object (like a binary string) is commonly measured by its Kolmogorov complexity. One can divide that information into two parts: the information accounting for the useful regularity present in the object and the information accounting for the remaining accidental information. There can be several ways (model classes) in which the regularity is expressed. Kolmogorov has proposed the model class of finite sets, generalized later to computable probability mass functions. The resulting theory, known as Algorithmic Statistics, analyzes the algorithmic [Kolmogorov] sufficient statistic when the statistic is restricted to the given model class. However, the most general way to proceed is perhaps to express the useful information as a total recursive function. The resulting measure has been called the “sophistication” of the object. We develop the theory of recursive functions statistic, the maximum and minimum value, the existence of absolutely nonstochastic objects (that have maximal sophistication—all the information in them is meaningful and there is no residual randomness), determine its relation with the more restricted model classes of finite sets, and computable probability distributions, in particular with respect to the algorithmic (Kolmogorov) minimal sufficient statistic, the relation to the halting problem and further algorithmic properties.

The dissertation was Ewert’s chance to straighten up and fly right. Back when he admitted to plagiarism in his master’s thesis, I wrote:

I want to emphasize again that I am not reveling in the academic misconduct of a young master’s student. The villains are the members of the thesis committee, who have not, to my knowledge, had to acknowledge their misconduct publicly.
So what did you do with your chance, Dr. Ewert? When you feed us the “to our knowledge” guano, knowingly creating the misimpression that you tried to acquire knowledge, you are lying again. You have exaggerated the significance of your own work by failing to report on the prior work of others. I doubt highly that you are capable of understanding their work. Were you so inept as to not Google, or so deceitful as to sweep a big mound of inconvenient truth under the rug? Assuming the former, you need to catch on to the fact that your “maverick genius” coauthors should have known what was out there.

Sunday, August 3, 2014

More to come on algorithmic specified complexity

As I indicated in my last post, “Proving a theorem of Ewert, Marks, and Dembski,” I was killing (sleepless) time dead. After 12 hours of shuteye and a couple cups of coffee, I proved something more fundamental. I want you to know, while I struggle to herd my catlike thoughts into the linguistic corral, not to make much of the theorem.

Thursday, July 31, 2014

Proving a theorem of Ewert, Marks, and Dembski

[Edit: Don't take this theorem too seriously.]

In what appears to be a forthcoming journal article, ID creationists Winston Ewert, William Dembski, and Robert J. Marks II state a formal result, and justify it merely by citing a three-page paper in the proceedings of a symposium. I was already annoyed with the reviewers and the editors for missing a huge defect, and decided to while away the sleepless hours by checking the putative proof in On the Improbability of Algorithmic Specified Complexity.

The formalism and the argument are atrocious. I eventually decided that it would be easier to reformulate what I thought the authors were trying to say, and to see if I could generate my own proof, than to penetrate the slop. It took me about 20 minutes, working directly in LaTeX. Then I decided to provide some explanation that is missing in the paper.

The theorem is correct. As Ewert, Marks, and Dembski put it, "The probability of obtaining an object exhibiting $\alpha$ bits of [algorithmic specified complexity] is less than or equal to $2^{-\alpha}$." It is something that they can establish a property like this. But algorithmic specified complexity is a sum of bits of Shannon self-information and bits of Kolmogorov complexity, which seem like apples and oranges to me.

I should mention that the result probably comes from Ewert's doctoral dissertation, Algorithmic Specified Complexity, still under wraps at Baylor. (I'm guessing that he refrained from plagiarism, this go around.) Evidently Dembski, a mathematician, did not edit the paper.

The following makes more sense if you read Sections I and II of the paper. Those of you with a bit of math under your belts will be amazed by the difference.


$\newcommand{\suppmu}{\mathrm{supp}(\mu)} \newcommand{\B}{\mathcal{B}} \newcommand{\X}{\mathcal{X}} \renewcommand{\S}{\mathcal{S}} \newcommand{\P}{\mathcal{P}} \newcommand{\Bprime}{\mathcal{B}^\prime}$The set of all strings (finite sequences) on $\B = \{0, 1\}$ is $\B^*.$ Assume that the binary encoding $e: \S \rightarrow \B^*$ of the set $\S$ of objects of interest is 1-to-1. This allows use of the set of codewords $e(\S)$ in place of $\S.$

The set of all programs $\P$ for universal computer $U$ is a prefix-free subset of $\B^*.$ That is, no program is a proper prefix of any other. The conditional Kolmogorov complexity $K(x|y)$ is the length $\ell(p)$ of the shortest program $p$ that outputs $x$ on input of $y,$ i.e., \[ K(x|y) = \hspace{-0.3em} \min_{p : U(p, y) = x} \hspace{-0.4em} \ell(p). \] The Kraft inequality \[ \sum_{x \in \X} 2^{-\ell(x)} \leq 1 \] holds for all prefix-free sets $\X \subset \B^*,$ including the prefix-free set of programs $\P.$ It follows that for all $\X \subseteq \B^*,$ \[ \sum_{x \in \X} 2^{-\!K(x|y)} \leq \sum_{p \in P} 2^{-\ell(p)} \leq 1, \] where $y$ is a string in $\B^*.$ In the first sum, all terms correspond to distinct programs, and each exponent $-\!K(x|y)$ is the negative length of a program that outputs $x$ on input of $y$.

Theorem 1. Let $\mu$ be a probability measure on encoded objects $e(\S).$ Also let \[ X = \{x \in \suppmu \mid -\!\log_2 \mu(x) - K(x | y) \geq \alpha\}, \] where $y$ is a string in $\B^*$ and $\alpha \geq 0.$ Then $\mu(X) \leq 2^{-\alpha}.$

Proof. Rewrite the property of string $x$ in $X$ to obtain a bound on $\mu(x):$ \begin{align*} -\!\log_2 \mu(x) - K(x | y) &\geq \alpha \\ \log_2 \mu(x) + K(x | y) &\leq -\alpha \\ \log_2 \mu(x) &\leq -\alpha - K(x | y) \\ \mu(x) &\leq 2^{-\alpha - K(x | y)}. \end{align*} Applying the bound, \begin{align*} \mu(X) &= \sum_{x \in X} \mu(x) \\ &\leq \sum_{x \in X} 2^{-\alpha -K(x | y)} \\ &= \sum_{x \in X} 2^{-\alpha} \cdot 2^{-K(x | y)} \\ &= {2^{-\alpha} \sum_{x \in X} 2^{-K(x | y)}} \\ &\leq 2^{-\alpha}. \end{align*} The last step follows by the Kraft inequality. Q.E.D.

Thursday, February 6, 2014

Bronowski, Michelangelo, Moore, and Einstein

I'm watching Jacob Bronowski's documentary series The Ascent of Man. You'll find in the following transcript a better account of how sculpture takes form than you'll get from any intelligent-design theorist. The notion that there's an independent design that the sculptor forces upon the stone is simply wrong. But to dwell on that would be to miss what Bronowski emphasizes, an interesting analogy of science to sculpture. On reflection, I saw the similarity of his remarks to some by Einstein, which I quote below. Hopefully someone out there will find the connection interesting.


BRONOWSKI: A popular cliche in philosophy says that science is pure analysis or reductionism, like taking the rainbow to pieces, and art is pure synthesis — putting the rainbow together. This is not so. All imagination begins by analysing nature. Michelangelo said that.

When that which is divine in us doth try
    To shape a face, both brain and hand unite
    To give, from a mere model frail and slight,
    Life to the stone by Art's free energy.
BRONOWSKI: The material asserts itself through the hand and thereby prefigures the shape of the work for the brain. The sculptor, as much as the mason, feels for the form within nature.
The best of artists hath no thought to show
    Which the rough stone in its superfluous shell
    Doth not include: to break the marble spell
    Is all the hand that serves the brain can do.
[TOM: See Sonnets XIV and XV here.]

BRONOWSKI: By the time Michelangelo carved the head of Brutus, other men quarried the marble for him. But Michelangelo had begun as a quarryman in Carrara and he still felt that the hammer in their hands, and in his, was groping in the stone for a shape that was already there. The quarrymen work in Carrara now for the modern sculptors who come here — Marino Marini, Lipschitz and Henry Moore. Their descriptions of their work are not as poetic as Michelangelo's, but they carry the same feeling.

HENRY MOORE: To begin with, as a young sculptor, I couldn't afford expensive stone. And I got my stone by going round the stone yards, and finding what they would call a random block. Then I had to think in the same way that Michelangelo might have done, so that one had to wait until an idea came that fitted the shape of the stone. And that was seeing the idea in that block.

BRONOWSKI: Of course, it can't be literally true that what the sculptor imagines and carves out is already there, hidden in the block. And yet the metaphor tells the truth about the relation of discovery that exists between man and nature. In one sense, everything that we discover is already there. A sculptured figure and the law of nature are both concealed in the raw material. And in another sense, what a man discovers is discovered by him. It would not take exactly the same form in the hands of someone else. Neither the sculptured figure nor the law of nature would come out in identical copies when produced by two different minds in two different ages. Discovery is a double relation of analysis and synthesis together. As an analysis it probes for what is there. But then, as a synthesis, it puts the parts together in a form in which the creative mind transcends the bare limits, the bare skeleton that nature provides.

BRONOWSKI: Sculpture is a sensuous art. The Eskimos make small sculptures that are not even meant to be seen, only handled. So it must seem strange that I choose as my model for science sculpture and architecture. And yet it's right. We have to understand that the world can only be grasped by action, not by contemplation. The hand is more important than the eye. We are not one of those contemplative civilisations of the Far East or the Middle Ages that believed that the world has only to be seen and thought about and who practised no science. We are active, and indeed we know in the evolution of man, that it is the hand that drives the subsequent evolution of the brain. We find tools made by man before he became man. Benjamin Franklin called man the "tool-making animal." And that's right. And the most exciting thing about that is that even in prehistory, man already made tools that have an edge finer than they need have.

[TOM: I've cleaned up this transcript.]


EINSTEIN: The reciprocal relationship of epistemology and science is of noteworthy kind. They are dependent upon each other. Epistemology without contact with science becomes an empty scheme. Science without epistemology is — insofar as it is thinkable at all — primitive and muddled. However, no sooner has the epistemologist, who is seeking a clear system, fought his way through to such a system, than he is inclined to interpret the thought-content of science in the sense of his system and to reject whatever does not fit into his system. The scientist, however, cannot afford to carry his striving for epistemological systematic that far. He accepts gratefully the epistemological conceptual analysis; but the external conditions, which are set for him by the facts of experience, do not permit him to let himself be too much restricted in the construction of his conceptual world by the adherence to an epistemological system. He therefore must appear to the systematic epistemologist as a type of unscrupulous opportunist: he appears as realist insofar as he seeks to describe a world independent of the acts of perception; as idealist insofar as he looks upon the concepts and theories as the free inventions of the human spirit (not logically derivable from what is empirically given); as positivist insofar as he considers his concepts and theories justified only to the extent to which they furnish a logical representation of relations among sensory experiences. He may even appear as Platonist or Pythagorean insofar as he considers the viewpoint of logical simplicity as an indispensable and effective tool of his research. [Wikiquote]


BOUNDED SCIENCE: There can be no scientific explanation of science.

Wednesday, June 26, 2013

Black-box “optimization” is merely sampling

[Edit: The replacement is here at last. There have been many visits to this page since I retracted the post. I apologize to those who have been waiting.]

The main points of this post were correct, but the math contained some errors. I am working on a replacement.

Dembski’s perennial misconception of fitness

DiEb has begun a response to the latest morph of the creationist model of “search” (Dembski, Ewert, and Marks, “A General Theory of Information Cost Incurred by Successful Search” [pdf]). Here, slightly modified, is a rather general comment I made.


In the conventional “no free lunch” analytic framework, the objective (cost, fitness) function is a component of the problem. Dembski, Ewert, and Marks turn the objective function into an “oracle” that is part of the problem-solver itself. This model is inappropriate to most, if not all, of the evolutionary computations they purport to have analyzed.

Back in the 1990’s, Dembski committed himself to the misconception that Richard Dawkins’ Weasel program uses the fitness function in order to “hit the target.” Various people have tried, with no apparent success, to explain to him that one of the offspring in each generation survives because it is the most fit. The so-called target is nothing but the fittest individual.

To put it simply, the fitness function comes first. The “target” is defined in terms of the fitness function. Dembski gets this backwards. He believes that the target comes first, and that the fitness function is defined in terms of the target.

Dembski and Marks carry this to extreme in “Life's Conservation Law.” They claim that biological targets exist implicitly in nature, and that if Darwinian evolution “hits” them, then fitness functions necessarily have guided evolution. A remarkable aspect of this claim is that they treat fitness functions, which are abstractions appearing in mathematical models of evolution, as though they really exist.

The “search for a search” is another abstraction that they reify. A probability measure on the sample space is a mathematical abstraction. They merely assert that a search practitioner, in selecting a search, searches the uncountably infinite set of probability measures. To that I say, “Give me a physical description of the process.”

Thursday, June 6, 2013

Open access to Biological Information: New Perspectives

I previously raised an eyebrow at an editor of Springer’s “Intelligent Systems Reference Library,” in which the creationist volume Biological Information: New Perspectives (eds. Robert J. Marks II, Michael J. Behe, William A. Dembski, Bruce L. Gordon, and John C. Sanford) was scheduled to appear. The proceedings of the secret scientific symposium of scientists and “scientists”…

In the spring of 2011 a diverse group of scientists gathered at Cornell University with an eye on the major new principles that might be required to unravel the problem of biological information. These scientists included experts in information theory, computer science, numerical simulation, thermodynamics, evolutionary theory, whole organism biology, developmental biology, molecular biology, genetics, physics, biophysics, mathematics, and linguistics. Original scientific research was presented and discussed at this symposium, which was then written up, and constitute most of the twenty-four peer-edited papers in this volume.
… (did I mention science?) that took place at, but not under the auspices of, Cornell University have migrated to World Scientific. You can read the volume online, free of charge.

The big surprise is that “Section Four: Biological Information and Self-Organizational Complexity Theory” comprises two dissenting papers, one by Stuart Kauffman (whose views on many things are similar to my own), and the other by Bruce H. Weber. Although editor Gordon is none too clear on the matter in his introduction to the section, it appears that Kauffman and Weber actually contributed to a previous secret meeting, the proceedings of which were never published.

Their involvement in this project traces back to a 2007 conference I organized in Boston under the auspices of the Discovery Institute’s Center for Science and Culture. The conference commemorated the famous 1967 Wistar Symposium on “Mathematical Challenges to the Neo-Darwinian Interpretation of Evolution.” [...] The general perception among the participants in the Boston symposium, as with the participants in the Cornell University conference giving rise to this compendium, is that the mathematical and biological challenges posed to the modern evolutionary synthesis (neo-Darwinism) have not been resolved, but actually have grown more acute as our knowledge of molecular biology, cell biology, developmental biology, and genetics has exploded.
Gee, that sounds like “these guys are on our side.” But here’s the second half of Weber’s abstract:
Presently, however, there is ferment in the Darwinian Research Tradition as new knowledge from molecular and developmental biology, together with the deployment of complex systems dynamics, suggests that an expanded and extended evolutionary synthesis is possible, one that could be particularly robust in explaining the emergence of evolutionary novelties and even of life itself. Critics of Darwinism need to address such theoretical advances and not just respond to earlier versions of the research tradition.
So Gordon contradicts Weber while trying to paint him as an ally. He makes a fine point of the inadequacy of the “modern evolutionary synthesis (neo-Darwinism),” which is hardly where Darwinian evolutionary theory stands today. Kauffman highlights in his abstract the essential reason that the information measures of Dembski and Marks go nowhere in biology.
Biological evolution rests on both quantum random and classical non-random natural selection and whole-part interactions that render the sample space of adjacent biological possibilities unknowable.
I’ve heard him put it more simply: We don’t know the phase space. This means that it is impossible to assign probabilities to evolutionary trajectories. And taking logarithms of probabilities is how Dembski and Marks get information.

I wrote “scientists and ‘scientists’” above because only two of the five editors are scientists, and because engineers, computer scientists, and mathematicians have contributed heavily.

Unsurprisingly, about half of the “new perspectives” are variations on old themes of why evolution doesn’t work. John C. Sanford, a young-earth creationist who believes that genomes have been going to hell in a handbasket since the Fall of Man, authored seven of the papers and one of the section introductions. Dembski, Marks, Montañez and Ewert continue to bash evolutionary computation, including artificial life.

Jonathan Wells shocks us by reporting, “Not Junk After All: Non-Protein-Coding DNA Carries Extensive Biological Information.” Other papers show that the genetic code is fine-tuned, and furthermore that DNA sequences and computer code look much alike, with appropriate visualization. I’m sure there are other sensations to be found on closer inspection of the volume.