by Gene Callahan

In a seminar I’ve been attending at NYU this semester, David Chalmers contended that “resetting priors” is irrational in a Bayesian framework. (If you’re not familiar with Bayesian inference, the Wikipedia article just linked to does a good job of introducing the topic, so I will refer you to that, rather than lengthening an already long post with my own introduction to the subject.) This seemed wrong to me and seemed wrong to me long before Chalmers raised the issue for me again. But his remarks renewed my interest in the subject, and resulted in this post.

To get myself back in the “Bayesian spirit,” I began by re-reading two of my favorite papers on the subject, Wesley Salmon’s “Rationality and Objectivity in Science” and Clark Glymour’s “Why I am Not a Bayesian” (both available in the excellent survey work by Cover and Curd, *Philosophy of Science: The Central Issues*, 1998, New York: W. W. Norton & Company). I will quote from both of those papers here, but the conclusion about re-setting priors seems to be my own—at least a quick Google search turns up only Chalmers rejection of my central idea here!

Glymour’s paper begins by stating what he sees as the role of confirmation theory, of which he understands Bayesianism to be a variety: “The aim of confirmation theory is to provide a true account of the principles that guide scientific argument insofar as that argument is not, and does not purport to be, of a deductive kind” (584). He believes Bayesianism falls short of capturing important aspects of that process, and discusses several ways in which he finds that so.

However, he notes at the outset that he does not suggest these shortcomings render Bayesian inference useless—far from it!—a caveat that applies to my own criticism as well: “It is not that I think the Bayesian scheme or related probabilistic accounts capture nothing. On the contrary, they are clearly pertinent where the reasoning involved in explicitly statistical. Further, the accounts developed by Carnap, his predecessors, and his successors are impressive systematizations and generalizations, in a probabilistic framework, of certain principles of ordinary reasoning” (587-588).

Nevertheless, he continues: “What is controversial is that the general principles required for argument can best be understood as conditions restricting prior probabilities in a Bayesian framework. Sometimes they can, perhaps, but I think when arguments turn on relating evidence to theory, it is very difficult to explicate them in a plausible way within the Bayesian framework” (592)—for instance, many scientific discoveries, such as Newton’s dynamics and the General Theory of Relativity, were made much more convincing by how they handled old evidence—but, per Bayesian thinking, old evidence, being already known, should cause no change in the plausibility we assign to present theories at all.

However, it is Salmon, in his advocacy of a Bayesian model of theory acceptance, who points at the problem I see in Chalmers’ contention. He offers the following example:

“Let us look at a simple and non-controversial application of Bayes’s theorem. Consider a factor that produces can openers at the rate of 6,000 per day. This factory has two machines, a new one that produces 5,000 can openers per day, and an old one that produces 1,000 per day. Among the can openers produced by the new machine 1 percent are defective; among those by the old machine 3 percent are defective. We pick one can opener at random from today’s production and find it defective. What is the probability that it was produced by the new machine?” (554)

Salmon shows, by Bayesian inference, that the probability is 5/8ths. Now, this is all fine as far as it goes. But let’s say that we bring this analysis to the plant foreman, and he says, “But wait a second—the day that piece was produced, the new machine was offline most of the day, and only put out 1000 pieces!”

Now, in this situation, I contend, the right maneuver is not to update based on our initial priors, but to reset the priors and right away get the correct probability of 1/4—but this is the very procedure that, Chalmers contends, is irrational. And, interestingly, this example is very much like the one I recall Bayes himself presenting to illustrate his theory. As I remember (I haven’t been able to Google this example up), Bayes used an example of deciding where the middle of a pool table lay. The procedure went something like (again, this is all from memory at present): Pick a spot that looks like the middle to you. Now, have someone keep breaking and see how many balls wind up on each side of your “prior” middle, then update your prior based on the idea that, over the long run, half of all balls should wind up on each side of the middle.

Now, I believe that both of the examples show perfectly sound uses of Bayesian inference, and the reason is that the problems (determine where the bad can opener came from and determine the middle of the table) are set within a framework where we believe we understand everything relevant to our problem. And, once again with the latter example, I suggest Bayesian inference breaks down if it turns out we were wrong about the framework of the problem we are trying to answer. If, while calculating where the middle of the pool table is, we suddenly spy a dwarf with a magnet under the table, and then pick up a ball and note that it feels like it might have an iron core, the right move would be to say, “Wait a minute—all bets are off!” We’d then gauge the dwarf’s agility, the size of the balls’ iron core, the strength of the magnet, etc., and then reset our priors to handle the actual situation.

The force of these examples, if I am correct in my analysis of them, is that, I suggest, they capture what happens in a time of Kuhnian paradigm change. (And, to relate this post to economics, perhaps in Kirznerian entrepreneurship as well?) For instance, if we surveyed scientists in the 19th century about whether measuring rods could change their length simply by being moved about, they would have said, not “probably not” but “no way,” because they would have seen this not as an empirical issue, but an analytic one—it was not that steel rods didn’t change length because you put them on a fast train, but that anything that did so could not serve as a measuring rod. (And an analytical impossibility should get a prior probability of zero, so that no amount of new evidence will dislodge it—it might turn out that nothing can serve as an (ideal) measuring rod, but not that the analytical truth was false.) What Einstein achieved, with the Special Theory of Relativity, was to give science an entire, previously unimagined framework that showed that, given another constant (the speed of light) and an equation relating the rods changes in length to the percentage of the speed of light at which they were moving, this was not an analytical truth at all. And, I suggest, what scientists really did, confronted with this new theory, was much more akin to resetting their priors than it was to updating based on fixed priors. And that it was eminently rational!

Now, it might be proposed, to counter my argument, that we must envision an ideal Bayesian reasoner, who could already conceive every scientific theory when setting her priors. Such a construct might prove useful in some situations, but it totally disconnects Bayesian inference from any mooring to the actual, historical process by which science really advances. Essentially, we are being asked to envision science as already complete, which, of course, obviates any need to ever correct priors set based on the scientific knowledge and theories actually available at any given time!

What this all means is that Bayesian inference is an incomplete model of scientific reasoning. But after the work of Carroll, Wittgenstein, Gödel, and Polanyi, are we really surprised that formal models can’t capture everything that is reasonable about science?

Gene, your post is very stimulating (and distracting!) to me. I’m not sure how much I’m really getting the issues.

Take your factory example, which captures, I gather, the crux of the matter.

The report about the machine being broken updates a different set of probabilities. You’re talking about two different Bayesian updating problems. The 5/8th value was conditional on the machines producing their benchmark outputs. The report about the new machine producing only 1,000 units induces you to update your probabilities on machine output. The 5/8 posterior is still good, just not relevant to the unconditional question of what might be the probability that the bum opener was produced by the new machine. Thus, I would have thought, it remains true that “resetting your priors is irrational in a Bayesian framework.” You don’t reset your priors at all in the factory example. You just change your opinion of which (conditional) posteriors to apply.

I don’t mean to deny your larger points, however, which I take to be the following. First, Bayesian updating does not tell us anything about abduction and must therefore be an incomplete account of scientific induction as it really happens in practice. Second, real scientists are often “irrational” because their priors put zero probability weight on possibilities that should have non-zero weight. Third, a “Kuhnian paradigm shift” often shows us that a set of possibilities we had not even imagined should get positive probability weight. As far as I can tell, BTW, these points are not distinct.

Aren’t you also raising the unlistability problem? If our factory manager did not realize that the 5/8th value was conditional, then he put a zero probability weight on the contingency the foreman identifies. From within the Bayesian framework, the posterior probability of the new machine producing only 1,000 units a day remains a big fat zero. You can’t reset your priors except by stepping out of framework. When you get evidence of an impossible event, you have to decide whether to “reset your priors” or discount the evidence to zero. If you reset, I would have thought you are “irrational” from within the Bayesian framework or stepping out of the framework or both.

BTW: I like the Kirznerian connection you make in your post. If we say one is stepping out of the Bayesian framework in cases such as the factory example, then one would seemed to be engaged in a bit of Kirznerian entrepreneurship as you seem to suggest. That’s cool, IMHO.

Thanks for the comment, Roger.

“You’re talking about two different Bayesian updating problems.”

Exactly, that’s my point. Sometimes it’s correct to stop merely updating and to realize, “I had the problem all wrong from the start.” You’re priors of 5/6 and 1/6 for the probability that a opener came from one machine or the other get reset to 1/2 and 1/2.

‘Thus, I would have thought, it remains true that “resetting your priors is irrational in a Bayesian framework.”’

Maybe I mis-stated my disagreement with Chalmers. What I take him to be saying is that, since Bayesian inference is a model of diachronic reasoning, if you accept it, once you start, you can’t go back, because if Bayesian inference correctly captures diachronic reasoning, it’s irrational to stop using it ever. My difference with him here (if I understand him correctly) is that I believe Bayesian inference is a good model of diachronic reasoning

within an understood framework, but, when you realize you got the framework wrong, it is rational to step outside it.“Second, real scientists are often “irrational” because their priors put zero probability weight on possibilities that should have non-zero weight.”

Is that really irrational? Were horse-and-buggy makers irrational not to have foreseen the automobile?

I put “irrational” in scare quotes, Gene. It would be “irrational” in precisely the sense that you’d have to jump out of the Bayesian framework. It is “irrational” in the same way Kirznerian entrepreneurship is irrational. It is “irrational,” in other words, only by a relatively narrow and, I suppose, Cartesian definition of “rational.”

I don’t know what Chalmers said or wrote. I would have thought he just meant to say something like the following. If you observe a coin you had thought to be fair come up with 10^n heads in a row (setting n big enough to make 10^n “large”), you can adjust your posterior, but not your prior. It makes no sense to say, “After seeing that, I will skew my prior toward heads. No, no. Such skewing works on posterior probabilities, not priors. As a technical note, you can’t adjust posterior appropriately if you put all your probability weight on 50/50. Then you’d have to say, “Wow, what an improbable run. How about that? But I’m still quite sure the coin is fair.”

> But let’s say that we bring this analysis to the plant foreman, and he says, “But wait a second—the day that piece was produced, the new machine was offline most of the day, and only put out 1000 pieces!”

> Now, in this situation, I contend, the right maneuver is not to update based on our initial priors, but to reset the priors and right away get the correct probability of 1/4—but this is the very procedure that, Chalmers contends, is irrational.

If you ‘reset’ like that, aren’t you assuming that the foreman is right that the machine did not run at full capacity? In other words, aren’t you assigning the proposition ‘the new machine produced 1000 pieces’ a probability of exactly 1?

Even if we trust the foreman an awful lot, 1 *is* an irrational weight to put on any proposition based just on his say-so. Perhaps he’s covering up something, etc.

“If you ‘reset’ like that, aren’t you assuming that the foreman is right that the machine did not run at full capacity?”

OK, fine gwern, but in the original problem, Salmon was assigning probability 1 to the new machine producing 5/6 of the output and the old 1/6, which probably came from the foreman as well — maybe the foreman was covering up something then, hey? The fact is, you can never get going on a problem at all without assuming you know

somethingas a stable, fixed background of knowledge. Hey, what if water is notreallyH2O? What if we are really brains in a vat? And, in any case, the issue under discussion doesn’t change if we give 1/2 99/100 probability anyway.Oh, and Roger, I think we are actually agreeing here, and have just been using different terminology.

I think the disagreement comes from the fact that there are multiple ways to apply Bayesian reasoning.

The practical way, which is the way that you (Callahan) seem to be talking about, is to take some priors that seem right and update on whatever evidence you have. This results in a simple computation, but in some circumstances, evidence comes by that disproves the priors, requiring you to change them.

The theoretical way, which is the way that I’m guessing Chalmers is talking about, is to use priors that include everything that could possibly be true. In this case, “the new machine produced 5,000 units that day” and “the new machine produced 1,000 units that day” are not priors but conclusions, and it’s perfectly rational to change them. As you realize (and as most people seem to ignore), this way is pretty much entirely academic, as applying Bayesian updating to every conceivable statement is impossible to do in this universe.

To a great extent, artificial intelligence is the study of ways to compress the theoretical version of Bayesian reasoning into something that can actually be computed.

Warrigal, the problem with your specific example is that Salmon explicitly gives the 5000 and 1000 as prior likelihoods. But in general, I think you are on target — Chalmers would say (and, in fact, did say, when I brought this up), “Well, the ideal Bayesian reasoner would have had the Special Theory of Relativity in mind already in 1850, and would have already given it a prior” — which, as I said, disconnects Bayesian inference from any relevance to the real history of science.

Gene! You were holding out on us! The story you now relate is important to knowing what he might have meant, isn’t it? >:-(

Doesn’t Chalmers answer raise problem of computability and data compression as Warigal seems to suggest? (Warigal: Your name links to a blog [?] called “Axiom of Omega.” Is that an allusion to Gregory Chaitin?) Is such sort of super-human updating “rational” if it can’t be done by a universal Turing machine? Citing Diaconis and Freedman Barkley Rosser and I touch of problems of Bayesian convergence in infinite-dimensional cases.

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=333935

Here’s an ungated draft:

http://alpha.fdu.edu/~koppl/rosser.htm

But Roger, I was hoping you would use Bayesian inference to

figure outwhat he meant, thus proving me wrong!Oh, and one more thing, warrigal:

“his results in a simple computation, but in some circumstances, evidence comes by that disproves the priors, requiring you to change them.”

But notice, The Special Theory of Relativity was not new “evidence” — it was a new framework from within which the whole issue could be re-conceived. That’s particularly when I think “updating priors” is called for — although I’m willing to agree with Roger that this could equally well be put as “re-stating the whole shebang as a new Bayesian problem.”

“Warrigal, the problem with your specific example is that Salmon explicitly gives the 5000 and 1000 as prior likelihoods.”

The fact that he says they’re prior likelihoods doesn’t mean they actually are. The “ideal Bayesian”, the uncomputable Bayesian, would consider the prior likelihoods to be something else.

I think we can all agree that some general framework of practical reasoning is needed, and that the ideal-Bayesian system isn’t it.

“(Warigal: Your name links to a blog [?] called “Axiom of Omega.” Is that an allusion to Gregory Chaitin?)”

As in Chaitin’s constant, represented by an omega? No; I wanted to make it “axiom of infinity”, but that was taken, so I figured that since the axiom of infinity postulates a set called omega, it would make sense to call it the axiom of omega.

“The fact that he says they’re prior likelihoods doesn’t mean they actually are.”

Yes, I agree, he could have made those priors of his part of the updating process as well. But I was just addressing his example as it stood — surely, for theoretical reasoning, we can stipulate, “Let’s say we know absolutely that the new machine produces 5,000 units per day and the old one 1,000,” and see how we’d reason from there — no?

warrigal, I think this comes down to the Wittgenstinian point that, even to begin disagreeing with someone, we must already acknowledge that they are mostly right about most things.

Of course, if we stipulate that we know absolutely that the new machine produces 5,000 units per day, that means we know absolutely that the foreman is simply lying.

Yes, ok, ok, warrigal, i was just accepting Salmon’s framework as he set it up, rather than trying to challenge how he could “know” these output figures. Now that I’ve acknowledged your point several times, can you let it go?

Sorry; I didn’t feel like I was pressing any point. I guess I felt like I was tying up a minor loose end that you had made; I was expecting a response along the lines of “Mm-hmm.” I probably should have taken your Wittgenstein comment as a cue to leave it alone.

Mm-hmm.

> OK, fine gwern, but in the original problem, Salmon was assigning probability 1 to the new machine producing 5/6 of the output and the old 1/6, which probably came from the foreman as well — maybe the foreman was covering up something then, hey? The fact is, you can never get going on a problem at all without assuming you know something as a stable, fixed background of knowledge. Hey, what if water is not really H2O? What if we are really brains in a vat? And, in any case, the issue under discussion doesn’t change if we give 1/2 99/100 probability anyway.

If we trust the foreman implicitly, and the original data was as trustworthy as the new assertion of low production, then doesn’t the new one override the old data without any need for a dubious reset procedure? It’s just updating our posteriors as usual. Why doesn’t the foreman’s assertion count as fresh grist for our Bayesian mill and let us get the right answer?

If we trust one of the foreman’s claims more than the other (and we are correct in doing so), then wouldn’t blindly resetting our priors lead us into error? If, for example, he claimed low production on a Friday and we know he sometimes gets drunk as a loon on Fridays (say, 1/4 of his Fridays he is a drunken liar), then the right thing to do would be to think he’s lying a quarter the time, in which case we will want to split the difference between the low-production/high-production estimates and guess that the odds the item is from the new machine is 1.75/8 or whatever.

“If we trust the foreman implicitly, and the original data was as trustworthy as the new assertion of low production, then doesn’t the new one override the old data without any need for a dubious reset procedure? It’s just updating our posteriors as usual. Why doesn’t the foreman’s assertion count as fresh grist for our Bayesian mill and let us get the right answer?”

No, gwern the procedure Salmon outlines takes the machine totals as priors AND DOES NOT RE-CALCULATE THEM. They don’t change, and so are only priors, and can’t be new posteriors.

Gene,

Yes, in some sense that was his “procedure,” but upon probing, you have reported above, he said something like ““Well, the ideal Bayesian reasoner would have [all possibilities] in mind already . . . and would have already given it a prior.” Let’s not confuse expositional infirmities with substantive errors. If your priors cover all possibilities with positive weight, new information cannot cause you to rationally revise your prior probabilities, only your posterior probabilities, period.

Roger, you’re confusing Salmon with Chalmers.

Oh! Okay, I see. Still, why are you hanging so much weight on an illustrative calculation?

I didn’t bring it up again, gwern did.

Bayesian fools!

It’s categorization (analogical inference) is the true basis of rationality, not Bayesian induction.

Formalizations of Occam are intractable, so Bayes fails. Computable general purpose rationality needs categorization to be able to creatively generate new ways of representing knowledge which were not initially conceived of and to integrate concepts from diffetent knowledge domains by grouping (forming analogies) according to similarity metrics.

You sometimes hear it said by Bayesian ideologues that ‘a plane is not held aloft by analogies’. But nor it is held aloft by probabilities!

Or the Bayesian ideologue might say ‘with infinite computing power all possibilities can be considered a priori’. But with infinite knowledge there would be no need for probabilities at all, since you could simply have a complete physical description of a system!

It’s clear it anyone who is not in the grip of a dogmatic insanity that categorization is the real basis of rationality, and Bayes is just a special case of it.

Just as Bayes is needed to represent our ignorance of the states of the external world, so too is Categorization needed to deal with the limits of our own internal knowledge representation systems (limited time and computing power).

mjgeddes, I think you are saying what I am saying: Bayesian reasoning is fine within a framework in which we have our categorizations correct, but when we don’t, the process of Bayesian updating fails to capture what really needs to be done, which is to re-categorize. Is that what you are saying?

Yes, I think you’ve nailed it exactly! So long as all the concepts we are using are well defined, Bayes will work to correctly predict outcomes. But when new insights/discoveries lead to the need to invent/extend certain types of concepts and relate them to the old ones, Bayes can fail. The process of recategorizing (forming analogies relating different concepts) will always require conscious creativity, and can never be fully reduced to mere probability shuffling.