In a recent issue of The Economist, there is a very nice article (see here) about how everyday reasoning can be compared to Bayesian inference.
This article is based on a recent paper by Griffiths and Tenenbaum (see here). What they have done is to ask questions such as "How long do you think a man who is xx years old will live?" to several people. It turns out that the answers matched very well with those which would have been obtained by applying Bayes rule. Even more, they tried this with several different types of questions, for which the implicit priors are very different (Gaussian, Erlang or power-law distributions) and in all cases, the intuitive answers given by people had the right form (in terms of distribution).
What they conclude from this is that the way people intuitively reason about the world is quite similar to applying Bayesian inference.
What is intriguing is that the article in The Economist tries to see there a proof of domination of the Bayesian over the frequentist point of view. Also in the paper of Griffith and Tenenbaum, they use the term "optimal" when they talk about Bayes rule. I think this is very misleading and inaccurate.
Indeed, the only conclusion one should draw from this study is that the way people naturally make inferences about events in the world is very much rational and this confirms the fact that has been observed many times before that the intuitive notion of rationality we have match very well with the rules of the calculus of probabilities.
But this is no surprise because these rules were designed in order to be intuitively rational (what else?). What is interesting is that rationality leads necessarily to these rules and no other, but this has been known for years.
I do not see what this study has to do with the debate between Bayesian vs frequentist. First of all, there is no real opposition between these points of view. Indeed, they lead to the same rules for combining probabilities, the only difference is in the meaning that is associated to these probabilities. So this debate is mostly philosophical and should not interfere with cognitive science studies, nor (even less) with machine learning.
Hi Olivier! Frankly I am very confused by the distinction between the Bayesian and frequentist approaches, which is drawn so often. I think Bayesianism is a particular way of incorporating prior information -- by choosing a probability distribution on the hypothesis space. However _any_ inference procedure always uses a prior of some sort.
The results in that paper basically show that human choice can be reasonably modelled by a certain Bayesian procedure under certain circumstances. However it seems to be a far-reaching conclusion indeed (which, I think, is only made in the Economist) that the brain itself uses Bayesian inference.
Btw, I was amused by the claim that the cakes baking time
distribution is more "complex and irregular" than that for human lifespan.
Posted by: misha b | January 29, 2006 at 05:37 PM
Hi Misha,
Thanks for this comment.
I perfectly agree with you: any inference has to rely on some prior assumptions. It seems that some people like to see conflicts or diverging opinions everywhere (especially journalists in this case).
To me there are several levels in which one can take the Bayesian point of view (and usually all these are somehow mixed up in one single thing):
1) Bayesianism in the interpretation of probability : the subjective interpretation of probability is often opposed to the objective one. Typically people make the associations bayesian=subjective and frequentist=objective. The things are a little more subtle than this, but in some sense it is really a philosophical debate (a very interesting one indeed). The question is often summarized as "Do the probabilities reflect some intrisic property of the objects causing events in the world or do they only measure someone's belief in the possible occurence of these events?"
Answering this question in one way or the other should not have much to do with how to perform inference.
2) Bayesian inference: applying Bayes rule in order to update one's probabilities when observing new data is show to be the most rational thing to do. However, these probabilities are usually coming from a "prior" which means something one cannot prove or disprove, it is just an assumption. So if you want to be consistent in the way you manipulate your assumptions, it makes sense to use Bayes rule. This is probably the only thing that is reflected by this paper. However, it does not mean that learning algorithms based on Bayes rule are necessarily "optimal". Indeed, there are many ways to perform inference, and using probabilities to represent the weights you assign to each possible hypothesis in only one way.
3) Bayesian analysis of inference procedures: when one analyzes in a theoretical way a learning algorithm, one may try to assess how well this algorithm can perform when the success is measured as an average over many possible situations (weighted by some "prior"). This is the so-called "average-case" analysis. It is perfectly fine, but again, measuring success in this way is just a choice and does not imply that on a particular problem a particular algorithm will do better. However, it happens that in order to optimize this measure of success, one may use Bayes rule type algorithms. But this is not a justification because the prior is used to measure the success so it is clear that the algorithm has to be based on this prior... (there is some circularity)
4) Bayesian algorithms: it is perfectly fine and often efficient to use Bayes rule to build learning algorithms. The nice thing about it is that the probabilistic setting is very convenient to express all sorts of prior assumptions one may have and in a way, it tells you how to combine this prior knowledge with the observed data. Again, this does not give any optimality guarantee, especially because the optimality is with respect to a criterion which does not make sense in many problems. But this approach, like many others, may give good algorithms in practice.
5) Bayesian interpretation of the prior: some people say that hard-core bayesians do believe in their priors. I guess this does not mean they believe Nature generate problems according to their priors, but rather that they believe they have incorporated in the prior all the knowledge they have about the problem they consider. This is fine, but I doubt anyone can sincerely claim this except in very special cases.
So to conclude, I think it is not possible to say that there are two opposed clans. The only thing is that all these ideas should be considered with care and one should try to see what the statements that are made exactly mean rather than repeating them without reason (e.g. "Bayes rule is optimal").
Posted by: Olivier Bousquet | January 29, 2006 at 11:12 PM
Olivier, nice and crisply formulated points, thanks!
I have re-read the paper more carefully and have become a bit perplexed --
what is the importance of Gaussian, Erlang, etc models, when the
true distribution is given to us as (presumably) a histogram?
The only (Bayesian) assumption that is made is that that
p(t_total|t) ~ p(t_total) * 1/t_total.
Once we believe this, we can just integrate numerically to find the
median of the conditional distribution p(t_total | t) and test it against the human judgement directly.
To take this a bit further, one might take the true conditional
distribution, which is no doubt well-known for longevity (not for
cakes, perhaps). After that, human performance can be compared to the _true_ distribution without any Bayesian (or non-Bayesian, for that matter) priors.
Even in that case, of course, optimality is still a question. Who is
to say that humans prefer medians to means?
Am I missing something?
Posted by: misha b | January 31, 2006 at 06:19 AM
Olivier,
Do you see Bayesian/frequentist disagreements as not bearing on machine learning because of the nature of machine learning, or because the interpretations of probability make no difference anywhere?
Experimental physicists like D'Agostini (http://www.roma1.infn.it/~dagos/) and Dose (http://www.ipp.mpg.de/OP/Datenanalyse/Publications/bib/node1.html) seem to think being a Bayesian makes a big difference to the way you reason from experimental data. But it might be that working in a field where there is typically very little background knowledge of the domain or of the instruments measuring the data, and where one is interested more in discrimination that the difference is much less important.
Posted by: David Corfield | March 02, 2006 at 03:35 PM
Hello,
Thank you for those clarifications about a supposed "optimality" of bayes rule.
But what's not clear for me is this sentence "So this debate is mostly philosophical and should not interfere with[...] machine learning."
For me, this debate on the meaning you give to the probability "tool" has some consequences.
For instance, if you're a pure frequentist, it's false to assign a pdf to a parameter of a model, because it's not a random variable. In this case it has some consequences, because you can't design a learning algo who makes probabilistic inference about such parameters.
Depending on your philosophical understanding, you will design different algorithm.
(other example: it makes almost no sense to a subjectivist to compute confidence intervals. The computation isn't false, but the resulting interval isn't what we intuitively desired)
Briefly, I'm not conviced that the debate is "just philosophical". (but I do not pretend to be right ;-)
Posted by: Pierre | March 17, 2006 at 02:40 PM
On bayesian vs frequentist I totally side with the bayesians. The reason is simple I view all mathematical systems as abstract models and the fundamental question is whether a particular physical phenomena matches a model. If it does you use the model to understand the physical phenomena. The basic idea of Bayesianism is to extend the mathematical model of probability to a larger class of physical problems than frequentists. The Bayesians have demonstrated fairly convincingly that probability can be used as a generalized form of logic.
Neapolitan (Learning Bayesian Networks) describes several studies where scientist have tried to find out how bayesian human reasoning is. He did a study with Morris titled "Examination of a Bayesian Network Model of Human Causal Reasoning". His finding was that humans use bayesian reasoning for problems with a small number of variables but for more complex problems the correlation of human reasoning with Bayesian reasoning declined.
Posted by: assman | August 20, 2006 at 06:22 PM
I guess frequentists worry about things like asymptotic consistency in the lack of a priori information. This question being out of the modelling assumptions of Bayesianism, most Bayesians don't worry about this question. However, sometimes they do. Persi Diaconis and David Freedman has a few papers on this. They show the inconsistency of Bayes estimates in non-parametric situations can arise quite commonly when working with hierarchical priors (as seen commonly in todays' ML literature). In some special cases they show that by choosing your prior carefully, you can avoid these situations.
Posted by: Csaba Szepesvari | April 17, 2007 at 06:03 AM
I agree with you. People have the capacity to think logically.
Posted by: credit card processing | February 11, 2011 at 10:26 AM