 ## Machine Learning (Theory) Hi Olivier! Frankly I am very confused by the distinction between the Bayesian and frequentist approaches, which is drawn so often. I think Bayesianism is a particular way of incorporating prior information -- by choosing a probability distribution on the hypothesis space. However _any_ inference procedure always uses a prior of some sort.

The results in that paper basically show that human choice can be reasonably modelled by a certain Bayesian procedure under certain circumstances. However it seems to be a far-reaching conclusion indeed (which, I think, is only made in the Economist) that the brain itself uses Bayesian inference.

Btw, I was amused by the claim that the cakes baking time
distribution is more "complex and irregular" than that for human lifespan. Hi Misha,
Thanks for this comment.

I perfectly agree with you: any inference has to rely on some prior assumptions. It seems that some people like to see conflicts or diverging opinions everywhere (especially journalists in this case).

To me there are several levels in which one can take the Bayesian point of view (and usually all these are somehow mixed up in one single thing):

1) Bayesianism in the interpretation of probability : the subjective interpretation of probability is often opposed to the objective one. Typically people make the associations bayesian=subjective and frequentist=objective. The things are a little more subtle than this, but in some sense it is really a philosophical debate (a very interesting one indeed). The question is often summarized as "Do the probabilities reflect some intrisic property of the objects causing events in the world or do they only measure someone's belief in the possible occurence of these events?"
Answering this question in one way or the other should not have much to do with how to perform inference.

2) Bayesian inference: applying Bayes rule in order to update one's probabilities when observing new data is show to be the most rational thing to do. However, these probabilities are usually coming from a "prior" which means something one cannot prove or disprove, it is just an assumption. So if you want to be consistent in the way you manipulate your assumptions, it makes sense to use Bayes rule. This is probably the only thing that is reflected by this paper. However, it does not mean that learning algorithms based on Bayes rule are necessarily "optimal". Indeed, there are many ways to perform inference, and using probabilities to represent the weights you assign to each possible hypothesis in only one way.

3) Bayesian analysis of inference procedures: when one analyzes in a theoretical way a learning algorithm, one may try to assess how well this algorithm can perform when the success is measured as an average over many possible situations (weighted by some "prior"). This is the so-called "average-case" analysis. It is perfectly fine, but again, measuring success in this way is just a choice and does not imply that on a particular problem a particular algorithm will do better. However, it happens that in order to optimize this measure of success, one may use Bayes rule type algorithms. But this is not a justification because the prior is used to measure the success so it is clear that the algorithm has to be based on this prior... (there is some circularity)

4) Bayesian algorithms: it is perfectly fine and often efficient to use Bayes rule to build learning algorithms. The nice thing about it is that the probabilistic setting is very convenient to express all sorts of prior assumptions one may have and in a way, it tells you how to combine this prior knowledge with the observed data. Again, this does not give any optimality guarantee, especially because the optimality is with respect to a criterion which does not make sense in many problems. But this approach, like many others, may give good algorithms in practice.

5) Bayesian interpretation of the prior: some people say that hard-core bayesians do believe in their priors. I guess this does not mean they believe Nature generate problems according to their priors, but rather that they believe they have incorporated in the prior all the knowledge they have about the problem they consider. This is fine, but I doubt anyone can sincerely claim this except in very special cases.

So to conclude, I think it is not possible to say that there are two opposed clans. The only thing is that all these ideas should be considered with care and one should try to see what the statements that are made exactly mean rather than repeating them without reason (e.g. "Bayes rule is optimal"). Olivier, nice and crisply formulated points, thanks!

I have re-read the paper more carefully and have become a bit perplexed --
what is the importance of Gaussian, Erlang, etc models, when the
true distribution is given to us as (presumably) a histogram?
The only (Bayesian) assumption that is made is that that
p(t_total|t) ~ p(t_total) * 1/t_total.
Once we believe this, we can just integrate numerically to find the
median of the conditional distribution p(t_total | t) and test it against the human judgement directly.

To take this a bit further, one might take the true conditional
distribution, which is no doubt well-known for longevity (not for
cakes, perhaps). After that, human performance can be compared to the _true_ distribution without any Bayesian (or non-Bayesian, for that matter) priors.

Even in that case, of course, optimality is still a question. Who is
to say that humans prefer medians to means?

Am I missing something? Olivier,

Do you see Bayesian/frequentist disagreements as not bearing on machine learning because of the nature of machine learning, or because the interpretations of probability make no difference anywhere?
Experimental physicists like D'Agostini (http://www.roma1.infn.it/~dagos/) and Dose (http://www.ipp.mpg.de/OP/Datenanalyse/Publications/bib/node1.html) seem to think being a Bayesian makes a big difference to the way you reason from experimental data. But it might be that working in a field where there is typically very little background knowledge of the domain or of the instruments measuring the data, and where one is interested more in discrimination that the difference is much less important. Hello,
Thank you for those clarifications about a supposed "optimality" of bayes rule.
But what's not clear for me is this sentence "So this debate is mostly philosophical and should not interfere with[...] machine learning."
For me, this debate on the meaning you give to the probability "tool" has some consequences.
For instance, if you're a pure frequentist, it's false to assign a pdf to a parameter of a model, because it's not a random variable. In this case it has some consequences, because you can't design a learning algo who makes probabilistic inference about such parameters.
Depending on your philosophical understanding, you will design different algorithm.
(other example: it makes almost no sense to a subjectivist to compute confidence intervals. The computation isn't false, but the resulting interval isn't what we intuitively desired)

Briefly, I'm not conviced that the debate is "just philosophical". (but I do not pretend to be right ;-) On bayesian vs frequentist I totally side with the bayesians. The reason is simple I view all mathematical systems as abstract models and the fundamental question is whether a particular physical phenomena matches a model. If it does you use the model to understand the physical phenomena. The basic idea of Bayesianism is to extend the mathematical model of probability to a larger class of physical problems than frequentists. The Bayesians have demonstrated fairly convincingly that probability can be used as a generalized form of logic.

Neapolitan (Learning Bayesian Networks) describes several studies where scientist have tried to find out how bayesian human reasoning is. He did a study with Morris titled "Examination of a Bayesian Network Model of Human Causal Reasoning". His finding was that humans use bayesian reasoning for problems with a small number of variables but for more complex problems the correlation of human reasoning with Bayesian reasoning declined. I guess frequentists worry about things like asymptotic consistency in the lack of a priori information. This question being out of the modelling assumptions of Bayesianism, most Bayesians don't worry about this question. However, sometimes they do. Persi Diaconis and David Freedman has a few papers on this. They show the inconsistency of Bayes estimates in non-parametric situations can arise quite commonly when working with hierarchical priors (as seen commonly in todays' ML literature). In some special cases they show that by choosing your prior carefully, you can avoid these situations. I agree with you. People have the capacity to think logically.

The comments to this entry are closed.