Using the Bayesian modeling approach is very convenient when designing learning algorithms. You just have to encode your prior knowledge and assumptions into a prior distribution and a likelihood function and then you can simply unroll the Bayes rule mechanism. Of course there are usually sophisticated computational problems, but no conceptual ones.
So-called "frequentists" usually try to justify the algorithm they design based on a theoretical analysis in the iid framework. Hence it is common practice (I am not saying here that I like this practice) to use a probabilistic error bound as a justification for a new algorithm.
The problem is that it is usually very hard to obtain such a bound and there is no unique way or guiding principle for doing so.
Fortunately, people have recently obtained bounds that involve a prior over a class of functions. They are called "PAC-Bayesian" bounds and are quite easy to apply to several different algorithms.
In a way this gives rise to a new principled method for producing new algorithms. Here is the recipe:
- You encode your prior assumptions into a prior distributions (just like in the Bayesian case)
- You plug this distribution into a PAC-Bayesian bound
- You use the bound as an objective criterion, so that the algorithm is simply minimizing the bound
This gives something that is similar in spirit to the MAP (maximum a posteriori) method. In my opinion, this is not any better or worse than MAP and there is no reason to believe that this way of designing algorithms is better than the Bayesian way.
Unfortunately, people tend to think that because the algorithm was derived from a bound which is some sort of formal mathematical statement, then it should be a better algorithm that if this were not the case.