This post is quite technical, but this question has been obsessing me for a while.
The starting point is the theorems about universal consistency of learning algorithms: a learning algorithm is universally (strongly) consistent provided, for any probability measure, as the sample size grows to infinity, the error of the learning algorithm converges (almost surely) to the Bayes error (the best possible error).
There is no issue in countable spaces, because everything in the above statement can be defined in a straightforward way.
However, if you work in an uncountable space (like the set of real numbers), things get more complicated. Of course one may argue that it is unnatural to model data processing phenomena (which necessarily deal with finite precision numbers in practice) with such an abstraction as the set of real numbers, but it is so convenient that nobody really argues...
So, in the real numbers, to define probabilities properly, one has to introduce a sigma-algebra, that is a collection of sets (with certain properties) on which the probabilities are defined. The elements of the sigma-algebra are called "measurable" and they are the only ones for which probabilities make sense. The point is that not every subset of the reals is measurable (at least in the standard Lebesgue construction) and this has to do with the Axiom of Choice. A consequence of this construction is that the "Bayes classifier" (i.e. the best possible prediction model for classification) is a measurable function. And the existence of universally consistent learning algorithms is a consequence of the fact that every measurable function can be approximated by "simple" functions which you can learn from finite samples.
In other words, when your input space is the set of real numbers, universal consistency of a learning algorithm is related to the measurability property.
So the questions that comes to mind are:
- Can one imagine a model of learning on the real numbers where universally consistent algorithm exist but where there is no requirement of measurability for the Bayes classifier?
- Would it help to work in a context where the axiom of choice is replaced by an axiom such as "all subsets of R are measurable"?
- Should one just forget about the real numbers when dealing with learning theoretic questions and rather use another model?