Is learning theory really answering the right questions?
More precisely, does working on (statistical) learning theory bounds really help designing better algorithms?
Let me first say that this note is not against anyone in particular.
I know most of the people working on SLT bounds and I do have a lot of respect for them, and I also wrote papers that were trying to justify an algorithm from a theoretical analysis. So I am just questioning the approach but not judging anyone following it!
I have spent a lot of time thinking about what theory did really bring that could not have been obtained without it. Because this is really the question. If you consider the algorithms that people use in practice, the two important questions are whether any of those algorithms could not have been inspired by considerations of a non-SLT nature, and whether the SLT analysis brings an understanding of why they work.
Regarding the first one (inspiration), it is risky to tell what could have inspired an algorithm a posteriori. But I believe that the amount of effort spent on trying to prove bounds and then obtain a criterion to minimize for an algorithm would be better used if people were just trying to find new criteria directly. Indeed, there are surely many ideas that we (theoreticians) refrain from trying or even expressing, because we do not see how they are connected to the standard patterns of SLT analysis!
Regarding the second one (whether bounds justify an algorithm), I would be even less confident: the only thing we can infer from most learning theory bounds is that the algorithm we are studying is not meaningless, that is, it will eventually make good predictions (with enough samples), but these bounds cannot really help comparing two algorithms. Also, they rarely (if ever) can justify the use of a specific criterion.
So, I think that the theoretical analysis is mainly a way to popularize an algorithm and to raise its visibility. The effect is then that more people try it out, and streamline it. So in the end, the algorithm may be adopted, but a theoretical analysis rarely justifies the algorithm and never provides guarantees. Hence theory is a way to attract attention to an algorithm.
It should also be a way to get new insights for the development of new algorithms, but this happens much less frequently than is claimed!
Hi Olivier,
just wanted to follow up on our very pleasant discussion at ALT06. You
raise a very important question -- unlike pure mathematicians, we
machine-learning theorists are not immune from challenges to justify
our work, and ideally, we aspire for our theory to have maximal impact
in the real world.
If memory serves me right, our discussion went something like
this. There are some very simple, intuitive algorithms (SVMs,
boosting) that are quite effective in practice, and generate hundreds
of pages of analysis to justify what we already knew (i.e, that they
work well). If that were the whole story, the state of affairs for SLT
would be pretty bleak!
Let me try to inject some optimism here. First, an example. I don't
know if non-parametric regression counts as learning proper, but if it
does, then surely the recent result of Lafferty and Wasserman,
http://www.cs.cmu.edu/~lafferty/pub/rodeo.pdf
is a success story for SLT. Here is a clear example where
sophisticated statistical analysis lead directly into a simple to
implement but not exactly "intuitive" algorithm, and I'm hard pressed
to imagine anyone coming up with something like this on intuition
alone.
More generally:
1. As our mathematical tools advance, we should expect to see more of
such algorithms derived from "first principles". The analysis of an
algorithm (proof of correctness and efficiency) is almost invariably
much more complicated than the algorithm itself; this is true not just
for learning (isn't Quicksort still being analyzed?). So if it seems
that the theory is running to catch up to what's "known" in practice
-- well, in some sense, that's the natural state of affairs. Which
brings us to
2. What does it mean to say that an algorithm "works"? It's an
ill-posed question, as the notion means different things in different
contexts. For an industry application, a lengthy and opaque proof of
convergence might well be superfluous; much more indicative is the
algorithm's actual performance on diverse datasets. Such an approach
would be called "heuristic" or "ad-hoc" -- neither a compliment coming
from a theorist (snobbery, as always, works in both directions). How
much faith can you place in a tool that seems to work well, but you
really don't know why, or if it actually does, and what its scope and
limitations are? Which brings us to
3. What is the purpose of a proof? Non-mathematicians often mistakenly
assume that the purpose of proving an assertion is to convince others
of its validity. I can think of much more efficient ways to accomplish
this task! Take Pythagoras's theorem as an example. If my goal is
merely to convince you that in a right triangle, a^2 + b^2 = c^2, I'd
make you draw a thousand right triangles and painstakingly measure
their sides. Something tells me that by the thousandth example, you
wouldn't have a shred of doubt regarding the validity of the
claim. What you wouldn't have is any insight into the problem or any
idea *why* this claim holds. If asked to conjecture how this might
extend to general triangles, your best bet would be to go back to the
drawing board.
What if, instead of giving you 1000 examples, I were to give you a
simple proof? Paradoxically, seeing a proof for the first time might
well be less persuasive than those 1000 examples -- proofs, after all,
take time to decipher and internalize. But once you understand the
proof, you have a whole new handle on the problem. Some of the
simplest proofs of Pythagoras's theorem are trigonometric; upon seeing one, you might have
some immediate ideas regarding generalizations.
All of this carries over to SLT. The oft-quoted aphorism "There is
nothing so practical as a good theory" [usually attributed to Vapnik,
but as I learned from your blog, actually belonging to Kurt Lewin] --
is not an empty cliche. A theoretician does not toil in sweat and
tears to analyze an algorithm just to give the engineer the green
light to use it (the engineer never needed such a green light!). A
theorist strives to understand under what precise assumptions the
algorithm will work, which examples will break it, and what is its
optimal scope of application. How often does this lead to new and
improved algorithms? It would be fascinating to do a case study, and
I'd be very curious to get input from some of the more knowledgeable
readers (my recent work has been so abstractly mathematical in nature,
that I'm afraid it doesn't qualify as machine learning theory, so I'm out of the loop).
So I close on an optimistic note. Understanding how and why things
work is always a good thing, and that's precisely what we need theory
for. And if it seems like the theory isn't giving us enough as far as
practical methods -- well, the theory is still young and developing,
so the ultimate solution is to develop more theory!
-Leo
Posted by: Leonid Kontorovich | October 15, 2006 at 07:36 AM
hey Olivier,
Interesting post. I just want to add one comment. One additional benefit of theoretical results that I found useful was to build confidence and perhaps suspicion about the conclusions drawn from results, particularly while comparing two approaches: to call the comparison "fair", to find the data domain(s) where the results of the comparison would hold true, and to raise eyebrows when the results do not turn out to be the way our intuition or theory suggests it should have. Although this feedback is not direct and explicit, it does help in the process of building robust algorithms. I guess it is more of a personal experience that is not reported in papers, nevertheless, it is a critical component of the process.
Posted by: Vidit Jain | November 10, 2008 at 05:05 PM
Like the old saying, practice makes perfect.
Posted by: dallas security systems | February 01, 2011 at 08:08 AM
The enduring problems of co-ordinating theory with practice in applied linguistics and language teaching are surveyed in view of the symptomatic disconnections of theory from practice in theoretical linguistics, with the suggestion that how far a theory is applicable to practice is a good measure of how far the theory is valid as a theory.
Posted by: refurbish used computer | February 07, 2011 at 01:17 PM
The additional stress you are putting on yourself worrying about when you will conceive.
Posted by: buy l arganine | April 01, 2011 at 01:59 PM
Persistent problems in the coordination of theory and practice in applied linguistics and language teaching as a symptom questionnaire to break the theory into practice theoretical language of science, referring to the fact that the extent to which the theory into practice is a good gauge of As the theory is a theory valid.
Posted by: קניית דומיין | May 31, 2011 at 08:31 AM