Is learning theory really answering the right questions?

More precisely, does working on (statistical) learning theory bounds really help designing better algorithms?

Let me first say that this note is not against anyone in particular.

I know most of the people working on SLT bounds and I do have a lot of respect for them, and I also wrote papers that were trying to justify an algorithm from a theoretical analysis. So I am just questioning the approach but not judging anyone following it!

I have spent a lot of time thinking about what theory did really bring that could not have been obtained without it. Because this is really the question. If you consider the algorithms that people use in practice, the two important questions are whether any of those algorithms could not have been inspired by considerations of a non-SLT nature, and whether the SLT analysis brings an understanding of why they work.

Regarding the first one (inspiration), it is risky to tell what could have inspired an algorithm a posteriori. But I believe that the amount of effort spent on trying to prove bounds and then obtain a criterion to minimize for an algorithm would be better used if people were just trying to find new criteria directly. Indeed, there are surely many ideas that we (theoreticians) refrain from trying or even expressing, because we do not see how they are connected to the standard patterns of SLT analysis!

Regarding the second one (whether bounds justify an algorithm), I would be even less confident: the only thing we can infer from most learning theory bounds is that the algorithm we are studying is not meaningless, that is, it will eventually make good predictions (with enough samples), but these bounds cannot really help comparing two algorithms. Also, they rarely (if ever) can justify the use of a specific criterion.

So, I think that the theoretical analysis is mainly a way to popularize an algorithm and to raise its visibility. The effect is then that more people try it out, and streamline it. So in the end, the algorithm may be adopted, but a theoretical analysis rarely justifies the algorithm and never provides guarantees. Hence theory is a way to attract attention to an algorithm.

It should also be a way to get new insights for the development of new algorithms, but this happens much less frequently than is claimed!

Hi Olivier,

just wanted to follow up on our very pleasant discussion at ALT06. You

raise a very important question -- unlike pure mathematicians, we

machine-learning theorists are not immune from challenges to justify

our work, and ideally, we aspire for our theory to have maximal impact

in the real world.

If memory serves me right, our discussion went something like

this. There are some very simple, intuitive algorithms (SVMs,

boosting) that are quite effective in practice, and generate hundreds

of pages of analysis to justify what we already knew (i.e, that they

work well). If that were the whole story, the state of affairs for SLT

would be pretty bleak!

Let me try to inject some optimism here. First, an example. I don't

know if non-parametric regression counts as learning proper, but if it

does, then surely the recent result of Lafferty and Wasserman,

https://www.cs.cmu.edu/~lafferty/pub/rodeo.pdf

is a success story for SLT. Here is a clear example where

sophisticated statistical analysis lead directly into a simple to

implement but not exactly "intuitive" algorithm, and I'm hard pressed

to imagine anyone coming up with something like this on intuition

alone.

More generally:

1. As our mathematical tools advance, we should expect to see more of

such algorithms derived from "first principles". The analysis of an

algorithm (proof of correctness and efficiency) is almost invariably

much more complicated than the algorithm itself; this is true not just

for learning (isn't Quicksort still being analyzed?). So if it seems

that the theory is running to catch up to what's "known" in practice

-- well, in some sense, that's the natural state of affairs. Which

brings us to

2. What does it mean to say that an algorithm "works"? It's an

ill-posed question, as the notion means different things in different

contexts. For an industry application, a lengthy and opaque proof of

convergence might well be superfluous; much more indicative is the

algorithm's actual performance on diverse datasets. Such an approach

would be called "heuristic" or "ad-hoc" -- neither a compliment coming

from a theorist (snobbery, as always, works in both directions). How

much faith can you place in a tool that seems to work well, but you

really don't know why, or if it actually does, and what its scope and

limitations are? Which brings us to

3. What is the purpose of a proof? Non-mathematicians often mistakenly

assume that the purpose of proving an assertion is to convince others

of its validity. I can think of much more efficient ways to accomplish

this task! Take Pythagoras's theorem as an example. If my goal is

merely to convince you that in a right triangle, a^2 + b^2 = c^2, I'd

make you draw a thousand right triangles and painstakingly measure

their sides. Something tells me that by the thousandth example, you

wouldn't have a shred of doubt regarding the validity of the

claim. What you wouldn't have is any insight into the problem or any

idea *why* this claim holds. If asked to conjecture how this might

extend to general triangles, your best bet would be to go back to the

drawing board.

What if, instead of giving you 1000 examples, I were to give you a

simple proof? Paradoxically, seeing a proof for the first time might

well be less persuasive than those 1000 examples -- proofs, after all,

take time to decipher and internalize. But once you understand the

proof, you have a whole new handle on the problem. Some of the

simplest proofs of Pythagoras's theorem are trigonometric; upon seeing one, you might have

some immediate ideas regarding generalizations.

All of this carries over to SLT. The oft-quoted aphorism "There is

nothing so practical as a good theory" [usually attributed to Vapnik,

but as I learned from your blog, actually belonging to Kurt Lewin] --

is not an empty cliche. A theoretician does not toil in sweat and

tears to analyze an algorithm just to give the engineer the green

light to use it (the engineer never needed such a green light!). A

theorist strives to understand under what precise assumptions the

algorithm will work, which examples will break it, and what is its

optimal scope of application. How often does this lead to new and

improved algorithms? It would be fascinating to do a case study, and

I'd be very curious to get input from some of the more knowledgeable

readers (my recent work has been so abstractly mathematical in nature,

that I'm afraid it doesn't qualify as machine learning theory, so I'm out of the loop).

So I close on an optimistic note. Understanding how and why things

work is always a good thing, and that's precisely what we need theory

for. And if it seems like the theory isn't giving us enough as far as

practical methods -- well, the theory is still young and developing,

so the ultimate solution is to develop more theory!

-Leo

Posted by: Leonid Kontorovich | October 15, 2006 at 07:36 AM

hey Olivier,

Interesting post. I just want to add one comment. One additional benefit of theoretical results that I found useful was to build confidence and perhaps suspicion about the conclusions drawn from results, particularly while comparing two approaches: to call the comparison "fair", to find the data domain(s) where the results of the comparison would hold true, and to raise eyebrows when the results do not turn out to be the way our intuition or theory suggests it should have. Although this feedback is not direct and explicit, it does help in the process of building robust algorithms. I guess it is more of a personal experience that is not reported in papers, nevertheless, it is a critical component of the process.

Posted by: Vidit Jain | November 10, 2008 at 05:05 PM

Like the old saying, practice makes perfect.

Posted by: dallas security systems | February 01, 2011 at 08:08 AM

The enduring problems of co-ordinating theory with practice in applied linguistics and language teaching are surveyed in view of the symptomatic disconnections of theory from practice in theoretical linguistics, with the suggestion that how far a theory is applicable to practice is a good measure of how far the theory is valid as a theory.

Posted by: refurbish used computer | February 07, 2011 at 01:17 PM

The additional stress you are putting on yourself worrying about when you will conceive.

Posted by: buy l arganine | April 01, 2011 at 01:59 PM

Persistent problems in the coordination of theory and practice in applied linguistics and language teaching as a symptom questionnaire to break the theory into practice theoretical language of science, referring to the fact that the extent to which the theory into practice is a good gauge of As the theory is a theory valid.

Posted by: קניית דומיין | May 31, 2011 at 08:31 AM