Machine Learning Thoughts

Some thoughts about philosophical, theoretical and practical aspects of Machine Learning.

About

My Photo

Favorite Links

  • Publications
  • Homepage

Categories

  • Artificial Intelligence
  • Data Mining
  • General
  • Links
  • Machine Learning
  • Personal
  • Pertinence
  • Philosophy
  • search engine
  • Theory

Recent Comments

  • freight on Why do we do Science?
  • freight on Why do we do Science?
  • Poker Ohne Einzahlung on Decision-making
  • Bonus Senza Deposito on Decision-making
  • Bonus sans dépôt on Decision-making
  • anti cellulite on Happiness of a scientist II: the 80/20 rule
  • Thesis Writing on The Failure of AI
  • nail school online on Happiness of a scientist II: the 80/20 rule
  • anti cellulite on The Failure of AI
  • Facebook advertising on Happiness of a scientist I: rationalization

Related blogs

  • Sam Cook
  • Group blog
  • Grant Ingersoll
  • Hal Daume III
  • ?Notes
  • Fernando Diaz
  • Matthew Hurst
    Director of Science and Innovation, Nielsen BuzzMetrics; co-creator of BlogPulse.
  • Daniel Lemire
  • Leonid Kontorovich
  • Cognitive Daily

Archives

  • February 2007
  • November 2006
  • September 2006
  • June 2006
  • May 2006
  • April 2006
  • March 2006
  • February 2006
  • January 2006
  • November 2005

Favorite Books

  • Advanced Lectures on Machine Learning : MLSS 2003 (Olivier Bousquet, Ulrike von Luxburg, Gunnar Rätsch eds)
  • Algorithmic Learning in a Random World (Vladimir Vovk,Alex Gammerman,Glenn Shafer)
  • Probability and Finance: It's Only a Game! (Glenn Shafer,Vladimir Vovk)

Machine Learning (Theory)

Add me to your TypePad People list
Subscribe to this blog's feed
Blog powered by TypePad

Other links

  • Listed on BlogShares
My Squidoo Lens

Learning abilities of computers and humans

While commenting on the comments to my previous note (see here), I thought about the comparison of human and computers in terms of learning performance. Does this comparison even make sense? If it does, on which basis, or with which criteria can it be made?

If you adopt a purely theoretical point of view, the no free lunch theorem tells you that you just cannot compare two learning machines in general. So all comparisons should be based on a specific, restricted set of reference problems.

If you adopt a Bayesian point of view, learning is very easy: once you have chosen your prior (an likelihood function), it is just matter of computation to get the posterior. So that, again, you cannot compare two learning machines, you can only compare their priors. Their performance will be more or less directly related to how well the prior matches the problem at hand (i.e. how high is the prior probability of the problem to be learned).

So we may already have very efficient learning algorithms (probably even better than the brain because they can compute more precisely and much faster -- although one can discuss what computing means in this context), and we still believe computers are not able to match humans learning performance because we compare them on tasks for which humans have much better priors.

Of course I am not saying anything new here: Bayesians would tell you that this has always been clear for them, you only have to build a good prior and you are done.

But building a good prior is not an easy task: it requires to define the right features, to find the right notion of smoothness... and there is basically no guidance for this! Moreover, it is completely problem-specific. So, apart from helping to implement Bayes rule efficiently, general (i.e. application-independent) Machine Learning research cannot help much.
One could then draw the conclusion that the essence of the learning problem is not statistical but computational.

But I still think there are important statistical problems, and I will come back to this issue in a future note...

June 14, 2006 in General | Permalink | Comments (16) | TrackBack (0)

Blink

I have just started reading "Blink, the power of thinking without thinking" by Malcolm Gladwell. Similarly to Freakonomics, this book has sold very well in the US. I was thus curious about it.
Overall, it is fun to read, although a bit unorganized. But what is especially striking for me is the main claim that humans can reason unconsciously.
More precisely, there are many situations (and the book gives a large number of surprising yet convincing examples) where humans are able to perform difficult "classification" tasks unexpectedly fast. For example, some art experts are able to tell apart genuine sculptures from fake ones virtually in a blink. Even more surprising: they are completely unable to explain what makes them think a specific sculpture is fake!

It thus seems (and there are plenty of psychological studies about this), that, with enough training, humans are able to learn very difficult classification (I take it in the classical Machine Learning sense) tasks, including tasks that are not natural.

Let me try to explain what is new here.
We know that humans are very good at learning certain classification tasks: young children can classify objects from images very easily and get a much better performance than any computer to date.
Also we know that once this has been learned, the actual classification of any new image can be done in a few milliseconds.
Hence, with enough training, the brain is able to perform this complicated task very easily, without requiring any conscious reasoning to take place.

However, I used to think that the tasks we can learn easily are those for which we have a sufficiently strong prior encoded into our genes. In other words, I thought that the ability to learn visual classification tasks was the result of a long natural evolution (which provides us with the appropriate pre-wiring, or the prior in Bayesian terms) combined with a short period of adaptation (similar to computing the posterior in Bayesian terms again).
What is new to me in this book is the following: we can be trained to perform tasks that have nothing to do with evolutionary constraints, and this training can be performed unconsciously (without any explicit or conscious reasoning). An example of this phenomenon is given in the book: a tennis coach once realized that he could predict whether a tennis player would miss his service right before he would hit the ball. However he would not be able to explain why and how he could do so!

This may show that our brain hosts a powerful learning engine (with a powerful feature extractor to isolate the relevant information) that does not even require our attention to be triggered and that can deal with many different learning tasks.
Of course this raises the question of the prior: we know that there is no better learning algorithm, but only algorithms better adapted to learning problems. In other words, we can only learn the problems that have a large enough weight under the prior, which means it is hard to be good simultaneously for many different tasks.
Why is it that the prior encode into our brain allows us to learn such useless tasks as being able to tell whether a tennis-man will fail his service? and why is it that this prior is not more "peaked" around the tasks that are really useful for our survival?
I guess this book is related to a lot of interesting cognitive science problems but it also revived my interest in human learning and its relationship to Machine Learning...

May 04, 2006 in General, Philosophy | Permalink | Comments (11) | TrackBack (0)

Can a computer think?

I recently came across the webpage of Jeffrey Shallit, a very impressive computer scientist, and I saw he gave a talk on a topic that can be of interest to people reading this blog: Can a Computer Think?
The slides are very documented and comprehensive, he also has a reading list associated to this talk on his website.
What I especially like about this talk is that it gives an interesting historical perspective, showing how many people had predicted that computers would achieve some task in the near future  and none of these predictions were correct.

Also of interest is the quote by Hofstadter who essentially says that "intelligence" is what computers cannot do. Indeed, once computers can do something, we start to think that it does not require intelligence.

But if the definition of "thinking" is very controversial, it might be a better choice to focus on simpler things like "learning" and ask the question "Can a computer learn?".
Of course, ML researchers are exactly after that, and to some extent, it is clear that computers can learn.

However, if we try to define more precisely what learning is, there are several issues. In particular, there are at least three levels at which we can define the learning phenomenon:

  1. Low-level: Ability to adapt to a (changing) environment
  2. Medium-level: Ability to perform a task or to improve at performing a task without being taught explicitly (by practice or imitation)
  3. High-level: Ability to infer general laws from particular instances (induction)

The first level is somewhat "unconscious" and is something that could be said of most animals.
The second level is also something many animals can do.
The third level is more "conceptual" and seems to require some "thinking". But this is not necessarily an exclusively human ability: indeed, when a dog learns that bringing back the stick will get him a stroke, this is also some kind of induction.

I am not sure the above distinction really makes sense and it might be impossible to say which form of learning actually occurs in a specific situation.
However, computers have clearly demonstrated all of them, at least in a very simple way.

November 03, 2005 in General, Philosophy | Permalink | Comments (47) | TrackBack (0)

Why do we do Science?

I have been asked several questions revolving about the usefulness of research: why do you do research in mathematics, computers can do the calculations? or why do you do research in computer science, is it to build faster computers? or what is the use of doing all these complicated calculations, is there any application you can make money of?
At some point I used to answer that there is nothing more useful than something that seems useless like a new mathematical theory. My argument was that things that are immediately useful are only useful immediately, while things for which we do not see any immediate application may very well turn out to lead to entirely new technologies in the long run. As an example, take the complex numbers. When they were invented, they were considered as a nice creation of the mind, as something only some mathematicians understood, but as something that would never have any application in the real world. Centuries later they are at the basis of many fields of technology we could hardly live without such as signal processing or electronics.
Unfortunately, it is not easy to predict which of the many mathematical works that are done today will be most useful in a few centuries. There is thus potentially a lot of wasted effort.
Another thing I used to say is the following: it is more fruitful to build a theory that explains several phenomena than to solve a specific problem (this is the difference between science and engineering).
A quote which I like is : "There is nothing so practical as a good theory". It is originally from Kurt Lewin (although some ML people think it is due to Vapnik because it is often used by him).

Anyway, instead of trying to justify scientific research, it is probably more interesting to think about how this research should be conducted and in particular what should be the motivations of someone doing so.
I recently found some interesting answers in the following quotes from Albert Einstein (taken from "The Einstein-Besso Manuscript", Scriptura, Aristophil 2005):

  1. "My scientific work is motivated by an irresistible longing to understand the secrets of Nature and by no other feelings. My love for justice and the striving to contribute toward the improvement of human conditions are quite independent from my scientific interests."
  2. "The important thing is not to stop questioning. Curiosity has its own reason for existing. One cannot help but be in awe when he contemplates the mysteries of eternity, of life, of the marvelous structure of reality. It is enough if one tries merely to comprehend a little of this mystery every day. Never lose a holy curiosity.
  3. "To be sure, it is not the fruits of scientific research that elevate a man an enrich his nature, but the urge to understand, the intellectual work, creative or receptive."
  4. "Where the world ceases to be the scene of our personal hopes and wishes, where we face it as free beings admiring, asking and observing, there we enter the realm of Art and Science."
  5. "The most beautiful thing we can experience is the mysterious. It is the source of all true art and science, who can no longer pause to wonder and stand rapt in awe is as good as dead: his eyes are closed."
  6. "After a certain high level of technical skill is achieved, science and art tend to coalesce in aesthetics, plasticity, and form. The greatest scientists are always artists as well.
  7. "It is my inner conviction that the development of science seeks in the main to satisfy the longing for pure knowledge."

So as a conclusion, the main motivation is the curiosity or the desire to understand, there should be no other. This is probably a bit idealistic, but what is life without a bit of idealism?

October 30, 2005 in General, Philosophy | Permalink | Comments (42) | TrackBack (0)

French teams getting together

There are many people working on Machine Learning in France, but unfortunately, they have little influence/presence on the international scene. Hence they have decided to get together and to start being "pro-active". The idea is to foster research activities in this area by organizing seminars, applying for funding, running projects...
The homepage for this initiative is http://proml.lri.fr (in french of course).

October 23, 2005 in General | Permalink | Comments (0) | TrackBack (0)

Scientific Names and Their Relationship

This is a followup of this post and of the comments made about my previous post.

I have use the idea of Rudi Cilibrasi and Paul Vitanyi (see their preprint here) of extracting from Google page counts a "semantic" distance between terms.
So I ran the following experiment: I did Google searches for the terms Statistics, Statistical, Data Analysis, Data Mining and Machine Learning.
The individual page counts came as follows:

Statistics 573000000
Statistical 158000000
Data Analysis 38600000
Data Mining 17500000
Machine Learning 5250000

Of course Statistics is the most common. One reason is that it is not exclusively associated to a scientific discipline.
Then, I used the so-called "Normalized Google Distance" (NGD) to assess the relationship between these terms. Here is the result:

  St.tics St.cal DA DM ML
Statistics 0.00 0.53 0.73 0.93 0.82
Statistical 0.53 0.00 0.49 0.72 0.65
Data Analysis 0.73 0.49 0.00 0.54 0.61
Data Mining 0.93 0.72 0.54 0.00 0.39
Machine Learning 0.82 0.65 0.61 0.39 0.00

It is interesting to notice that Machine Learning is more related to Statistics than is Data Mining. Also, Machine Learning and Data Mining are very close to each other, while Data Analysis is closer to Statistics.

I also tried to compute the NGD between these terms and the term "Company" in order to see which one had penetrated the corporate world the most. The results were not very enlightening. However, if you look at the page counts for the phrase "Statistics Company", or replace Statistics by the other terms, you get the following:

Statistics Company 23100
Data Mining Company 10500
Data Analysis Company 623
Statistical Company 163
Machine Learning Company 140

Interestingly, Machine Learning is very seldom associated to company, while Data Mining companies seem to abund.

Another possible measure, is the number of ads you get when you do these searches. I noticed that the search "Data Mining" or "Data Mining Company" are returning an incredible number of ads for data mining software and companies (much more than statistics or machine learning).

So there is still a lot to be done in order to make Machine Learning better recognized...

September 19, 2005 in Data Mining, General, Machine Learning | Permalink | Comments (0) | TrackBack (0)

Data in the Corporate World

Dealing with data is becoming an important part of the job of most large companies. Within this area, tasks such as storing and managing data are now well-mastered, so that the key capability now becomes the analysis or leveraging of this data.
Hence Data Mining is becoming an increasingly important concern of most hi-tech companies. This trend is witnessed by the recent creation of the CDO (Chief Data Officer) title at Yahoo! (see here), which has been given to a former Data Mining researcher.
Another indication of this trend can be found in the educational domain: most computer science departments in the big US universities now offer courses in Data Mining or Machine Learning.
These terms are also now known to people remote from the scientific world.

So it seems that Machine Learning is no longer an obscure research field, and more and more a popular technological domain.

September 14, 2005 in Data Mining, General, Machine Learning, Pertinence | Permalink | Comments (6) | TrackBack (0)

Interesting Links

Here is a short list of surveys or courses that may be of interest for someone who wants to get acquainted with theoretical topics in Machine Learning:

  • Avrim Blum's tutorial at FOCS03
  • Avrim Blum's course on Machine Learning
  • A set of lectures by Andrew W. Moore, containing some slides about theoretical aspects
  • A survey paper I wrote with Stéphane Boucheron and Gabor Lugosi
  • ISCAS2001 course by Bartlett, Cauwenberghs and Smola
  • Vasant Honavar's course
  • Rob Shapire's course notes

July 22, 2005 in General | Permalink | Comments (1) | TrackBack (0)

Why this blog?

Blogs are a fun way to communicate.
But beyond being fun, they seem to be a very promising tool for researchers. Indeed, I realize everyday that the core activity of a researcher is not to think but to share ideas with others. I noticed that productivity is a monotone function of the number of conferences you attend, papers you read or time you spend discussing with others.

So the main goal of this blog is to increase the "ideas traffic" over the internet about the topic I am interested in: Machine Learning.

To be more specific, there are three areas in which I would like to actively start or participate in discussions:

  1. Philosophy: as a relatively young scientific field, Machine Learning lacks unity and foundations. This is fortunate because it prevents, to some extent, people from being too dogmatic. The overall atmosphere is very open to various sources of inspiration and to cross-fertilization with remote fields. Ideally, foundations should be built in a way that does not restrict this openness.
  2. Theory: while the first item is about philosophical foundations, the same is true of the theoretical foundations. There is already a large litterature about theoretical topics of Machine Learning, but there is no such thing as a Machine Learning Theory. Of course, one may argue that there is no need for a single theory (or that it is not possible to unify this diverse field), but this is a matter of discussion, debate and/or research.
  3. Practice: finally, I would like to discuss and exchange ideas about practical aspects. In this topic, I include both the design of algorithms, their implementation and the various problems  that they may be applied to.

Overall, I hope this blog will not only be a way for me to publish ideas but will be enriched by others' contributions and thus be profitable to others  in the ML community.

June 27, 2005 in General | Permalink | Comments (0) | TrackBack (0)