Machine Learning Thoughts

Some thoughts about philosophical, theoretical and practical aspects of Machine Learning.

About

My Photo

Favorite Links

  • Publications
  • Homepage

Categories

  • Artificial Intelligence
  • Data Mining
  • General
  • Links
  • Machine Learning
  • Personal
  • Pertinence
  • Philosophy
  • search engine
  • Theory

Recent Comments

  • freight on Why do we do Science?
  • freight on Why do we do Science?
  • Poker Ohne Einzahlung on Decision-making
  • Bonus Senza Deposito on Decision-making
  • Bonus sans dépôt on Decision-making
  • anti cellulite on Happiness of a scientist II: the 80/20 rule
  • Thesis Writing on The Failure of AI
  • nail school online on Happiness of a scientist II: the 80/20 rule
  • anti cellulite on The Failure of AI
  • Facebook advertising on Happiness of a scientist I: rationalization

Related blogs

  • Sam Cook
  • Group blog
  • Grant Ingersoll
  • Hal Daume III
  • ?Notes
  • Fernando Diaz
  • Matthew Hurst
    Director of Science and Innovation, Nielsen BuzzMetrics; co-creator of BlogPulse.
  • Daniel Lemire
  • Leonid Kontorovich
  • Cognitive Daily

Archives

  • February 2007
  • November 2006
  • September 2006
  • June 2006
  • May 2006
  • April 2006
  • March 2006
  • February 2006
  • January 2006
  • November 2005

Favorite Books

  • Advanced Lectures on Machine Learning : MLSS 2003 (Olivier Bousquet, Ulrike von Luxburg, Gunnar Rätsch eds)
  • Algorithmic Learning in a Random World (Vladimir Vovk,Alex Gammerman,Glenn Shafer)
  • Probability and Finance: It's Only a Game! (Glenn Shafer,Vladimir Vovk)

Machine Learning (Theory)

Subscribe to this blog's feed
Blog powered by Typepad

Other links

  • Listed on BlogShares
My Squidoo Lens

Building models: what for?

A large amount of the effort in Machine Learning research is devoted to building predictive models. This means trying to infer, from labeled examples, a model that can, later on, be used for making predictions on new instances. This problem of making models for prediction is relatively well understood, although very partially solved today. But, there are plenty of other reasons for building models, and this may drive a large part of the future research in this field.

For example, my current work is to investigate how one can build models for understanding, monitoring and controlling complex systems or processes. Of course, this is not new as such, but I want to emphasize here that this is an area people in Machine Learning have seldom studied, although ML techniques could very well be adapted to yield better solutions than existing approaches.

Let us leave understanding aside (as this is a very debatable issue and should be discussed separately) and focus on monitoring and control. There are plenty of methods for monitoring and controlling systems or processes. Many rely on models that are built from knowledge, some rely on statistical models (built from observations). The problem is that models built from knowledge are usually very sophisticated and specialized, while statistical models are very simple and generic. There thus seems to be a gap in between.
Machine Learning is very advanced in terms of automatically building sophisticated models from observations, while incorporating previous knowledge. The issues of trading-off the complexity of the models with their statistical properties (e.g. overfitting) has been thoroughly investigated in this field.
As a result, ML is particularly suited for providing new ways of building sophisticated, yet reliable models.

However, ML researchers have focused their effort on prediction error, while when you need to control a process, this prediction error is not what matters.
To make my point more precise, let me give an example. Assume you are trying to cook up some nice steak. Depending on the thickness of the piece of meat, you might have to heat it up longer or shorter. Your goal is to have a steak that tastes good (tender, not cold, not burnt...) and for that, you cannot modify the thickness of the meat but only act on the time you leave it on the grill.
Of course if you never have done it before, it is likely that you will fail to get a good steak the first  few times, but after a while, you will have a good model of what is the right cooking time for a given thickness.

To translate this into ML terms, you have as input variables the steak thickness (S), the grill time (G) and your output is the taste (T).
If you try to purely optimize the classification error, you would look for a function f(S,G) that approximates T in the sense that |f(S,G)-T|should be small on average over the distribution of (S,G).
Hence you need to be able to accurately predict the taste for all values (within the observation range) of the pair (S,G).

However, in order to solve your problem, it is sufficient to find, for each value of S, a value of G that guarantees that T is good. This means that you do not need to build a full model (for all pairs (S,G)) but you only need a model which allows you to determine one good grill time and can be wrong otherwise.

This shows that by rephrasing the objective from prediction to control, you get a different problem so that finding the most predictive model might not the best thing to do.
In recent years, ML researchers have started looking at slightly different loss functions (than the simple prediction error) and settings. My feeling is that this will continue and possibly drift towards loss functions that correspond to control (since many real-life problems are control problems rather than prediction ones).

October 08, 2005 in Data Mining, Machine Learning, Pertinence | Permalink | Comments (1) | TrackBack (0)

Data in the Corporate World

Dealing with data is becoming an important part of the job of most large companies. Within this area, tasks such as storing and managing data are now well-mastered, so that the key capability now becomes the analysis or leveraging of this data.
Hence Data Mining is becoming an increasingly important concern of most hi-tech companies. This trend is witnessed by the recent creation of the CDO (Chief Data Officer) title at Yahoo! (see here), which has been given to a former Data Mining researcher.
Another indication of this trend can be found in the educational domain: most computer science departments in the big US universities now offer courses in Data Mining or Machine Learning.
These terms are also now known to people remote from the scientific world.

So it seems that Machine Learning is no longer an obscure research field, and more and more a popular technological domain.

September 14, 2005 in Data Mining, General, Machine Learning, Pertinence | Permalink | Comments (6) | TrackBack (0)

Data Mining, Statistics and Machine Learning

Talking about names (see previous post), here is an attempt to define and distinguish several names that sometimes are used interchangeably: Statistics, Data Mining, Machine Learning.
If one were to put these under a common name, one could think of "Information Sciences" as a reasonable candidate, but let us treat them separately first:

  • Statistics: formally, statistics are the exact opposite of Probability. Probability theory is about computing the probability of events knowing the model, Statistics is about inferring the model from the observation of events. Events are typically described by data, so Statistics is about building models from data. One can also find this more general definition "Statistics is the part of mathematics that deals with collecting, organizing, and analyzing data".
  • Data Mining: the goal here is to "extract information from (large) databases". This requires to define both what is meant by information, and by the extraction process. Possible answers (see e.g. this paper by Jerome Friedman for others) are as follows: "Data Mining is the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data" (U. Fayyad), "Data Mining is the process of extracting previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decision" (A. Zekulin). Data Mining can be considered as a sub-area of Descriptive Statistics (although this is probably restrictive) with emphasis on
    • "understandability" of the produced results
    • algorithmic issues
    • ability to handle "large" databases
    • potential use of the produced results for decision-making
  • Machine Learning: this refers to the study of the learning phenomenon, which can be defined as  "the ability of a machine to improve its performance based on previous results". The connection with the above fields is that "previous results" usually mean data, hence this other definition: "Subspecialty of artificial intelligence concerned with developing methods for software to learn from experience or extract knowledge from examples in a database". Machine Learning largely overlaps with Statistics in the sense that both deal with the analysis of data, but it considers issues that are largely ignored in Statistics, such as the algorithmic complexity of computational implementations. Also, Machine Learning includes the study of other forms of learning that cannot be directly cast as a problem of building a model from a database. Examples are on-line learning, active learning or reinforcement learning.

The goal here is to emphasize the distinction in the spirit of these different fields, while showing their connections.

I have tried to define the above domains in terms of their goals and not in terms of the techniques they developped. Indeed, very often, these domains are compared in terms of the tools they produced. My opinion is that this is meaningless since:

  1. the same algorithms may be used for different goals
  2. many algorithms were (re)discovered independently in each field

It is thus much more interesting to look at the goals or at the types of problems they aim at solving rather than at the set of tools they encompass.

Also, I do not like to use names in a discriminative way ("What you are doing is not XYZ, it is ABC") because it deepens the gaps between scientific domains, and nothing is more detrimental to science than the lack of communication between domains. However, I like to think about what a name means, because this usually leads to thinking about the goal of your research and this is always a good thing to do...

July 18, 2005 in Pertinence | Permalink | Comments (1) | TrackBack (0)

Implementing Machine Learning Algorithms

Having recently moved from an academic research lab to a private company, I realized how different it is to design an algorithm for a research paper and for a product that will be used by non-expert users. In academic research, people tend to have all sorts of tricks to do appropriate pre-processing and to choose appropriately the parameters. Indeed, they develop a dual expertise: expertise of their algorithms and expertise of the datasets they use to demonstrate them.
Besides introducing a lot of bias in the comparisons (and making results unreproducible), this shows that Machine Learning is often closer to an art than to a science.

It is a really tough challenge to actually build algorithms that can be used by someone without Machine Learning expertise. Indeed, you always face the dilemma of including tunable parameters (which may improve the performance in certain cases but are hard to manipulate for a non-expert), or pre-set them to fixed values (which may give average performance on most cases).
Chih-Jen Lin has done interesting work in the direction of implementing SVM in a usable way. He recently gave a very nice  talk about this.

Machine Learning has been considerably influenced by theoreticians, which is a good thing because it helped building solid foundations. However, there has been very little influence from the experimental sciences. As a result, two important things are lacking:

  1. proper design of experiments,with a concern for reproducibility of results, and a community-wide agreement on an experimental methodology
  2. concern for usability (in other words, reproducibility or results by non-experts)

I hope that, as the research community grows and establishes more contacts with industry (where there is a booming demand), these issues will be addressed more systematically.

 

July 08, 2005 in Pertinence | Permalink | Comments (0) | TrackBack (0)

Optimization and Prediction

How do these two domains relate?

Actually there are many interconnections between them. Here are some:

  1. Most learning algorithm start by defining a criterion that they optimize. Hence, once the criterion is chosen, learning is just optimizing this criterion.
  2. In optimization problems, one may have situations where part of the objective function which is not known exactly, but with uncertainty (via a sampling mechanism for example). This is the context of stochastic optimization. The goal being to optimize a function which you know only partially (via its value at random points for example).
  3. In the process of optimizing a function, you may gather information about it. For example if you only know a function implicitly (you can get its value at any point, but you don't know its mathematical representation), you may want to build a model of (i.e. learn) this function in order to speed up the optimization process.

In most real-world applications of Data Mining or Machine Learning, there is an optimization component which may be ignored.
People tend to apply the techniques they know best and since the communities of optimization and learning are quite distinct, this leads to sub-optimal solutions.

Most businesses have as a primary goal to optimize return under constraints. They may have collected data on how the market, the customers or their production lines behave, but the goal is not just to model this behavior, it is to use this model in order to optimize the return.

So it seems that for most practical problems it is important to have in mind both aspects.

June 30, 2005 in Pertinence | Permalink | Comments (0) | TrackBack (0)

Decision-making

The Fortune magazine has a special issue on Decision-Making. There are plenty of useful remarks for decision makers, but there is also some food for thought for Machine Learning scientists!

In a way, making decision is the ultimate goal of artificial intelligence, and the learning aspect is a key element in this process. There is a deep connection between Machine Learning and decision theory.

Vladimir Vapnik, one of the founders of statistical learning theory likes to use this quote
Subtle is the Lord, but malicious he is not

The way he interprets this in the context of learning from data is:
the reality might be hard or even impossible to model (and understand) accurately, but there still can be ways to make the good decisions.
One consequence of this idea is that, when one infers a model from data, one should judge the quality of the model not from the point of view of how it fits the "true model", but from the point of view of how good are the decisions one makes based on this model, and the goodness of a decision is not measured as how different it is from an hypothetical "ideal decision", but should be measured by how much return one gets, compared to the best possible return.

Going back to the Fortune articles, here is an interesting quote that shows that directly relates to the well-known overfitting phenomenon in machine learning (you think you got a pattern but you just got a coincidence):

People are clinically overoptimistic, for instance, assigning zero probability to events that are unlikely but not impossible (such as a massive iceberg in path of a really big ship). We see "patterns" in the random movements of stocks, just as people once saw bears and swordsmen in the scatterplot of the nighttime sky. We make choices that justify our past choices and then look for data to support them. We not only make these errors; we make them reliably.

 Another thing of interest is the interview of Jim Collins about decision-making. In particular he says the following
World is uncertain : great decisions stem from saying "I don't know"

This emphasizes one critical aspect of decision making: uncertainty handling. When making a decision, one must not only estimate the odds of the possible outcomes but also assess the uncertainty in this estimate.
He also talks about how important it is to talk to people and to provoke debates:
Decisions are not about strategy and consensus, they are about people and discussions. People because only them can use their experience to adapt to real-life situations and constant changes. Discussions, because great leaders are good at igniting the dialog and debate among teams with various experiences, using Socratic questions.

Translated into Machine Learning/Process Intelligence terms, this means that you should use user's knowledge as much as possible, and integrate opinions and experiences into your decision-making procedure. The goal being to extract information and thus reduce uncertainty. An ideal decision support machine would thus ask the right questions to the right persons, relay information between people (in order for them to compare with their own experience)  so that they can react, and finally integrate all the information into quantitative assessments of the possible consequences of the various decisions that can be made.

There are surely other ideas to be taken from "human decision-making" in order to build better decision support systems. The key being to avoid eliminating humans from the process, but rather using them as much as possible for what they are good for.

June 29, 2005 in Pertinence | Permalink | Comments (11) | TrackBack (0)