It seems natural that the goal of any good Machine Learning algorithm should be to extract information from the available data.
However, when you are faced with practical problems, this is not enough. More precisely, data by itself does not hold the solution. One needs "prior knowledge" or "domain knowledge".
So far, nothing new.
But what is important is how to actually get and use this knowledge, and this is very rarely addressed or even mentioned!
My point here is that building efficient algorithms should mean building algorithms that can extract and make maximum use of this knowledge!
To achieve this, here are some possible directions:
- A first step is probably to think about what are the natural "knowledge bits" one may have about a problem and how to formalize them. For example, it can be knowledge about how the data was collected, what the features mean, what kind of errors can be made in the data collection,...
- A second step is to provide simple but versatile tools to encode prior knowledge: this can be done off-line, for example when using a probabilistic framework one can allow the probability distributions to be customized, or on-line (i.e. interactively) with a trial-and-error procedure (based on cross-validation or on expert validation).
- There is also a possibility to go one level higher: often, knowledge is gained by integration of very diverse sources of information, humans (as learning systems) are never isolated: all problems they can solve have some relationship to their environment. So ideally our systems should be able to integrate several sources and have some sort of meta-learning capability rather than starting from scratch every time a new dataset is to be used, and focusing only on this specific dataset.
All the above explains the title of my post, and to be more precise, I even tend to think that research efforts should be focused on knowledge extraction from experts rather than from data!!!
Finally, I would like to give examples of such an extraction (we are not talking about sitting experts in a chair with electrodes connected into their brains! but just about providing software that can interact a bit with them).
Below is a (non-exhaustive) list of what can be learned from the user by a learning system:
- Implicit knowledge (when the data is collected and put in a database)
- data representation: the way the data is represented (the features that are used to represent the objects) already brings a lot of information and often a problem is solved once the appropriate representation has been found.
- setting of the problem: the way the problem is set up (i.e. the choice of which variables are the inputs and which are the outputs, the choice of the samples...) also bring information.
- Basic information (when the analysis starts)
- choice of features: choosing the right features, ignoring those that are irrelevant...
- choice of samples: choosing a representative subset, filtering...
- choice of an algorithm
- choice of parameters for this algorithm
- Structural knowledge (usually incorporated in the algorithm design phase)
- design of kernels, prior distribution
- design of the algorithm
- causal structures
- Interactive knowledge: all the above can be repeated by iteratively trying various options. Each trial can be validated using data (cross-validation) or expertise (judging the plausibility of the built model).
As a final remark, let me just mention that the interactive mode is often used (although not explicitly) by practitioners who try several different algorithms and take the one that seems the best (on a validation set). Of course this gives rise to the risk of overfitting, especially because the information brought by the interaction is very limited. Indeed, it simply amounts to the validation error which cannot be considered as knowledge: this kind of interaction simply brings in more data (the validation data) rather than more knowledge.
It would probably be interesting to formalize a bit better these notions...