A large amount of the effort in Machine Learning research is devoted to building predictive models. This means trying to infer, from labeled examples, a model that can, later on, be used for making predictions on new instances. This problem of making models for prediction is relatively well understood, although very partially solved today. But, there are plenty of other reasons for building models, and this may drive a large part of the future research in this field.
For example, my current work is to investigate how one can build models for understanding, monitoring and controlling complex systems or processes. Of course, this is not new as such, but I want to emphasize here that this is an area people in Machine Learning have seldom studied, although ML techniques could very well be adapted to yield better solutions than existing approaches.
Let us leave understanding aside (as this is a very debatable issue and should be discussed separately) and focus on monitoring and control. There are plenty of methods for monitoring and controlling systems or processes. Many rely on models that are built from knowledge, some rely on statistical models (built from observations). The problem is that models built from knowledge are usually very sophisticated and specialized, while statistical models are very simple and generic. There thus seems to be a gap in between.
Machine Learning is very advanced in terms of automatically building sophisticated models from observations, while incorporating previous knowledge. The issues of trading-off the complexity of the models with their statistical properties (e.g. overfitting) has been thoroughly investigated in this field.
As a result, ML is particularly suited for providing new ways of building sophisticated, yet reliable models.
However, ML researchers have focused their effort on prediction error, while when you need to control a process, this prediction error is not what matters.
To make my point more precise, let me give an example. Assume you are trying to cook up some nice steak. Depending on the thickness of the piece of meat, you might have to heat it up longer or shorter. Your goal is to have a steak that tastes good (tender, not cold, not burnt...) and for that, you cannot modify the thickness of the meat but only act on the time you leave it on the grill.
Of course if you never have done it before, it is likely that you will fail to get a good steak the first few times, but after a while, you will have a good model of what is the right cooking time for a given thickness.
To translate this into ML terms, you have as input variables the steak thickness (S), the grill time (G) and your output is the taste (T).
If you try to purely optimize the classification error, you would look for a function f(S,G) that approximates T in the sense that |f(S,G)-T|should be small on average over the distribution of (S,G).
Hence you need to be able to accurately predict the taste for all values (within the observation range) of the pair (S,G).
However, in order to solve your problem, it is sufficient to find, for each value of S, a value of G that guarantees that T is good. This means that you do not need to build a full model (for all pairs (S,G)) but you only need a model which allows you to determine one good grill time and can be wrong otherwise.
This shows that by rephrasing the objective from prediction to control, you get a different problem so that finding the most predictive model might not the best thing to do.
In recent years, ML researchers have started looking at slightly different loss functions (than the simple prediction error) and settings. My feeling is that this will continue and possibly drift towards loss functions that correspond to control (since many real-life problems are control problems rather than prediction ones).