Using deep learning to predict the unpredictable

Machine learning , which is based on learning by experience, can be harnessed to solve an array of problems, with applications including object recognition (faces, diagrams, writing, and so on); diagnostic support; detection of credit card fraud; stock market analysis; and DNA sequence classification. To produce reliable results, however, the approach requires a vast amount of data, which is lacking in some fields.

"When I went to work on the predictive maintenance of railway switches for SNCF," explains Malo Huard, "I only saw a single failure out of 80,000 items of data I observed." What could Huard do in such a case, when the data was lacking? He decided to find a solution by thinking about the problem the other way around, by trying instead to detect the parameters of normal operations. By modeling all the signals through observation, the researcher was able to identify the precursors of a breakdown. "With this model,” says Huard, “we even know when teams on the ground are lubricating the point mechanics, because of the anomalous signals that we receive during that process.”

In the face of difficult choices: combining the best models

Like many other engineers and researchers attempting to make predictions, Huard uses mathematical models to obtain his results. While most choose models for their reliability in the context of the problem they are endeavoring to solve or to combine several models uniformly, however, Huard — in tandem with his thesis supervisor, Gilles Stoltz — opted for a different approach: weighting the models based on data history.

"Using this technique is almost as good as using the best model, even if you don’t know which one to choose or don’t know in advance," Huard says. This solution avoids randomly selecting a model to instead take advantage of the panoply of different methods. Huard’s approach won him second place — out of 479 participants — in the RTE challenge on forecasting electricity consumption . The aim of the competition was to predict how much energy the national and regional grids would use over 10 days (the duration of the challenge), after one day and per quarter-hour. Huard was the only competitor to aggregate the models sequentially.

Using this technique is almost as good as using the best model, even if you don’t know which one to choose or don’t know in advance.

Introducing competition between models and developing automatic learning

How does aggregation work? “I employ a range of automatic learning techniques to build elementary models, including neural networks,” Huard explains. “The next step involves devising a meta-prediction system using sequential aggregation algorithms." At the start of the process, all the elementary models are aggregated uniformly (with the same weight). The weighting is then updated over time according to the data history. The model is particularly robust since it can work with very limited data: three months of data are enough to calibrate a model, for example.

This robustness is linked to the sequential aggregation, which sets the models in competition and deactivates those that are only reliable over a short timespan and cannot be widely applied . Aggregating models has further advantages: it is very inexpensive because the algorithms, which operate via updating, only sweep the data once compared to hundreds of times in the case of conventional machine learning programs. They are also faster. To achieve reliable results, it remains necessary to use good models in the aggregation.

From predicting sales to air quality: a whole host of applications

The above method can be used in many different contexts. For example, Huard applies the techniques of sequential aggregation to forecasting product sales for Cdiscount — a crucial operation for logistics coordinators who have to anticipate demand and build up stocks in advance. As well as being involved in predicting electricity consumption, Huard is also interested in predictive maintenance. He is currently working on a project with EDF looking at the timetables for activating dams based on electricity demand.

The goal is to automate tasks currently carried out manually. All temporal processes can benefit from this approach, since few other models offer as reliable guarantees: forecasting air quality (realized by Stoltz), footfall in subways or airports, Vélib usage, stock prices, crude oil production, fluctuations in exchange rates, and so on. The fact that Huard’s laboratory works so closely with business means that the researcher is immersed in environments where he can explore and implement sequential aggregation applications, thereby helping industry gain the most from these solutions.