Artificial intelligence powered by meta-learning
Below you can find commented slides from my talk at Human Level Artificial Intelligence in Prague. The aim of my talk was to explain what meta-learning is and what it is not and show that it is important research direction that already showed its business potential.
Meta-level AI algorithms make AI systems learn better & faster, adapt to changes in the environment or generalize over more tasks.
In meta-learning, or learning to learn, top level AI optimizes bottom level AI. This additional AI layer can be useful in many directions. My goal was to explain what can be considered as meta-learning. Therefore a made simple quiz giving examples of important ML approaches and asking the audience questions below.
Before the quiz, I wanted to mention few definitions of meta-learning I consider most accurate.
Jurgen Schmidhuber and his group introduced one of the first meta-learning systems that were capable of recursive self-improvement and also studied machines capable of building other machines. The other definitions describe meta-learning systems as adaptable by experience and containing learning subsystems.
Other researchers describe meta-learning as process accumulating experience over many tasks or systems where you can observe learning at two different time scales. So let’s start the quiz with the neuro-evolution.
You can find many approaches to neuro-evolution. The picture is from Neuro-Evolution through Augmenting Topologies introduced by Ken Stanley, who was also speaking at the conference. Actually the picture is not showing the population of networks evolved, but it is from the explanation why it does not make much sense to encode neural networks naively into genetic algorithm. There are several variants of neuroevolution so the answer is not simple.
In some cases, when evolutionary algorithm is applied directly, optimizing both the structure and weights of the network, it should not be considered meta-learning, because it is simple optimization of machine learning model. However when you encode indirectly and train networks by fast gradient algorithms possibly on instances from more tasks, then it should be considered meta-learning. Then, the slower meta-level AI is the evolutionary algorithm and fast bottom level algorithm is gradient learning or reinforcement learning of neural networks.
Another important concept in machine learning are Hyper-networks, as demonstrated by Ken Stanley in his HyperNEAT architecture. The concept of constructing one network by other network is increasingly popular and directly relates to “machine that build machines” definition.
So the answers are quite obvious.
The smaller network encodes weights and connections of bigger neural network and is optimized by evolutionary algorithms or alternatively replaced by a genetic programming.
Another important recently introduced algorithm is targeting optimization algorithms.
In this case, one can overcome limitations of gradient optimization methods by learning another machine learning model (optimizer) how to generate gradient updates of trained model(optimizee).
Again, the name of the approach already suggests answers.
The optimizer is LSTM with parameters shared over all optimized dimensions. It is trained in gradient fashion, similarly as the optimizee that can be almost any differentiable ML model.
Recently introduced differentiable architecture search algorithm is one of the most efficient variants of neural architecture search. It make structural search differentiable by a simple trick. The architecture decisions are encoded as a softmax over all possible choices. Then the network is optimized by alternating between gradient descent steps in the weight and architecture spaces respectively.
In this case, it is not a meta-learning approach as soon as there is just one clever optimization process involved. The last important state-of-the-art algorithm mentioned in the quiz is IMPALA.
This algorithm is based on distributed and highly scalable reinforcement learning of actors. Actors are typically deep convolutional networks with recurrent LSTM layers and fully connected layers that generate actions based on history of visual embeddings and rewards. Learners contain LSTM or networks with external memory (Neural Turing Machine) and they generate parameters for group of actors. At the same time, thanks to population based training, learners share information among actors and parameter the optimization process is more efficient.
This time, it is clearly a meta-learning method, because it accumulates knowledge across many tasks and contains learning subsystems as well.
Think about other important machine learning algorithms. You will learn that many of them use meta-learning and meta-level AI to improve their performance, scalability or robustness.
In the second part of the presentation I will be talking about our meta-learning research. At FIT CTU, we run several ML oriented laboratories. In Datalab, we focus on AutoML in predictive modeling, clustering, anomaly detection or conversational AI. Once you are in Prague, you can check one of our chatbots knowing about events that might be of your interest. Our joint labs with companies enable CTU researchers and students accessing interesting real-world datasets and live data.
Lets focus on meta-learning in predictive modeling first.
My first article on this topic is 15 years old and it extends the GMDH multilayered networks, one of the first deep learning algorithms.
In subsequent variants a top level AI (GA with deterministic crowding) optimized predictors trained by gradient descent.
You can replay how a model is evolved on iris data to estimate the probability that a flower belongs to Virginica class. Later, we also added a possibility to evolve training algorithms.
Meta-learning approach to neural network optimization was later generalized to arbitrary hierarchical ensembles of predictive algorithms.
Our approach allows to evolve ensembles under time constraints and it is also possible to evolve algorithms (templates) for more datasets (tasks).
For each task, evolved solutions can match the complexity of the problem. Trivial problems can be solved by a simple algorithm, solving complex tasks involves deep hierarchical ensembles.
Simple tasks can be solved by any algorithm, complex ones just by those evolved on similar problems.
Recently, we have reimplemened our AutoML algorithm on top of Apache Spark to enable distributed computing on big data predictive problems. We will be happy if you join our efforts and help us to add new algorithms and ensembling techniques to the AutoML opensource project. I am very grateful to Showmax, that is sponsoring our AutoML research in predictive modelling.
In data clustering, meta-learning is quite difficult due to absence of single objective function that can be optimized.
We build AutoML clustering using combination of unsupervized criteria in the Clueminer opensource project. Join our efforts and contribute.
In recommender systems, algorithms are often combined to ensembles, similarly to best practices in the predictive modeling field. The difference from predictive modeling algorithms is that you have typically millions users and items.
Basic RS algorithms such as matrix factorization or KNN are quite parameter sensitive not very robust when it comes to solving various recommendation scenarios.
Even for a single database (e.g. a media house), different content can be recommended (movies, articles) in each recommendation scenario. Meta-learning can help us to build and maintain good recommendation model for every single scenario.
For hundreds of databases with couple of scenarios for each, you can save significant amount of data scientists thanks to AutoML. For more details on research in recommender systems, you can refer to Recombee MLMU talk.
Many other companies started to use AutoML in their solutions. You can do it as well as soon there are many applications where meta-learning can help and you do not need thousands of GPU days to run it.