Has there been a second artificial intelligence big bang?

winy (62)in Project HOPE • 2 years ago

The first big bang in 2012

AI had some remarkable feats but never made much money. Since 2012, AI has been helping big tech companies generate huge fortunes, not least from advertising.

Second Big Bang in 2017?

Has there been another big bang in AI since Transformers came out in 2017? Aleksa is an AI researcher at DeepMind and previously worked on Microsoft's Hololens team. Remarkably, his AI expertise is self-taught – so there's still hope for us all!

Transformers

Transformers are deep learning models that process inputs expressed in natural language and produce outputs such as translations or summaries of texts. Their arrival was announced in 2017 when Google researchers published an article titled "Attention is all you need.

Image source

" This name referred to the fact that Transformers can "attend" to a large corpus of text simultaneously, whereas their predecessors, Recurrent Neural Networks, could only attend to symbols on either side of a segment of text being processed.

Transformers work by breaking text into small units, called tokens, and mapping them onto high-dimensional meshes—often thousands of dimensions. We humans cannot imagine that. The space we inhabit is defined by three numbers – or four if you include time, and we simply cannot imagine a space with a thousand dimensions. Researchers suggest we shouldn't even try.

Dimensions and vectors

For Transformer models, words and tokens have dimensions. For example, "man" means "king" and "woman" means "queen". The model assigns a probability to a certain token that is associated with a certain vector. For example, a princess is more likely to be associated with a vector that denotes "wearing slippers" than a vector that denotes "wearing a dog".

There are different ways machines can discover relationships or vectors between tokens. In supervised learning, they are provided with enough labeled data to label all relevant vectors. In self-controlled learning, they are not provided with labeled data and have to find the relationships themselves.

This means that the relationships they discover are not necessarily discoverable by humans. They are black boxes. Researchers are investigating how machines manage these dimensions, but it is not certain that the most powerful systems will ever be truly transparent.

Parameters and synapses

The size of a transformer model is normally measured by the number of parameters it has. The first Transformer models had roughly a hundred million parameters, and now the largest models have trillions. This is still less than the number of synapses in the human brain, and human neurons are far more complex and powerful creatures than artificial ones.

A surprising discovery a few years after the arrival of Transformers was that they are able to tokenize not only text but also images. Google released the first Transformer vision in late 2020, and since then people around the world have marveled at the output of the Dall-E, MidJourney, and more.

The first of these models for image generation was Generative Adversarial Networks or GANs. These were pairs of models, one (the generator) producing images designed to trick the other into accepting them as original, and the other system (the discriminator) rejecting attempts that weren't good enough. GANs have now been superseded by Diffusion models, whose approach is to peel the noise away from the desired signal. The first Diffusion model was actually described back in 2015, but the paper was almost completely ignored. They were rediscovered in 2020.

Energy eaters

Transformers are guzzlers of computing power and energy, and this has led to concerns that they could represent a dead end for AI research. It is already difficult for academic institutions to fund research on the latest models, and there were fears that even tech giants could soon find them unaffordable. The human brain shows the way forward.

It's not only bigger than the latest Transformer models (about 80 billion neurons, each with about 10,000 synapses, it's 1,000 times bigger). It is also a much more efficient energy consumer – mainly because we only need to activate a small fraction of our synapses to perform a given calculation, Neuromorphic chips that mimic the brain more closely than conventional chips can help.

Unsurprising surprises

Aleksa is often surprised by what the latest models can do, but that in itself is not surprising. "If I wasn't surprised, that would mean I could predict the future, which I can't." He enjoys the fact that the research community is like a hive mind: you never know where the next idea will come from. The next big thing could come from a few students at university, and a researcher named Ian Goodfellow famously created the first GAN by playing around at home after brainstorming over a few beers.