Here are four deep thoughts on Deep Learning

Here are four deep thoughts on Deep Learning

Another year of exciting developments in artificial intelligence (AI) deep learning is coming to an end, one filled with remarkable progress, controversies, and, of course, disagreements. As we close out 2022 and prepare for what 2023 has in store, here are some of the most notable overarching trends in deep learning that emerged this year.

1. Scale remains an important consideration.

The desire to build larger neural networks has been a recurring theme in deep learning over the last few years. Scaling neural networks is made possible by the availability of computer resources, specialized AI hardware, large datasets, and architectures like the transformer model that are easy to scale.

Companies are currently achieving better results by scaling neural networks to larger sizes. DeepMind announced Gopher, a 280-billion-parameter large language model (LLM); Google announced the Pathways Language Model (PaLM) and Generalist Language Model (GLaM), both with 540 billion parameters; and Microsoft and Nvidia released the Megatron-Turing NLG, a 530-billion-parameter LLM.

One of the most intriguing aspects of scale is emergent abilities, in which larger models succeed at tasks that smaller ones could not. This phenomenon has been particularly fascinating in LLMs, where models show promising results on a broader range of tasks and benchmarks as they grow in size.

However, even in the largest models, some of deep learning's fundamental problems remain unsolved (more on this in a bit).

2. Unsupervised learning is still effective.

Many successful deep learning applications, also known as supervised learning, require humans to label training examples. However, most data available on the internet lacks the clean labels required for supervised learning. Furthermore, data annotation is costly and time-consuming, resulting in bottlenecks. Because of this, researchers have been looking for a long time for ways to improve unsupervised learning, which is the process of training deep learning models without using data that has been labeled by humans.

In recent years, there has been tremendous progress in this field, particularly in LLMs, which are mostly trained on large sets of raw data gathered from the internet. Even though LLMs kept getting better in 2022, other unsupervised learning techniques started to catch on.

This year, for example, there were tremendous advances in text-to-image models. Unsupervised learning has been demonstrated by models such as OpenAI's DALL-E 2, Google's Imagen, and Stability AI's Stable Diffusion. Unlike previous text-to-image models, which required well-annotated pairs of images and descriptions, these models make use of large datasets of loosely captioned images already available on the internet. Because of the sheer size of their training datasets (which is only possible because no manual labeling is required) and the variability of their captioning schemes, these models can find all kinds of intricate patterns between textual and visual information. So, they are much more flexible when it comes to making pictures for different purposes.

3. Multimodality makes significant progress.

Another intriguing feature of text-to-image generators is that they combine multiple data types into a single model. Deep learning models can handle much more complex tasks because they can process multiple modalities.

Multimodality is critical to the intelligence found in humans and animals. For example, if you see a tree and hear the wind rustling in its branches, your mind will quickly associate the two. Similarly, when you hear the word "tree," you may immediately recall an image of a tree, the smell of pine after a rain, or other previous experiences.

Clearly, multimodality has played an important role in increasing the flexibility of deep learning systems. DeepMind's Gato, a deep learning model trained on a variety of data types including images, text, and proprioception data, was perhaps the best example of this. Gato performed well in a variety of tasks, including image captioning, interactive dialogues, controlling a robotic arm, and playing games. In contrast, traditional deep learning models are designed to perform a single task.

Some researchers have gone so far as to suggest that a system like Gato is all that is required to achieve artificial general intelligence (AGI). While many scientists disagree, one thing is certain: multimodality has resulted in significant advances in deep learning.

4. Fundamental deep learning issues persist.

Despite deep learning's impressive achievements, some of the field's problems remain unsolved. Some of them are cause and effect, composition, common sense, reasoning, planning, intuitive physics, abstraction, and making analogies.

These are some of the mysteries of intelligence that scientists in various fields are still investigating. Pure scale-and data-based deep learning approaches have aided in making incremental progress on some of these issues, but have yet to provide a definitive solution.

Larger LLMs, for example, can maintain coherence and consistency over longer text stretches. They don't do well, though, when it comes to tasks that need careful step-by-step thinking and planning.

Similarly, text-to-image generators produce stunning graphics but make basic errors when asked to draw images with complex descriptions or those that require compositionality.

These challenges are being discussed and investigated by a variety of scientists, including some of the deep learning pioneers. Among them is Yann LeCun, the Turing Award-winning inventor of convolutional neural networks (CNN), who recently published a lengthy essay on the limitations of LLMs that learn solely from text. LeCun is working on a deep learning architecture that learns world models and can address some of the problems that the field is currently facing.

Deep learning has progressed significantly. However, as we make progress, we become more aware of the difficulties in creating truly intelligent systems. Next year will undoubtedly be as exciting as this one.

Here are four deep thoughts on Deep Learning

Post a Comment

Contact Form