Published on
Monday, May 1, 2023

An overview of Generative AI in 2023

1169 words6 min read
  • avatar
    Viet Anh

Recently, the development of generative AI has become a hot trend in the AI community, with many research papers and applications each week. Huyen Chip said Now is the time to get into AI. In this article, I will briefly overview Generative AI and how to get started in this field.

I. What are Generative AI Models?

Generative AI models are machine learning models that can generate new data from existing data. Generative AI models are used in many applications, such as image generation, text generation, speech synthesis, and music generation. Recently, we have seen many exciting applications of Generative AI models such as ChatGPT, MidJourney, or Microsoft's Copilot.

II. The history of the most important Generative AI models from 2014 to 2023

2014-2018: The rise of Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs)

The introduction of Variational Autoencoders (VAEs) in 2013 and Generative Adversarial Networks (GANs) in 2014 has led to a new wave of research in generative models.

Variational AutoEndcoders (VAEs) - 2013:

Variational AutoEncoders (VAEs) is a type of autoencoder that extends the basic architecture to learn a probabilistic model of the data. This allows them to generate new data similar to the original input but not identical. The key innovation in VAEs is the introduction of a regularization term known as the Kullback-Leibler (KL) divergence, which encourages the learned distribution to match a prior distribution, typically a standard normal distribution. This regularization term allows VAEs to generate more diverse and realistic data than traditional autoencoders.


Illustration of variational autoencoder model

Generative Adversarial Networks (GANs) - 2014:

Generative Adversarial Networks (GANs) were introduced by Ian Goodfellow in 2014. They are a class of generative models that use two neural networks, a generator, and a discriminator, to generate new data. The generator network takes a random noise vector as input and generates a sample from the data distribution. The discriminator network takes a sample from the data distribution and a sample from the generator network and tries to distinguish between them. The generator network is trained to fool the discriminator network, while the discriminator network is trained to distinguish between real and fake samples.

GAN Architecture

The development of GANs resulted in DCGAN (2015), Wasserstein GAN (2017), and ProGAN (2017). These models can operate on different problems such as image-to-image translation (pix2pix - 2016, CycleGAN - 2017), music generation (MuseGAN - 2017), text generation (SeqGAN - 2017), and speech synthesis (WaveGAN - 2018).

2018-2019: The Era of Transformers

The Transformer was introduced in 2017 by Vaswani et al. It is a neural network architecture that uses attention mechanisms to process data sequences. The Transformer has been used in many applications, such as machine translation, text summarization, and image captioning. In 2019, OpenAI released GPT-2, a Transformer-based language model that can generate text with human-like quality. GPT-2 was a "direct scale-up" of the GPT model in 2018. Transformer has been proven to be a powerful architecture for generative models.

2020-now: The rise of big and giant models

Large Language Models (LLM):

In 2020, OpenAI released GPT-3, a Transformer-based language model that can generate text with human-like quality. GPT-3 is the largest language model ever trained, with 175 billion parameters. After GPT-3, other giant models for text generation were released, such as Gopher (2021), Chinchilla (2022) by DeepMind, LaMDA (2022), PaLM (2022) by Google, OPT (2022) by Meta AI, BLOOM (2022) by HuggingFace. The wave created by ChatGPT - OpenAI in 2022 has made the world realize the power and potential of Text Generation models. Giant and start-up companies are racing to develop their own LLM (large language models). Some of them are BART - Google, LLaMA - Meta AI, Dolly - Databricks, StableVicuna - Stability AI, etc. Besides Chatbot, LLM models are also used in many other applications such as code generation (Github's Copilot), office works (Copilot for MS Office 365), slides generation (SlideGPT),...

ChatGPT Demo
A demo of ChatGPT

Image Generation Models:

In 2021, OpenAI released DALL-E, a Transformer-based image generation model that can generate images from text descriptions. By 2022, text-to-image models like OpenAI's DALL-E 2, StabilityAI's Stable Diffusion, and Google Brain's Imagen had made significant progress in generating images that were nearly indistinguishable from real photographs and human-drawn art.

Besides text input, some methods allow users to input other prompts to control image generation output, such as ControlNet, Composable-Diffusion or T2I-Adapter.

Example result of ControlNet

Example result of ControlNet

Audio Generation Models:

Besides text or image content, new AI models for audio have thrived, such as VALL-E for speed synthesis, MusicLM for music generation, or Bark for different audio from text.

You can follow the latest models for the audio generation problem here.

Video Generation Models:

There are some models/services for video generation. However, in my opinion, they are not stable enough to have a good result and practical applications. Many of them apply the same technique as image generation to video generation, thus resulting in unstable and unrealistic videos. Maybe the next "Wow" in AI will be a video generation model?

Check the latest research about this field here. Some useful links:

III. How to get started with Generative AI?

There are many ways to get started with Generative AI. We suggest following the learning path for software engineers and data scientists who want to get started with Generative AI quickly.

  • First, you need to understand the basic concepts of machine learning and deep learning. The best way to do so is to take a course. We recommend the Machine Learning and Deep Learning Specialization by Andrew Ng. These courses are available on Coursera. In order to get the most out of these courses, you need to have a good understanding of linear algebra, probability, and statistics. If you are unfamiliar with these topics, you can take the Mathematics for Machine Learning and Data Science course.
  • Second, you can learn the principle of Generative AI and practice with code examples from Generative Deep Learning book. This book covers variational autoencoders (VAEs), generative adversarial networks (GANs), Transformers, normalizing flows, energy-based models, and denoising diffusion models. It goes from the basics of deep learning and progresses to cutting-edge architecture. You will know the principle of ChatGPT, DALL.E2, Imagegen, or Stable Diffusion through this book.
  • Never stop learning. You can follow the latest research papers on Arxiv or Papers with Code. Try to practice and build your own projects. You can find many exciting projects on Github or Kaggle and start from there.

IV. Interesting resources