What Is a Variational Autoencoder?

作者:Coursera Staff • 更新于

Learn about variational autoencoders (VAEs), their role in machine learning, and their use in real-world applications like data generation and image processing. Become familiar with best practices in the use of VAEs as well as their future potential.

[Featured Image] Three doctors analyzing high-dimensional images of MRI scans.

Key takeaways 

Variational autoencoders are one of several types of autoencoders, which assist with various machine learning functions.

  • Other forms of autoencoders include adversarial autoencoders and sparse autoencoders, but each handles latent space differently.

  • VAEs are useful for anomaly detection, image processing, and dimensionality reduction.

  • You can use VAEs to generate data that’s consistent with its training data to enhance model performance. 

Learn what a variation autoencoder is and the role it plays in machine learning. If you’re interested in learning more about advanced machine learning techniques, Deep Learning Specialization from DeepLearning.AI can help you learn to build and train neural networks, optimize algorithms, and master theoretical concepts.

What is a variational autoencoder (VAE)?

Variational autoencoders (VAEs) are a subset of generative models in machine learning. They combine probabilistic techniques with traditional autoencoding to give you tools for data generation, anomaly detection, and dimensionality reduction. Unlike traditional autoencoders, VAEs focus on learning a probabilistic distribution of data, enabling you to generate new samples consistent with the original data set or variations. From image synthesis to health care applications, VAEs have become one of the driving forces pushing the boundaries of artificial intelligence today. 

Understanding variational autoencoders: How do variational autoencoders work?

VAEs fall under the larger category of autoencoders. They’re one of several other types of autoencoders you can use, including adversarial autoencoders (AAEs), sparse autoencoders (SAEs), and denoising autoencoders. Explore the basics of latent space and autoencoders to learn how VAEs differ.

Latent space explained

Latent space represents the set of underlying variables (also known as latent variables) that shape how autoencoders distribute data, even if the variables aren’t clearly observable. Imagine picking up an unfamiliar object with your eyes closed. You would instantly sense its weight, even without knowing what it is. The weight is the observable variable, but the object type is the latent variable.

In autoencoding, the latent space is where the models learn to represent data more efficiently. Autoencoders try to gather the more important underlying patterns that define the data to compress it efficiently.

The basics of autoencoders

Autoencoders are a type of neural network designed for unsupervised learning. They consist of two main components, an encoder and a decoder. The encoder compresses input data into a lower-dimensional latent space representation. The decoder then reconstructs the original data from the latent space.

Autoencoders minimize the difference between the input data and the reconstructed data, capturing critical features in the process. For example, autoencoding a high-resolution image involves compressing it into a smaller representation and then reconstructing it with its essential details intact.

Introducing variational autoencoders

Whereas traditional autoencoders map input data to a single point in latent space, VAEs take a probabilistic approach by mapping data to a distribution across the latent space.

 

This difference allows VAEs to generate new data samples by sampling from the latent space distribution. VAEs accomplish this through the use of latent variable models. These approximate the data distribution, making them quite useful for generative tasks compared to other autoencoders you can use.

What is the difference between an autoencoder and a variational autoencoder?

The primary difference between an autoencoder and a variational autoencoder lies in their capabilities of encoding latent variables, which dictate the distribution of data through a model. Autoencoders have discrete latent variables, while variational autoencoders are continuous, giving them the ability to construct new data that’s representative of the original training data.

The reparameterization trick

In machine learning, training (or backpropagation) requires differentiability. That’s why sampling from a latent space during training is among the more significant challenges in variational autoencoding; it can’t differentiate between the best and the worst. 

That’s where the reparametrization trick comes in: It improves how variational autoencoders sample data from the latent space during training. Instead of directly sampling from a probability distribution (which isn’t differentiable for training), the trick rewrites the sampling process as a combination of a fixed mean, a standard deviation, and random noise. This makes it possible to calculate gradients and backpropagate the model more effectively.

What are variational autoencoders used for?

VAEs are widely used, helping you generate data, detect anomalies, reduce dimensionality, and process images and videos. Some of the different applications of VAEs include:

Data generation

You can use variational autoencoding to generate realistic data samples. By sampling from the latent space, VAEs can create entirely new data points resembling the original data set. This might come in handy during tasks like data augmentation, for example, which you can use to improve machine learning model performance.

Anomaly detection

Variational autoencoders can learn the normal distribution of data and identify anomalies or outliers that deviate from that norm. This can be helpful when looking for fraudulent credit card transactions; deviations from typical spending behavior would trigger a red flag.

Dimensionality reduction

VAEs compress high-dimensional data into more meaningful latent space representations. This simplifies analysis and visualization. In health care settings, for instance, you can use VAE to create high-quality 3D medical scans or images.

Image and video processing

Variational autoencoders can also enhance and generate visual content. VAEs can remove noise from low-quality images, convert low-quality images into high-resolution versions, and even create video sequences by predicting future frames based on temporal relationships in data.

Challenges and future directions

As with any emerging AI technology, VAEs are set to become more advanced and will likely face new hurdles in the coming years. Consider the following potential challenges related to model complexity, mode collapse, and interpretability.

Model complexity

Balancing complexity and performance remains a critical challenge for VAEs. Overly complex models risk overfitting, while overly simple models might lack expressiveness. Moving forward, VAEs will likely aim to strike a consistent balance between a high-performing model with just the right amount of complexity.

Model collapse

This common issue in generative AI models occurs when a model generates limited diversity in its samples. To address it, you need advanced regularization techniques and hybrid approaches that combine VAEs with other generative models (such as convolutional or recurrent neural networks).

Interpretability

Understanding and interpreting the latent space remains a challenge for VAEs. Going forward, expect researchers to focus on developing more interpretable latent representations, allowing better insights into the underlying data structures.

Read more: Exploring Deep Learning Frameworks: Tools for Building Intelligent Systems

Explore our free artificial intelligence resources

Subscribe to our weekly LinkedIn newsletter, Career Chat, where you can stay up to date with industry developments, career tips, and skill-building resources. Then, check out some of our other free resources to learn more about artificial intelligence.

Whether you want to develop a new skill or get comfortable with an in-demand technology, you can keep growing with a Coursera Plus subscription. You’ll get access to over 10,000 flexible courses. 

更新于
作者:

编辑团队

Coursera 的编辑团队由经验丰富的专业编辑、作者和事实核查人员组成。我们的文章都经过深入研究和全面审核,以确保为任何主题提供值得信赖的信息和建议。我们深知,在您的教育或职业生涯中迈出下一步时可能...

此内容仅供参考。建议学生多做研究,确保所追求的课程和其他证书符合他们的个人、专业和财务目标。