
i
i
i
i
i
i
i
i
8.5 Graph Neural Networks
Notes and References
This chapter covered a broad range of specialized architectures and paradigms where each section
covers a major topic which could have in fact made a whole chapter. Hence in our notes and
references about the topics of this chapter we only summarize key references and developments in
each of the sub-fields. A further recent overarching text that we recommend is [
73
] with multiple
chapters, one per each of the topics covered here.
The field of generative modelling has multiple origins. Early models include hidden Markov models
and Gaussian mixture models with origins in the 1950’s and 1960’s; see chapters 11 and 17 of [
65
]
for background. Somewhat more recently, some authors consider the study of Boltzman machine
models introduced in the 1980’s in [
1
], and deep Boltzman machines in [
81
], as the initial meaningful
generative models in the context of deep learning. See also chapter 20 of [
25
] for an overview. A more
recent survey of generative models in machine learning is [
34
] and a comparison of deep generative
modelling approaches is in [12].
Up to 2014, while generative models were useful for some applications and certainly interesting,
in terms of images, they lacked the ability to create real life looking data. The big advance came
with the development of generative adversarial networks (GANs), in Goodfellow et al.’s work [
26
].
This opened up possibilities for creation of realistic looking images (and data) and is still a very
active topic. Variational autoencoders, initially introduced in [
52
], grew into multiple directions and
contemporary diffusion models such as [
39
], and those surveyed in [
101
], constitute the state of the
art in image generative modelling. As of the time of publishing of this book, diffusion models and
GANs still compete, with diffusion models generally able to produce more impressive images, while
GANs are much faster in production since they do not require multiple neural networks.
Ideas of variational autoencoders are rooted in modern developments of Bayesian statistics. See [
19
]
for an introductory general text on Bayesian statistics and [
91
] for an accessible review of the area.
Specifically, the variational Bayes methods, a well-known optimization-based approach in the field
of approximate Bayesian computation, captures the key ideas used in variational autoencoders. See
[
11
] and [
103
] for reviews of variational Bayes. This approach also falls in the realm of approximate
Bayesian computation and entails a method for approximating posterior distributions using simpler
surrogate distributions. See [
85
] for a collection of approximate Bayesian computation methods.
Specifically, for more details about variational autoencoders, see [53].
Our presentation of variational autoencoders was geared towards hierarchical Markovian variational
autoencoders of which diffusion models are a special case. Nevertheless, variational autoencoders
and their variants are interesting and useful in their own right. They have been applied to many
fields. In image processing, prediction of the trajectory of pixels of an image is tackled in [
95
] and
natural image modelling is in [
31
]. In the field of speech analysis, voice conversion is handled in
[
40
] and speech synthesis in [
3
]. In the area of text processing as in [
13
], reccurent neural network
based variational autoencoders for generating sentences are put forward and in [
41
], controlled
text generation is handled. Another field is graph-based data analysis as in [
54
] where learning on
graph-structured data is handled, and [
45
] which deals with molecular graph generation. As we
presented diffusion models as special cases of hierarchical variational autoencoders, the literature
on these models is also relevant. In particular, see [
76
] for an application in black box variational
inference and [87] for a variant called ladder variational autoencoder.
Diffusion models, initially introduced in [
86
], gained significant prominence following [
39
], which
showcased exceptional image synthesis results. These models were further improved with [
22
], where
for the first time the prolonged dominance of GANs was broken. For recent surveys of diffusion
models, refer to [
15
], [
21
], and [
101
]. In terms of applications in the realm of image processing,
diffusion models are utilized for tasks such as colorization, inpainting, uncropping, and restoration as
in [
78
]. Other image processing applications include super-resolution as in [
80
], and image editing as
in [
37
]. There is an extensive study on applications of diffusion models for text to image generation
such as for example the work in [
79
] which introduced Imagen. This system utilizes a transformer
based large language model which is used for understanding text combined with a diffusion model
used for image generation. The application of diffusion models extends to video data as well. Notable
contributions include [
35
] where an approach for long-duration video completions is put forward,
and [
38
] which introduced Imagen Video, a text-conditional video generation system based on a
cascade of video diffusion models.
57