
i
i
i
i
i
i
i
i
8.5 Graph Neural Networks
Notes and References
This chapter covered a broad range of specialized architectures and paradigms where each section
covers a major topic which could have in fact made a whole chapter. Hence in our notes and
references about the topics of this chapter we only summarize key references and developments in
each of the sub-fields. A further recent overarching text that we recommend is [
336
] with multiple
chapters, one per each of the topics covered here.
The field of generative modelling has multiple origins. Early models include hidden Markov models
and Gaussian mixture models with origins in the 1950’s and 1960’s; see chapters 11 and 17 of [
298
] for
background. Somewhat more recently, some authors consider the study of Boltzman machine models
introduced in the 1980’s in [
2
], and deep Boltzman machines in [
361
], as the initial meaningful
generative models in the context of deep learning. See also chapter 20 of [
142
] for an overview. A
more recent survey of generative models in machine learning is [
162
] and a comparison of deep
generative modelling approaches is in [46].
Up to 2014, while generative models were useful for some applications and certainly interesting,
in terms of images, they lacked the ability to create real life looking data. The big advance came
with the development of generative adversarial networks (GANs), in Goodfellow et al.’s work [
143
].
This opened up possibilities for creation of realistic looking images (and data) and is still a very
active topic. Variational autoencoders, initially introduced in [
234
], grew into multiple directions
and contemporary diffusion models such as [
183
], and those surveyed in [
430
], constitute the state
of the art in image generative modelling. As of the time of publishing of this book, diffusion models
and GANs still comp ete, with diffusion models generally able to produce more impressive images,
while GANs are much faster in production since they do not require multiple neural networks.
Ideas of variational autoencoders are rooted in m odern developments of Bayesian statistics. See [
92
]
for an introductory general text on Bayesian statistics and [
407
] for an accessible review of the area.
Specifically, the variational Bayes methods, a well-known optimization-based approach in the field
of approximate Bayesian computation, captures the key ideas used in variational autoenco ders . See
[
41
] and [
444
] for reviews of variational Bayes. This approach also falls in the realm of approximate
Bayesian computation and entails a method for approximating posterior distributions using simpler
surrogate distributions. See [
381
] for a collection of approximate Bayesian computation methods.
Specifically, for more details about variational autoencoders, see [235].
Our presentation of variational autoencoders was geared towards hierarchical Markovian variational
autoencoders of which diffusion mo dels are a special case. Nevertheless, variational autoencoders
and their variants are interesting and useful in their own right. They have been applied to many
fields. In image processing, prediction of the trajectory of pixels of an image is tackled in [
414
] and
natural image modelling is in [
152
]. In the field of speech analysis, voice conversion is handled in
[
191
] and speech synthesis in [
8
]. In the area of text processing as in [
54
], reccurent neural network
based variational autoencoders for generating sentences are put forward and in [
193
], controlled
text generation is handled. Another field is graph-based data analysis as in [
236
] where learning on
graph-structured data is handled, and [
210
] which deals with molecular graph generation. As we
presented diffusion models as special cases of hierarchical variational autoencode rs, the literature
on these models is also relevant. In particular, see [
343
] for an application in black box variational
inference and [385] for a variant called ladder variational autoencoder.
Diffusion models, initially introduced in [
384
], gained significant prominence following [
183
], which
showcased exceptional image synthesis results. These models were further improved with [
104
],
where for the first time the prolonged dominance of GANs was broken. For recent surveys of diffusion
mo dels, refer to [
71
], [
96
], and [
430
]. In terms of applications in the realm of image processing,
diffusion models are utilized for tasks such as colorization, inpainting, uncropping, and restoration
as in [
358
]. Other image processing applications include super-resolution as in [
360
], and image
editing as in [
176
]. There is an extensive study on applications of diffusion models for text to image
generation such as for example the work in [
359
] which introduced Imagen. This system utilizes
a transformer based large language model which is used for understanding text combined with a
diffusion model used for image generation. The application of diffusion models extends to video data
as well. Notable contributions include [
163
] where an approach for long-duration video completions
is put forward, and [
182
] which introduced Imagen Video, a text-conditional video generation system
based on a cascade of video diffusion models.
353