
i
i
i
i
i
i
i
i
Preface
notation at the level equivalent to at least 3 or 4 university courses. Hence set notation,
matrices, basic probability, and calculus are used without apology. However, no explicit
knowledge of machine learning, statistics, optimization, or advanced probability is needed or
assumed. Our hope is that we strike the right balance so that a mathematically equipped
non-expert can easily read the book in a self contained manner.
While the focus of the book is “mathematical engineering”, we fully acknowledge the
importance of applications and the ability to use software and hardware effectively. For
this you may also use the companion website,
https://deeplearningmath.org/
, where
additional examples, links, and software usage details are provided.
Outline of the Contents
The book has 8 chapters and 2 appendices. Chapters 1 – 4 introduce the field, outline key
concepts from machine learning, present an overview of optimization concepts needed for
deep learning, and focus on fundamental models and concepts. Chapter 5 is the central
chapter introducing fully connected deep neural networks. Chapters 6 and 7 deal with the
core models and architectures of deep learning, including convolutional networks, recurrent
neural networks, and transformers. Chapter 8 covers additional popular domains such as
generative models, reinforcement learning, and graph neural networks. Appendices A and B
provide mathematical support. Here is a detailed outlined of the contents.
Chapter 1 – Introduction: In this chapter we present an overview of deep learning,
demonstrate key applications, survey the associated ecosystems of high performance com-
puting, discuss big and high-dimensional data, and set the tone for the rest of the book. The
chapter discusses key terminology including data science, machine learning, and statistical
learning, and with this we place these terms in the context of the book. Key popular datasets
such as ImageNet and MNIST digits are also presented together with a description of the
deep learning culture that emerged.
Chapter 2 – Principles of Machine Learning: Deep learning can be viewed as a
sub-discipline of machine learning and hence this chapter provides an overview of key
machine learning concepts and paradigms. The reader is introduced to supervised learning,
unsupervised learning, and the general concept of iterative based optimization for learning.
The concepts of training sets, test sets, and the like, together with principles of cross
validation and model selection are introduced. A key object explored in the chapter is the
linear model which can be trained also via iterative optimization. We introduce the most
simple gradient descent algorithm and it is later refined in Chapter 4. Gradient descent
is used for training almost any deep learning model. We also explore basic unsupervised
learning algorithms including K-means clustering, principal component analysis (PCA), and
the singular value decomposition (SVD).
Chapter 3 – Simple Neural Networks: In this chapter we focus on logistic regression
(sigmoid) for binary classification and the related multinomial regression model (softmax)
for multi-class problems. These models are the most popular shallow neural networks. The
chapter sets the tone for more complex models by introducing principles of deep learning
such as the cross entropy loss and other basic terminology. The chapter also presents a simple
non-linear autoencoder architecture and with this introduces general ideas of autoencoders.
4