In the last few years deep learning has seen explosive growth and even dubbed as the “new electricity”. This is due to its incredible success in transforming and improving a variety of automated applications. At its core, deep learning is a collection of models, algorithms, and techniques, such that when assembled together, efficient automated machine learning is executed. The result is a method to create trained models that are able to detect, classify, translate, create and take part in systems that execute human like tasks and beyond.

In this course we focus on the mathematical engineering aspects of deep learning. For this we survey and investigate the collection of algorithms, models, and methods that allow the statistician, mathematician, or machine learning professional to use deep learning methods effectively. Many machine learning courses focus either on the practical aspects of programming deep learning, or alternatively on the full development of machine learning theory, only presenting deep learning as a special case. In contrast, in this course, we focus directly on deep learning methods, building an understanding of the engineering mathematics that drives this field.

A student completing this course will possess a solid understanding of the fundamental models, algorithms, and techniques of deep learning. These include feedforward networks, convolutional networks, recurrent neural networks, autoencoders, generative adversarial networks, reinforcement learning, first order methods of learning (optimization), second order method of learning, regularization techniques, and general benchmarking methods.

The course includes deep learning demonstrations using several alternative software options. However, the focus is primarily on the mathematical formulation of deep learning and software usage (and programming) is only a secondary focus.

The course includes 10 chapters (units) describing the basics of deep learning, focusing on the basic mathematical engineering of this field.

**Unit 1 – Machine Learning Introduction**:
An overview of the basic problems of machine learning. Supervised, unsupervised, reinforcement, image data, tabular data, sequence data, classification/regression. A demonstration of basic classifiers. Performance measures such as accuracy, recall, and precision, \(F_1\) Score. Differences between ML approaches and statistics approaches to problem solving. Train, Dev/Validate, Test/Production sets. K-fold cross validation. Hyper-Parameters. Bias-variance tradeoff.

**Unit 2 – Logistic Regression Type Neural Networks**:
Building NN from logistic regression (and logistic softmax regression). Loss function, cross-entropy, soft-max, forward pass, backpropogration, gradient descent (basic). Mini-batches. Fully worked example.

**Unit 3 – More on Optimization Algorithms**:
All the variants of graident descent (momentum, nesterov, RMS-prop, ADAM). SGD vs. GD and everything in between, some analysis where possible. Second order methods (basic) incl. BFGS.

**Unit 4 – General Fully Connected Neural Networks**:
The full neural network. Forward, backward, chain-rule, automatic differentiation. All the activation functions, matrix representation. Dropout. Mini-batch. batch-normalization. Vanishing gradients (and overcoming them).

**Unit 5 – Convolutional Neural Networks**:
The full story of convolutional nerual networks. Convolutions and Toeplitz matrices. Engineering of networks. Padding, stride, maxpooling, other details. Common networks (Inception etc…).

**Unit 6 – Tricks of the Trade**:
Everything that connects the previous two units into working models: transfer learning (pre-trained networks), hyper-parameter tuning methods (including perhaps things like Gaussian processes for that), resnets, more on vanishing gradients and parameter tuning. Autoencoders.

**Unit 7 – Generative Adversarial Networks**:
The basic GAN structure and relationship to game-theory. Basic impleminintation. Usage of GANs: Deep fakes, data-augmentation, other uses.

**Unit 8 – Sequence Models**:
RNN, LSTM, usage in NL. New approaches such as transformers.

**Unit 9 – Deep Reinforcement Learning - 3hrs**:
MDP, Q-learning, then Deep Q-learning.

**Unit 10 – Summary: Past, Present, and Future of Deep Learning**: Summary and perspective.

Source code with examples is available in several locations.

- See the GitHub repository MathematicalEngineeringDeepLearning also via NBViewer.
- GoogleCollab.
- An additional R-markdown generated collection of pages.

See also the AMSI Summer School page for practical links, assessment information, schedule, and other course specifics.

The space of deep learning has multiple competing terms, all with intersecting meanings. Key terms include Artificial Intelligence, Machine Learning, Statistics, Data Science, and Deep Learning. Related terms that have somewhat steped out of the spotlight include statistical learning, data mining, and big data (analytics). In any case, all of these are just terms. When considered in isolation, each of these terms has a slightly different meaning. However when considered together there are typically intersections between the fields and meanings of the terms.

In general, deep Learning is the suite of techniques used to design, train, and deploy systems based on **artificial neural networks**. The study and application of neural networks has been around since the late 1950’s. However, it is only in the last 10-15 years that things really took off and now deep learning and neural networks are integrated in applications and scientific work on a regular basis.

For an overview of the history of deep learning, see this engaging talk by Chris Bishop. It is already 3 years old which is a long period in this quickly evolving field, nevertheless it presents an appealing overview of how we got to this point today in which deep learnig is so popular.

A more up to date talk, that also presents a nice overview of the history and present application of deep learning is this talk by Lex Fridman.

See also this sequence of papers which were some of the key papers in the development of deep learning.

As is evident by their name, neural networks, also known as artificial neural networks (ANN), were originally inspired by the neurological structure of the brain. In neuroscience, a neuron is a basic working unit of the brain and the same term is also used to represent the basic working unit of an artificial neural network. Some courses on deep learning focus on this analogy very strongly, however in this course we do not because neural networks are at best a very crude approximation to how the brain operates. In fact, the operation of the brain is still a huge mystery for science.

Related to the brain-ANN analogy is the quest to achieve **general artificial intelligence** (GAI). This terms deals with the ability of an artificial system to think and operate like a human, or better than a human. GAI has not yet been achieved and while the performance of systems based on ANN continues to impress, there is probably still a very long way to go until the next big breakthrough towards GAI will appear.

In addition to the mathematics that this course covers, there are two technological aspects that are central to the success of deep learning. These are fast computers and an abundance of data. When it comes to computation, fast processors that can execute parallel instructions in parallel have made a huge difference. These are most notably GPUs (graphical processing units) which originally were developed for purpuses of video gaming and have more recently found their way as a means to aid with computation. See for example this video:

In addition to the computation the abundance of huge datasets has also made a significant difference. In fact, empirical practice shows that for many “small data” tasks, deep learning methods do not perform as well as other machine learning and statistics methods. However, when huge volumes of data are collected, deep neural networks typically outperform other methods. See for example this video:

The mathematical engineering of deep learning does not focus directly on the computer-systems issues associated with fast compute and big data collection. Nevertheless, the mathematical formulation of many algoirthms and architectures oftne needs to take compute and big data into account.

As mentioned above this is not a general pupurse machine learning course but rather a course focusing on deep learning. Still, some general machine learning terminology is needed. When considering some of the key activities of machine learning, four key activities are the following:

**Supervised learning**: Data is available in the form of \((x_i,y_i)\) and the goal is to learn how to predict \(Y\) based on \(X\). Each \(x_i\) is often very high dimensional. The elements of \(x_i\) are called*features*and the variable \(y_i\) is referred to as the*label*. When the labels are only \(0\) or \(1\), or come from a finite discrete set, the supervised learning problem is called a*classification problem*. As opposed to that, when the labels are continuous this is called a*regression problem*.**Unsupervised learning**: In this case there are only features \(X\) but no labels \(Y\). That is, there isn’t any label marking \(X\). Think for example of a baby that learns about the world without receiving explicit feedback and direction. In basic machine learning, one important task is creating clusters of points and recognizing the clusters. This falls in the realm of algorithms. Another task is reducing the dimension of the data, i.e.*data reduction*.**Reinforcement learning**: In this case an*agent*makes decisions dynamically over time, aiming to maximize some objective or achieve some desired behavior. In certain cases the agent has some knowledge about the way the world responds to the decisions, but this knowledge is often lacking or very partial. The methods of reinforcement learning allow us to control such systems in a near-optimal manner. Notable examples include playing games such as chess or Alpha-Go. Other examples include playing an unknown video game against a computer and eventually improving. Practically there are often applications in robotics related to reinforcement learning.**Generative modelling**: This is the task of observing data similar to the unsupervised case and creating a model that can then create additional similar data. One general application of these types of models is the*deep fake*technology where images and movies can be modified to look differently yet appear natural. For example one face can be implanted on another. Such technology is not necessarily positive and has been put to some negative use in recent years. Nevertheless, understanding the basics of how it works is important.

Each of these activities can be carried out with deep learning or using other methods. We focus on Deep Learning. Our main focus is supervised learning, but we also deal with generative modeling in unit 7 and reinforcement learning in unit 9. Up until then, we spend time in units 1 - 6 to understand the basic marhematical engineering of deep learning. We begin with the next unit where we layout general techniques and terminology of supervised learning.

**Disclaimer:** *These notes are still evolving and may thus contain multiple typos and inconsistencies. At the moment they also use a significant number of figures and illustrations created by other authors. We have attempted to properly attribute all such usage of figures and illustrations. In cases where omissions are present, please accept our apology beforehand and we will rectify ASAP.*

Page built: 2021-03-04 using R version 4.0.3 (2020-10-10)