This article sums up the conference video of Yann LeCun on Deep Learning And The Future Of AI.

Yann LeCun currently works on Convolutional Neural Networks (CNN). As of 2016, he is director of the Facebook AI Lab. He remains part-time at NYU.

*You may use this notebook post to quickly extract value from the following video without watching it entirely. It contains a lot of keywords that may help anyone in its search for concepts to learn on AI.*

# Watch the video

# Read the notebook

**The first learning machine: The Perceptron**- ML algorithms we use today are descendants of the Perceptron
- Weighted sum, error correction
- If the output is too low, increase all the weights whose input is positive and decrease all the weights whose input is negative.
- If the output is too large, we do the other way.

- Supervised learning
- A training set.
- A number of examples, if the machine gets it wrong (or right) you adapt the weights.
- Standard model of pattern recognition since the 50s, until then was “the only model in town”.
- Supervised Machine Learning = Function Optimization.
- Stochastic Gradient Descent, means show one example, give the machine the desired answer and tune the hyper-parameters so that the errors decrease.
- How to build complex machines? A lot of variability in examples (e.g.: example of chairs), how to generalize e.g. for image recognition?

- Deep learning systems
- Have hundreds of millions of “knobs” (i.e. buttons).
- Each recognition take billions operations (input to output), hence why we use GPUs, not CPUs.
- Build DL systems not as a single block, but as a cascade of modules.
- Feature Extractor.
- Mid-Level Features.
- High-Level Features.
- Trainable Classifier.

- All layers (i.e. modules) are trainable.
- Deep (in DL) because there are many layers.
- Images are made of motifs / objects, themselves made of parts, and then of pixels.
- Low-Level Feature detects pixels.
- Mid-Level Feature detects parts.
- High-Level Feature detects motifs / objects.

- Not only true for images, but also for text, speech, etc. Makes the world understandable.
- The structure of the neocortex is very much hierarchical too
- Ventral / recognition pathway has many layers too.
- Very fast (process of recognizing an object takes less than 100ms).
- Very few effect of feedback and reasoning to interpret everyday objects.

**Multi-Layer Neural Networks**- Back-Propagation Algorithm.
- Takes a vector multiplied by a metric.
- Thresholding operations (non-linearity), e.g.: ReLU(x) = max(x, 0).
- Each unit computes a weighted sum of its inputs.
- Weighted sum is passed through a non-linear function.
- The learning algorithm changes the weights.
- How to train it?
- Use the Back-Propagation algorithm.
- Frameworks: Torch7, TensorFlow, Theano, etc.
- Jacobian metrics (contains all the partial derivative outputs)
- Multiply the gradient by the Jacobian

- ReLU: Rectified Linear Unit

**Convolutional Neural Networks**- Acronyms: ConvNets, or CNN.
- 2 kinds of layers in the network
- Convolutional.
- Pooling.

- Improvements for working on AI
- ImageNet dataset, with 1.2 million labeled training samples and 1000 categories.
- NVIDIA CUDA, capable of 1 trillion op/s.
- Convolutional Nets then gained credibility and people switched to ConvNets.

- Very Deep ConvNet Architectures
- VGG.
- GoogLeNet.
- ResNet.

- Uses of Very Deep ConvNets
- Captions of pictures (e.g. generates sentences to explain an image, Facebook has a new system to vocalize images for blind people).
- Deep Face (Facebook face recognition).
- Classification + Localization (multi-scale sliding window), e.g. used for identifying the pose of an human body on a picture.
- “Big Sur”: Deep Learning engine at Facebook, consists of bundled GPUs, capable of recognizing objects on a picture (e.g. person, frisbee, broccoli, carrot, etc).
- Progresses have accelerated since much more people are working on Deep Learning recently.

**Differentiable Memory****Generative Adversarial Networks**