LeCun, Bengio and Hinton’s “Deep Learning” presents deep learning as a transformative form of representation learning, in which computational systems discover hierarchical features directly from raw data rather than relying on hand-engineered descriptors. Their central argument is that deep neural networks achieve their power by composing multiple layers of non-linear transformations, each layer converting an input into increasingly abstract representations: pixels become edges, edges become motifs, motifs become object parts, and parts become recognisable objects. This architecture enables systems to solve the long-standing selectivity–invariance problem, remaining sensitive to meaningful differences while ignoring irrelevant variation such as lighting, position, accent, or background. The article’s technical core is backpropagation, the procedure that uses gradients to adjust millions of internal weights so that errors decrease across training examples. Its case studies show the method’s breadth: convolutional neural networks revolutionised image recognition after the 2012 ImageNet breakthrough, recurrent neural networks advanced speech and language processing, and distributed word representations allowed machines to map semantic similarities into vector space. The visual examples are especially revealing: the convolutional network trained on a Samoyed image illustrates the layered extraction of visual structure, while the image-captioning system shows how deep vision models and recurrent language models can be joined to translate visual scenes into sentences. In conclusion, the article frames deep learning not as a narrow algorithmic technique, but as a general computational paradigm whose success derives from learning complex representations at scale, thereby reshaping artificial intelligence across vision, speech, language, science, and industry.

LeCun, Y., Bengio, Y. and Hinton, G. (2015) ‘Deep learning’, Nature, 521, pp. 436–444.