Convergence of Backpropagation with Momentum for Network Architectures with Skip Connections

Authors

Chirag Agarwal Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, IL 60607, USA
Joe Klobusicky Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
Dan Schonfeld Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, IL 60607, USA

DOI:

https://doi.org/10.4208/jcm.1912-m2018-0279

Keywords:

Backpropagation with momentum, Autoencoders, Directed acyclic graphs.

Abstract

We study a class of deep neural networks with architectures that form a directed acyclic graph (DAG). For backpropagation defined by gradient descent with adaptive momentum, we show weights converge for a large class of nonlinear activation functions. The proof generalizes the results of Wu et al. (2008) who showed convergence for a feed-forward network with one hidden layer. For an example of the effectiveness of DAG architectures, we describe an example of compression through an AutoEncoder, and compare against sequential feed-forward networks under several metrics.

Downloads

Published

2021-06-10

Issue

Vol. 39 No. 1 (2021)

Section

Articles