Network training algorithms for feedforward and recurrent models have heavily concentrated on the learning of connection weights. Little effort has been made so far to learn the 'width'of the activation functions, that is a measure of the (possibly open) interval in R1 which defines the range of values that the function can take. A vaiety of situations where adaptive widths are sought or even necessary is discussed. Previous work concentrated on specific analytical forms for the activations, mainly concerning the shape (slope) of sigmoidal units, particularly under certain applicative constraints. This paper introduces novel algorithms to learn the widths of non-linear activation functions in layered networks. No assumption is made on the analytical form on the activations, which may or may not depend upon the trainable width and can possibly vary from unit to unit. The proposed algorithms rely on a stochastic gradient descent technique. Three instances of the algorithms are developed: (i) a unique width is shared among all the nonlinear units of the network; (ii) each layer of the net has its owm width; (iii) different widths are allowed on a neuron-by-neuron basis. Experimental results on real-world speech processing tasks validate the approach to a large extent: an 87.5% relative error rate reduction over the network with fixed width was obtained in a 10-class speaker identification roblem (that is a classification problem). a 17.4% relative word error rate reduction ws obtained averaging over a series of speaker normalization experiments (that is a multiple regression problem), with a continuous speech recognizer (10,000 word vocabulary). As a side effect, the proposed algorithms induce also a self-tuning mechanism for the topolloty of the network, allowing for an immediate pruning of redundant neurons

Networks with Trainable width of Activation Functions

Trentin, Edmondo
1997-01-01

Abstract

Network training algorithms for feedforward and recurrent models have heavily concentrated on the learning of connection weights. Little effort has been made so far to learn the 'width'of the activation functions, that is a measure of the (possibly open) interval in R1 which defines the range of values that the function can take. A vaiety of situations where adaptive widths are sought or even necessary is discussed. Previous work concentrated on specific analytical forms for the activations, mainly concerning the shape (slope) of sigmoidal units, particularly under certain applicative constraints. This paper introduces novel algorithms to learn the widths of non-linear activation functions in layered networks. No assumption is made on the analytical form on the activations, which may or may not depend upon the trainable width and can possibly vary from unit to unit. The proposed algorithms rely on a stochastic gradient descent technique. Three instances of the algorithms are developed: (i) a unique width is shared among all the nonlinear units of the network; (ii) each layer of the net has its owm width; (iii) different widths are allowed on a neuron-by-neuron basis. Experimental results on real-world speech processing tasks validate the approach to a large extent: an 87.5% relative error rate reduction over the network with fixed width was obtained in a 10-class speaker identification roblem (that is a classification problem). a 17.4% relative word error rate reduction ws obtained averaging over a series of speaker normalization experiments (that is a multiple regression problem), with a continuous speech recognizer (10,000 word vocabulary). As a side effect, the proposed algorithms induce also a self-tuning mechanism for the topolloty of the network, allowing for an immediate pruning of redundant neurons
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/1373
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact