RNN – How it works
Recurrent neural networks are similar to how our brain works. Besides using a feed-forward network to pass data from one node to another, they also retain some form of memory between the nodes, similar to short-term memory. This cognitive approach to processing information draws an interesting parallel to the ongoing discussion on human intelligence vs artificial intelligence.
The independent activations of each layer of a traditional neural network are transformed into dependent activation, where the output from one layer is passed as an input to the next layer.
Thus some form of memory is preserved throughout the layers, and the complexity of activating each layer is considerably reduced.
Some of the activation functions used in the RNN layers are listed below
This type of machine learning is well suited to deal with sequential data. As sequential data carry an extra significance to the order in which data is presented, they need to employ some form of memory to hold this information. RNN helps achieve this. No other algorithm is seen to have better results with sequential data when compared to RNNs.
As for training and assigning the proper weights, a back-propagation algorithm is used.
Training an RNN model can thus be a lot more different than a regular neural network. Here are the steps involved in training an RNN machine-learning model:
- The first input into the neural network is fed into the network
- The current state of the network is calculated by making use of the current inputs and the previous state information
- The current state is then fed as input to the next step.
- This process is repeated until the desired state is achieved such that the final output can be calculated
Once the final output is calculated, it is compared with the target output, and an error is generated. This error is back-propagated to the network to fine-tune the recurrent layer weights. But doing so could sometimes cause difficulties with the RNNs when the gradients get too large or too small. The back-propagation technique is generally called Backpropagation Through Time (BPTT).
Tip:
Train your Recurrent Neural Network models efficiently by using high quality data that can be provided by clickworker’s
Datasets for Machine Learning
Defining Characteristics of RNN
The structure of RNN is quite different from other neural networks. While most traditional neural networks are designed to work feed forward, RNN uses a back-propagation through time for training the model. The RNN architecture thus differs from the other neural networks as they have a linear data direction.
The hidden state of the RNN holds some information about the previous state and thus maintains a form of memory within the neural network.
The basic difference between a regular feed-forward neural network and a recurrent neural network is the route of information flow. In a regular feed-forward neural network, information flows only in one way and does not pass through a node a second time. But with RNNs, the information may be passed through the same node more than once, and the information flow is not strictly a straight route.
A good way to demonstrate how an RNN works is to discuss it with relevance to an example application. If you feed a regular feed-forward a word, say, ‘peacock,’ the model would try to process each letter one by one, and by the time it reaches the fourth letter, it would have no memory of the previous letters. So it would have no idea what the next letter would be and cannot make any predictions. But in the case of RNN, the previous characters will be remembered by an internal memory mechanism, and thus the model can predict the next letter based on its training.
Advantages of RNN
RNN finds great use in time series prediction problems as it can retain information through each network step. Since it can remember the previous inputs, RNN is said to have Long Short Term Memory
RNN can be used alongside CNN (Convolutional neural network) to optimize the results further. RNN helps to expand the effective pixel neighborhood further and thus improves the final results.
History of RNN
Recurrent neural networks were first conceptualized by David Rumelhart in 1986 whereas a similar network by the name of Hopfield networks was discovered earlier by John Hopfield in 1982. Since then, there have been several developments in the RNN architecture, the most significant being the LSTM (Long short-term memory) network developed in 1997 by Hochreiter and Schmidhuber.
LSTM is now a popular network used in applications such as speech recognition, handwriting recognition, machine translation, language modeling, and multilingual language processing. It is also used in Google Android for its text-to-speech synthesis application.
Challenges of RNN
Training an RNN can be challenging given the many times of back propagation with errors to finalize the weights for the recurrent layers. It is a time-consuming process.
RNN also suffers from gradient exploding or gradient vanishing problems. As mentioned earlier, RNN uses back-propagation through time and calculates a gradient with each pass to adjust the nodes’ weights. But as you go through multiple states, the gradients between the states could significantly keep reducing and reach zero, or the converse gradients could become too large to handle during the back-propagation process. The exploding gradient issue can be handled by using a threshold value above which the gradients cannot get bigger. But this solution is often considered to cause quality degradation and is thus not preferred.
RNN also does not really consider future inputs to make the decisions and can thus suffer from inaccuracies in predictions.
Types of RNN
Several variations of RNN have been developed, each focusing on a problem to be solved or trying to achieve some optimization. Two major RNNs that have been developed to deal with the challenges faced by RNN are the:
- Long Short-Term Memory Networks
This type of RNN is designed to retain a certain amount of relevant information through the neural network with the help of function layers called gates. The memory blocks used in this type of neural network are called cells, where the information is stored. The gates handle the memory manipulation of retaining relevant information while discarding irrelevant information. There are three gates used in LSTM networks, namely Forget gate, Input Gate, and Output Gate
LSTM finds its use in applications such as:
- Language Modeling, where a sequence of words can be given as inputs and predictions for sentence level, character level, and paragraph level can be made.
- Image processing – LSTM networks can be trained to process image data and identify the objects in the image. They can also be used to recognize handwriting
- Speech recognition
- Music generation can be achieved by analyzing input notes and creating a musical piece using the LSTM network.
- Machine translation
- Image captioning
- Handwriting generation
- Question answering chatbots
LSTM deals with the vanishing gradient problem very effectively and is better at handling noise, continuous values, and distributed data values when compared to a regular RNN.
There have also been several variations of the basic LSTM architecture, with improvements to the cell designs and gate layers.
Even though LSTM offers a great improvement over regular RNNs, they also suffer from certain difficulties.
- The vanishing gradient problem could still cause a performance issue with LSTM
- LSTM can be hardware expensive as the cells require high memory bandwidth
- With huge data volumes in data mining, the short-term memory provided by LSTM often proves insufficient.
- LSTM networks often face the overfitting problem and have to employ proper regularization methods such as the Dropout method.
Video on LSTM
- Gated Recurrent Unit Networks
Gated Recurrent Unit Networks is another variation of the basic RNN. It also uses gates but does not have an internal cell state, as seen in the LSTM network.
The three gates used are:
- Update Gate: This gate is responsible for deciding which information needs to be retained through the network.
- Reset Gate: This gate is responsible for discarding or forgetting irrelevant information.
- Current Memory Gate: The current memory gate is a part of the input gate that tries to make the input zero-mean and helps reduce the complexity involved with the input gate operations.
GRU (Gated Recurrent Unit) is often used as an alternative to RNN. It is faster and less memory intensive. It also solves the vanishing gradient problem efficiently with the help of its update gate and reset gate mechanisms.
But it does not surpass the accuracy produced by LSTM networks.
- Bidirectional recurrent neural networks
In bidirectional RNNs, the nodes can gather inputs from both previous states and future data to calculate the current state.
Besides these popular network architectures, the RNN networks can be broadly classified into the following types based on the way the nodes are connected:
- One-to-one – This is the usual architecture followed by traditional neural networks.
- One-to-many – In this type of architecture, a single input can be mapped to multiple outputs. This architecture is used in applications like music generation.
- Many-to-one – In a many-to-one architecture, multiple outputs are used to create a single output. This applies to sentiment analysis and emotion detection applications where multiple words and sentences are used to arrive at one conclusion.
- Many-to-many – Many-to-many networks could have several variations of input and output mappings. It is often used for language translation applications.
Conclusion
RNNs are quite different kinds of neural networks as they have a neural memory associated with them. They use the back-propagation method for model training and thus have challenges such as exploding gradient and vanishing gradient. But advanced RNNs such as the LSTM help solve these issues and are highly preferred in applications such as speech synthesis, sentence prediction, translations, music generation, and more. RNNs are integral to many AI applications, such as the chatbots in use today.
Despite the widespread use of RNNs, they still have their limitations when dealing with long-range dependencies where data relations are several steps apart.