March 12, 2022

About the AIKA Neural Network

AIKA (Artificial Intelligence for Knowledge Acquisition) is a new type of artificial neural network designed to more closely mimic the behavior of a biological brain and to bridge the gap to classical AI. A key design decision in the AIKA network is to conceptually separate the activations from their neurons, meaning that there are two separate graphs. One graph consisting of neurons and synapses representing the knowledge the network has already acquired and another graph consisting of activations and links describing the information the network was able to infer about a concrete input data set. There is a one-to-many relation between the neurons and the activations. For example, there might be a neuron representing a word or a specific meaning of a word, but there might be several activations of this neuron, each representing an occurrence of this word within the input data set. A consequence of this decision is that we must give up on the idea of a fixed layered topology for the network, since the sequence in which the activations are fired depends on the input data set. Within the activation network, each activation is grounded within the input data set, even if there are several activations in between. This means links between activations server multiple purposes:

They propagate the activation value.
They propagate the binding-signal, that is used for the linking process.
They establish an approximate causal relation through the fired timestamps of their input and output activations.
They allow the training gradient to be propagated backwards.
Negative feedback links create mutually exclusive branches within the activations network.
Positive feedback links allow the binding neurons of a pattern neuron ensemble to support each other, by feeding the activation value of the patten neuron back to its input binding-neurons.

The AIKA network uses four different types of neurons:

Pattern-Neurons (PN)
Binding-Neurons (BN)
Inhibitory-Neurons (IN)
Category-Neurons (CN)

The Pattern-Neurons and the Binding-Neurons are both conjunctive in nature while the Inhibitory-Neurons and the Category-Neurons are disjunctive. The Binding-Neurons are kind of the glue code of the whole network. On the one hand, they bind the input-features of a pattern to the pattern-neuron and on the other hand receive negative feedback synapses from the inhibitory neurons which allow them to either be suppressed by an opposing pattern or allow themselves suppress another conflicting pattern. Similar to the neuron types there are also several different types of synapses, depending on wich types of neurons they connect. For example, the input synapses of an inhibitory neuron are always linked to Binding-Neurons, while the input synapses of Category-Neurons are always linked to pattern-neurons.

The following types of synapses exist within the AIKA network:

PrimaryInputBNSynapse ((PN|CN) -> BN)
RelatedInputBNSynapse (BN -> BN)
SamePatternBNSynapse (BN -> BN)
PositiveFeedbackSynapse (PN -> BN)
NegativeFeedbackSynapse (IN -> BN)
PatternSynapse (BN -> PN)
CategorySynapse (PN -> CN)
InhibitorySynapse (BN -> IN)

The binding-signal that is propagated along linked synapses carries a state consisting of either of these three values: SAME, INPUT, BRANCH
SAME indicates that the binding signal has not yet left its originating neuron pattern ensemble. INPUT indicates, that the binding signal was propagated to a dependant pattern neuron ensemble, for instance through the PrimaryInputSynapse or the RelatedInputSynapse. BRANCH indicates, that the binding signal originated from a binding activation instead of a pattern activation.

As already mentioned, the binding-neurons of a pattern neuron ensemble are used to bind this pattern to its input features. To verify that all the input-features occurred in the correct relation to each other the SamePatternSynapse is used. The SamePatternSynapse connects two binding-neurons within the same pattern neuron ensemble. The SamePatternSynapse connects two binding-neurons within the same pattern neuron ensemble and is only linked both ends of the synapse have been reached by the same binding-signal. Therefore, the SamePatternSynapse is used to avoid what is called the superposition catastrophe.

Since the category-neuron passes on the binding-signal of its input pattern-neuron, it can act as a category slot, therefore allowing the network great flexibility in abstracting concepts.

Initially, the network starts out empty and is then gradually populated during training. The induction of new neurons and synapses is guided by a network of template neurons and synapses.

Since this type of network contains cycles, the usual backpropagation algorithm will not work very well here. Also, relying on handcrafted labels that are applied to the output of the network can be highly error-prone and can create a large distance between our training signal and the weights that we would like to adjust. This is the reason for the huge number of training examples required for classical neural networks. Hence, we would like to train the network more locally from the patterns that occur in the input data without relying on supervised training labels. This is where Shannon entropy comes in quite handy. Take, for example, a word whose input features are its individual letters. In this example, we can measure the amount of information of each letter by calculating the Shannon entropy. Then we can look at the word pattern neuron as a way of compressing the information given by the individual letter neurons. The word neuron requires a lot less information to communicate the same message as the sum of the individual letters. This compression, or information gain, can be formalized using the mutual information, which can then be used to derive an objective function for our training algorithm.

A consequence of using entropy as a source for the training signal is that we need to know what the underlying probability distribution is for each neuron and for each synapse. That is, we need to count how often each neuron is fired. But determining this statistic involves some challenges as well:

Not all the neurons exist right from the start. Therefore, we need to keep track of an offset for each neuron, so that we can compute how many training examples this neuron has already seen.
Not all activations cover the same space within the input data set. For example there are a lot more letters than there are words in a given text. This has to be taken into account when determining the event space for the probability.
Then there is the problem that right after a new neuron is introduced, there are very few training instances for this statistic, yet we still want to adjust the weights of this neuron. Hence we need a conservative estimate of what we already know for certain about the probabilities. Since entropy is based on the surprisal (-log(p)), which gets large when the probability gets close to zero, we can use the cumulative beta distribution to determine what the maximum probability is that can be explained by the observed frequencies.
Another problem is related to concept drift. As we are using the distribution to adjust the synapse weights, this leads to changes in the activation patterns and therefore to changes in the distributions again.