January 31, 2021
Work in progress

How information is propagated through the network

General Architecture

In the AIKA neural network, neurons and their activations are represented separately. Thus, each neuron may have an arbitrary number of activations. For example, if the input data set is a text and a neuron represents a specific word, then there might be several occurrences of this word and therefore several activations of this neuron. The human brain probably does something similar through the timing of activation spikes.
These activations represent the information that the neural network was able to infer about the input data set. Every activation is directly or indirectly grounded in the input data set. That is, by following the input links of an activation down to the input layer of the network, one can determine the atomic input information this activation is referring to, similar to an annotation.
Since we have a one-to-many relationship between neurons and activations, we also need a one-to-many relationship between the synapses and the links between the activations. Therefore, we need a linking process to determine which activations are going to be connected with which others. Roughly speaking, activations can be linked if they are grounded in the same input data.
A consequence of separating the neurons and their activations is that we cannot rely on the network topology to give us the chronological sequence in which each activation is processed. Therefore each activation needs fired timestamp information that describes the point in time when this activation becomes visible to other neurons. In contrast to conventional neural networks, with their predefined and layered architecture, in which one layer after another is processed by hardware-accelerated matrix operations AIKA, follows a different approach. The core of the AIKA algorithm is a single queue to which primitive operations within the graph of the neural network are added and then processed in a predefined order. The advantage of this approach is that the activation network can be very sparse in comparison to the actual neural network. Only a tiny minority of neurons that are relevant to a given input data set are actually activated, and the order in which inferences are performed within the network is completely determined by the input data set instead of the hardwired network architecture.
Another difference from classical neural networks is that the complete structure of the network is inferred during training. The AIKA network starts out completely empty. In a first step, input neurons and their activations are added to the network. These represent either letters or words. Some relational input activations are then added to capture the correct sequence in which the input tokens occur. Then, during training, new neurons and synapses are instantiated and added to the network. These, however, are not completely created from scratch. Instead they are instantiated based on neurons and synapses in a template network. This template network contains several different types of neurons and synapses that are needed for specific tasks within the network. A similar approach can be observed in the human brain, where different types of neurons serve different purposes. Pyramidal neurons, for instance, possess long axons through which these neurons are able to connect parts of the brain that are far from each other. Interneurons, on the other hand, create local circuits. Depending on the type of neuron, different neurotransmitters are used, such as gamma-Aminobutyric acid (GABA) in some of the inhibitory neurons.

The weighted sum and the activation function

Like other artificial neural networks the synapses are weighted. To compute the activation value of a neuron, the weighted sum over its input synapses is computed. Then the bias value \(b\) is added to this sum and the result is sent through an activation function \(\varphi\).

$$net_j = {b_j + \sum\limits_{i=0}^N{x_i w_{ij}}}$$ $$y_j = \varphi (net_j)$$ Depending on the type of neuron, different activation functions are used. One commonly used activation function in the AIKA network is the rectified hyperbolic tangent function, which is basically the positive half of the \(\tanh()\) function. \[\varphi(x) = \Bigg \{ {0 \atop \tanh(x)} {: x \leq 0 \atop : x > 0}\]

The activation functions are chosen in such a way that they clearly distinguish between active and inactive neurons. Only activated neurons are processed and are thus visible to other neurons. These activations are expressed not only by a real valued number but also by an activation object.

Neuron Types

By choosing the weights and the threshold (i.e., the bias) accordingly, neurons can take on the characteristics of boolean logic gates, such as an and-gate or an or-gate. These two kinds of logic gates can be implemented by two different types neurons. Excitatory neurons take on the characteristics of an and-gate. These are used to detect patterns in the input data set. The second type of neuron is the inhibitory neuron, which acts like an or-gate. These are able to group a large number of excitatory neurons together and form a category of them. They are called inhibitory because they are able to form negative feedback loops with their input neurons, so that only one of their inputs is allowed to stay active. All other input neurons will then be suppressed.

The biological role models for these neurons are the spiny pyramidal cell and the aspiny stellate cell in the cerebral cortex. The pyramidal cells usually exhibit an excitatory characteristic and some of them possess long-ranging axons that connect to other parts of the brain. On the other hand, the stellate cells are usually inhibitory interneurons with short axons that form circuits with nearby neurons.
These two types of neurons also have different electrical signatures. Stellate cells usually react to a constant depolarising current by firing action potentials. This occurs with a relatively constant frequency during the entire stimulus. In contrast, most pyramidal cells are unable to maintain a constant firing rate. Instead, they fire quickly at the beginning of the stimulus but then the firing reduces in frequency even if the stimulus stays strong. This slowdown over time is called "adaption".
AIKA tries to mimic this behaviour by using different activation functions for the different types of neurons. Since AIKA is not a spiking neural network like its biological counterpart, we have only the neurons' activation value, which can be roughly interpreted as the firing frequency of a spiking neuron. In a sense, the earlier-described activation function based on the rectified tanh function quite nicely captures the adaption behaviour of a pyramidal cell. An increase of a weak signal has a strong effect on the neurons' output, while an increase of an already-strong signal has almost no effect. Furthermore, if the input of the neuron does not exceed a certain threshold, it will not fire at all. For inhibitory neurons, AIKA uses the rectified linear unit function (ReLU).

$$y = \max(0, x)$$

Especially for strongly disjunctive neurons like the inhibitory neuron, ReLU has the advantage of propagating its input signal exactly as it is, without distortion or loss of information.

The ability to make exact inferences using formal logic is the strength of classical AI. But neurons can do more than that. They can also incorporate many weak inputs that, only when they are taken together, give a clear picture of whether the neuron should fire or not. Therefore, given a fitting training rule, neurons are able to combine the strength of the classical symbolic AI and the connectionist sub-symbolic AI. The excitatory neurons, however, can be further split up into two additional sub-types — pattern neurons and the pattern part neurons. Both are conjunctive in nature. The pattern part neurons describe which lower-level input patterns are part of the current pattern. Each pattern part neuron receives a positive feedback synapse from the pattern to which it belongs. The pattern neuron, on the other hand, is only activated if a pattern as a whole is detected. For example, if we consider a word consisting of individual letters as a lower-level pattern, then we have a corresponding pattern part neuron for each letter whose meaning is that this letter occurred as part of this word. The pattern part neurons also receive negative feedback synapses from the inhibitory neurons in such a way that competing patterns are able to suppress each other.

Positive and Negative Feedback Loops

Another crucial insight is the need for positive and negative feedback loops. These are synapses that ignore the causal sequence of fired activations. Especially the negative feedback synapses are interesting because they require the introduction of mutually shielded branches for the activations that follow. They create a kind of independent interpretation of parts of the input data set, and only later on the decision is made which of these interpretations is selected. It is very similar to what a parse tree does, except that a parse tree is limited to syntactic information only. Another way to relate it to classical logic is to consider it as a non-monotonic inference, which in classical AI could not be solved properly since the weak influences that neural networks are able to capture are missing from classical logic.

Relational Pattern Part Neurons

In a text, the sequence of words is not arbitrary and is required to carry the meaning of that text. Thus, we also need a way to represent this sequence in the activations network. Let us consider the example of the two-word phrase "the cat". Here we have an input pattern neuron representing each word. Additionally we have two relational pattern part neurons per word that allow us to establish the connections to the previous and next word. Since the pattern part neurons belong to the pattern of a specific word, we also need an inhibitory neuron to avoid having to create a quadratic number of synapses to connect the corresponding relational neurons.

Now, how can we facilitate these relational neurons to match the phrase pattern neuron "P-the cat"? For this we need to ensure that the pattern part neurons of the pattern "P-the cat" do not map to any unrelated words in the text. There are two things needed for this. First, there needs to be a synapse from the pattern part neuron "PP-the" to the pattern part neuron "PP-cat" that establishes the requirement of an relationship between these two neurons. The second part is to verify this relationship during the linking process for the activations. This is done by requiring a connecting path through the activation network between the input and the output activation of a potential new link. This verification is performed by a 'visitor' class that steps through the activation network in search of such a connection. At each activation and each link a transition operation is performed in order to check that all of the conditions for a certain neuron or synapse type hold. When the visitor is able to close a loop, a new link is created.

The red arrows depict the path of the visitor.