How it works

The Artificial Neural Network

Aika is a neural network simulation algorithm, that uses neurons to represent a wide variety of linguistic concepts and connects them via synapses. In contrast to most other neural network architectures, Aika does not employ layers to structure its network. Synapses can connect arbitrary neurons with each other, but the network is not fully connected either.
Like other artificial neural networks (ANN) the synapses are weighted. By choosing the weights and the threshold (i.e. the bias) accordingly, neurons can take on the characteristics of boolean logic gates such as an and-gate or an or-gate. To compute the activation value of a neuron, the weighted sum over its input synapses is computed. Then the bias value \(b\) is added to this sum and the result is sent through an activation function \(\varphi\).

$$net_j = {b_j + \sum\limits_{i=0}^N{x_i w_{ij}}}$$ $$y_j = \varphi (net_j)$$ Depending on the type of neuron, different activation functions are used. One commonly used activation function in Aika is the rectified hyperbolic tangent function, which is basically the positive half of the \(\tanh()\) function. \[\varphi(x) = \Bigg \{ {0 \atop \tanh(x)} {: x \leq 0 \atop : x > 0}\]

The activation functions are chosen in a way that they clearly distinguish between active and inactive neurons. Only activated neurons are processed. These activations are expressed not only by a real valued number but also by an activation object.


The advantage of having activation objects is that, through them, Aika is able to cope with the relational structure of natural language text by making the activation relate to a specific segment of text. In a way these activations can be seen as text annotations that specify either the start and end character or the position of a word (relational id).
Words, phrases and sentences are in a relation to each other through their sequential order. The assignment of text ranges and word positions to activations is a simple yet powerful representation of the relational structure of text and avoids some of the shortcomings of other representations such as bag of words or sliding window. Since the activations are propagated along through the network, synapses need to be able to manipulate the text range and the word position while the activations are passed on to the next neuron.


One common problem when processing text is that of cyclic dependencies. In the example 'jackson cook' it is impossible to decide which word has to be resolved first, the forename or the surname, since both depend on each other. The word jackson can be recognized as a forename when the next word is a surname and the word cook can be recognized as a surname when the previous word is a forename. To tackle this problem Aika employs non-monotonic logic and is therefore able to derive multiple interpretations that are mutually exclusive. These interpretations are then weighted and only the strongest interpretation is returned as the result.
Consider the following network, which is able to determine whether a word, which has been recognized in a text, is a forename, surname, city name, or profession. If for instance the word "jackson" has been recognized in a text, it will trigger further activations in the two jackson entity neurons. Since both are connected through the inhibiting neuron only one of them can be active in the end. Aika will therefore generate two interpretations. But these interpretations are not limited to a single neuron. For instance if the word neuron cook gets activated too, the jackson forename entity and the cook surname entity will be part of the same interpretation. The forename and surname category links in this example form a positive feedback loop which reinforces this interpretation.

New interpretations are spawned if both input and output of a negative recurrent synapse get activated. In this case a conflict is generated. An interpretation is a conflict free set of activations. Therefore, if there are no conflicts during the processing of an input data set, only one interpretation will exist and the search for the best interpretation will end immediately. On the other hand, if there are conflicts between activations, a search needs to be performed which selects or excludes individual activations and tests how these changes affect the overall weights sum of all activations. This sum is also called the objective function \(f\) and can be stated in the following way: $$f = \sum\limits_{j \in Acts}{\min (-g_j, net_j)}$$ $$g_j = \sum\limits_{i = 0, w_{ij} < 0, w_{ij} \in Recurrent}^N{w_{ij}}$$ The value \(g_j\) is simply the sum of all negative feedback synapses. The intention behind this objective function is to measure a neurons ability to overcome the inhibiting input signals of other neurons.

During the search, the individual activation values of each neuron are adjusted according to the current search path. Selected activations are computed and propagated through the network, excluded activations are set to zero. To reduce costs of the interpretation search several optimizations have been implemented, so that the search avoids to follow branches that have already been computed. Furthermore, the computation results in each search node are cached if possible. A stack trace of the resulting binary search tree looks as shown in the box below. This list shows a single path from a leaf node to the root node. Each search node is associated with a single activation. The nodes are ordered in a way, that the causality of non recurrent synapses is respected. Depending on the path each activation is either selected or excluded. Decisions on lower level nodes might limit the available choices for higher level nodes. The values DW and DN are the weight delta and norm delta for this search node. The values AW and AN are the accumulated weights and norms up to this node. The values LIMITED, CACHED, EXPLORE, SELECTED, EXCLUDED, SIM-CACHED, SIM-COMPUTED and MODIFIED are simply statistics for different events during the search.

	7 RANGE:(12,16) C-surname
	AW:9.998 AN:20.3 DW:9.998 DN:0.0

	6 RANGE:(12,16) INHIBIT
	AW:0.0 AN:20.3 DW:0.0 DN:0.0

	5 RANGE:(12,16) E-cook (surname)
	AW:0.0 AN:20.3 DW:0.0 DN:5.0

	4 RANGE:(12,16) E-cook (profession)
	AW:0.0 AN:15.3 DW:0.0 DN:4.5

	3 RANGE:(4,11)  C-forename
	AW:0.0 AN:10.8 DW:0.0 DN:0.0

	2 RANGE:(4,11)  INHIBIT
	AW:0.0 AN:10.8 DW:0.0 DN:0.0

	1 RANGE:(4,11)  E-jackson (forename)
	AW:0.0 AN:10.8 DW:0.0 DN:5.0

	0 RANGE:(4,11)  E-jackson (city)
	AW:0.0 AN:5.8 DW:0.0 DN:5.8


Synapses in Aika consist not just of a weight value but also properties that specify relations between synapses. These relations can either be used to imply an constrained on the matching input text ranges or the dependency structure of the input activations. Biological neurons seem to achieve such a relation matching through the timing of the firing patterns of their action potentials.
The complete list of synapse properties looks as follows:

                new Synapse.Builder()
                new Synapse.Builder()
                new Relation.Builder()
                        .setRelation(new Equals(END, BEGIN)),
                new Relation.Builder()
                        .setRelation(new Equals(BEGIN, BEGIN)),
                new Relation.Builder()
                        .setRelation(new Equals(END, END))

Unlike other ANNs, Aika allows to specify a bias value per synapse. These bias values are simply summed up and added to the neurons bias value.
The property recurrent states whether this synapse is a feedback loop or not. Depending on the weight of this synapse such a feedback loop might either be positive or negative. This property is an important indicator for the interpretation search.
Range relations define a relation either to another synapse or to the output range of this activation. The range relations consist of four comparison operations between the between the begin and end of the current synapses input activation and the linked synapses input activation.

The next property is the range output which consists of two boolean values. If these are set to true, the range begin and the range end are propagated to the output activation.
The last two properties are used to establish a dependency structure among activations. The instance relation here defines whether this synapse and the linked synapse have a common ancestor or are depending on each other in either direction. During the evaluation of this relation only those synapses are followed which are marked with the identity flag. Depending on the weight of the synapse and the bias sum of the outgoing neuron, Aika distinguishes two types synapses: conjunctive synapses and disjunctive synapses. Conjunctive synapses are stored in the output neuron as inputs and disjunctive synapses are stored in the input neuron as outputs. The reason for this is that disjunctive neurons like the inhibitory neuron or the category neurons may have a huge number of input synapses, which makes it expensive to load them from disk. By storing those synapses on the input neuron side, only those synapses need to stay in memory, that are needed by an currently activated input neuron.

Neuron Types

There are two main types of neurons in Aika: excitatory neurons and inhibitory neurons. The biological role models for those neurons are the spiny pyramidal cell and the aspiny stellate cell in the cerebral cortex. The pyramidal cells usually exhibit an excitatory characteristic and some of them possess long ranging axons that connect to other parts of the brain. The stellate cells on the other hand are usually inhibitory interneurons with short axons which form circuits with nearby neurons.
Those two types of neurons also have a different electrical signature. Stellate cells usually react to a constant depolarising current by firing action potentials. This occurs with a relatively constant frequency during the entire stimulus. In contrast, most pyramidal cells are unable to maintain a constant firing rate. Instead, they are firing quickly at the beginning of the stimulus and then reduce the frequency even if the stimulus stays strong. This slowdown over time is called adaption.
Aika tries to mimic this behaviour by using different activation functions for the different types of neurons. Since Aika is not a spiking neural network like the biological counterpart, we only have the neurons activation value which can roughly be interpreted as the firing frequency of a spiking neuron. In a sense the earlier described activation function based on the rectified tanh function quite nicely captures the adaption behaviour of a pyramidal cell. An increase of a weak signal has a strong effect on the neurons output, while an increase on an already strong signal has almost no effect. Furthermore, if the input of the neuron does not surpass a certain threshold then the neuron will not fire at all. For inhibitory neurons Aika uses the rectified linear unit function (ReLU).

$$y = \max(0, x)$$

Especially for strongly disjunctive neurons like the inhibitory neuron, ReLU has the advantage of propagating its input signal exactly as it is, without distortion or loss of information.

The Pattern Lattice

One design goal of the Aika algorithm is, to be able to efficiently simulate huge networks with millions of neurons. However, this is only possible if only a very small part of the neurons become active for any given input data set. One prerequisite for achieving this goal is the earlier mentioned distinction between active and inactive neurons. If one neuron gets activated, only those outgoing neurons are required to be computed, that would exceed their threshold based on this new input. Depending on the synapse weights and bias settings of the outgoing neuron it may either fire directly (for or-gate like neurons) or it may require other input neurons to fire first (for and-gate like neurons). If the neuron is an and-gate like neuron then there is an efficient way to determine only those outgoing neurons that might be activated by a new input signal.
This is where the pattern lattice comes into play. The pattern lattice is used to store all sub patterns of a given input pattern. If for instance a neuron requires the inputs \(A\), \(B\) and \(C\) to be activated before the neuron itself can be fired, then the following pattern lattice would emerge.

As with the neural network, activations can be propagated within the pattern lattice as well. If only the two inputs \(A\) and \(B\) are activated by an incoming signal, then the pattern node \(AB\) would be activated as well, but not the pattern node \(ABC\). The advantage of this data structure is that only minimal refinements have to be tested from one level in the pattern lattice to the next. If all neurons dependent on input \(A\) were tested directly, the number of outgoing neurons might be so large that many unnecessary calculations would be required. But before we can build the pattern lattice, we first have to convert the weighted inputs of the neuron into a boolean representation. Here we only need to distinguish between active and inactive neurons. Hence let us consider a neuron with a heaviside activation function (\(\theta(x) = 1\) if \(x > 0\) otherwise \(0\)).

$$b = -0.5$$ $$w_0 = 1.0$$ $$w_1 = 0.3$$ $$w_2 = 0.3$$ $$y = \theta \Bigg({b + \sum\limits_{i=0}^N{x_i w_i}}\Bigg)$$

This example neuron can then be reformulated into a disjunctive normal form (-0.5 + 1.0 > 0 OR -0.5 + 0.3 + 0.3 > 0) boolean expression:

$$y = x_0 \vee (x_1 \wedge x_2)$$

A boolean expression in disjunctive normal form consists of a set of conjunctions that are connected by a disjunction. That means we can store these conjunctions as nodes within the pattern lattice and connect them by a disjunction node. The activation of this disjunction node is then the precondition before the associated neuron can be activated. So the weighted sum of the neuron still needs to be computed, but only when the disjunction node gets activated. Another advantage of the pattern lattice is that large parts of the lattice can be shared by multiple neurons.