E. Paxon Frady: November 2012

Wednesday, November 28, 2012

Positive or negative feedback?

The main difference between the predictive-coding model (PC) and the adaptive-resonance model (ART) is that PC is based on negative feedback of prediction errors while ART is based on a positive-feedback resonance. These are very different ideas about how feedback is modulating responses, but they may be reconcilable. Each method has support - PC is Bayesian and it seems that feedback is overall negative. ART predicts bursting and all the pyramidal cell synapses are excitatory.

So how might they be reconcilable? The positive feedback system in ART could turn out to be overall negative, since the responses are normalized. I'm still trying to reconcile the effects of positive feedback signals and how they might be used. Grossberg compares the top-down signals to attention, and the attention literature suggests that attentional modulations effect the gain of responses. So does this mean activation of the apical tuft via top-down signals results in a scaling of the ultimate IO function? Do different neurons then receive different amounts of scaling depending on the feedback signals?

One way of looking at it is that the top-down layer (2nd layer) develops a normalized population code just like the first layer. The 2nd layer then sends back a normalized response vector to the 1st layer. If the top-down signal was perfect at matching the bottom-up signal, and these signals were multiplicative, then it would be as if you were squaring the first-layer. The population would go from x to x^2 (after re-normalization). This means 2/3 of the neurons will be inhibited and 1/3 will be excited. This may lead to some weird effects in a steady-state, as the population-code will change. This would then change the 2nd layer and then further alter the first layer.

What if it was additive instead? The second layer sends back to the first-layer the same normalized vector that the first layer was producing. This would be like multiplying the first-layer by 2, which after re-normalization would lead the first layer back to the same state. This seems better, population-code is maintained, and the well-predicted first layer doesn't change. This could also look like multiplication in the grand-scheme.

Imagine that the goal of learning is for the top-down layer to send back to the first layer the same population vector. This means that differences in the population vectors would lead to some form of plasticity. If a neuron received more top-down input than bottom-up input, then the top-down synapses should get weaker. If it received more bottom-up input than top-down then the synapses should get stronger.

Layer 4 is then the data layer, Layer 2/3 is a classification layer. These interact as described (2/3 is trying to predict 4). L2/3 is chunky - imagine L4 as data points in a high-dimensional space, L2/3 are the boundaries that classify these points. Sometimes different classifcation boundaries overlap, if there is no extra evidence then L2/3 goes through hypothesis testing, cycling through different possible classifications. Higher inputs or lateral inputs can favor one hypothesis over another.

Perhaps another layer is somehow a parameterization layer (maybe L6 or L5?). This layer describes the transformation of the classification back to the data. As in it is like the principal component scores of the clusters. Lets imagine language since it is a good hierarchical system. So this part of cortex is classifying the word "grape". The data layer gets the input sounds, and one part of it is representing the a sound. Imagine that the a sound has a lot of variability - it can be said quickly or stretched out. L2/3 classifies it as a, and L6 describes the amount of quickness or stretchiness of the a sound. This helps remap the classification back to the data, and describes a parameterization of the classification.

L4 receives data and sets up a population vector that is equivalent to a point in high-dimensional space. L2/3 creates a cluster (more lateral connections, perhaps is even binary activity). If the L4 point is within 2 clusters, then the clusters will turn on and off based on the probabilities of each (over time). L6 then describes the data based on the clustering - as in like the principal components of each cluster. (I'm not sure if this is necessarily L6, but it seems like the PCA parameterization of the clusters would be useful somewhere).

There may not need to even be a different layer. The cluster is like the 0th principal component (describing the mean of the data). So you could imagine how some neurons in L2/3 are calculating the 0th component, some are calculating the first etc. L2/3 could just be the low dimensional parameterization.

Monday, November 26, 2012

Canonical Microcircuits for Predictive Coding

Bastos, AM. Usrey, WM. Adams, RA. Mangun, GR. Fries, P. Friston, KJ. (2012) Canonical Microcircuits for Predictive Coding. Neuron 76: 695-711.

This looks like a good review that covers some papers I've been meaning to get to.

Predictive coding is the most plausible candidate for making generative models.

superficial layers of cortex show neuronal synchronization in gamma range, deep layers prefer alpha or beta - Maier 2010, Buffalo 2011. Feedforward connection originate from superficial layers, feedback from deep layers.

Statistical connections show that most are "feedforward" L4-L23-L5. Fewer feedback. feedback connections were typically seen when pyramidal cells in one layer targeted inhibitory cells in another.

Feedforward connections are thought to be driving and can cause spiking, feedback connections are thought to modulate receptive field characteristincs according to the context. Feedforward have strong, depressing EPSPs, feedback have weak facilitating EPSPs. Sherman 2011 - retinal input to LGN is driving, cortical input is modulatory. But other studies suggest that feedback and feedforward can both have driving and modulatory effects.

Feedback connections convey predictions, feedforward connections convey prediction errors. Effective feedback "connectivity is generally assumed to be inhibitory." Prediction errors lead to more gamma activity - from superficial layers failing to supress deeper layers. Todorovic 2011, Wacongne 2011. Imaging studies also show less activity when stimuli are predictable. (seems that inhibition has biggest influence in the surround).

Most long-range feedback connections are glutamatergic, although some may be inhibitory. L1 inhibitory neurons could be mediating this inhibition.

Simple cells in L4, complex cells in L2/3 and deep layers. Simple cells have driving effects on complex cells.

Feedforward is sent through the gamma-band. Feedback is sent through alpha-beta frequencies

Predicitve coding = Bayesian inference. Hierarchical. Biology is minimizing surprise (entropy) which mean maximizing bayesian evidence for their generative model. Can build an entire model based on predictive coding equations, subtractive errors etc.

Figure 5: Left: the canonical microcircuit based on Haeusler and Maass (2007), in which we have removed inhibitory cells from the deep layers because they have very little interlaminar connectivity. The numbers denote connection strengths (mean amplitude of PSPs measured at soma in mV) and connection probabilities (in parentheses) according to Thomson et al. (2002). Right: the proposed cortical microcircuit for predictive coding, in which the quantities of the previous ﬁgure have been associated with various cell types. Here, prediction error populations are highlighted in pink. Inhibitory connections are shown in red, while excitatory
connections are in black. The dotted lines refer to connections that are not present in the microcircuit on the left (but see Figure 2). In this scheme, expectations (about causes and states) are assigned to (excitatory and inhibitory) interneurons in the supragranular layers, which are passed to infragranular layers. The
corresponding prediction errors occupy granular layers, while superﬁcial pyramidal cells encode prediction errors that are sent forward to the next hierarchical level. Conditional expectations and prediction errors on hidden causes are associated with excitatory cell types, while the corresponding quantities for hidden
states are assigned to inhibitory cells. Dark circles indicate pyramidal cells. Finally, we have placed the precision of the feedforward prediction errors against the superﬁcial pyramidal cells. This quantity controls the postsynaptic sensitivity or gain to (intrinsic and top-down) presynaptic inputs. We have previously discussed this in terms of attentional modulation, which may be intimately linked to the synchronization of presynaptic inputs and ensuing postsynaptic responses (Feldman and Friston, 2010; Fries et al., 2001).

This is based on equation 1:

And they mathematically describe how the different frequencies would dominate in the different layers based on these equations.

Sunday, November 25, 2012

Hippocampal Pyramidal Neurons Comprise Two Distinct Cell Types that Are Countermodulated by Metabotropic Receptors

Graves, AR. Moore, SJ. Bloss, EB, Mensh, BD. Kath, WL. Spruston, N. (2012) Hippocampal Pyramidal Neurons Comprise Two Distinct Cell Types that Are Countermodulated by Metabotropic Receptors. Neuron 76: 776-789.

Two pyramidal cell types? "our results support a model of hippocampal processing in which the two pyramidal cell types are predominantly segregated into two parallel pathways that process distinct modalities of information"

rat hippocampal slices. supratheshold step current evoked one of two patterns: regular spiking or bursting. Extracted several (30ish) features and clustered them. Only two clusters came out of analysis.

The bursting class has a more extensive tuft, whereas regular-spiking cells have more extensive basal dendrites.

They made EPSC-like current injections. All neurons responded with a mixture of single-spikes and bursts. But cells still distinguishable from temporal pattern of bursting. Regular spiking neurons responded at first with single spikes early and bursts later. Bursting neurons fired bursts early and single spikes later.

Burstiness can be modulated based on activity - theta-burst stimulation can make both cell types more bursty. This burst plasticity can be modulated by glutamate and acytelocholine antagonists. Burst plasticity does not interconvert one cell type to the other. Thus there are two stable cell type pathways out of CA1.

So there's a seperate spatial and non-spatial loop that doesn't go through DG. And the tri-synpatic pathway through DG that combines the info. The two types are separated - CA1p has more of the late bursting cells (with larger basal trees), and CA1d has more early bursting cells.

Temporal receptive chunking

One thing that was really interesting from SfN that I meant to put on this a while back, was this illusion about speech sounds. Basically, a recording of speech sounds was broken up into small segments, and these segments were then played backwards. So every, say, 20ms block is flipped around, but each block is in the same forward order.

When the chunk was below a certain threshold, you couldn't even tell a difference between the normal speech and the reversed (I think it was 20 ms sounded normal). Above the threshold it sounded like complete unidentifiable garble. So it sounds like there must be some temporal chunking going on in the receptive fields. A set of frequencies together in small temporal windows will sound identical. The smallest chunks - the 20 ms sized windows, are likely to be something like primary cortical cell temporal smeering (since it sounds basically normal - sufficiently enough for higher order areas to recognize the speech and help make it sound correct).

And then you imagine that the temporal receptive fields then start increasing in their chunk sizes. I just have a hard time thinking about how that information could be encoded. It makes me think about theta in hippocampus. So its like every cycle of theta the place cells fire in the same order based on how close to their receptive fields the animal is. This sets up a sequence pattern. So I guess these types of sequences can be learned and reliably replicated. The sequence is around 7 place cells long, and then it could repeat, or it could move up a few place cells. i.e. the first sequence is [1,2,3,4,5,6,7], and then the next sequence is [3,4,5,6,7,8,9].

Well then this makes it sound like a polychronous pattern can be repeated with a lower frequency signal (like theta) and that keeps timing over a longer scale. And its like the one polychronous pattern primes the next polychronous pattern in a top-down fashion. Imagine bottom-up inputs are driving the cells and if a certain bottom up input stays stationary then a theta rhythm of the same polychronous pattern at the higher level starts repeating. These send learned top-down inputs to the next patterns in line - top-down makes them more excitable, but they don't fire without bottom-up drive. Once the bottom up shifts, then the expected patterns will be more easily excited.

So the order gets set up by how excited they are in the population-code. But then the receptive field fall offs must be drastic - as red and orange just disappear, instead of falling behind - e.g to the cyan spot (looking at the bottom of the figure).

So, some kind of STDP rule could learn this. And if it gets repeated from the bottom-up inputs staying constant, then the theta-chunk will learn the chunk cycle. Then as the chunk shifts a new chunk-cycle will be oscillating. I'm not sure what is happening on the up-stroke of the theta wave. Are there just more cells here (like do cells fire at each gamma troughs on both the down and up-stroke of theta), or is there really a gap after the bottom of the theta-wave. A gap here may be preventing the chunks from learning a chunk loop, but this would also be unlearned by STDP.

And the other interesting thing is that there is a general time-window in which the things before you spike after your spike - slightly less than one theta cycle. If the delays were as long as a theta, then loops would definitely get formed. These neurons would be learning temporal chunks at more of a theta frequency.

Wednesday, November 21, 2012

How to create a mind 4-6

So chapter 7 looks to be the meat of the book. He alluded to it several times in these chapters. This was basically an overview of some topics from biology and neuroscience. He went over all the other brain structures besides neocortex and talked about some other interesting thigs about neocortex - like the idea that visual areas can process auditory information in blind people.

He dismisses everything that doesn't have a "neocortex". And reptiles are just basically useless. I don't know what he thinks about turtle dorsal cortex-like structures.

He mentions that his pattern recognizers are on order of hundreds of neurons, and basically is a cortical column. He says that they are intertwined in such a way that you wouldn't see them (for whatever reason). He talked about how most of synaptic structure was hard encoded within the columns, and that learning was really about changing connections between columns. Connections are avaible and utilized or pruned - not completely regrown.

One thing he talked about that I was just thinking about was the temporal expansion of receptive fields. There was some work by Uri Hasson he cited that pertains to increasing temporal expansion of the higher-order receptive fields. It will be important to consider how this is implemented by neurons. At the highest level it could be like Mongillo, there could be different amounts of like STP/STD that changes the temporal responses of the circuits. It would be interesting to know if neurons with longer receptive fields are firing constantly when their preferred stimulus is shown.

Monday, November 19, 2012

Direct control of firing rate gain by dendritic shunting inhibition.

Capaday, C. Van Vreeswijk, C. (2006) Direct control of firing rate gain by dendritic shunting inhibition. Journal of Integrative Neuroscience 5(2): 199-222.

Ok, crap. Just skimming over this paper he basically gets to the same model that I have in the local-bend system. Ugh, I knew someone must have done this. We shall see what he came up with. He makes his compartments dendrite and soma - which are equivalent to my soma and axon respectively. I've never heard of this journal, and this paper is only cited by one other paper in pubmed, and that paper is something completely different.

Intro has a nice review of all the noise papers. Holt & Koch, Chance et al, Mitchel & Silver, Prescott and De Koninck. There are slight differences in the noise mechanism across these papers.

"soma acts as an IF neuron attached, by a coupling conductance, to a passive dendritic compartment."

He is taking into account the current from the action-potential, which is normally neglected. He makes Ispike like a delta function with some integral S. Then he derives a way of incorporating the spike current as a value for the reset. So then he just jumps to his alpha-motor neuron model, where he adds in some more conductances - an AHP, and a K.

He then analytically derives the IFR in the two-compartment model. Its quite complicated. But then he gets to the firing rate being:

R=1/C_S(V_T-V_r) * (I_S + g_C/(g_C+g_D)I_D)

R is firing rate, C_S soma capacitance, V_T threshold, V_r reset, I_S current injected into soma (axon, for mine), I_D is current injected into dendrite (soma, for mine). g_C is coupling conductance, g_D is the approximate conductance of the dendrite. He then derives, similarly to mine, how Holt & Koch works, and how you can get division if current is injected in dendrite and shunting is in dendrite.

So, A is just like Holt & Koch (with a passive dendrite attached). B is just an IF and you slightly increase the conductance through the dendrite to ground. C is equivalent to A, just the current is going through an extra resistor. D is the gain-control type.

He next is considering synaptic conductances instead of currents. He talks about the upper-limit when soma shunting prevents the neuron from firing at all - due to the Voltage saturation and the excitatory reversal potential.

"The observation emphasizes that it is the net current reaching the SIZ at the soma which determines firing rate." ala my figure 5B.

So, yes, basically the same. He doesn't catch the trick of changing the reset potential such that the zero-crossing is the same and thus you get pure scaling. And in general I think the seperation of soma and axon is better than like dendrite and soma. All the conductance stuff he talks about is the same as what happened with my model, I just ignored it because of the complicated properties of the conductance curves - it was hard to make something work based on those functions. And acting like the dendrite is one-compartment and can actually saturate seems unlikely anyway, and currents summing from dendrites is more appropriate.

Sunday, November 18, 2012

How to create a mind 1-3

So, Ray Kurzweil has just published a book called "How to create a mind". I've read the first three chapters so far. It's pretty interesting, but feels like its missing the details in much of the way Grossberg's work was.

He makes some interesting points about Einstein and Darwin and what their thought-processes were when they were making their big discoveries. Einstein is famous for his thought experiments and Kurzweil walked through his version of Einstein's thinking process. He makes the case that to understand the brain we have to make these types of thought experiments - thought experiments about thought.

It makes sense. There's definitely a lot of insight one could get by "introspecting". I mean it's a really interesting question: What information does my consciousness have access to? It's not everything (we have little access to the "Where" pathway, we see illusory contours, so we don't have access to the raw data.)

Kurzweil's big thing is talking about hierarchical pattern recognizers. His whole "unifying theory" is called PRTN - pattern recognition theory of neocortex. He goes into explaining hierarchies - ilustrating how certain patterns make up letters, letters make up words, words make up sentences and so on. He says in a way that language is crafted by the brain's hierarchical organization - our hierachical pattern recognizer lead us to creating a hiearchical pattern communication system. Oh yeah, he calls spoken language our "first invention", and he calls written language our "second invention"...

So far its pretty simple in AI circles - some "pattern recognizer", lets say a neuron, gets bottom-up inputs that have size (firing-rate), variance (spike-timing), and a weight (synapse strength). He even talks about dendrites and axons, but he does it in a strange way - either he's just too much into the AI field and not actual neuro, or he is maybe hinting at the idea that dendrites can each be a pattern recognizer (i'll probably find out later).

So the neuron recognizes its pattern and outputs its own pattern having size and variance information, which goes on to the next level. In the next level the same thing happens and so a hiearchy is arranged, and ultimately it could also be recursive. He talks about how top-down is coming back down constantly, and that its all feedback etc. Top-down is predicting bottom-up style.

So we'll see where he goes. It seems simple now, but its still just the beginning of the book - maybe 1/5th. What he has said so far seems just obviously true in such a vague sense, the real question is in the details.

I'm still really digging the resonance idea. One thing I've been thinking about recently is that a simple system that could do like an instant PCA on its data set, and then as more data get collected the best PCA dimensions are pulled out. Or it could be equivalently done for clustering.

I'm struggling with the question of how do the timescales change as we go up the hierarchy? What is happening at the top? So I can see how visual cortex is basically operating as an online system, processing and basically memorizing all of the inputs coming in constantly. But then as you go up the visual hierarchy, the neural representations seem to be for things that move more slowly and can be in a larger area. For the object recognition game, it definitely seems like spatial invariance is being done by simple and complex cells that stack up in a hierarchy. The complex cells are important to spatially invariant object recognition. I can see how you could build a hierarchical pattern recognizer with layers of complex and spatial cells to build to a spatially invariant object classifier. But at the same time that you go up in space, it also feels that you should go up in time. There's something to objects being both spatially and temporally invariant, and I think that this temporal invariance needs to be encoded.

Cortex is tasked with the job of simultaneously making a representation of its inputs, but also figuring out how to make the lowest-dimensional representation. Low-dimensionality is something to strive for, as it forces you to come up with simple explanations. But when there is no good answer you have to be high-dimensional in the description. So each layer in the hierarchy is a pattern classify (or maybe each column).

So a cortical column gets as inputs a bottom-up signal from the level-below, bottom-up like signals from the neighboring columns, and then top-down signals from higher-levels. A cortical column is then like a few thousand simple cells that are like layer 4, and complex cells that are like layer 2/3. The simple cells make the representation, and they are both forced to be more excitable (hence increasing dimensionality) when they are bad, and making the lowest they can make when they are starting to work. The complex cells use the simple cells to pattern complete, making a spatial invariant pattern of the simple cell representation.

So it just seems that one thing that would help shaping the feedback signals is to look for temporal invariance. I think back to a paper I remember about how the brain learns slowness - things that are temporally invariant. It would just be a useful signal to learn from - if your higher-level is changing less then your representation is good...

Wednesday, November 14, 2012

Neuronal arithmetic

Silver, RA. (2004) Neuronal airthmetic. Nature Review Neuroscience 11.

So, basically this is the guy all about the need for synaptic noise in order to get multiplicative effects from shunting inhibition.

But he has some basic good ideas about population codes and the need for making certain types of computations.

Figure 1 | The rate-coded neuronal input–output relationship and possible
arithmetic operations performed by modulatory inputs. a | For rate-coded neuronal
signalling, a driving input typically consists of asynchronous excitatory synaptic input from
multiple presynaptic neurons firing in a sustained manner (shown in red). A neuron may
also receive a modulatory input, such as inhibition (shown in green), that alters the way the
neuron transforms its synaptic input into output firing rate (shown in blue). b | The
input–output (I–O) relationship between the total (or mean) driving input rate (d) and the
response that is represented by the output firing rate (R). The arrow indicates the rheobase
(minimum synaptic input that generates an action potential). c | Rate-coded I–O
relationships can be altered by changing the strength of the modulatory input (m), which
may be mediated by a different inhibitory or excitatory input. If this shifts the I–O
relationship along the x-axis to the right or left, changing the rheobase but not the shape of
the curve, an additive operation has been performed on the input (shown by orange
curves). This input modulation is often referred to as linear integration because the synaptic
inputs are being summed. d | An additive operation can also be performed on output firing.
In this case a modulatory input shifts the I–O relationship up or down along the y-axis
(shown by orange curves). e,f | If the driving and modulatory inputs are multiplied together
by the neuron, changing the strength of a modulatory input will change the slope, or gain,
of the I–O relationship without changing the rheobase. A multiplicative operation can
produce a scaling of the I–O relationship along either the x-axis (input modulation; e)
or the y-axis (output modulation; f). Although both of these modulations change the gain of
the I–O relationship, only output gain modulation scales the neuronal dynamic range (f).

So he talks about "input gain modulation" where the max value doesn't change (e), and he talks about "output gain modulation" where the max value is scaled (f). And, so basically due to the sigmoidal shape, the slope is changed in both scenarios. And he says that this is gain control in both cases.

So yeah all the experimental work with shunting inhibition just based on ohms law and currents. He makes it sound amazing. But then he says, yeah, Holt and Koch showed it doesn't work. But then he makes an argument about how it could work if there was "synaptic noise", but he does show some experimental evidence to back it up. I need to go back and look at this.

But right his mechanism seems strange to me (and I think as of now useless, but let me explain it the best I can). The theoretical idea is explained in the Larry Abbott paper. But I think I'll give it a try and read that paper later. So... the idea is that you have balanced excitation and inhibition, and then basically what that means is that the noise is being increased. This results in a higher variance of current fluxes, and then due to like an integrate-and-fire mechanism, the higher variances results in spiking I-O function to get scaled.

But. Wouldn't a higher variance make it spike more often and not less? That would seem like backwards gain control - the network has a lot of activity so increase activity even faster? Hmm... but looking at their figure it sounds like it goes down... But they also show some data. And its like with dynamic clamp they add in excitatory and shunting currents and show that its like how they predict.

They basically explain the whole "derive Holt and Koch" thing in this paper. Its not as pretty as my derivation, but yeah they explain why it works that way mathematically. (they dont actually derive the linear equation though). But need to look at experimental work better.

Right, so I think the problem with this idea is that it isn't really controllable. How would one turn up or down the gain? Its like the circuit gets noisy and the gain goes down, but how would synapses keep that in shape - plasticity rules? I'm not sure, I'm confused thinking about it.

Monday, November 12, 2012

Normalization as a canonical neural computation

Carandini, M. Heeger, DJ. (2012) Normalization as a canonical neural computation. Nature Reviews Neuroscience 13: 51-62.

The normalization equation:

R = D / (s + N)

D is the non-normalized drive of the neuron, s prevents divide by zero, and N is the normalization factor.

Normalization is seen in a ton of areas. It can be done in many different ways, and have different mechanisms. Here are the areas he talks about:

Invertabrate olfactory system (drosophila)
Retina
V1
MT, MST

He also include exponents in the normalization equation, which can change the shapes of the curves. He takes the equation and fits it to a large amount of data pretty nicely. What's interesting is that many of the figures he describes is not population-code normalization as purely as I've been modeling. He often describes the normalization function as a right-ward shift in the IO function on a log scale. This means that the input can eventually reach the saturation level if strong enough (but has to be multiplicatively larger).

So you can see how in B and D that the IO functions aren't just being purely scaled. However, the normalization equation he describes fits the data quite well.

He then talks about attention and gain control. Attention "multiplicatively enhances the stimulus drive before normalization".

He makes a distinction between two types of gain control: "Contrast-gain" is right-left shift of IO function on log scale (horizontal stretching), "Response-gain" is up-down scaling of IO function.

There are clues that feed-forward and feed-back circuitry is involved in divisive processes.

He briefly mentions Holt and Koch. Then he says: "It is now agreed that the effect of conductance increases on firing rates is divisive, but only if the source of increased conductance varies in time" and cites Silver, Chance and Abbott.

So I think what they call contrast normalization and response normalization may be nice to have in the paper. I can also talk about the temporal gain control stuff more, as it is not needed in my model.

Also, this makes me think about ARBMs. I was wondering what the effect of the top-down signals should be on the output, and if the top-down are equivalent to attention, then these papers say attention is gain increasing. So the top-down effects of the ARBMs should increase the gain of the population that they feedback to. Should look into if burst firing is like a multiplication somehow of spiking...

Friday, November 9, 2012

ARBM

ARTBM? AR-RBM? ART-RBM?

Anyway, adaptive resonance restricted boltzmann machine. Lets go even simpler and try and put the ART rules in context of an RBM. Lets consider just two layers where stimuli are presented to the first layer through the basal-tree. The firing of this layer stimulates the basal tree of the second layer. The spiking of the second layer influences the apical tree of the first layer (and would influence the basal tree of the third layer). We are ignoring any lateral connections for now.

So, I want to consider establishing a resonance, learning, and representation dimensionality in the context of an RBM. For just a regular RBM, I like to think of the learning rule as basically back-propagation in a feed-forward neural network, where the supervised learning signal is the input. Although, I feel like this is not quite the case (need to do an RBM refresher - RBMs are binary?).

The first layer receives a stimulus, and lets say is normalized to a constant length. So at first the second layer will have low activity (nothing has been learned), and over time the excitability of the second layer increases. Eventually, enough neurons will be activated until the feedback starts causing the first layer to begin bursting. One parameter is how much longer is a burst than a spike? The bursts and the spikes will keep being normalized by the inhibition to a constant length. This should create a positive feed-back loop which leads to more bursting in the first layer and more excitation in the second layer. The second layer will also need to be normalized. Synapses from second layer that cause a burst in the first layer (spike before burst) are strengthened, synapses that cause a spike in layer 2 from a burst (burst before spike) are also strengthened.

If this kept going up a hierarchy, then the second layer would receive top-down signals and bottom-up signals. This will cause the second layer to burst. So the question is should there be learning if and only if both pre and post are bursting? Or is one enough? What ultimately happens to the top layer, as this layer will not have top-down signals.

The valuable thing about this set-up is that the second layer can increase or decrease the dimensionality of the representation as it learns. So there needs to be two opposing forces - one that increases the dimensionality when the matching is poor, and one that decreases the dimensionality when the matching is good. The increasing excitability with a poor match will naturally increase the dimensionality. It could be that the divisive normalization/competition could turn-off the weaker firing neurons when the match is good. So there probably needs to be an LTD rule when there is a burst and no spiking.

A big question is what is the relationship between the burst and the spike. Consider bottom-up inputs causing a neuron to fire at 10Hz, then top-down inputs come in and increase activity. How is this activity increased? Does it just increase the gain of the output - i.e. all bottom-up rates are multiplied by 2? Or does it add a constant amount - say another 10Hz, thus a 10Hz burster looks like a 20Hz spiker? Is the top-down signal to burst binary or is it graded?

I would say the first thing to do is start with a rate-coded ARBM. Each neuron has two "compartments" where each compartment just sums its inputs and passes that through some kind of sigmoid like function. The bottom-up compartment sets the firing rate based on the inputs. All the firing rates are normalized by the recurrent multiplicative inhibition (or just constantly normalized by the program - perhaps it the vector length can be lower, but ultimately has some maximum). The top-down compartment lets say increases the gain of the output, and has some range - like 1x - 3x max. The top-down compartment being over some threshold level, would indicate learning signals. If pre and post activity then LTP, if only one side is bursting then LTD.

Tuesday, November 6, 2012

ART Orienting

The other idea about ART was that there exists an "orienting system". This system is necessary to provide the signals to learn new things. The idea of the orienting system is that with a novel stimulus the top-down will have a poor match with the bottom-up. So the orienting system basically "resets" the top-down matching signal. It simultaneously increases the excitability of some neurons such that newly activated neurons can be learned as part of the grouping.

So lets just start back with the resonances. Part of the idea about the resonance state is that learning is occurring, but also the resonance state is good - it means what you are doing worked. So I guess that makes sense, learn better the classification states that are good. And so this can be done probably quite rapidly in the brain. A good amount of synchronous bursting would, from what we see experimentally, probably become a pretty solid encoding in a short time. Remember resonances are part of consciousness, so we have to think about what we are counscious of and what it might mean for how the brain works.

So my "declarative memory" is my life experience. I can look around me and see and remember what is happening to me basically all the time. There is a tremendous amount of information being encoded in my memories all the time. Some of these memories fade, some are more attended to and code better, some are remembered soon after encoding - reinforcing the memory further. These have real biophysical correspondences in how are brain works. Plasticity rules in ART would suggest that these encodings are done throughout cortex and through plasticity driven by resonances (aka synchronous bursting). Attention and motivational system can perhaps increase and decrease the amount of plasticity - both through direct modulatory mechansims of plasticity (Dopamine-like) and through increased excitability leading to stronger resonances and more burstiness.

But how do you do the learning in the first place? How do you get to the really nice resonance states? We have a good rule for what to do while in the resonance states, but what about when we aren't in one? What happens if there is a poor match between top-down and bottom-up?

So its as if the orienting system is at first turning up the volume of the bottom-up inputs when the top-down activity is not very strong. Or maybe this could happen from inhibitory feedback being reduced from low activity. So soon the neurons begin to fire. As more and more neurons begin to fire, eventually it settles on a resonant state? I guess its like you are constantly increasing the dimensionality of the model and eventually you would activate enough neurons that some of the bottom-up neurons would begin to burst. Bursting = learning. So you could start to learn a resonance in that manner. The search process increases the dimensionality of the representation.

So the next time you experience a similar input, a similar pathway will be activated which now activates a large population pretty strongly. Perhaps strongly enough that divisive normalization quiets some down. But the input will be different in some ways. Some of the top-down is not activated. This would suggest that perhaps then new plasticity would form between the bursting neurons at the synapses that are bursting together.

First lets clarify, in a way we're trying to build something hierarchical. So we consider the input layer to be a layer of pyramidal cells who receive "bottom-up" input (from LGN, and I guess neighboring pyramidal cells). So when a stimulus arrives the bottom-up input will activate a population - this population is the "representation" space and should have all the information about the stimulus. These bottom up signals only cause spiking. The spike are relayed through a layer of tunable synaptic weights (think feed-forward neural network) to a "prototype" layer. This is again a layer of pyramidal cells, this layer gets its inputs in the basal tree from the first layer. (So lets call the representation layer L4 and the prototype layer L2/3 (also could be CA3/CA1). So again bottom up is from L4 and neighbors. (What if neighbors connected to apical tuft of self?). Ok, so L2/3 has more lateral connections and likes to excite itself. This will lead to a more pattern-completing type of layer. So then probably like more STDP between L2/3 cells (controlled by gamma oscillatory inhibition). Now, those signals feedback onto L4, and excite the apical dendrites of L4 neurons. This causes the L4 cells with both bottom-up and top-down inputs to begin bursting. Now the bursting is more of a signal that means do some learning and maybe listen to me a little more closely (how much more?).

But lets pay attention to learning. Ok so now we have L2/3 neurons spiking from input by L4, and now many of those L4 cells start bursting. This means that we have spiking in L2/3 and bursting in L4 learning at those synaptic weights (make the current pattern in L2/3 more likely given the pattern in L4). So lets say those weights get some LTP. LTD when there is a bursting presynapse, but no spiking in the post. Does L2/3 at this point begin bursting as well? I imagined that L2/3 would be modulated by some top-down signal. Perhaps the next L2/3 in the hierarchy (making two cortical hierarchies - pattern completing and pattern seperating), or possibly by L4 of the next level (making something that is interleaving(?)).

Well it would seem that there should be learning in the weights that feedback from L2/3 to L4. (I know Grossberg has an L6 inbetween, but just go along.) This would reinforce the prototype as having the characteristics of the neurons that are firing. So perhaps in the apical tree if there was a calcium spike and the neuron began bursting, then the active synapses should get stronger (making it more likely to be a burst next time the same prototype is drawn up). LTD if there was a calcium spike and no burst.

So L2/3 is two fighting feedback loops. The bottom-up inputs are all trying to excite the neurons by these learning rules, and the neurons are exciting each other. These synapses are being reinforced by positive feedback by inputs, except there is some normalization procedure - like homeostasis - in the synaptic weights. Still the neuron can be driven a lot. Now the neurons turn on each other as well as the inhibitory feedback gain-control. This forces the neurons to then also compete with each other. The gain control maximizes the allowable length of the population vector. This prevents L2/3 from just maxing out.

Monday, November 5, 2012

ART and feedback

One insight from ART that I was struggling with conceptually is the idea of positive vs. negative feedback in forming a perception. I always imagined it like cortex would form the model of the pixels that are within thalamus. The feedback coming from cortex to thalamus is predicting the state of thalamus. It seemed reasonable to me that this feedback was negative - once some pixels were explained by a model of cortex, then those pixels would be subtracted off and cortex would be trying to figure out the pixels that are left over.

ART, however, is based on positive feedback. Adaptive resonance means that the top-down and bottom-up signals are in agreement, and then those pixels are amplified. This fits in anatomically, as I was having trouble figuring out how cortex could be doing negative feedback to thalamus in a specific way. The key to working with positive feedback as a neural computational mechanism, however, is the necessity of multiplicative gain control.

Without any inhibition, a positive feedback loop would obviously explode or die off to zero in a short amount of time. But one of the keys to ART is that there is divisive inhibition/constrast normalization/gain-control in the representation space. This leads to feedback systems that are controlled by additive levels of excitation, and multiplicative types of inhibition. The positive feedback may not even come across as excitatory. The positive feedback will enhance the signals that are matched by top-down and bottom-up, but the divisive inhibition will keep the population normalized. Thus enhancement of some neurons will lead to more inhibition in other neurons. And by making sure the inhibition is multiplicative, the information stored in the population code is not lost.

So in a way the feedback from cortex can seem negative. But really the negative part of the feedback is coming from feedforward inhibition and local feedback inhibition. Cortex excites further the neurons which it predicts correctly, and the population gets normalized by the inhibitory neurons.

So, how do we actually make something that is based on ART that can actually do what we want it to do? The problem with ART so far is that it isn't completely written out. It is a conceptual framework for fitting in neural circuits and solving some learning problems, but I feel it hasn't been fully fleshed out.

First, lets make some assumptions. ART is based on top-down excitation being modulatory. Based on all the evidence, much of which is talked about in this blog, is that top-down modulatory excitation is done via the apical tuft. This implies that pyramidal cells have two modes of sending signals - spiking and bursting. We see from the literature that the apical tree can generate a calcium spike when signals from layer 1 excite the tree enough. Top-down signals tend to come from layer 1. The calcium spike will cause a burst of spikes if the basal tree (bottom-up) is also excited. However, what ART requires is that the calcium spike does not actually cause any action-potentials if the bottom-up inputs are not present, which is fine biophysically.

The top-down and bottom-up types of excitation can set up learning rules for the system. The most simple way to think about it is that there are 4 binary combinations of possibilities. 1. No top-down or bottom-up input, 2. Bottom-up but no top-down, 3. Top-down but no bottom-up, 4. Both top-down and bottom-up. There are different outputs from a pyramidal cell under these conditions: 1. Nothing, 2. Spiking, 3. Nothing (but neuron is extra excitable). 4. Bursting.

Now, ART talks about learning based on 4, with no plasticity in the other 3 cases (4 is a resonance). However, there could be some plasticity based on cases 2 and 3, but that follows different rules - 3 for instance, could be an indication for LTD, as the top-down predictions were incorrect at classifying a bottom-up input.

One question that needs to be answered is what is the relative impact of bursting over spiking on the normalized network? If we imagine that the normalization process, at its most simple approximation, is to make the population vector distance 1, then what does a bursting neuron do in comparison to a spiking neuron? Is a burst twice the length of a spike? Three times? One? Does it even matter - can the network through learning be operational with different impacts of bursting and spiking?

Friday, November 2, 2012

ART thoughts

Ok, so now that I've gone through half this paper in detail, it seems that the rest of it is more or less just a conceptual model of different meso-brain circuits. It feels that he explains a lot of data with his model, but some times its like he's just fitting his model to all data, because it has so many parameters.

Not to say that I don't think he has very good ideas, but they don't seem to be developed on a functional level. SMART seems to just be like here's what we know about the anatomy of early visual areas, and magic magic ART.

I think that we need to explore the mathematical details of his ideas. He talks about using resonances to make a learning system, but I'm just not really sure if he ever actually does it. I guess I need to look at SMART in more detail.

Part of what I think is missing is actually showing how ART can solve a learning problem. He makes lots of claims about data, but doesn't really explain what it means. It would be nice to see a ART model that can actually be used for learning something and how it works.

Adaptive resonance theory: How a brain learns to consciously attend, learn, and recognize a changing world IV

Grossberg, S. (2012) Adaptive resonance theory: How a brain learns to consciously attend, learn, and recognize a changing world. Neural Networks.

SMART: Synchronous Matching ART. Now getting into spiking models, STDP, acetylcholine modulation, hierarchical laminar thalamic and cortical circuit designs and their interactions, oscillations.

SMART divides thalamus up into specific first-order (LGN), specific second-order (pulvinar), nonspecific, and thalamic reticular nucleus. Here's the figure:

"Figure 6: The SMART model clarifies how laminar neocortical circuits in multiple cortical areas interact with specific and nonspecific thalamic nuclei to regulate learning on multiple organizational levels, ranging from spikes to cognitive dynamics. The thalamus is subdivided into specific first-order and second-order nuclei, nonspecific nucleus, and thalamic reticular nucleus (TRN). The first-order thalamic matrix cells (shown as an open ring) provide nonspecific excitatory priming to layer 1 in response to bottom-up input, priming layer 5 cells and allowing them to respond to layer 2/3 input. This allows layer 5 to close the intracortical loop and activate the pulvinar (PULV). V1 layer 4 receives inputs from two parallel bottom-up thalamocortical pathways: a direct LGN→4 excitatory input, and a 6I→4 modulatory on-center, off-surround network that contrast-normalizes the pattern of layer 4 activation via the recurrent 4→2/3→5→6I→4 loop. V1 activates the bottom-up V1→V2 corticocortical pathways from V1 layer 2/3 to V2 layers 6I and 4, as well as the bottom-up cortico thalamocortical pathway from V1 layer 5 to the PULV, which projects to V2 layers 6I and 4. In V2, as in V1, the layer 6I→4 pathway provides divisive contrast normalization to V2 layer 4 cells. Corticocortical feedback from V2 layer 6II reaches V1 layer 1, where it activates apical dendrites of layer 5 cells. Layer 5 cells, in turn, activate the modulatory 6I→4 pathway in V1, which projects a V1 top-down expectation to the LGN. TRN cells of the two thalamic sectors are linked via gap junctions, which synchronize activation across the two thalamocortical sectors when processing bottom-up stimuli. The nonspecific thalamic nucleus receives convergent bottom-up excitatory input from specific thalamic nuclei and inhibition from the TRN, and projects to layer 1 of the laminar cortical circuit, where it regulates mismatch-activated reset and hypothesis testing in the cortical circuit. Corticocortical feedback connections from layer 6II of the higher cortical area terminate in layer 1 of the lower cortical area, whereas corticothalamic feedback from layer 6II terminates in its specific thalamus and on the TRN. This corticothalamic feedback is matched against bottom-up input in the specific thalamus. [Reprinted with permission from Grossberg and Versace (2008).]"

Wow. Yeah. Some highlights: Its like thalamus is doing a loop with layer 5 - layer 5 to pulvinar is like retina to LGN. Nonspecific thalamus is the orienting mechanism. It somehow causes a reset by going through L5-L6 pathway - and it involves the habituative transmitters in L6. Reset would be implicated by beta oscillations, a resonant match results in gamma oscillations. Vigilance is controlled by acetylcholine - vigilance promotes finer categorical seperation.

Ok, here's how he explains how learning novel environments works, with respect to beta oscillations. So the first exploration of a track does not cause much beta - this is because the top-down expectation of a novel environment is usually broadly tuned, so that resonance eventually begins. (I guess, in novel environment search procedure increases top-down excitability, eventually leading to a large number of top-down neurons being activated and associated with bottom-up states.) The top-down inputs are broadly tuned to match feature patterns. So the real beta-level learning happens while pruning the top-down expectations. Using mismatch-based reset events the categories are fine-tuned. This results in beta. So you see beta in a few more trials. (So I guess the second time a category is activated, its not activated as fully, so it may not lead to a resonance at first. But then the orienting system increases excitability, leading to a larger share of the pattern, but still not full. Then maybe you need 90% of the pattern to get a resonance, but then that 90% will get reinforced. Then next time you need only 90% of that pattern.).

"Such an inverted-U in beta power through time is thus a signature of ART category learning in any
environment."

He explains about how place cells are learned through ART from grid cells. Place cells are just categories of space. Self-organizing maps can do it, so can ART, but top-down attention is needed.

next is section 41, page 39/98...

Thursday, November 1, 2012

Adaptive resonance theory: How a brain learns to consciously attend, learn, and recognize a changing world III

Grossberg, S. (2012) Adaptive resonance theory: How a brain learns to consciously attend, learn, and recognize a changing world. Neural Networks.

Section 18.

Now we are getting into how ART is embedded in the microcircuitry of cortex and thalamus. This also begins to combine ART with FACADE (Form-And-Color-And-DEpth), which introduces horizontal interactions.

Boundaries are completed inwardly between oriented and collinear cell populations. Surfaces are completed outwardly in uncorrelated manner until reaching boundaries. Boundaries form through interblobs in V1 to pale stripes in V2 and beyond to V4. Surfaces go through blobs in V1 to thin stripes in V2 and beyond to V4.

All boundaries are invisible (In a consciousness sense). Surfaces are visible. This is because surfaces are part of surface-shroud resonances, where consciousness needs a resonant state. (I guess boundaries aren't resonances? Or they are resonances, but not all resonances are consciousness).

Cortico-cortical feedback tend to preferentially originate in layer 6 of a higher area and terminate in layer 1 of a lower area. This top-down feedback is "modulatory". Goes through apical dendrites of layer 5, to layer 6 then is "folded" back up into layer 4.

5a: LGN has two excitatory pathways to L4. L6 activates L4 though modulatory on-center, off-surround network. L4 is driven by direct LGN inputs. This circuit "contrast-normalizes" the inputs that layer 4 receives from LGN.

5b/d: cortical feedback signals are carried through L6 back to L4. A similar attentional feedack occurs between L6 of V1 and LGN.

5c: Layer 2/3 possess long-range horizontal connections that are used for perceptual grouping of contours, textures, and shading. (Pattern completion). They ensure "boundaries are invisible" (not sure what this means). 2/3 also sends back to "folded-feedback path", with direct routes to L6 and indirect routes to L5.

5e: hierarchical propagation of priming: V2 repeats laminar pattern of V1, but at larger spatial scale. Since top-down signals are modulatory then the top of the hierarchy (e.g. prefrontal cortex) can potentially modulate all the way down.

Pre-attentive grouping. Both intercortical attentional feedback and intracortical grouping feedback share the same competitive selection circuit from L6-to-L4. L2/3 acts as the attentional prime needed for learning, without feedback from higher cortical areas. Both L2/3 and higher areas act on the same L4-to-L6 selection circuit. "The pre-attentive grouping is its own attentional prime".

Balance of excitation and inhibition. L23 cells fire only if they get direct bottom-up input, or if co-linear inputs from pairs or more are of bipole populations. (You only see the edge if you have pac-men at both ends).

Cortex exists at a "cusp of excitability" in the resting state.

Unambiguous scene is processed - fast sweep up cortical hierarchy directly through L4 to L2/3 and then to L4 to L2/3 in higher areas. Multiple possible groupings the feedback competition arises due to inhibitory interactions in L4 and L2/3. Competitive circuits are self-normalizing. "they tend to conserve the total activity in hte circuit".

self-normalizing circuits carry out a type of "real-time probability theory". Amplitude of cell activity covaries with the certainty of the network's selection/decision about a grouping. Amplitude, in turn, is translated into processing speed and coherence of cell activations.

next is section 31.

Pages