E. Paxon Frady: 2012

Monday, December 17, 2012

Distinct functions for direct and transthalamic corticocortical connections

Sherman, SM. Guillery, RW. (2011) Distinct functions for direct and transthalamic corticocortical connnections. Journal of Neurophysiology 106: 1068-1077.

View that glutamatergic signaling is functionally uniform needs to change. Different classes of glutamate signaling.

Drivers and modulators: class1 and class 2 Glutamatergic Pathways. First seen in thalamus - distinction between retinal feedforward and cortical feedback. Drivers are larger initial excitatoion and show paired-pulse depression. Facilitators have smaller initial excitation and paired-pulse facilitation.

Table 1. Properties of class 1 and class 2 pathways

Class 1/Driver (e.g., Retinal)

Large EPSPs
Synapses show paired-pulse depression
Less convergence onto target
Dense terminal arbors (type 2)
Thick axons
Large terminals
Contacts target cell proximally
Activates only iGluRs

Class 2/Modulator (e.g., Layer 6)

Small EPSPs
Synapses show paired-pulse facilitation
More convergence onto target
Sparse terminal arbors (type 1)
Thin axons
Small terminals
Contacts target cell peripherally
Activates iGluRs and mGluRs

Fig. 1. Distinguishing driver (class 1) from modulator (class 2) inputs. A: light microscopic tracings of a driver (class 1) afferent (a retinogeniculate axon from the cat) and a modulator (class 2) afferent (a corticogeniculate axon from layer 6 of the cat). [Redrawn from Sherman and Guillery 2006.] B: modulators (red) shown contacting more peripheral dendrites than do drivers (green). Also, drivers activate only ionotropic glutamate receptors, whereas modulators also activate metabotropic glutamate receptors. C: effects of repetitive stimulation on excitatory postsynaptic potential (EPSP) amplitude: for modulators it produces paired-pulse facilitation (increasing EPSP amplitudes during the stimulus train), whereas for drivers it produces paired-pulse depression (decreasing EPSP amplitudes during the stimulus train). Also, increasing stimulus intensity for modulators (shown as different line styles) produces increasing EPSP amplitudes overall, whereas for drivers it does not; this indicates more convergence of modulator inputs compared with driver inputs.

Spikes and bursts in thalamus are caused by driving inputs from retina - (spikes in tonic mode, burst during burst mode). Cortex should interpret spikes as arising from retina.

Class 1 (drivers) and class 2 (modulators) are also seen throughout cortex. The major parameters of theses classes separate and stay clustered withing class. They stay clustered even across cortical and thalamic areas.

Class 2 can acton on metabotropic GluRs. These have longer time-scales and can also be inhibitory.

Higher-order thalamus gets its driving input from layer 5 of cortex. Class 2 input primarily comes from layer 6.

Fig. 3. Direct and transthalamic corticocortical pathways. Information relayed to cortex through thalamus is brought to thalamus via class 1 axons, most or all of which branch, with the extrathalamic branch innervating brain stem or spinal cord motor centers. This applies to inputs to both ﬁrst-order (FO) and higher order (HO) thalamic relays. Thus the branches innervating thalamus (green) can be regarded as efference copies. The schematic diagram also shows the layer 6 class 2 feedback from each cortical area to thalamus, and this is contrasted with the layer 5 feed-forward corticothalamic pathways. Note that this shows cortical areas connected by 2 parallel paths: a direct one and a transthalamic one.

The class 1 inputs (both from retina and layer 5) have a common feature: they branch and project to brainstem as well as thalamus. Since thalamus is relay-like, these are efference copies. Every cortical area so far studied has a layer 5 projection to subcortical motor centers, many of which branch to thalamus.

Not much overlap in the parallel pathways - direct cortical pathways does not go subcortical, transthalamic pathway does not go to cortex.

Thalamus could be responding to unexpected motor instruction, blocking conflicting motor commands, or dynamically coupling different cortical areas. This is because of modulation by reticular nucleus and other modulatory signals.

Friday, December 14, 2012

The Brain Activity Map Project and the Challenge of Functional Connectomics

Alivisatos, AP. Chun, M. Church, GM. Greenspan, RJ. Roukes, ML. Yuste, R. (2012) The Brain Activity Map Project and the Challenge of Functional Connectomics. Neuron 74:970-974.

Brain Activity Map Project (BAM) is aimed at reconstructing the full record of neural activity across complete neural circuits. The general idea is to crack the emergent properties of the neural circuits. I like their opening quote:

‘‘The behavior of large and complex

aggregates of elementary

particles, it turns out, is not to be

understood in terms of a simple

extrapolation of the properties of a

few particles. Instead, at each level

of complexity entirely new properties

appear.’’ –More Is Different,

P.W. Anderson

Record every action potential from every neuron within a circuit. Need better voltage dyes, better multi-electrode recordings (3-dimensional probes), use wireless electrodes.

So just a call for this progress. I was thinking that this applies nicely to my voltage-dye studies of the leech. We are probably closer than anyone to actually imaging the activity of an entire neural circuit during behaviorally relevant states. This type of mapping project fits nicely into how I'm thinking of telling the VSD story for my thesis.

Bayesian inference with probabilistic population codes

Ma, WJ. Beck, JM. Latham, PE. Pouget, A. (2006) Bayesian inference with probabilistic population codes. Nature Neuroscience 9(11): 1432-1438.

To get Bayes optimal performance, neuron's must be doing computation that is a close approximation to Bayes' rule. Neuronal variability implies that populations of neurons automatically represent probability distributions over the stimulus - a code called "probabilistic population codes".

Any paper that mentions death by piranha in the first paragraph has got to be good.

Poisson-like variability seen in neuronal responses allows neurons to represent probability distributions in a format that reduces optimal Bayesian inference to simple linear combinations of neural activities.

Equations 2 and 3 describe how to combine two gaussian distributions (i.e. sensory integration) optimally according to Bayes. This is their definition of optimal:

So the gain of the population code reflects the variance of the distribution. Simply adding two neural distributions can lead to optimal bayesian inference.

Figure 2 Inference with probabilistic population codes for Gaussian
probability distributions and Poisson variability. The left plots correspond
to population codes for two cues, c1 and c2, related to the same variable s.
Each of these encodes a probability distribution with a variance inversely
proportional to the gains, g1 and g2, of the population codes (K is a constant
depending on the width of the tuning curve and the number of neurons).
Adding these two population codes leads to the output population activity
shown on the right. This output also encodes a probability distribution with a
variance inversely proportional to the gain. Because the gain of this code is
g1 + g2, and g1 and g2 are inversely proportional to s12 and s22, respectively,
the inverse variance of the output population code is the sum of the inverse
variances associated with c1 and c2. This is precisely the variance expected
from an optimal Bayesian inference (equation (3)). In other words, taking the
sum of two population codes is equivalent to taking the product of their
encoded distributions.

They derive generalizations of this - i.e. tuning curves and distributions that are not gaussian. Essentially optimality can be obtained even if the neurons are not independent, or if their receptive fields are not of the same form (i.e. some gaussian some sigmoidal). The covariance matrix of the neural repsonses must be proportional to the gain.

Can also incorporate a prior distribution that is not flat.

They do a simulation of integrate-and-fire neurons that is similary to figure 2, and show that it works.

The population code not only reflects the value, but also the uncertainty - based on the gain of the population.

Need divisive normalization to prevent saturation.

Pretty cool stuff. Another example of why divisive normalization is an essential computation for the brain. I also like how they create seperate populations that represent their distributions and then are combined in a higher-level population.

Wednesday, December 12, 2012

Normalization for probabilistic inference with neurons

Eliasmith, C. Martens, J. (2011) Normalization for probabilistic inference with neurons. Biological Cybernetics 104:251-262.

A solution that maintains a probability density for inference that does not depend on division.

"the NEF approach:

1. Neural representations are deﬁned by the combination of
nonlinear encoding (exempliﬁed by neuron tuning curves,
and neural spiking) and weighted linear decoding (over
populations of neurons and over time).
2. Transformations of neural representations are functions
of the variables represented by neural populations. Trans-
formations are determined using an alternately weighted
linear decoding.
3. Neural dynamics are characterized by considering neural
representations as control theoretic state variables. Thus,
the dynamics of neurobiological systems can be analyzed
using control theory."

So he shows some math that converts between vector spaces and function spaces and shows how these can be considered equivalent. Basically you are parameterizing the function with a vector representation.

He derives a bias function that is supposed to compensate for the errors in the integral (the integral is supposed to be 1 for probabilities). It captures distortions of the representation from projecting it to the neuron-like encoding. Basically the bias gets factored into the connection strengths, and can account for the non-linearities. (Not so much what I thought this was going to be about).

Any this looks like an interesting paper for the gain control stuff: Ma et al. 2006

Tuesday, December 11, 2012

Eliasmith 2012 Supplemental II

The working memory hierarchy is based on recurrent attractor neural networks. This can store semantic pointers.

Circular convolution to perform compression - binding two vectors together - bind the current semantic pointer vector with a position. The position is an internally generated position index semantic pointer:

MemoryTrace = Position1 ⊗ Item1 + Position2 ⊗ Item2 ...

Position1 = Base

Position2 = Position1 ⊗ Base

Position3 = Position2 ⊗ Base

Conceptual semantic ponters for numbers are constructed similarly to positoin - circular convolution of the base and the operator addone:

One = Base

Two = One ⊗ AddOne

Three = Two ⊗ AddOne

The vectors are unitary - don't change length when convolved with themselves.

Reward evaluation is done via a dopamine like RL system.

Neurons are LIF. 20ms time constant, abs refractory of 2ms. random max firing rates between 100-200Hz. Encoding vectors are randomly chosen from the unit hyper-sphere. Most projections are 10ms AMPA. Recurrent projectsions are 50ms NMDA.

Model occupies 24GB of RAM. 2.5 Hours of processing for 1s of simulated time.

The main learning aspect of spaun is changing weights during the RL task. This is not changing the visual/motor hiearchies, but only weights that project to the value system, which are modulated by TD learning. More on learning. The learning requires an error signal, which he is not sure of how to implement (top-down like vs bottom-up like in basal and apical trees).

Dynamics of the model are highly constrained by the time-constants of the synapses.

Definitely more papers to read: visual system learning: Shinomoto (2003), Spike learning: MacNeil (2011), Normalization for probabilisitic inference: Eliasmith (2010)

Also, get his book.

Monday, December 10, 2012

Eliasmith 2012 supplemental I

Eliasmith et al. 2012 - Supplemental.

"The central kinds of representation employed in the [Semantic Pointer Architecture] (SPA) are "semantic ponters". Semantic pointers are neurally realized representations of a vector space generated through a compression method." They are generated through compression of the data they represent. They carry info that is derived from their source. Point to more info. Lower dimensional representation of data that they point to. The compression can be learned or defined explicitly.

Interesting way of stating this. It sounds similar to the symbol-data binding. The pointer is the low-dimensional symbol that points to the high-dimensional data.

The Neural Engineering Framework is a set of methods that can compute functions on neural vectors (how to connect populations of neurons). Each neuron has a preferred direction vector. The spiking activity is written as:

a_i(x) = G_i [\alpha_i e_i x + J^bias_i]

a_i is spike train, G is nonlinearity, alpha is the gain, e is the preferred direction vector, J^bias is a bias current. He uses LIF neurons in spaun.

Then you can derive a linear decoder from the activity of a population. This can be optimized in a least-squares sense. Can take decoder to calculate weights to come up with transformation function.

The visual hierarchy in spaun is constructed by training RBM based auto-encoders on natural images. For spaun the first layer is 728 dimensional (28x28 pixel images). consecutive hidden layers of 1000, 500, 300 and 50 nodes. First layer is higher dimensional that actual input image. Learns many gabor-like filters in V1. In spaun the visual hierarchy does not have spiking neurons until IT (the top).

[on page 12, Working memory is next.]

Wednesday, December 5, 2012

A Large-Scale Model of the Functioning Brain

Eliasmith, C. Stewart, TC. Choo, X. Bekolay, T. DeWolf, T. Tang, Y. Rasmussen, D. (2012) A Large-Scale Model of the Functioning Brain. Science 338: 1202.

videos
supplemental

2.5 million neuron simulation of brain called "Spaun". They taught it to do 8 different tasks without changing any configurations of the network. Spaun takes a 28x28 pixel image as input and controls a simulated arm as output.

"Compression is a natural way to understand much of neural processing." higher-dimensional space in V1 (image-based) lower-dimensional space in IT (feature).

Fig. 1. Anatomical and functional architecture of Spaun. (A) The anatomical architecture of Spaun shows the major brain structures included in the model and their connectivity. Lines terminating in circles indicate GABAergec connections. Lines terminating in open squares indicatemodulatory dopaminergic connections. Box styles and colors indicate the relationship with the functional architecture in (B). PPC, posterior parietal cortex; M1, primary motor cortex; SMA, supplementary motor area; PM, premotor cortex; VLPFC, ventrolateral prefrontal cortex; OFC, orbitofrontal cortex; AIT, anterior inferior temporal cortex; Str, striatum; vStr, ventral striatum; STN, subthalamic nucleus; GPe, globus pallidus externus; GPi, globus pallidus internus; SNr, substantia nigra pars reticulata; SNc, substantia nigra pars compacta; VTA, ventral tegmental area; V2, secondary visual cortex; V4, extrastriate visual cortex. (B) The functional architecture of Spaun. Thick black lines indicate communication between elements of the cortex; thin lines indicate communication between the actionselection mechanism (basal ganglia) and the cortex. Boxes with rounded edges indicate that the actionselection mechanism can use activity changes to manipulate the flow of information into a subsystem. The open-square end of the line connecting reward evaluation and action selection denotes that this connection modulates connection weights. See table S1 for more detailed definitions of abbreviations, a summary of the function to anatomy mapping, and references supporting Spaun’s anatomical and functional assumptions.

The motor output is also hierarchical going from a low-dimensional goal representation to a high-dimensional representation that is in muscle space.

The spiking neurons are implementing a neural representation called "semantic ponters". From Eliasmith's website: Higher-level cognitive functions in biological systems are made possible by semantic pointers. Semantic pointers are neural representations that carry partial semantic content and are composable into the representational structures necessary to support complex cognition.

Eliasmith is also about to publish a book called: "How to build a brain." due out in 2013.

I'm pretty impressed by this. I'm going to spend some time look at his papers.

Sunday, December 2, 2012

Temporal vs. Parietal Cortex

The two competing hypotheses about cortical function are based on the differences between positive and negative feedback signals. Both of these types of feedback may be useful for different cortical areas, and they may have deeper relations. ART (Grossberg) describes the mechanisms of the positive feedback system, where predictive-coding (Maass) describes the negative feedback system.

We know that the visual system is basically split up into the "What" pathway and the "Where" pathway. The "What" pathway goes down temporal cortex, and as you move from V1 to IT neurons become object recognizers and lose spatial invariance. Going up the "Where" pathway (this is less studied) neurons are binding the objects to information about their spatial properties (position, movement, momentum etc).

So it seems that up the parietal cortex, having a system that can predict the future, and is constantly minimizing error signals based on these predictions would be very powerful. The negative feedback seems like an ideal way to derive the laws of mechanics and understand how objects move throughout the world. Grossberg says that you need resonance to have consciousness. This seems to fit as you are not really conscious of parietal cortex (your conscious mind does not have as much spatial information as your subconscious motor system).

Temporal cortex, however, is part of your conscious awareness. This is because, according to Grossberg, there are resonant states being created by the positive feedback system (you are not necessarily conscious of all resonance states). Resonant states are a binding of the data with a representation, this binding is the key to conscious awareness. The negative-feedback states are not bound - they propagating prediction errors.

Both types of these feedback systems will have rules that can be based on the two-input pyramidal cell model. Each pyramidal cell can receive different types of inputs - bottom-up (data) inputs through the basal tree, and top-down (symbol/class) inputs through the apical tree. Plasticity rules can be established to create both the positive and negative feedback systems.

The positive feedback system will strengthen synapses when the symbols are a good match to the data. The synapses will grow in strength each time a pattern is introduced, limited by some maximum strength or a normalization procedure.

The negative feedback system will compare a top-down prediction with a bottom-up state, and use the difference to modulate synaptic strengths. This will create negative loops and the system will be driven to local minimas.

How to create a mind 7

Chapter 7 is definitely where he gets into more detail about how he would actually create a mind. He basically explains that siri and other similar speech recognition algorithms is based on a hierarchical hidden markov model. These are states with transition probabilities, where each state would be like a pattern recognizer, and the transitions are the effective synaptic strengths.

Every learning algorithm has what Kurzweil calls "God Parameters". In his HHMM model, there were many parameters that we required at the initialization of the system. In order to optimally choose those parameters he used a genetic algorithm. This would lead to unexpected optimizations.

What was really fascinating was that they then altered the model in subtle ways - like adding leakage across markov states. They would then repeat the GA, and get comparable prediction quality (maybe even better), but the GA optimized parameters were totally different. If they used the GA parameters of the original configuration, performance would go down.

This has some important insights into the biology of intelligence. If our brain has some leak problems, or some unintentional side-effects of an implementation based on proteins, then the genetic algorithm will pick-out parameters that can offset these consequences, and potentially even use them for its advantage. So when looking at the brain, there's the mathematically beautiful thing it is trying to do (some sort of hierarchical learning) and then there's what it actually did (hiearchical learning with some tweaks). The tweaks in many ways could help the system, but would be reflected in potentially counterintuitive selection of the parameters.

Another thing he mentioned was the overfitting problem. He said that adding noise to the inputs actually aided learning, as it prevented overfitting to the examples given.

The ultimate conclusion of the chapter is building hierarchical pattern recognizers. He says that there are multiple types of learning algorithms that could do it, but he prefers HHMMs as he is most familiar with them and they are well characterized. But there are other options. Regardless of the choice, there are always "God" parameters that will need to be optimized via a GA.

He briefly mentions some other ideas that would go into the brain - a system that checks for inconsistencies, a system that looks for new problems, a goal system (i.e. pleasure and pain signals from the old brain). And he describes some limitations of the biological cortex that will not be in a digital cortex - like how many things you can keep in memory, or the number of active lists you can operate on.

So HHMMs seem like an interesting idea. I don't think its the full picture of neocortex. What the make-up of each pattern recognizer is will be important. HHMMs may be useful to study just to understand how they work, and they may give us some insight into how to handle the temporal side of cortex. And he still doesn't really say anything about the top of the hierarchy. He mentions that we would want the cortex to build as many levels as it would want/need, but how to make an arbitrary hierarchy that can change is a problem itself. It seems like there must be some point where the hierarchy goes back down (like the top is PFC and this feedsback down to a language area, which allows you to think and build arbitrary hiearchies).

Wednesday, November 28, 2012

Positive or negative feedback?

The main difference between the predictive-coding model (PC) and the adaptive-resonance model (ART) is that PC is based on negative feedback of prediction errors while ART is based on a positive-feedback resonance. These are very different ideas about how feedback is modulating responses, but they may be reconcilable. Each method has support - PC is Bayesian and it seems that feedback is overall negative. ART predicts bursting and all the pyramidal cell synapses are excitatory.

So how might they be reconcilable? The positive feedback system in ART could turn out to be overall negative, since the responses are normalized. I'm still trying to reconcile the effects of positive feedback signals and how they might be used. Grossberg compares the top-down signals to attention, and the attention literature suggests that attentional modulations effect the gain of responses. So does this mean activation of the apical tuft via top-down signals results in a scaling of the ultimate IO function? Do different neurons then receive different amounts of scaling depending on the feedback signals?

One way of looking at it is that the top-down layer (2nd layer) develops a normalized population code just like the first layer. The 2nd layer then sends back a normalized response vector to the 1st layer. If the top-down signal was perfect at matching the bottom-up signal, and these signals were multiplicative, then it would be as if you were squaring the first-layer. The population would go from x to x^2 (after re-normalization). This means 2/3 of the neurons will be inhibited and 1/3 will be excited. This may lead to some weird effects in a steady-state, as the population-code will change. This would then change the 2nd layer and then further alter the first layer.

What if it was additive instead? The second layer sends back to the first-layer the same normalized vector that the first layer was producing. This would be like multiplying the first-layer by 2, which after re-normalization would lead the first layer back to the same state. This seems better, population-code is maintained, and the well-predicted first layer doesn't change. This could also look like multiplication in the grand-scheme.

Imagine that the goal of learning is for the top-down layer to send back to the first layer the same population vector. This means that differences in the population vectors would lead to some form of plasticity. If a neuron received more top-down input than bottom-up input, then the top-down synapses should get weaker. If it received more bottom-up input than top-down then the synapses should get stronger.

Layer 4 is then the data layer, Layer 2/3 is a classification layer. These interact as described (2/3 is trying to predict 4). L2/3 is chunky - imagine L4 as data points in a high-dimensional space, L2/3 are the boundaries that classify these points. Sometimes different classifcation boundaries overlap, if there is no extra evidence then L2/3 goes through hypothesis testing, cycling through different possible classifications. Higher inputs or lateral inputs can favor one hypothesis over another.

Perhaps another layer is somehow a parameterization layer (maybe L6 or L5?). This layer describes the transformation of the classification back to the data. As in it is like the principal component scores of the clusters. Lets imagine language since it is a good hierarchical system. So this part of cortex is classifying the word "grape". The data layer gets the input sounds, and one part of it is representing the a sound. Imagine that the a sound has a lot of variability - it can be said quickly or stretched out. L2/3 classifies it as a, and L6 describes the amount of quickness or stretchiness of the a sound. This helps remap the classification back to the data, and describes a parameterization of the classification.

L4 receives data and sets up a population vector that is equivalent to a point in high-dimensional space. L2/3 creates a cluster (more lateral connections, perhaps is even binary activity). If the L4 point is within 2 clusters, then the clusters will turn on and off based on the probabilities of each (over time). L6 then describes the data based on the clustering - as in like the principal components of each cluster. (I'm not sure if this is necessarily L6, but it seems like the PCA parameterization of the clusters would be useful somewhere).

There may not need to even be a different layer. The cluster is like the 0th principal component (describing the mean of the data). So you could imagine how some neurons in L2/3 are calculating the 0th component, some are calculating the first etc. L2/3 could just be the low dimensional parameterization.

Monday, November 26, 2012

Canonical Microcircuits for Predictive Coding

Bastos, AM. Usrey, WM. Adams, RA. Mangun, GR. Fries, P. Friston, KJ. (2012) Canonical Microcircuits for Predictive Coding. Neuron 76: 695-711.

This looks like a good review that covers some papers I've been meaning to get to.

Predictive coding is the most plausible candidate for making generative models.

superficial layers of cortex show neuronal synchronization in gamma range, deep layers prefer alpha or beta - Maier 2010, Buffalo 2011. Feedforward connection originate from superficial layers, feedback from deep layers.

Statistical connections show that most are "feedforward" L4-L23-L5. Fewer feedback. feedback connections were typically seen when pyramidal cells in one layer targeted inhibitory cells in another.

Feedforward connections are thought to be driving and can cause spiking, feedback connections are thought to modulate receptive field characteristincs according to the context. Feedforward have strong, depressing EPSPs, feedback have weak facilitating EPSPs. Sherman 2011 - retinal input to LGN is driving, cortical input is modulatory. But other studies suggest that feedback and feedforward can both have driving and modulatory effects.

Feedback connections convey predictions, feedforward connections convey prediction errors. Effective feedback "connectivity is generally assumed to be inhibitory." Prediction errors lead to more gamma activity - from superficial layers failing to supress deeper layers. Todorovic 2011, Wacongne 2011. Imaging studies also show less activity when stimuli are predictable. (seems that inhibition has biggest influence in the surround).

Most long-range feedback connections are glutamatergic, although some may be inhibitory. L1 inhibitory neurons could be mediating this inhibition.

Simple cells in L4, complex cells in L2/3 and deep layers. Simple cells have driving effects on complex cells.

Feedforward is sent through the gamma-band. Feedback is sent through alpha-beta frequencies

Predicitve coding = Bayesian inference. Hierarchical. Biology is minimizing surprise (entropy) which mean maximizing bayesian evidence for their generative model. Can build an entire model based on predictive coding equations, subtractive errors etc.

Figure 5: Left: the canonical microcircuit based on Haeusler and Maass (2007), in which we have removed inhibitory cells from the deep layers because they have very little interlaminar connectivity. The numbers denote connection strengths (mean amplitude of PSPs measured at soma in mV) and connection probabilities (in parentheses) according to Thomson et al. (2002). Right: the proposed cortical microcircuit for predictive coding, in which the quantities of the previous ﬁgure have been associated with various cell types. Here, prediction error populations are highlighted in pink. Inhibitory connections are shown in red, while excitatory
connections are in black. The dotted lines refer to connections that are not present in the microcircuit on the left (but see Figure 2). In this scheme, expectations (about causes and states) are assigned to (excitatory and inhibitory) interneurons in the supragranular layers, which are passed to infragranular layers. The
corresponding prediction errors occupy granular layers, while superﬁcial pyramidal cells encode prediction errors that are sent forward to the next hierarchical level. Conditional expectations and prediction errors on hidden causes are associated with excitatory cell types, while the corresponding quantities for hidden
states are assigned to inhibitory cells. Dark circles indicate pyramidal cells. Finally, we have placed the precision of the feedforward prediction errors against the superﬁcial pyramidal cells. This quantity controls the postsynaptic sensitivity or gain to (intrinsic and top-down) presynaptic inputs. We have previously discussed this in terms of attentional modulation, which may be intimately linked to the synchronization of presynaptic inputs and ensuing postsynaptic responses (Feldman and Friston, 2010; Fries et al., 2001).

This is based on equation 1:

And they mathematically describe how the different frequencies would dominate in the different layers based on these equations.

Sunday, November 25, 2012

Hippocampal Pyramidal Neurons Comprise Two Distinct Cell Types that Are Countermodulated by Metabotropic Receptors

Graves, AR. Moore, SJ. Bloss, EB, Mensh, BD. Kath, WL. Spruston, N. (2012) Hippocampal Pyramidal Neurons Comprise Two Distinct Cell Types that Are Countermodulated by Metabotropic Receptors. Neuron 76: 776-789.

Two pyramidal cell types? "our results support a model of hippocampal processing in which the two pyramidal cell types are predominantly segregated into two parallel pathways that process distinct modalities of information"

rat hippocampal slices. supratheshold step current evoked one of two patterns: regular spiking or bursting. Extracted several (30ish) features and clustered them. Only two clusters came out of analysis.

The bursting class has a more extensive tuft, whereas regular-spiking cells have more extensive basal dendrites.

They made EPSC-like current injections. All neurons responded with a mixture of single-spikes and bursts. But cells still distinguishable from temporal pattern of bursting. Regular spiking neurons responded at first with single spikes early and bursts later. Bursting neurons fired bursts early and single spikes later.

Burstiness can be modulated based on activity - theta-burst stimulation can make both cell types more bursty. This burst plasticity can be modulated by glutamate and acytelocholine antagonists. Burst plasticity does not interconvert one cell type to the other. Thus there are two stable cell type pathways out of CA1.

So there's a seperate spatial and non-spatial loop that doesn't go through DG. And the tri-synpatic pathway through DG that combines the info. The two types are separated - CA1p has more of the late bursting cells (with larger basal trees), and CA1d has more early bursting cells.

Temporal receptive chunking

One thing that was really interesting from SfN that I meant to put on this a while back, was this illusion about speech sounds. Basically, a recording of speech sounds was broken up into small segments, and these segments were then played backwards. So every, say, 20ms block is flipped around, but each block is in the same forward order.

When the chunk was below a certain threshold, you couldn't even tell a difference between the normal speech and the reversed (I think it was 20 ms sounded normal). Above the threshold it sounded like complete unidentifiable garble. So it sounds like there must be some temporal chunking going on in the receptive fields. A set of frequencies together in small temporal windows will sound identical. The smallest chunks - the 20 ms sized windows, are likely to be something like primary cortical cell temporal smeering (since it sounds basically normal - sufficiently enough for higher order areas to recognize the speech and help make it sound correct).

And then you imagine that the temporal receptive fields then start increasing in their chunk sizes. I just have a hard time thinking about how that information could be encoded. It makes me think about theta in hippocampus. So its like every cycle of theta the place cells fire in the same order based on how close to their receptive fields the animal is. This sets up a sequence pattern. So I guess these types of sequences can be learned and reliably replicated. The sequence is around 7 place cells long, and then it could repeat, or it could move up a few place cells. i.e. the first sequence is [1,2,3,4,5,6,7], and then the next sequence is [3,4,5,6,7,8,9].

Well then this makes it sound like a polychronous pattern can be repeated with a lower frequency signal (like theta) and that keeps timing over a longer scale. And its like the one polychronous pattern primes the next polychronous pattern in a top-down fashion. Imagine bottom-up inputs are driving the cells and if a certain bottom up input stays stationary then a theta rhythm of the same polychronous pattern at the higher level starts repeating. These send learned top-down inputs to the next patterns in line - top-down makes them more excitable, but they don't fire without bottom-up drive. Once the bottom up shifts, then the expected patterns will be more easily excited.

So the order gets set up by how excited they are in the population-code. But then the receptive field fall offs must be drastic - as red and orange just disappear, instead of falling behind - e.g to the cyan spot (looking at the bottom of the figure).

So, some kind of STDP rule could learn this. And if it gets repeated from the bottom-up inputs staying constant, then the theta-chunk will learn the chunk cycle. Then as the chunk shifts a new chunk-cycle will be oscillating. I'm not sure what is happening on the up-stroke of the theta wave. Are there just more cells here (like do cells fire at each gamma troughs on both the down and up-stroke of theta), or is there really a gap after the bottom of the theta-wave. A gap here may be preventing the chunks from learning a chunk loop, but this would also be unlearned by STDP.

And the other interesting thing is that there is a general time-window in which the things before you spike after your spike - slightly less than one theta cycle. If the delays were as long as a theta, then loops would definitely get formed. These neurons would be learning temporal chunks at more of a theta frequency.

Wednesday, November 21, 2012

How to create a mind 4-6

So chapter 7 looks to be the meat of the book. He alluded to it several times in these chapters. This was basically an overview of some topics from biology and neuroscience. He went over all the other brain structures besides neocortex and talked about some other interesting thigs about neocortex - like the idea that visual areas can process auditory information in blind people.

He dismisses everything that doesn't have a "neocortex". And reptiles are just basically useless. I don't know what he thinks about turtle dorsal cortex-like structures.

He mentions that his pattern recognizers are on order of hundreds of neurons, and basically is a cortical column. He says that they are intertwined in such a way that you wouldn't see them (for whatever reason). He talked about how most of synaptic structure was hard encoded within the columns, and that learning was really about changing connections between columns. Connections are avaible and utilized or pruned - not completely regrown.

One thing he talked about that I was just thinking about was the temporal expansion of receptive fields. There was some work by Uri Hasson he cited that pertains to increasing temporal expansion of the higher-order receptive fields. It will be important to consider how this is implemented by neurons. At the highest level it could be like Mongillo, there could be different amounts of like STP/STD that changes the temporal responses of the circuits. It would be interesting to know if neurons with longer receptive fields are firing constantly when their preferred stimulus is shown.

Monday, November 19, 2012

Direct control of firing rate gain by dendritic shunting inhibition.

Capaday, C. Van Vreeswijk, C. (2006) Direct control of firing rate gain by dendritic shunting inhibition. Journal of Integrative Neuroscience 5(2): 199-222.

Ok, crap. Just skimming over this paper he basically gets to the same model that I have in the local-bend system. Ugh, I knew someone must have done this. We shall see what he came up with. He makes his compartments dendrite and soma - which are equivalent to my soma and axon respectively. I've never heard of this journal, and this paper is only cited by one other paper in pubmed, and that paper is something completely different.

Intro has a nice review of all the noise papers. Holt & Koch, Chance et al, Mitchel & Silver, Prescott and De Koninck. There are slight differences in the noise mechanism across these papers.

"soma acts as an IF neuron attached, by a coupling conductance, to a passive dendritic compartment."

He is taking into account the current from the action-potential, which is normally neglected. He makes Ispike like a delta function with some integral S. Then he derives a way of incorporating the spike current as a value for the reset. So then he just jumps to his alpha-motor neuron model, where he adds in some more conductances - an AHP, and a K.

He then analytically derives the IFR in the two-compartment model. Its quite complicated. But then he gets to the firing rate being:

R=1/C_S(V_T-V_r) * (I_S + g_C/(g_C+g_D)I_D)

R is firing rate, C_S soma capacitance, V_T threshold, V_r reset, I_S current injected into soma (axon, for mine), I_D is current injected into dendrite (soma, for mine). g_C is coupling conductance, g_D is the approximate conductance of the dendrite. He then derives, similarly to mine, how Holt & Koch works, and how you can get division if current is injected in dendrite and shunting is in dendrite.

So, A is just like Holt & Koch (with a passive dendrite attached). B is just an IF and you slightly increase the conductance through the dendrite to ground. C is equivalent to A, just the current is going through an extra resistor. D is the gain-control type.

He next is considering synaptic conductances instead of currents. He talks about the upper-limit when soma shunting prevents the neuron from firing at all - due to the Voltage saturation and the excitatory reversal potential.

"The observation emphasizes that it is the net current reaching the SIZ at the soma which determines firing rate." ala my figure 5B.

So, yes, basically the same. He doesn't catch the trick of changing the reset potential such that the zero-crossing is the same and thus you get pure scaling. And in general I think the seperation of soma and axon is better than like dendrite and soma. All the conductance stuff he talks about is the same as what happened with my model, I just ignored it because of the complicated properties of the conductance curves - it was hard to make something work based on those functions. And acting like the dendrite is one-compartment and can actually saturate seems unlikely anyway, and currents summing from dendrites is more appropriate.

Sunday, November 18, 2012

How to create a mind 1-3

So, Ray Kurzweil has just published a book called "How to create a mind". I've read the first three chapters so far. It's pretty interesting, but feels like its missing the details in much of the way Grossberg's work was.

He makes some interesting points about Einstein and Darwin and what their thought-processes were when they were making their big discoveries. Einstein is famous for his thought experiments and Kurzweil walked through his version of Einstein's thinking process. He makes the case that to understand the brain we have to make these types of thought experiments - thought experiments about thought.

It makes sense. There's definitely a lot of insight one could get by "introspecting". I mean it's a really interesting question: What information does my consciousness have access to? It's not everything (we have little access to the "Where" pathway, we see illusory contours, so we don't have access to the raw data.)

Kurzweil's big thing is talking about hierarchical pattern recognizers. His whole "unifying theory" is called PRTN - pattern recognition theory of neocortex. He goes into explaining hierarchies - ilustrating how certain patterns make up letters, letters make up words, words make up sentences and so on. He says in a way that language is crafted by the brain's hierarchical organization - our hierachical pattern recognizer lead us to creating a hiearchical pattern communication system. Oh yeah, he calls spoken language our "first invention", and he calls written language our "second invention"...

So far its pretty simple in AI circles - some "pattern recognizer", lets say a neuron, gets bottom-up inputs that have size (firing-rate), variance (spike-timing), and a weight (synapse strength). He even talks about dendrites and axons, but he does it in a strange way - either he's just too much into the AI field and not actual neuro, or he is maybe hinting at the idea that dendrites can each be a pattern recognizer (i'll probably find out later).

So the neuron recognizes its pattern and outputs its own pattern having size and variance information, which goes on to the next level. In the next level the same thing happens and so a hiearchy is arranged, and ultimately it could also be recursive. He talks about how top-down is coming back down constantly, and that its all feedback etc. Top-down is predicting bottom-up style.

So we'll see where he goes. It seems simple now, but its still just the beginning of the book - maybe 1/5th. What he has said so far seems just obviously true in such a vague sense, the real question is in the details.

I'm still really digging the resonance idea. One thing I've been thinking about recently is that a simple system that could do like an instant PCA on its data set, and then as more data get collected the best PCA dimensions are pulled out. Or it could be equivalently done for clustering.

I'm struggling with the question of how do the timescales change as we go up the hierarchy? What is happening at the top? So I can see how visual cortex is basically operating as an online system, processing and basically memorizing all of the inputs coming in constantly. But then as you go up the visual hierarchy, the neural representations seem to be for things that move more slowly and can be in a larger area. For the object recognition game, it definitely seems like spatial invariance is being done by simple and complex cells that stack up in a hierarchy. The complex cells are important to spatially invariant object recognition. I can see how you could build a hierarchical pattern recognizer with layers of complex and spatial cells to build to a spatially invariant object classifier. But at the same time that you go up in space, it also feels that you should go up in time. There's something to objects being both spatially and temporally invariant, and I think that this temporal invariance needs to be encoded.

Cortex is tasked with the job of simultaneously making a representation of its inputs, but also figuring out how to make the lowest-dimensional representation. Low-dimensionality is something to strive for, as it forces you to come up with simple explanations. But when there is no good answer you have to be high-dimensional in the description. So each layer in the hierarchy is a pattern classify (or maybe each column).

So a cortical column gets as inputs a bottom-up signal from the level-below, bottom-up like signals from the neighboring columns, and then top-down signals from higher-levels. A cortical column is then like a few thousand simple cells that are like layer 4, and complex cells that are like layer 2/3. The simple cells make the representation, and they are both forced to be more excitable (hence increasing dimensionality) when they are bad, and making the lowest they can make when they are starting to work. The complex cells use the simple cells to pattern complete, making a spatial invariant pattern of the simple cell representation.

So it just seems that one thing that would help shaping the feedback signals is to look for temporal invariance. I think back to a paper I remember about how the brain learns slowness - things that are temporally invariant. It would just be a useful signal to learn from - if your higher-level is changing less then your representation is good...

Wednesday, November 14, 2012

Neuronal arithmetic

Silver, RA. (2004) Neuronal airthmetic. Nature Review Neuroscience 11.

So, basically this is the guy all about the need for synaptic noise in order to get multiplicative effects from shunting inhibition.

But he has some basic good ideas about population codes and the need for making certain types of computations.

Figure 1 | The rate-coded neuronal input–output relationship and possible
arithmetic operations performed by modulatory inputs. a | For rate-coded neuronal
signalling, a driving input typically consists of asynchronous excitatory synaptic input from
multiple presynaptic neurons firing in a sustained manner (shown in red). A neuron may
also receive a modulatory input, such as inhibition (shown in green), that alters the way the
neuron transforms its synaptic input into output firing rate (shown in blue). b | The
input–output (I–O) relationship between the total (or mean) driving input rate (d) and the
response that is represented by the output firing rate (R). The arrow indicates the rheobase
(minimum synaptic input that generates an action potential). c | Rate-coded I–O
relationships can be altered by changing the strength of the modulatory input (m), which
may be mediated by a different inhibitory or excitatory input. If this shifts the I–O
relationship along the x-axis to the right or left, changing the rheobase but not the shape of
the curve, an additive operation has been performed on the input (shown by orange
curves). This input modulation is often referred to as linear integration because the synaptic
inputs are being summed. d | An additive operation can also be performed on output firing.
In this case a modulatory input shifts the I–O relationship up or down along the y-axis
(shown by orange curves). e,f | If the driving and modulatory inputs are multiplied together
by the neuron, changing the strength of a modulatory input will change the slope, or gain,
of the I–O relationship without changing the rheobase. A multiplicative operation can
produce a scaling of the I–O relationship along either the x-axis (input modulation; e)
or the y-axis (output modulation; f). Although both of these modulations change the gain of
the I–O relationship, only output gain modulation scales the neuronal dynamic range (f).

So he talks about "input gain modulation" where the max value doesn't change (e), and he talks about "output gain modulation" where the max value is scaled (f). And, so basically due to the sigmoidal shape, the slope is changed in both scenarios. And he says that this is gain control in both cases.

So yeah all the experimental work with shunting inhibition just based on ohms law and currents. He makes it sound amazing. But then he says, yeah, Holt and Koch showed it doesn't work. But then he makes an argument about how it could work if there was "synaptic noise", but he does show some experimental evidence to back it up. I need to go back and look at this.

But right his mechanism seems strange to me (and I think as of now useless, but let me explain it the best I can). The theoretical idea is explained in the Larry Abbott paper. But I think I'll give it a try and read that paper later. So... the idea is that you have balanced excitation and inhibition, and then basically what that means is that the noise is being increased. This results in a higher variance of current fluxes, and then due to like an integrate-and-fire mechanism, the higher variances results in spiking I-O function to get scaled.

But. Wouldn't a higher variance make it spike more often and not less? That would seem like backwards gain control - the network has a lot of activity so increase activity even faster? Hmm... but looking at their figure it sounds like it goes down... But they also show some data. And its like with dynamic clamp they add in excitatory and shunting currents and show that its like how they predict.

They basically explain the whole "derive Holt and Koch" thing in this paper. Its not as pretty as my derivation, but yeah they explain why it works that way mathematically. (they dont actually derive the linear equation though). But need to look at experimental work better.

Right, so I think the problem with this idea is that it isn't really controllable. How would one turn up or down the gain? Its like the circuit gets noisy and the gain goes down, but how would synapses keep that in shape - plasticity rules? I'm not sure, I'm confused thinking about it.

Monday, November 12, 2012

Normalization as a canonical neural computation

Carandini, M. Heeger, DJ. (2012) Normalization as a canonical neural computation. Nature Reviews Neuroscience 13: 51-62.

The normalization equation:

R = D / (s + N)

D is the non-normalized drive of the neuron, s prevents divide by zero, and N is the normalization factor.

Normalization is seen in a ton of areas. It can be done in many different ways, and have different mechanisms. Here are the areas he talks about:

Invertabrate olfactory system (drosophila)
Retina
V1
MT, MST

He also include exponents in the normalization equation, which can change the shapes of the curves. He takes the equation and fits it to a large amount of data pretty nicely. What's interesting is that many of the figures he describes is not population-code normalization as purely as I've been modeling. He often describes the normalization function as a right-ward shift in the IO function on a log scale. This means that the input can eventually reach the saturation level if strong enough (but has to be multiplicatively larger).

So you can see how in B and D that the IO functions aren't just being purely scaled. However, the normalization equation he describes fits the data quite well.

He then talks about attention and gain control. Attention "multiplicatively enhances the stimulus drive before normalization".

He makes a distinction between two types of gain control: "Contrast-gain" is right-left shift of IO function on log scale (horizontal stretching), "Response-gain" is up-down scaling of IO function.

There are clues that feed-forward and feed-back circuitry is involved in divisive processes.

He briefly mentions Holt and Koch. Then he says: "It is now agreed that the effect of conductance increases on firing rates is divisive, but only if the source of increased conductance varies in time" and cites Silver, Chance and Abbott.

So I think what they call contrast normalization and response normalization may be nice to have in the paper. I can also talk about the temporal gain control stuff more, as it is not needed in my model.

Also, this makes me think about ARBMs. I was wondering what the effect of the top-down signals should be on the output, and if the top-down are equivalent to attention, then these papers say attention is gain increasing. So the top-down effects of the ARBMs should increase the gain of the population that they feedback to. Should look into if burst firing is like a multiplication somehow of spiking...

Friday, November 9, 2012

ARBM

ARTBM? AR-RBM? ART-RBM?

Anyway, adaptive resonance restricted boltzmann machine. Lets go even simpler and try and put the ART rules in context of an RBM. Lets consider just two layers where stimuli are presented to the first layer through the basal-tree. The firing of this layer stimulates the basal tree of the second layer. The spiking of the second layer influences the apical tree of the first layer (and would influence the basal tree of the third layer). We are ignoring any lateral connections for now.

So, I want to consider establishing a resonance, learning, and representation dimensionality in the context of an RBM. For just a regular RBM, I like to think of the learning rule as basically back-propagation in a feed-forward neural network, where the supervised learning signal is the input. Although, I feel like this is not quite the case (need to do an RBM refresher - RBMs are binary?).

The first layer receives a stimulus, and lets say is normalized to a constant length. So at first the second layer will have low activity (nothing has been learned), and over time the excitability of the second layer increases. Eventually, enough neurons will be activated until the feedback starts causing the first layer to begin bursting. One parameter is how much longer is a burst than a spike? The bursts and the spikes will keep being normalized by the inhibition to a constant length. This should create a positive feed-back loop which leads to more bursting in the first layer and more excitation in the second layer. The second layer will also need to be normalized. Synapses from second layer that cause a burst in the first layer (spike before burst) are strengthened, synapses that cause a spike in layer 2 from a burst (burst before spike) are also strengthened.

If this kept going up a hierarchy, then the second layer would receive top-down signals and bottom-up signals. This will cause the second layer to burst. So the question is should there be learning if and only if both pre and post are bursting? Or is one enough? What ultimately happens to the top layer, as this layer will not have top-down signals.

The valuable thing about this set-up is that the second layer can increase or decrease the dimensionality of the representation as it learns. So there needs to be two opposing forces - one that increases the dimensionality when the matching is poor, and one that decreases the dimensionality when the matching is good. The increasing excitability with a poor match will naturally increase the dimensionality. It could be that the divisive normalization/competition could turn-off the weaker firing neurons when the match is good. So there probably needs to be an LTD rule when there is a burst and no spiking.

A big question is what is the relationship between the burst and the spike. Consider bottom-up inputs causing a neuron to fire at 10Hz, then top-down inputs come in and increase activity. How is this activity increased? Does it just increase the gain of the output - i.e. all bottom-up rates are multiplied by 2? Or does it add a constant amount - say another 10Hz, thus a 10Hz burster looks like a 20Hz spiker? Is the top-down signal to burst binary or is it graded?

I would say the first thing to do is start with a rate-coded ARBM. Each neuron has two "compartments" where each compartment just sums its inputs and passes that through some kind of sigmoid like function. The bottom-up compartment sets the firing rate based on the inputs. All the firing rates are normalized by the recurrent multiplicative inhibition (or just constantly normalized by the program - perhaps it the vector length can be lower, but ultimately has some maximum). The top-down compartment lets say increases the gain of the output, and has some range - like 1x - 3x max. The top-down compartment being over some threshold level, would indicate learning signals. If pre and post activity then LTP, if only one side is bursting then LTD.

Tuesday, November 6, 2012

ART Orienting

The other idea about ART was that there exists an "orienting system". This system is necessary to provide the signals to learn new things. The idea of the orienting system is that with a novel stimulus the top-down will have a poor match with the bottom-up. So the orienting system basically "resets" the top-down matching signal. It simultaneously increases the excitability of some neurons such that newly activated neurons can be learned as part of the grouping.

So lets just start back with the resonances. Part of the idea about the resonance state is that learning is occurring, but also the resonance state is good - it means what you are doing worked. So I guess that makes sense, learn better the classification states that are good. And so this can be done probably quite rapidly in the brain. A good amount of synchronous bursting would, from what we see experimentally, probably become a pretty solid encoding in a short time. Remember resonances are part of consciousness, so we have to think about what we are counscious of and what it might mean for how the brain works.

So my "declarative memory" is my life experience. I can look around me and see and remember what is happening to me basically all the time. There is a tremendous amount of information being encoded in my memories all the time. Some of these memories fade, some are more attended to and code better, some are remembered soon after encoding - reinforcing the memory further. These have real biophysical correspondences in how are brain works. Plasticity rules in ART would suggest that these encodings are done throughout cortex and through plasticity driven by resonances (aka synchronous bursting). Attention and motivational system can perhaps increase and decrease the amount of plasticity - both through direct modulatory mechansims of plasticity (Dopamine-like) and through increased excitability leading to stronger resonances and more burstiness.

But how do you do the learning in the first place? How do you get to the really nice resonance states? We have a good rule for what to do while in the resonance states, but what about when we aren't in one? What happens if there is a poor match between top-down and bottom-up?

So its as if the orienting system is at first turning up the volume of the bottom-up inputs when the top-down activity is not very strong. Or maybe this could happen from inhibitory feedback being reduced from low activity. So soon the neurons begin to fire. As more and more neurons begin to fire, eventually it settles on a resonant state? I guess its like you are constantly increasing the dimensionality of the model and eventually you would activate enough neurons that some of the bottom-up neurons would begin to burst. Bursting = learning. So you could start to learn a resonance in that manner. The search process increases the dimensionality of the representation.

So the next time you experience a similar input, a similar pathway will be activated which now activates a large population pretty strongly. Perhaps strongly enough that divisive normalization quiets some down. But the input will be different in some ways. Some of the top-down is not activated. This would suggest that perhaps then new plasticity would form between the bursting neurons at the synapses that are bursting together.

First lets clarify, in a way we're trying to build something hierarchical. So we consider the input layer to be a layer of pyramidal cells who receive "bottom-up" input (from LGN, and I guess neighboring pyramidal cells). So when a stimulus arrives the bottom-up input will activate a population - this population is the "representation" space and should have all the information about the stimulus. These bottom up signals only cause spiking. The spike are relayed through a layer of tunable synaptic weights (think feed-forward neural network) to a "prototype" layer. This is again a layer of pyramidal cells, this layer gets its inputs in the basal tree from the first layer. (So lets call the representation layer L4 and the prototype layer L2/3 (also could be CA3/CA1). So again bottom up is from L4 and neighbors. (What if neighbors connected to apical tuft of self?). Ok, so L2/3 has more lateral connections and likes to excite itself. This will lead to a more pattern-completing type of layer. So then probably like more STDP between L2/3 cells (controlled by gamma oscillatory inhibition). Now, those signals feedback onto L4, and excite the apical dendrites of L4 neurons. This causes the L4 cells with both bottom-up and top-down inputs to begin bursting. Now the bursting is more of a signal that means do some learning and maybe listen to me a little more closely (how much more?).

But lets pay attention to learning. Ok so now we have L2/3 neurons spiking from input by L4, and now many of those L4 cells start bursting. This means that we have spiking in L2/3 and bursting in L4 learning at those synaptic weights (make the current pattern in L2/3 more likely given the pattern in L4). So lets say those weights get some LTP. LTD when there is a bursting presynapse, but no spiking in the post. Does L2/3 at this point begin bursting as well? I imagined that L2/3 would be modulated by some top-down signal. Perhaps the next L2/3 in the hierarchy (making two cortical hierarchies - pattern completing and pattern seperating), or possibly by L4 of the next level (making something that is interleaving(?)).

Well it would seem that there should be learning in the weights that feedback from L2/3 to L4. (I know Grossberg has an L6 inbetween, but just go along.) This would reinforce the prototype as having the characteristics of the neurons that are firing. So perhaps in the apical tree if there was a calcium spike and the neuron began bursting, then the active synapses should get stronger (making it more likely to be a burst next time the same prototype is drawn up). LTD if there was a calcium spike and no burst.

So L2/3 is two fighting feedback loops. The bottom-up inputs are all trying to excite the neurons by these learning rules, and the neurons are exciting each other. These synapses are being reinforced by positive feedback by inputs, except there is some normalization procedure - like homeostasis - in the synaptic weights. Still the neuron can be driven a lot. Now the neurons turn on each other as well as the inhibitory feedback gain-control. This forces the neurons to then also compete with each other. The gain control maximizes the allowable length of the population vector. This prevents L2/3 from just maxing out.

Monday, November 5, 2012

ART and feedback

One insight from ART that I was struggling with conceptually is the idea of positive vs. negative feedback in forming a perception. I always imagined it like cortex would form the model of the pixels that are within thalamus. The feedback coming from cortex to thalamus is predicting the state of thalamus. It seemed reasonable to me that this feedback was negative - once some pixels were explained by a model of cortex, then those pixels would be subtracted off and cortex would be trying to figure out the pixels that are left over.

ART, however, is based on positive feedback. Adaptive resonance means that the top-down and bottom-up signals are in agreement, and then those pixels are amplified. This fits in anatomically, as I was having trouble figuring out how cortex could be doing negative feedback to thalamus in a specific way. The key to working with positive feedback as a neural computational mechanism, however, is the necessity of multiplicative gain control.

Without any inhibition, a positive feedback loop would obviously explode or die off to zero in a short amount of time. But one of the keys to ART is that there is divisive inhibition/constrast normalization/gain-control in the representation space. This leads to feedback systems that are controlled by additive levels of excitation, and multiplicative types of inhibition. The positive feedback may not even come across as excitatory. The positive feedback will enhance the signals that are matched by top-down and bottom-up, but the divisive inhibition will keep the population normalized. Thus enhancement of some neurons will lead to more inhibition in other neurons. And by making sure the inhibition is multiplicative, the information stored in the population code is not lost.

So in a way the feedback from cortex can seem negative. But really the negative part of the feedback is coming from feedforward inhibition and local feedback inhibition. Cortex excites further the neurons which it predicts correctly, and the population gets normalized by the inhibitory neurons.

So, how do we actually make something that is based on ART that can actually do what we want it to do? The problem with ART so far is that it isn't completely written out. It is a conceptual framework for fitting in neural circuits and solving some learning problems, but I feel it hasn't been fully fleshed out.

First, lets make some assumptions. ART is based on top-down excitation being modulatory. Based on all the evidence, much of which is talked about in this blog, is that top-down modulatory excitation is done via the apical tuft. This implies that pyramidal cells have two modes of sending signals - spiking and bursting. We see from the literature that the apical tree can generate a calcium spike when signals from layer 1 excite the tree enough. Top-down signals tend to come from layer 1. The calcium spike will cause a burst of spikes if the basal tree (bottom-up) is also excited. However, what ART requires is that the calcium spike does not actually cause any action-potentials if the bottom-up inputs are not present, which is fine biophysically.

The top-down and bottom-up types of excitation can set up learning rules for the system. The most simple way to think about it is that there are 4 binary combinations of possibilities. 1. No top-down or bottom-up input, 2. Bottom-up but no top-down, 3. Top-down but no bottom-up, 4. Both top-down and bottom-up. There are different outputs from a pyramidal cell under these conditions: 1. Nothing, 2. Spiking, 3. Nothing (but neuron is extra excitable). 4. Bursting.

Now, ART talks about learning based on 4, with no plasticity in the other 3 cases (4 is a resonance). However, there could be some plasticity based on cases 2 and 3, but that follows different rules - 3 for instance, could be an indication for LTD, as the top-down predictions were incorrect at classifying a bottom-up input.

One question that needs to be answered is what is the relative impact of bursting over spiking on the normalized network? If we imagine that the normalization process, at its most simple approximation, is to make the population vector distance 1, then what does a bursting neuron do in comparison to a spiking neuron? Is a burst twice the length of a spike? Three times? One? Does it even matter - can the network through learning be operational with different impacts of bursting and spiking?

Friday, November 2, 2012

ART thoughts

Ok, so now that I've gone through half this paper in detail, it seems that the rest of it is more or less just a conceptual model of different meso-brain circuits. It feels that he explains a lot of data with his model, but some times its like he's just fitting his model to all data, because it has so many parameters.

Not to say that I don't think he has very good ideas, but they don't seem to be developed on a functional level. SMART seems to just be like here's what we know about the anatomy of early visual areas, and magic magic ART.

I think that we need to explore the mathematical details of his ideas. He talks about using resonances to make a learning system, but I'm just not really sure if he ever actually does it. I guess I need to look at SMART in more detail.

Part of what I think is missing is actually showing how ART can solve a learning problem. He makes lots of claims about data, but doesn't really explain what it means. It would be nice to see a ART model that can actually be used for learning something and how it works.

Adaptive resonance theory: How a brain learns to consciously attend, learn, and recognize a changing world IV

Grossberg, S. (2012) Adaptive resonance theory: How a brain learns to consciously attend, learn, and recognize a changing world. Neural Networks.

SMART: Synchronous Matching ART. Now getting into spiking models, STDP, acetylcholine modulation, hierarchical laminar thalamic and cortical circuit designs and their interactions, oscillations.

SMART divides thalamus up into specific first-order (LGN), specific second-order (pulvinar), nonspecific, and thalamic reticular nucleus. Here's the figure:

"Figure 6: The SMART model clarifies how laminar neocortical circuits in multiple cortical areas interact with specific and nonspecific thalamic nuclei to regulate learning on multiple organizational levels, ranging from spikes to cognitive dynamics. The thalamus is subdivided into specific first-order and second-order nuclei, nonspecific nucleus, and thalamic reticular nucleus (TRN). The first-order thalamic matrix cells (shown as an open ring) provide nonspecific excitatory priming to layer 1 in response to bottom-up input, priming layer 5 cells and allowing them to respond to layer 2/3 input. This allows layer 5 to close the intracortical loop and activate the pulvinar (PULV). V1 layer 4 receives inputs from two parallel bottom-up thalamocortical pathways: a direct LGN→4 excitatory input, and a 6I→4 modulatory on-center, off-surround network that contrast-normalizes the pattern of layer 4 activation via the recurrent 4→2/3→5→6I→4 loop. V1 activates the bottom-up V1→V2 corticocortical pathways from V1 layer 2/3 to V2 layers 6I and 4, as well as the bottom-up cortico thalamocortical pathway from V1 layer 5 to the PULV, which projects to V2 layers 6I and 4. In V2, as in V1, the layer 6I→4 pathway provides divisive contrast normalization to V2 layer 4 cells. Corticocortical feedback from V2 layer 6II reaches V1 layer 1, where it activates apical dendrites of layer 5 cells. Layer 5 cells, in turn, activate the modulatory 6I→4 pathway in V1, which projects a V1 top-down expectation to the LGN. TRN cells of the two thalamic sectors are linked via gap junctions, which synchronize activation across the two thalamocortical sectors when processing bottom-up stimuli. The nonspecific thalamic nucleus receives convergent bottom-up excitatory input from specific thalamic nuclei and inhibition from the TRN, and projects to layer 1 of the laminar cortical circuit, where it regulates mismatch-activated reset and hypothesis testing in the cortical circuit. Corticocortical feedback connections from layer 6II of the higher cortical area terminate in layer 1 of the lower cortical area, whereas corticothalamic feedback from layer 6II terminates in its specific thalamus and on the TRN. This corticothalamic feedback is matched against bottom-up input in the specific thalamus. [Reprinted with permission from Grossberg and Versace (2008).]"

Wow. Yeah. Some highlights: Its like thalamus is doing a loop with layer 5 - layer 5 to pulvinar is like retina to LGN. Nonspecific thalamus is the orienting mechanism. It somehow causes a reset by going through L5-L6 pathway - and it involves the habituative transmitters in L6. Reset would be implicated by beta oscillations, a resonant match results in gamma oscillations. Vigilance is controlled by acetylcholine - vigilance promotes finer categorical seperation.

Ok, here's how he explains how learning novel environments works, with respect to beta oscillations. So the first exploration of a track does not cause much beta - this is because the top-down expectation of a novel environment is usually broadly tuned, so that resonance eventually begins. (I guess, in novel environment search procedure increases top-down excitability, eventually leading to a large number of top-down neurons being activated and associated with bottom-up states.) The top-down inputs are broadly tuned to match feature patterns. So the real beta-level learning happens while pruning the top-down expectations. Using mismatch-based reset events the categories are fine-tuned. This results in beta. So you see beta in a few more trials. (So I guess the second time a category is activated, its not activated as fully, so it may not lead to a resonance at first. But then the orienting system increases excitability, leading to a larger share of the pattern, but still not full. Then maybe you need 90% of the pattern to get a resonance, but then that 90% will get reinforced. Then next time you need only 90% of that pattern.).

"Such an inverted-U in beta power through time is thus a signature of ART category learning in any
environment."

He explains about how place cells are learned through ART from grid cells. Place cells are just categories of space. Self-organizing maps can do it, so can ART, but top-down attention is needed.

next is section 41, page 39/98...

Thursday, November 1, 2012

Adaptive resonance theory: How a brain learns to consciously attend, learn, and recognize a changing world III

Grossberg, S. (2012) Adaptive resonance theory: How a brain learns to consciously attend, learn, and recognize a changing world. Neural Networks.

Section 18.

Now we are getting into how ART is embedded in the microcircuitry of cortex and thalamus. This also begins to combine ART with FACADE (Form-And-Color-And-DEpth), which introduces horizontal interactions.

Boundaries are completed inwardly between oriented and collinear cell populations. Surfaces are completed outwardly in uncorrelated manner until reaching boundaries. Boundaries form through interblobs in V1 to pale stripes in V2 and beyond to V4. Surfaces go through blobs in V1 to thin stripes in V2 and beyond to V4.

All boundaries are invisible (In a consciousness sense). Surfaces are visible. This is because surfaces are part of surface-shroud resonances, where consciousness needs a resonant state. (I guess boundaries aren't resonances? Or they are resonances, but not all resonances are consciousness).

Cortico-cortical feedback tend to preferentially originate in layer 6 of a higher area and terminate in layer 1 of a lower area. This top-down feedback is "modulatory". Goes through apical dendrites of layer 5, to layer 6 then is "folded" back up into layer 4.

5a: LGN has two excitatory pathways to L4. L6 activates L4 though modulatory on-center, off-surround network. L4 is driven by direct LGN inputs. This circuit "contrast-normalizes" the inputs that layer 4 receives from LGN.

5b/d: cortical feedback signals are carried through L6 back to L4. A similar attentional feedack occurs between L6 of V1 and LGN.

5c: Layer 2/3 possess long-range horizontal connections that are used for perceptual grouping of contours, textures, and shading. (Pattern completion). They ensure "boundaries are invisible" (not sure what this means). 2/3 also sends back to "folded-feedback path", with direct routes to L6 and indirect routes to L5.

5e: hierarchical propagation of priming: V2 repeats laminar pattern of V1, but at larger spatial scale. Since top-down signals are modulatory then the top of the hierarchy (e.g. prefrontal cortex) can potentially modulate all the way down.

Pre-attentive grouping. Both intercortical attentional feedback and intracortical grouping feedback share the same competitive selection circuit from L6-to-L4. L2/3 acts as the attentional prime needed for learning, without feedback from higher cortical areas. Both L2/3 and higher areas act on the same L4-to-L6 selection circuit. "The pre-attentive grouping is its own attentional prime".

Balance of excitation and inhibition. L23 cells fire only if they get direct bottom-up input, or if co-linear inputs from pairs or more are of bipole populations. (You only see the edge if you have pac-men at both ends).

Cortex exists at a "cusp of excitability" in the resting state.

Unambiguous scene is processed - fast sweep up cortical hierarchy directly through L4 to L2/3 and then to L4 to L2/3 in higher areas. Multiple possible groupings the feedback competition arises due to inhibitory interactions in L4 and L2/3. Competitive circuits are self-normalizing. "they tend to conserve the total activity in hte circuit".

self-normalizing circuits carry out a type of "real-time probability theory". Amplitude of cell activity covaries with the certainty of the network's selection/decision about a grouping. Amplitude, in turn, is translated into processing speed and coherence of cell activations.

next is section 31.

Wednesday, October 31, 2012

Adaptive resonance theory: How a brain learns to consciously attend, learn, and recognize a changing world II

Grossberg, S. (2012) Adaptive resonance theory: How a brain learns to consciously attend, learn, and recognize a changing world. Neural Networks.

Starting back with section 9.

He links consciousness to resonance by the idea that the "contents" of the experience - conscious qualia, are linked through the "symbolic", compressed representation. The feedback in brain/ART binds the pixels and the symbols together, which is the basis of a conscious experience.

Learning occurs in the resonant state. When their is resonance the bottom-up adaptive filter and the top-down expectation pathways have learning activated. (the weights going up and down the hierarchy).

match learning causes gamma (ergo gamma is consciousness), mismatch/reset leads to beta oscillations.

Attentional system knows how inputs are categorized, but not whether categorization is correct, orienting system knows whether categorization is correct, but not what is being categorized. This means orienting system's activation needs to be nonspecific. Can use medium-term memory (synaptic depression) to lower chances of getting stuck in same local-minima category during search process. The self-normalizing network is essential - can act as a real-time probability distribution. Search cycle is probabilistic hypothesis testing and descision making.

ART prototypes are not averages, but the actively selected critical feature patterns upon which the top-down expectations of the category focus attention. "Vigilance" is the level of acceptable matching - low vigilance learns general categories with abstract prototypes. High vigilance forces a prototype to encode an individual examplar.

p is the vigilance parameter in figure 2. This controls how bad a match can be before search for a new category is initiated. Can control vigilance by a process of match tracking. Vigilance "tracks" the degree of match between input exemplar and matched prototype. The vigilance parameter is constantly being increased just enough to trigger a reset.

What stream learns spatially invariant object categories, where stream knows object positions and how to move. Interactions between what and where overcome these informational deficiencies. The what and where stream interact to bind view-invariant and positionally-invariant object categories.

A view-specific category of a novel object is learned and activates cells at a higher level that will become view-invariant object category as multiple view-specific categories are associated with it. As the eyes move around an object surface multiple view-specific categories are learned and associated with the emerging invariant category. An attentional shroud prevents the view-invariant category from getting reset, even while new view-specific categories are rest, as the eyes explore an object. This is done by inhibiting ITa reset mechanism.

The surface-shroud resonance is formed between surface representation (V4) and spatial attention (PPC), and focuses attention on object to be learned. When shroud collapses view-invariant category can be reset, and eyes can move to a new object.

Next is section 18, page 23.

Pages