##Systems neuroscience beyond the tuning curve John Pearson [pearsonlab.github.io/beyond-tuning-curve](https://pearsonlab.github.io/beyond-tuning-curve)

How we decide

  • Explore/exploit
    • Pearson, Hayden, Raghavachari, Platt (Curr. Bio., 2009)
    • Addicott, Pearson, Froeliger, Platt, McClernon (Psych. Res., 2014)
  • Temporal discounting
    • Pearson, Hayden, Platt (Frontiers, 2010)
    • Blanchard, Pearson, Hayden (PNAS, 2013)
  • Learning under uncertainty
    • Hayden, Heilbronner, Pearson, Platt (J. Neuro., 2011)
    • Pearson, Heilbronner, Barack, Hayden, Platt (Trends in Cog. Sci., 2011)
  • Foraging
    • Hayden, Pearson, Platt (Nat. Neuro., 2011)
    • Pearson, Watson, Platt (Neuron, 2014)
  • Social Reward
    • Pearson, Watson, Klein, Ebitz, Platt (Frontiers, 2013)
    • Chang, Fagan, Toda, Utevsky, Pearson, Platt (PNAS, 2015)

But how do you turn this...

...into this?

##The tuning curve paradigm - Subject repeats same behavior many times - Average neural firing across repeats - Neuronal populations are characterized by tuning
##The prefrontal reality - Complex behaviors never quite repeated - Dynamics not quite regular enough to average - Where are the tuning curves?

Why machine learning?

  • Models that actually fit
    • is the model really a good model?
    • structured black box
  • The ANN as model system
    • algorithm as homology
    • what details matter?
    • takes distributed, asynchronous seriously

What we do

Organization of complex foraging, social behavior.
Neural mechanisms from data-driven models
Whole-brain dynamics
### Today's plan: Three exercises in removing constraints: - Learning stimulus space without labels ([link](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005645)) - Complex behavior without trial averaging ([arXiv](https://arxiv.org/abs/1702.07319)) - Single-trial analysis of neural spiking (in prep)

How do neurons see the world?

Freiwald, Tsao, Livingstone (Nature Neuroscience, 2009)

But what if we use this?

Adams, Pearson, and Platt (in prep)

Xin Chen

Jeff Beck

Let's run an experiment

Chen, Beck, Pearson (PLoS Comp. Bio, 2017)

Let's imagine a model

###We are not the first - Gallant lab (fMRI) ([Huth 2012](http://www.sciencedirect.com/science/article/pii/S0896627312009348), [Stansbury 2013](http://www.sciencedirect.com/science/article/pii/S0896627313005503)) - Continuous latent states ([Park 2014](http://www.nature.com/neuro/journal/v17/n10/abs/nn.3800.html), [Buesing 2014](http://papers.nips.cc/paper/5339-clustered-factor-analysis-of-multineuronal-spike-data), [Archer 2015](https://arxiv.org/abs/1511.07367), [Park 2015](http://papers.nips.cc/paper/5790-unlocking-neural-population-non-stationarities-using-hierarchical-dynamics-models)) - Discrete latent states ([Escola 2011](http://www.mitpressjournals.org/doi/abs/10.1162/NECO_a_00118#.WNqSexLythE), [Putzky 2014](http://papers.nips.cc/paper/5338-a-bayesian-model-for-identifying-hierarchically-organised-states-in-neural-population-activity)) - ...and many more
###So what's different? - Previous models: latents capture *internal* dynamics - latents can be driven by stimuli - but vary for presentations of the same stimulus - Our model: latents capture *stimulus* dynamics - each stimulus frame has a set of binary tags - tags follow a Hidden (semi-)Markov Model - latents are *the same* for repeated stim presentations

Let's put that in math

$$ \begin{align} N_{tu} &\sim \mathrm{Poisson}(\Lambda_{tu}\cdot\theta) \\ \theta &\sim \mathrm{Gamma}(s, s) \\ \Lambda_{tu} &= \lambda_{0u} \prod_{k=1}^K (\lambda_{zuk})^{z_{tk}} \prod_{r=1}^R (\lambda_{xuk})^{x_{tr}} \end{align} $$

firing rate = baseline * latents * externals * noise

###Model fitting We have $p(N|Z, \Theta)$ $Z$: latent variables, $\Theta$: model parameters We want $$ p(Z, \Theta | N) \propto p(N|Z, \Theta)\, p(Z) \, p(\Theta) $$ But too hard to do Bayes' Rule exactly!
> Do you want the wrong answer to the right question or the right answer to the wrong question? I think you want the former. > > — David Blei
###Variational Bayesian (VB) Inference - Replace true posterior $p$ with *approximate* posterior $q$ - Minimize "distance" $KL(q \Vert p)$ between actual and approximate posteriors - Same as maximizing the evidence lower bound (ELBO): $\log p(N)$

Experiment I: Synthetic data

Experiment II: Temporal Cortex

McMahon et al. (PNAS, 2014)

Face, monkey, and body part cells!

Experiment II: Temporal Cortex

Viewpoint selectivity!

###What did we do? - Given spike counts, *what features drive firing?* - Multiply "tag" each stimulus frame - Model recovers features from even modest data sizes when signal is strong - Goal is to look for patterns that **suggest new experiments.**

From movement to strategy

Shariq Iqbal

Caroline Drucker

Jean-Francois Gariépy

Michael Platt

Penalty Shot

Iqbal and Pearson (arXiv)

Complexity tax

  • each trial a different length
  • how to average, align?
  • need to "reduce" dynamics

Real trials

### What we want - ~~Joystick censoring~~ - ~~Details of motor execution~~ - Model of latent cognitive state - Capture interaction between players
### Our approach - Borrow from control theory, time series - Structured black box models (pieces make sense) - Neural networks for flexible fitting

Modeling I

Observed positions at each time ($y_t$): $$ y_t = \begin{bmatrix} y_{goalie} & x_{puck} & y_{puck} \end{bmatrix}^\top $$
Control inputs ($u_t$) drive changes in observed positions: $$y_{t + 1} = y_t + v_{max} \sigma(u_t)$$
Goal: predict control inputs from trial history: $$u_t = F(y_{1:t})$$

Modeling II

Assumption: PID control $$ u_t = u_{t-1} + L * (g_{t} - y_{t}) + \epsilon_t $$
  • linear control model: $L$
  • goal (set point): $g_{t}$
  • error signal: $e_t \equiv g_{t} - y_{t}$
  • observation noise: $\epsilon_t$

Modeling III

Goal model:
$$ \log p(g) = -\beta E(g|s) - \log Z \\ E(g|s) = \sum_t \left[ \frac{1}{2} \Vert g_t - g_{t-1}\Vert^2 + U(g_t, s_t)\right] $$
How do we interpret this?
  • Goals minimize an "energy"
  • "Kinetic" energy favors smoothness
  • "Potential" $U$ captures player interaction

Modeling IV

  • $U$ is a problem
  • What if $U$ were just quadratic?
  • Model $e^U$ as a *mixture* of normals
  • Use a Gaussian Mixture:
    • $U(g, s) = \sum_k w_k\mathcal{N}(g | \mu_k(s), \lambda_k^{-1}(s))$
    • i.e., goal mixture depends on current state

Our model

Model fitting

Variational Bayes autoencoder
  • Encoding model:
    • goals: GMM
    • latent control: PID + Gaussian noise
    • observed control: soft censoring

  • Decoding model:

Implementation

  • Variational Inference: $$ \max_{\phi, \theta} \mathbb{E}_q[\log p_\theta(x, z)] + \mathbb{H}[q_\phi(z)] \le \log p_\theta(x) $$
  • Stochastic Gradients: $$ \mathbb{E}_q\left[f(z)\right] \approx f(z^*) \quad z^* \sim q_\phi(z) $$
  • Reparameterization Trick: $$ Z = h(\epsilon, \phi) \quad \epsilon \sim \mathcal{N}(0, 1) $$
  • Code in TensorFlow and Edward

It fits!

Generated Trials

Generated trials

A sample trial

Inferred goals

Potential energy function

###What did we do? - Dynamic control tasks let us leverage motor behavior to study cognitive and social decisions. - Structured black-box models allow us to carve behavior into interpretable pieces. - We inferred a value function capable of explaining behavior in terms of goals.

Modeling neural data

Outcomes matter

(Even more than they should)

DMPFC DLPFC
Modulated 58% 43%
Close ≠ Easy 20% 18%

Neural modeling

We use LFADS:
  • Firing rates are linear combinations of factors
  • Factors form a low-d dynamical system
  • Factors shared across neurons, sessions
  • (Sparse) regression of factors on behavior

Neural modeling

What does this suggest?

  • Post-trial D(M|L)PFC reflects more than reward
  • Within-trial firing reflects progress toward outcome
  • Within-trial firing reflects across-trial variance
    • Strategy complexity?
    • Decision/control difficulty?
## Future directions

Individual differences

McDonald, Broderick, Huettel, and Pearson

Pacman

Yoo, Iqbal, Hayden, and Pearson

Large-scale time series

  • Intractable epilepsy
  • ~5 days 24/7 monitoring
  • ~100 ECoG sensors + A/V
  • ~100 GB per patient
  • Deidentified data to Amazon
  • Spark for in-memory cluster computing
## Conclusions Modeling is a tool for doing the *right* experiment - Model single trials instead of averaging across trials - Data-driven firing patterns, not tuning curves - Population dynamics, not static single units

Sponsors

A social brain?

Mars et al. (PNAS 2014)

Same behavior, different mechanisms

Adams, Watson, Pearson, and Platt (2012)

Foraging, for instance

Pearson, Watson, and Platt (2014)

What I cannot create, I do not understand. — Richard Feynman
###A reverse engineering approach - Work "outside-in" - Focus on computational constraints - "Structured black box" modeling

And in diagram form

Experiment II: Parietal Cortex

Roitman and Shadlen (J. Neuro., 2002)

Experiment III: Temporal Cortex

Predicting final target

Previous state of the art

Variational Autoencoder

Variational Recurrent Neural Network

A potential training signal

A potential training signal


DMPFC DLPFC
Win > Loss 33% 25%
Both effects 15% 9%

Probing the model

Initial goal distribution

Let's view that strategically

Opponents Converging

Opponents Diverging