Blog | Pearson Lab at NYU

NIH's Big Data Push

13 November 2015 by John Pearson

NIH Big Data to Knowledge (BD2K) banner

Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…

— Dan Ariely

Phil Bourne is on a mission. Phil is the Associate Director for Data Science at the National Institues of Health, and he has only a few years to prove to the biomedical research community that picking its pocket to pay programmers, statisticians, and data curators is not a huge boondoggle, but a bold way forward for science.

Yesterday and today, Phil has been preaching to the choir. I’m sitting in an auditorium at NIH holding somewhere in the neighborhood of 400 people, and even for a group of scientists, it’s astonishing how many people are staring at laptop screens. But it’s what’s on those screens that’s really different: GitHub, Sublime Text, Atom, RStudio. These are data people — correction, we are data people — and it’s all hands on deck for the Big Data To Knowledge Initiative meeting.

What does this mean? A lot of discussion about curation, metadata and standards. Lots of centers and centers to coordinate centers. Money to train people. What does it mean for neuroscience, where our “big” data is not so big by the standards of genomics? Probably not much in the short term, but in the long term, a few changes are clearly on the horizon:

Yes, you are going to have to share your data. And you’re going to have to make sure it’s usable.
People here are serious, really serious, about moving to some sort of preprint server model for knowledge dissemination.

I would argue this is a good thing. It’s what we do anyway. But as so many have pointed out over the last couple of days, the real problems with this sort of change are cultural, not technical. And they run deep:

The incentive structure in research is a bad fit for the new data ecosystem. We don’t credit researchers for software, let alone data management. Worse, there is no career path for our really good scientific programmers and team scientists. We are losing our best people because they do not fit easily into traditional faculty research categories.
Biomedical scientists work like artisans. We (I do still run some experiments) invest in our data, we sweat for our data, and so we guard our data. We have a natural distrust of computational types, who have shorter, smaller projects and want to pillage our life’s work for abstruse ideas we don’t understand.

I am personally optimistic that both of these are going to change. At P[λ]ab, we have a diverse and wonderful group of collaborators. But I suspect these issues of culture and trust are primarily sociological, and so it’s up to us, to the quants, to show the rest of our colleagues that we can provide real breakthroughs. That data tools really do make better science.

The NIH is giving us a few short years to prove that Big Data is more than hype. We’re going to need every bit of that time, a lot of help, and a lot of friends in the experimental world, helping us formulate the questions.

It’s going to be an exciting time.

PS: I presented a poster on my work with Jeff Beck on inferring features in complex scenes from neural responses.

Please note: This work is preliminary. Code is not yet up and results may change.

Previous Next