Getting up to speed

How do I get started?

I’m frequently asked by students, especially neuroscience students, how they should go about improving their {programming, computing, statistics} skills. This page is partly an answer to that. It’s mostly my opinions, with no claim to being comprehensive. The wonderful upside of learning to program in the internet age is that there is so much information and so many options that you don’t have to go with my recommendations.

Contents

  1. Learning to program
    1. General comments
    2. Choosing your first language
    3. Learning your first language
    4. In addition
  2. Python for Data Science
  3. Statistics
  4. Machine Learning: Classic
  5. Machine Learning: Deep Learning
  6. Notes

Learning to program

General comments

Choosing your first language

Learning your first language

I’ll be vague here for one reason: there are too many choices, and none is a clear winner. All you really want at this initial phase is an acquaintance with basic programming: variables, control flow, functions, etc.

Some people prefer books here, but in the cases of Python and R there are also lots of free video series and online courses. Which you choose doesn’t matter so long as:

If you’re coming to Python from a different language and want a quick overview, I highly recommend Jake Vanderplas’s Whirlwind Tour of Python. It’s perhaps a little more than what many scientists need to know to get started, but it’s free and excellent.

In addition

Python for Data Science

Most programming material online is targeted either at students learning their first programming language or professionals learning a new tool for software development. However, programming for science — writing code that runs, simulates, or analyzes experiments — carries its own set of unique challenges, and is distinct from general-purpose programming. That’s why learning to program Python is distinct from learning “scientific Python,” the suite of packages, tools, and practices that surround Python as used in (data) science.

This is why I make every new student in my lab read (cover-to-cover) Jake Vanderplas’s Python Data Science Handbook. The book covers exactly the toolset we use: IPython, Jupyter, NumPy, SciPy, Pandas, Matplotlib, and Scikit-Learn. I don’t know of a better, more comprehensive introduction to modern scientific Python.

Statistics

Professional disclaimer: I recommend a good grounding in statistical theory. It’s worth the investment.

But we’re all busy people. What I usually end up recommending to students:

Machine Learning: Classic

There are lots of great references. The current deep learning phase notwithstanding, machine learning is actually a very broad field, and what is old now will eventually be new again. Some references worth checking out:

Machine Learning: Deep Learning

So Deep Learning (aka neural networks) is eating the world. Briefly:

Notes

  1. Note that information on StackOverflow tends to be proportional to the popularity of a given tool. So information on R and Python is extensive, while Matlab has comparatively less support. 

  2. To be fair, Matlab is now an old language and was designed to ease the burden of engineers who were coding C and FORTRAN for a living. By those standards, it is highly successful, and new features are being added to the language all the time. 

  3. Keep in mind that these classes are great at introducing the material, but they tend to be very light on theory and more focused on simple applications. While they’re a great starting point for high school students, undergraduates, or graduate students in other fields, students interested in machine learning research will be expected to engage with these ideas at a much higher mathematical level.