Via Paul, a very interesting recent article in Science about the need to improve the quality of scientific computing in modelling-based life sciences:
In a nutshell, the authors surveyed a bunch of ecologists doing species distribution modeling and found that only a small fraction were competent at ‘syntax-based’ model programming or otherwise were being particularly scientifically rigorous about the modelling software that they chose, while many were just making decisions based on what software was easiest to use and/or most common.
Software pervades every domain of science, perhaps nowhere more decisively than in modeling. In key scientific areas of great societal importance, models and the software that implement them define both how science is done and what science is done…
Scientific considerations of the consequences of [modelling software] adoption generally occur late in the process, if at all. This may be appropriate when deciding which smartphone application one uses. But we must hold scientific inquiry and adoption of scientific software to higher standards. Does use of modeling software conform to basic tenets of scientific methods? We describe survey findings suggesting that many scientists adopt and use software critical to their research for nonscientific reasons. These reasons are scientifically limiting.
The authors conclude with a series of recommendations about how to improve the quality of quantitative analysis in the biological sciences, including multi-disciplinary graduate education with a heavy math/stats/computing component, and peer-review and publication of scientific software code in line with the associated manuscript, an intriguing and forward-thinking idea but perhaps of questionable practicality.
I’m particularly interested in this topic because I recently set myself to improve my computing skills. Though I’ve had some initial publication success so far based on simple, custom spreadsheet analysis, I find myself falling far behind many of my peers in terms of programming skill. Up to this point, I’ve done a lot of analysis with point-and-click programs like Excel & Access, but see the need to transition more towards code-based solutions in order to a) streamline & automate complex multi-step analyses, and b) improve the documentation and reproducibility of my work. After a long discussion with other students and co-workers, I decided to focus my efforts on learning some Python, along with some basic -NIX shell and database skills, and have found two great resources geared towards those in the natural sciences:
- Software Carpentry (online course materials based on this curriculum available here from Dr. Ethan White at Utah State University)
- How to Think Like a Computer Scientist (online course materials based on this curriculum available here from Dr. Asa Ben Hur at Colorado State University)
I’m starting out with Software Carpentry first, and will try to comment on it more in future posts.
However, my new-found enthusiasm for coding is tempered a bit by some recent writings from academics I respect about the relative value of creative inspiration versus methodological/computational firepower in driving high-impact research. Dr. E.O. Wilson, endowed chair at Harvard and one of the most famous biologists alive today, has a recent commentary in the Wall Street Journal (Great Scientists Don’t Need Math) in which he admits to taking algebra and calculus about a decade later in life than most students would today, and merely muddling his way through:
Fortunately, exceptional mathematical fluency is required in only a few disciplines, such as particle physics, astrophysics and information theory. Far more important throughout the rest of science is the ability to form concepts, during which the researcher conjures images and processes by intuition…
Pioneers in science only rarely make discoveries by extracting ideas from pure mathematics…
If your level of mathematical competence is low, plan to raise it, but meanwhile, know that you can do outstanding scientific work with what you have. Think twice, though, about specializing in fields that require a close alternation of experiment and quantitative analysis… For every scientist, there exists a discipline for which his or her level of mathematical competence is enough to achieve excellence.
At the same time, he highlights the benefits of his past collaborations with mathematicians and statisticians to add the necessary rigor to his observationally-derived insights. Not long after reading that I came across another article by Paul Krugman, Nobel Prize-winning theoretical economist and widely-influential blogger, that sounded many similar themes. This one’s a bit longer a read, but there is also an emphasis on creative thinking over sophisticated methods & models:
Most young economists today enter the field from the technical end… It is not, however, where I come from. My first love was history; I studied little math, picking up what I needed as I went along… always try to express your ideas in the simplest possible model… I have used the “minimum necessary model” approach over and over again… In each case the effect has been to allow me to tackle a subject widely viewed as formidably difficult with what appears, at first sight, to be ridiculous simplicity.
So, stepping back, my take-away from all this is that proper scientific computing is an indispensable tool for the modern researcher…
…AND there are tons of great resources out there for anyone wanting to up their game
…BUT sophisticated model code is only useful to the extent that it is well-implemented and well-understood, and there’s still plenty of paradigm-shifting stuff coming from bolts of inspiration rather than program output.