Scientific programming – Is this really necessary and important?

Basic modules in scientific computing used to be included in forest science degree programmes. For a number of years they seem to have disappeared in many countries, I am not sure why. Perhaps this has something to do with Bologna and such key-skill modules didn’t fit in any longer? This seems odd as many research topics in forest science and in natural sciences in general focus on quantitative research, which require some level of programming skills. Quantitative research almost always requires modelling work and simulations. As part of this it is often necessary to define one’s own loss function of more sophisticated regressions that for example involve spatial models of plant interactions.

Computers increasingly dominate everyone’s work life and in specialised academic fields such as forest science it is crucial to be able to produce code, which makes computers to efficiently process specific scientific jobs. And anyway, have you ever felt the fantastic and inspiring experience of creating your very own software and of getting the computer to perform your own, special vision of science? If not, then you have missed something, definitely!

Surely with the advent of R it has generally become much easier for students to implement even challenging quantification methods. The script language of this software is not too hard to learn and existing packages give easy access to more complicated methods. Thus almost anybody can design quite complex analyses and program her or his own model. Particularly in the design-phase of modelling R is great, because you can easily try and compare different options and use the wealth of visualisation and statistical tools for assessing their performance. Still not every degree programme in forest science includes an R course, although the software has become a bit of a standard in plant sciences. Even among the PhD students I teach there are still quite a few without prior knowledge of R.

Then, of course, there are also programming situations, where R hits its limits: R has for example problems with processing loops. Surely loops are included in the language syntax, however, loops can severely slow down computations. This problem can be fixed by externalising code, which takes too long to process in R. Basically you program parts of your R script in a different way and deposit this code in a separate file. Sounds easy and like a good idea. However, the implementation of external code has to happen in a higher programming language. Often C/C++ is  chosen for this purpose. Oh dear, another steep learning curve ahead! And actually, how do I decide which parts of a code to externalise and which to leave behind in my R script?

This is where the problems (or the fun?) begin(s). Many PhD students (not only those with a Forest Science background) struggle when their supervisors ask them to study a problem that involves programming skills. I have seen a great many of them coming for consultation to my office, half panicking …

And then there are also other external software tools, which in combination with R could make your life much easier. If you only knew …

To address this problem, I am offering a course on scientific programming & simulation to research students and young researchers for the first time this year at Umeå. The course is part of the Research School in Applied Forest Statistics and Scientific Computing based at my Chair. Initial application numbers seem to confirm that the lack of scientific programming skills is really a problem across a wide range of subject areas. Let’s see how this course goes, I am really looking forward to it.

Perhaps it is time to re-think the role of scientific programming skills in the academic education that we provide and to introduce this subject earlier? Not only in research is this important but also many employers expect advanced computing skills along with a knowledge of GIS and other software packages. Basic programming skills also allow you to learn new statistical and simulation methods easier. You develop a better understanding of different data types, how best to handle them and you can more easily try new methods yourself. Worth considering I think.

By Arne Pommerening

My background is in forest science with a PhD in forest biometrics (from Göttingen University (Germany) and a Habilitation in forest biometrics (from BOKU University Vienna (Austria). For eleven years I have been working in the fields of quantitative forest management and quantitative ecology at Bangor University (North Wales, UK) before working for a short while in Switzerland. Since 2014 I work as a Professor in Mathematical Statistics Applied to Forest Science at the Swedish University of Agricultural Sciences (SLU in Umeå and my research areas include woodland structure analysis and modelling, spatio-temporal dynamics of plant point patterns, individual-based modelling with a focus on plant interactions, plant growth analysis, methods of quantifying and monitoring biodiversity and the analysis of human behaviour of selecting trees. Much of my research is computer-based using simulation experiments and my research is strongly interdisciplinary and international.

2 comments

  1. Hi Professor. Firstly, thank you for writing this blog. I completely agree that scientific computing is not given sufficient attention at undergraduate level, but part of the problem lies in the fact that few academics in the environmental/biological sciences have sufficient expertise in this area to educate the students. Many academics use Microsoft Excel or proprietary statistical packages such as SPSS and Minitab which primarily use a GUI interface. These are not ideally suited towards the analysis of large, complex datasets.

    I primarily use Python for my scientific work (the remote sensing of forests). However, I only became comfortable with Python at MSc and PhD level. I was not given any formal training at undergraduate level, and I often feel that I would have been better off studying computer science before moving into the environmental sciences at postgraduate level.

    Perhaps we need to retrain the teachers as well as the students?

    1. S’mae Osian? (I used to work in North Wales for a while.) Thanks for your comment. We have delivered the first course in scientific programming and simulation last year in October and the response was overwhelming. There were 25 applicants from a range of European countries and the problem of being required to produce computer code but not knowing how to do this seems to be very widespread. The feedback we received was very encouraging. Our course focuses on combining R and C++ but also includes aspects of automatised reporting and versioning. The next course will be held in June 2017 at Umeå. Yes, I think we need to raise awareness.

Leave a comment

Your email address will not be published. Required fields are marked *