Scientific programming – Is this really necessary and important?

Basic modules in scientific computing used to be included in forest science degree programmes. For a number of years they seem to have disappeared in many countries, I am not sure why. Perhaps this has something to do with Bologna and such key-skill modules didn’t fit in any longer? This seems odd as many research topics in forest science and in natural sciences in general focus on quantitative research, which require some level of programming skills. Quantitative research almost always requires modelling work and simulations. As part of this it is often necessary to define one’s own loss function of more sophisticated regressions that for example involve spatial models of plant interactions.

Computers increasingly dominate everyone’s work life and in specialised academic fields such as forest science it is crucial to be able to produce code, which makes computers to efficiently process specific scientific jobs. And anyway, have you ever felt the fantastic and inspiring experience of creating your very own software and of getting the computer to perform your own, special vision of science? If not, then you have missed something, definitely!

Surely with the advent of R it has generally become much easier for students to implement even challenging quantification methods. The script language of this software is not too hard to learn and existing packages give easy access to more complicated methods. Thus almost anybody can design quite complex analyses and program her or his own model. Particularly in the design-phase of modelling R is great, because you can easily try and compare different options and use the wealth of visualisation and statistical tools for assessing their performance. Still not every degree programme in forest science includes an R course, although the software has become a bit of a standard in plant sciences. Even among the PhD students I teach there are still quite a few without prior knowledge of R.

Then, of course, there are also programming situations, where R hits its limits: R has for example problems with processing loops. Surely loops are included in the language syntax, however, loops can severely slow down computations. This problem can be fixed by externalising code, which takes too long to process in R. Basically you program parts of your R script in a different way and deposit this code in a separate file. Sounds easy and like a good idea. However, the implementation of external code has to happen in a higher programming language. Often C/C++ is  chosen for this purpose. Oh dear, another steep learning curve ahead! And actually, how do I decide which parts of a code to externalise and which to leave behind in my R script?

This is where the problems (or the fun?) begin(s). Many PhD students (not only those with a Forest Science background) struggle when their supervisors ask them to study a problem that involves programming skills. I have seen a great many of them coming for consultation to my office, half panicking …

And then there are also other external software tools, which in combination with R could make your life much easier. If you only knew …

To address this problem, I am offering a course on scientific programming & simulation to research students and young researchers for the first time this year at Umeå. The course is part of the Research School in Applied Forest Statistics and Scientific Computing based at my Chair. Initial application numbers seem to confirm that the lack of scientific programming skills is really a problem across a wide range of subject areas. Let’s see how this course goes, I am really looking forward to it.

Perhaps it is time to re-think the role of scientific programming skills in the academic education that we provide and to introduce this subject earlier? Not only in research is this important but also many employers expect advanced computing skills along with a knowledge of GIS and other software packages. Basic programming skills also allow you to learn new statistical and simulation methods easier. You develop a better understanding of different data types, how best to handle them and you can more easily try new methods yourself. Worth considering I think.

R script for computing growth dominance

A summary statistic that crossed my path a lot this year is growth dominance. Growth dominance is a concentration measure based on the Lorenz curve and was invented by Dan Binkley and his lab at Colorado State University, USA. Together we have worked with this characteristic this year and it has been great fun. We applied it to data from a virgin forest and learned a lot more about growth processes in such forest types.

Growth dominance characterises the contributions of different tree sizes to total population growth (Binkley et al., 2006). West (2014) provides a good statistical description of the growth dominance concept in Forest Science.

The growth dominance curve is related to different development phases of forest stands. Young, middle-aged and old stands have characteristic curves and the statistic is therefore a useful indicator. In the context of continuous cover forestry, it is likely that growth dominance can prove a useful characteristic for quantifying the progress of transforming tree plantations to uneven-aged forests.

In this “Christmas blog”, I would like to share a simple R script for the calculation of this statistic. Reading and using it is also a good way of understanding how the growth dominance statistic works. The blog is intended for readers with a basic understanding of R and unknown bits of syntax can easily be tracked down in the internet. Here are the first three code chunks:

# Load data
dFile <- paste(filePath, "Clog1.txt", sep = "")
xdata <- read.table(dFile, header = TRUE)
# Creating the size and growth rate vector
d <- xdata$d2002
g <- pi * (d/200)^2
ig <- pi * (xdata$d2007/200)^2 - g
ig <- ig / 5
# Merging size and growth rate vector in a data frame
xdata <- data.frame(g, ig)

The first two-three chunks of code are almost self-explanatory: Data is loaded including two columns with a size variable measured at two different points in time. In this case it is stem diameter (at 1.3 m above ground level) measured in centimetres both in 2002 and 2007. The stem diameters are  converted to basal areas (cross-sectional areas) and the mean annual basal-area (absolute) growth rate is calculated. Finally size vector (here initial basal area in 2002) and growth-rate vector are merged in one data frame.

The next step is important and is key to the interpretation of the growth dominance statistic. The whole data frame is ordered according to size from small to large trees, so that the size and the growth rate vectors contain corresponding pairs of values.

xdata <- xdata[order(xdata$g, decreasing = FALSE), ]

Now we calculate cumulative relative tree sizes and cumulative relative growth rates:

cumG <- cumsum(xdata$g) / sum(xdata$g) 
cumInc <- cumsum(xdata$ig) / sum(xdata$ig)

Based on the cumulative vectors we can now estimate a characteristic similar to the Gini coefficient of the Lorenz curve.

area <- 0
for(i in 2 : length(cumG))
 area[i] <- (cumG[i] - cumG[i - 1]) * ((cumInc[i] - cumInc[i - 1]) / 
 + 2 + cumInc[i - 1])
gc <- 1 - sum(area) / 0.5

cumInc can now be plotted over cumG to give the growth dominance curve. To obtain smooth curves, it is, however, advisable to calculate percentiles corresponding to selected points on the size axis:

x.values <- seq(0, 1, by = 0.05)
rx <- ecdf(cumG) (x.values)
sx <- quantile(cumInc, rx)

To obtain smooth curves, sx is plotted over x.values. The figure below gives an impression of the data from a roughly 55-year old Sitka spruce plantation in transformation to Continuous Cover Forestry.


Apparently the observed growth dominance curve (continuous green line) is almost symmetric and very close to the 1: 1 (dashed) line. This is indicative of development phase 1, where each tree’s contribution to total stand growth is proportional to its size. Usually this pattern can be found in young stands before canopy closure but apparently it is also true for middle-aged plantations at the beginning of transformation.

Got intrigued? I find growth dominance quite fascinating and for me it is definitely my personal growth characteristic of the year 2015. Perhaps you would like to try this characteristic with your own data. (I can send you the full version of my R script and the data on request.) Have fun and … Merry Christmas and a Happy New Year!

Forest Biometrics in the Southern US

IMG_1167Being on an extended research visit to one of the world’s Mekkas of Forest Biometrics, Virginia Tech at Blacksburg, I cannot help but notice quite a few differences to European Forest Biometrics. I am currently based on this vast and beautiful campus, where in autumn the tree leaves almost seem to be on fire and the warm, orange-red colour of the leaves forms a nice contrast to the limestone facades of the campus buildings. The sunny, warm weather at this time of the year is just awesome. Through the window of the coffee shop, where I am writing this blog, I can see the students passing by. Many wear T shirts and sweaters with Virginia Tech logos on them. Maroon and orange everywhere. The identification with this University is amazing, also among the general public in town across all age classes. This is my second visit to Blacksburg, this time representing the Swedish University of Agricultural Sciences.

Obviously the US are a big country and there are quite some differences in how Forest Biometrics is run up and down the country. However, a lot of the research over here appears to be tailored to the needs of forest industry. Interestingly this does not necessarily mean that research remains at a practical level or that research methods are simplified, on the contrary!

Private funding plays an important role in US research compared to the largely state-funded research in Europe (possibly with the exception of the UK). As a consequence industry and research representatives join their efforts in so-called research cooperatives. The industry partners pay a kind of membership fee that university researchers are free to use for the best intents and purposes and the coop members benefit from any new releases in terms of publications, models and software. This is a set-up I have always admired since I heard about it for the first time. The industry-driven funding stream also contributes to the fact that growth and yield science still plays a large role in the US, very much in contrast to Europe. Flexible, computer-implemented growth and yield models addressing current issues such as climate change, genetic tree breeding programmes but also traditional topics such as spacings and fertilisation are high in demand. Often these growth and yield models are required to go far beyond the level of growth simulators to become decision-support systems that allow the precise management planning of huge forestry companies. For collecting data, forest biometricians and even silviculturists work more with statistically designed experiments over here and less with observational plots as in Europe. In the south of the country, most research work is in the context of plantation forestry rather than in natural forests. And forest management in these plantations is highly optimised, running by the clock.

IMG_1166Forest Biometrics in many parts of the US largely focuses on synthesising growth patterns by using the latest statistical techniques for empirical, data-driven modelling. Research into fundamental growth processes and ecological questions is therefore often just a small byproduct. Regular conferences in this subject area (often traditionally associated with the term “mensuration”) are also visited by private consultants and company representatives. as a result there is an interesting exchange of opinions and thoughts in discussions.

Forest biometricians are well respected in America and are even explicitly mentioned in books of forest history as I have recently discovered. Blacksburg, for example, is the location of the school (of thought) of Prof. Harold Burkhart, who in turn is a student of Prof. Jerry Clutter (University of Georgia). Like in Europe, there are interesting academic lines of succession or academic family trees. A recently published book on the history of forestry in the southern US describes the development of forest biometrics in this part of the world including all the aforementioned relationships on several pages. This is indeed very remarkable and unprecedented in similar European publications.

IMG_1172It has been great fun here so far  thanks to my hosts and colleagues at Virginia Tech including Harold Burkhart, Phil Radtke and Randy Wynne. I started with participating in a conference of the Southern Mensurationists (where I gave talks on my recent growth dominance research at Bialowieza Forest and on human tree selection behaviour) and then stayed on to carry out joined research with Harold Burkhart. Together we are currently working on modelling size-dependent relative growth rates. This is very inspiring and takes the work my colleague Anders Muszta and I previously did in this field to a new level.

Do humans behave in woodlands?

We are using woodlands for recreation, we pick berries and mushrooms and we select picnic sites at spots in the forest we like. Certain woodlands we tend to avoid, because the tree species and tree densities make them look dark and hostile or we hate prickly brambles. Forest structures that do not allow you to clearly see through the stems for at least 50-70 m can have an intimidating effect on some people.  Surely we all have experienced the heart-lifting effect of a great day outdoors in the woods.

So what is this all about? How does this interaction tree-human, woodland-human really work? Is it the spatial structure of a woodland that creates an effect in our mind similar to the effect of a piece of artwork or of a building? Or is it the microclimate, a combination of light, humidity, noises and smells?

For most of these questions it is probably fair to admit that we simply don’t know. “Humans as part of the forest ecosystem” is not a topic that has been widely studied so far. Socio-economic relationships – yes, but not so much human behaviour and human health in relation to trees and woodlands.

In this context, I find two topics particularly interesting, the relationship between (spatial) forest structure and human health and human behaviour in terms of how trees are selected for various purposes.

Humans select trees for various purposes – e.g. as habitat trees (i.e. for their ecological value), for their timber value, for picnics, for playing, as Christmas trees and for burials among other reasons. In all these cases humans make decisions that depend on a wide range of different factors. One quite important factor, of course, constitute the trees themselves that arouse attention for their size, the straightness of their stems and for a certain aesthetic feeling they stimulate. Human selection behaviour may vary a lot between different people and also within the same persons at different times and in different situations.

For a long time it was assumed that forestry students would select trees according to their university textbooks or to the instructions of their professors throughout their professional lifetime. Variation between and within individuals was therefore ignored. Only recently it started to dawn on some people that it is worth considering this variation a bit more carefully and systematically. What are the reasons for this variation, does it matter, is it possibly even an advantage?

This research into human selection behaviour is part of a larger field of research where for example the voting behaviour of humans or the selection of products in supermarkets are analysed. How do humans distinguish between trees they like and those they don’t? How do they perceive the trees, which trees catch their eyes? This is simultaneous research into humans and trees. The data used in this research are often collected by non-scientists in forestry training programmes and then passed on to researchers for detailed analysis. Therefore this field is also an example of citizen science.

I found inspiration for this research direction in the work of my mentor Klaus von Gadow and the colleagues who worked with me at his Chair in Göttingen.  At that time many shook their heads and could not understand how such a topic could ever matter. Now, 20 years later, the research question has eventually matured and attracted a lot of interest in a number of European countries. Xin Zhao, a PhD student and Carlos Pallares Ramos, an MSc student, are currently studying the statistical side of this topic at my Chair and we are planning new experiments to better understand human tree selection behaviour. Perhaps this is something you would like to join in?



Relative growth rates

It happens from time to time, doesn’t it,  that you are digging into what you believe is a specific set of limited methods and suddenly you uncover a whole universe that stretches over a vast number of research fields.  Recently this happened to me again when preparing a review on the concept of relative growth that is related to the important concept of allometry.

The first time I came across this concept was when attending lectures in forest growth and yield given by Prof. Günter Wenk at Dresden University as a visiting student back in 1992. Later in 1995 I re-programmed his forest stand model as a young research assistant and from 2000 to 2008 we developed a close friendship and Prof. Wenk mentored some of my research activities at Bangor University in North Wales.

Relative growth rate (RGR) is simply absolute growth rate divided by the corresponding size variable. Assuming that function Latex formula represents the state of a plant characteristic at time Latex formula, for example the biomass of a plant, instantaneous relative growth rate can be expressed as

Latex formula.

Since relative growth rate is equivalent to the derivative of Latex formula with respect to time Latex formula, studying the relative growth of Latex formula is equivalent to studying the absolute growth of Latex formula .

In empirical studies, we commonly deal with discrete time, e.g. Latex formula, which are our scheduled survey days or years. The period between two discrete instances of time can be denoted by Latex formula with Latex formula. For simplification we can now set Latex formula and Latex formula.

For empirical data observed at discrete times we can now calculate the mean relative growth rate as

Latex formula.

In forestry, Latex formula is also known as mean periodic relative increment, though the concept has not often been used in this field. Relative growth rates are always useful, when the initial size of organisms varies. Then relative growth rates allow a better comparison. This reminds us of the analysis of covariance with initial size as covariate and indeed the two ideas are related. Still, relative growth rates are also size dependent and this can sometimes cause problems in plant growth research.

It is quite amazing to see how many different fields have independently used the concept of relative growth rate, developed their own separate terminology and modelling approaches. For example, a characteristic derived from RGR is the efficiency index, also referred to as growth coefficient and growth multiplier, Latex formula:

Latex formula


The growth coefficient or growth multiplier plays a crucial role in projecting future growth based on relative growth rates and has been “re-invented” several times in various separate fields of application.

The vast amount of publications from different subject areas on this topic calls for a standardisation of notation and terminology. They also in way suggest that there are many more similar research topics that would benefit from a more systematic approach. The use of relative growth rates is widespread in general plant growth science but less common in forest science. Interestingly Brand et al. (1987) mention in their paper in Annals of Botany that growth analysis (involving relative growth rates) fills a gap in crop yield research between strictly mechanistic studies of plant physiology and strictly empirical studies of growth and yield.

Creativity – where do we get it from?

Haven’t we all experienced this? – We can sit for hours in our offices trying to come up with a brilliant new idea, to solve a problem, to shape an important text or to find a nagging error in our computer code. And then out of nowhere, once we have set off to go home or for a coffee – there is the solution, straight and neat. Apparently when we let things go and don’t pursue them, the solution comes to us. Intriguingly there is an old story in the Welsh legends of the Mabinogi describing how a king tried to pursue a beautiful woman on horseback, but the faster he rode the larger the distance between them became. Finally the king figured out that he had to ride more slowly, not faster. This did the trick and he eventually caught up with the lady of his heart.

Often even a small change of perspective helps when you stand up,  go for a short walk or talk to a colleague in the corridor. Recently I read that someone had reviewed where researchers said they regularly have their best ideas. Not surprisingly the various locations mentioned rarely include the office.

Obviously we are sharing this experience with many others who have creative professions outside research and higher education.

How can we make better use of such flashes of creativity? Is there something wrong about our offices? Is it the noise, the disturbance or is it simply the change of locations and situations that fuel our creativity? Probably not easy to say and quite dependent on everyone’s personality. In any event, the culture of occasionally working from home, at other universities and abroad is certainly something that stimulates research output.  Quite frequently I even experience fits of creativity in airport cafes and while travelling in planes and trains packed with people. Also blocking other activities such as teaching and administration opens up windows of quality research time that can be used to think things thoroughly through – a rare commodity in this day and age.

Increasingly I am enjoying the chats I am sharing with my staff and other colleagues at lunch, coffee break or on the corridor. After each of them many research ideas appear in a new light, thoughts have become deeper and above all – a renewed flow of inspiration and love for my research field has filled my heart and on this wave of energy I get carried away to new shores.

An intriguing question in this context is, if we can actually “teach” creativity to our students or can we just inspire and promote it?

It is probably an important part of our research quest to find for ourselves what works best for us. Still, there may be some common “laws” and “principles” that work for many and that we can adopt to improve our research culture. I am curious to discover more of them as I am experimenting with myself.

Where to start?

As this is my first posting on this webblog, I am starting it with a “smooth” thought. Forest Sciences as a scientific field have seen many changes since I have been a student in the 1990s.  It used to be a world of its own, pretty much uncontested and a microcosm of general science including such fields as for example history, law and politics on top of natural sciences. Forest Biometrics was one of the scientific fields, much respected both by students and academics, and usually honoured by a university chair.  Professorships in Forest Biometrics were usually responsible for the mathematical and statistical education of students, for consultation and for quantitative research.

Since then Forest Sciences have been absorbed by natural sciences, environmental science or natural resource management. This worldwide development was often coupled with a re-naming of forestry faculties.  This process of change has most likely not come to an end and is tied into university politics favouring basic rather than applied sciences.

In the current constellation, (Forest) Biometrics (also referred to as mathematical or computational forestry, mathematical natural-resource science, see Cieszewski and Strub, 2009 in MCFNS) is seen as a field of basic science by some and as an applied study area by others. In quite a few university chairs, Forest Biometrics is viewed as a synonym of statistics and all teaching and research is orientated towards it. Others have a broader approach and include mathematical topics, plant growth analysis and modelling along with other subjects such as sampling and forest inventory.  Causton and Venus (1981) for example wrote in their book “The Biometry of Plant Growth”: “We, however, take the view that biometry is a subject in its own right. The aspects of biology requiring quantitative study should form an integral part of biometry, and not merely dismissed once the problem has been put into quantitative form and attention turned to mathematical and statistical theory and methods.”  In his book “Mathematics of Life” Stewart (2012) is of a similar opinion when he writes “Mathematics is being used not just to help biologists manage their data, but on a deeper level to provide significant insights into the science itself, to help explain how life works. Biomathematics is not merely a new application for existing mathematical methods. You can’t just pull an established mathematical technique off the shelf and put it to use: it has to be tailored to fit the question. Biology requires – indeed demands – entirely new mathematical concepts and techniques, and it raises new and fascinating problems for mathematical research.”  I find this view quite agreeable, since Forest Biometrics in my opinion should be about interdisciplinary work bringing biology/ecology and mathematics/statistics together. Forest biometricians are meant to act as mediators between mathematical  statistics and forest science able to speak and understand both “languages”. On the websites of my Chair ( you can see a few examples. Naturally, it is thrilling and re-assuring to see that there is a great diversity of research visions for Forest Biometrics. I am convinced that we need this pluralism of ideas to make real progress in quantitative research.  This topic is also considered in Joel E. Cohen’s essay “Mathematics is biology’s next microscope, only better; biology is mathematic’s next physics, only better from 2004 (PLoS Biol. 2, e439)”, which apparently has become a sort of proverb in biomathematics.

Cieszewski and Strub (2009) among others also pointed out that the advances in computer technology form another important column of research in Forest Biometrics in the same way as this technological development as resulted in other specialised fields such as Computational Physics and Computational Genetics.

As someone who has recently taken up a University Chair in Forest Biometrics, I am wondering how others – whether they are in a similar situation or not – feel about this?