Spatial species mingling

$Latex formula$

j
denotes the jth nearest neighbour of plant i. The expression 1(A) = 1, if condition A is true, otherwise 1(A) = 0. In principle, k can take any number and can also vary between plants of the same population or research plot.  However, for convenience k has often been set to a fixed value of 3 or 4, but this convenience lacks scientific justification.

What does it mean?

The mingling index $Latex formula$ is defined as the mean fraction of plants among the k nearest neighbours of a given plant i with heterospecific neighbours.

Where does it come from?

An early strategy and benchmark for quantifying spatial species mingling is Pielou’s segregation index (Pielou, 1977) comparing pairs of points formed by the locations of an arbitrary plant and its nearest neighbour. For all plants in a certain observation window (e.g. a research plot) these pairs are determined. Pielou (1977) defined the segregation index as the ratio of the observed probability that an arbitrary plant and its first nearest neighbour are conspecific and the same probability with independent species marks (i.e. a completely random dispersal of species). This segregation index was originally developed for bivariate species patterns and describes the neighbourhood structure of a subject plant of one species in terms of others. An individual has a high degree of mingling if its neighbourhood is highly diverse, i.e. if many heterospecific neighbours are located in its vicinity. The opposite of a largely conspecific neighbourhood is referred to as segregation.

Gadow (1993) and Aguirre et al. (2003) extended this concept to general multivariate species patterns involving k neighbours.

Why is it important?

Species diversity is an important and most commonly considered aspect of biodiversity worldwide (Kimmins, 2004). As a part of species diversity, spatial mingling of plants concerns the question of how plants of the same and different species mix in space. This refined concept extends beyond the idea of species richness to take the individuals’ perception of local diversity into account. Spatial mingling specifically relates to the microstructure of ecosystems, particularly to neighbourhood relationships, which play a vital role in plant ecology.

How can it be used?

With k number of neighbours the mingling index $Latex formula$ can take k + 1 different discrete values depending on the neighbourhood situation. This is illustrated for k = 4 in the following illustration, where the colours of the circular plant objects represent different species:

These discrete mingling values can be used to construct an empirical mingling distribution, which collects the relative numbers of plants with one of the k + 1 $Latex formula$ values and presents them as bars.

It is also possible to estimate arithmetic mean mingling,  $Latex formula$, at population level using an appropriate edge correction method such as NN1 (Pommerening and Stoyan, 2006).

Empirical mingling distributions and mean arithmetic mingling can also be calculated separately for individual species populations of the same plant population.

According to Lewandowski and Pommerening (1997) expected mingling (implying independent species marks), EM, is independent of the number of nearest neighbours, k, and can be calculated as

$Latex formula$

with s, the number of species, N, the total number of plants in the observation window and $Latex formula$, the number of plants of species i.

In analogy to Pielou’s segregation index $Latex formula$ and Ecan be arranged in an index M expressing the relationship between observed species mingling and completely random species mingling according to

$Latex formula$.

I refer to this index as species segregation index. Consequently, M = 0 if the species marks are independent or random. If the nearest neighbours and plant i always share the same species, M = 1 (attraction of similar species, segregation in Pielou’s terminology). If all neighbours always have a species different from that of plant i, M = -1 (attraction of different species, aggregation in Pielou’s terminology).

R code

To calculate M two auxiliary functions are necessary, the first one calculating EM and the second one calculating the Euclidean distance.

```calcExpectedMinglingAllSpecies <- function(species) {
ta <- table(species)
s <- length(ta)
ka <- length(species)
swm <- 0
for (i in 1 : s)
swm <- swm + ta[[i]] * (ka - ta[[i]]) / (ka * (ka - 1))
return(swm)
}

euclid <- function(x1, y1, x2, y2) {
dx <- abs(x2 - x1)
dy <- abs(y2 - y1)
dz <- dx^2 + dy^2
return(dz^0.5)
}```

The main calculations are done in the following loop and additional lines of code. Here the NN1 edge correction is used for estimating arithmetic mean mingling.

```myData\$sming <- NA
myData\$dist <- NA
myData\$rf <- NA

k <- 4
for (i in 1 : length(myData\$x)) {
sums <- 0
dn <- findNeighboursOfOnePoint(myData\$x, myData\$y, k, i)
for (j in 1 : k) {
index <- dn[j] + 1 # Correcting C++ indices
if(myData\$species[i] != myData\$species[index])
sums <- sums + 1
if(j == k) {
dist <- euclid(myData\$x[i], myData\$y[i], myData\$x[index],
myData\$y[index])
myData\$dist[i] <- dist
myData\$rf[i] <- calcRepFactor(xmax, ymax, myData\$x[i], myData\$y[i],
dist)
}
}
myData\$sming[i] <- sums / k
}

sums <- sum(myData\$sming * myData\$rf)
sumRF <- sum(myData\$rf)
(mm <- sums / sumRF)
(em <- calcExpectedMinglingAllSpecies(myData\$species))
(m <- 1 - mm / em)```

“xmax” and “ymax” define the boundary of the observation window whose bottom left corner coincides with the origin of the system of coordinates. The functions “findNeighboursOfOnePoint” and “calcRepFactor” are implemented in external C++ files that can be made available on request and are loaded in the following way:

```library(Rcpp)
sourceCpp(paste(filePath, "findNeighboursOfOnePoint.cpp", sep = ""))
sourceCpp(paste(filePath, "NN1.cpp", sep = ""))

```

Literature

Aguirre, O., Hui, G. Y., Gadow, K. and Jiménez, J., 2003. An analysis of spatial forest structure using neighbourhood-based variables. Forest Ecology and Management 183: 137-145.

Gadow, K. v., 1993. Zur Bestandesbeschreibung in der Forsteinrichtung. [New variables for describing stands of trees.] Forst und Holz 48: 602-606.

Kimmins, J. P., 2004. Forest ecology. A Foundation for sustainable forest management and environmental ethics in forestry. 3rd edition. Pearson Education, Inc., Upper Saddle River.

Lewandowski, A. and Pommerening, A., 1997. Zur Beschreibung der Waldstruktur – Erwartete und beobachtete Arten-Durchmischung. [On the description of forest structure – Expected and observed mingling of species.] Forstwissenschaftliches Centralblatt 116: 129-139.

Pielou, E. C., 1977. Mathematical ecology. John Wiley & Sons, New York.

Pommerening, A. and Stoyan, D., 2006. Edge-correction needs in estimating indices of spatial forest structure. Canadian Journal of Forest Research 36: 1723-1739.

Basal area in larger trees and the growth compensation point

$Latex formula$ with $Latex formula$

What does it mean?

The formula quantifies the basal area of all trees of a forest that are larger or equal in size to that of a given tree i at time t. It is the complement value of the basal area percentile $Latex formula$ of tree i denoting relative dominance, where G(t) is basal area per hectare of a given forest stand at time t. Basal area of an individual tree is the cross-sectional area of its stem usually measured at 1.3 m above ground level. For convenience basal area is often calculated from stem diameter using the area formula of a circle. Basal area per hectare is a density measure taking both number of trees and their sizes in a certain area into account.

BAL is related to available light, since with increasing basal area of larger trees there is less light available for smaller trees. In a sense BAL is a surrogate for light measurements with the benefit that stem diameters and basal area are easier to measure.

Where does it come from?

Basal area in larger trees is also referred to as overtopping basal area suggesting the nature of a kind of competition index. In fact this was precisely the context in which Schütz (1975), Wykoff et al. (1982) and Wykoff (1990) suggested this measure.

Why is it important?

Basal area in larger trees is a simple and effective measure that simultaneously considers relative dominance of a tree and density. It is very flexible and can easily be modified to a spatially explicit measure of competition by calculating it specifically for an influence zone around a tree. BAL is also a very suitable competition index for trees in small-sized sample plots. Naturally it is also possible to distinguish between different species and inter- and intraspecific versions of BAL. It is also easy to modify basal area in larger trees in such a way that it better explains growth rate, e.g. BAL(t)/G(t) (Wykoff, 1990). Schröder and Gadow (1999) suggested  a modification which they referred to as BALMOD(t):

$Latex formula$ with $Latex formula$

RS(t) is referred to as relative spacing, another density measure linking average growing space with top or dominant height, $Latex formula$, a characteristic of growth and site quality. Top or dominant height is often calculated as the mean height of the hundred largest trees per hectare. BALMOD(t) is useful in situations where a competition index is needed for fitting a function estimating growth rates from a multiple number of different time series scattered across a certain larger geographic entity, for example a country or even a continent. In such a context, Schröder and Gadow (1999) found that  BALMOD(t) is superior to BAL(t). Other authors have applied the BAL concept to crown variables and Burkhart and Tomé (2012, p. 202ff.) give a good overview.

How can it be used?

Another important purpose of the BAL concept is its use for growth analysis. Scatter plots of absolute stem-diameter growth rate, AGR(t),  of the trees of a given stand for a given time, t, over BAL(t) can be depicted. Following Schütz (1975) this relationship can the be described for example by a simple power function:

$Latex formula$

a, b and c are regression coefficients. However, it is best to set c to a fixed value between 1.5 and 4. The larger the value of c the stronger the saturation effect towards the lower range of BAL and the larger the value of the growth compensation point. For comparisons within the same time series it is probably best to set c to an optimised constant value. A value of c = 2 is a good starting point.

The regression coefficient a is then the intercept describing the stem-diameter growth rate of large, dominant trees (almost open-grown trees) whilst the function has a root at $Latex formula$. This is the growth compensation point in analogy to the light compensation point. The growth compensation point is an expression of carrying capacity and site quality. The graph shown above is an example of this relationship for the Sitka spruce time series 2068 in the Brecon Beacons (Wales, UK, between 1961 and 1966).

Such graphs can offer valuable insights of the growth patterns of forest ecosystems in relation to density and size. Comparisons between different forest stands and tree species in the same stand (Schütz and Pommerening, 2013) are helpful in understanding the relationships between environment, size and growth. The graph below, for example, has been produced using the data from the birch-plantation time series Bagshot (England, UK). The data points represent the total time series. For fitting the saturation curves the growth data of only two subsequent survey years were used. Model parameter c was set to 2. The numbers under the dashed horizontal line give the base calendar years of the corresponding survey period.

Apparently, growth compensation point and maximum growth rate (parameter a) vary throughout the years with hardly any particular pattern. Partly this can be attributed to forest management activities and partly to changing environmental conditions.

In natural, unmanaged forests, such as an Interior-Douglas fir stand in the Alex Fraser Experimental Forest (plot 3, see graph below) in British Columbia (Canada), the saturation curves and the growth compensation points are typically much closer together and show a stronger saturation effect.

Another way to illustrate and analyse the BALAGR relationship is to use the growth dominance characteristic that I introduced in an earlier blog in 2015.

As mentioned above, BAL(t) of an individual tree can also be used as a competition index for estimating growth rate. In that case basal area of larger trees is one of a number of explanatory variables that is either used directly in a growth function or as part of a modifier function (usually an exponential function). In the potential-modifier approach (see Weiskittel et al., 2011, p. 89ff.), first potential growth rate is estimated (e.g. growth of open-grown trees or the growth of a certain, upper percentile) and then reduced by a number of modifiers, one of which usually is competition.

R code

For calculating basal area in larger trees it is best to define a function in R that you can use at multiple instances in your R script. Such a function could look like the code given in the box below:

```bal <- function(ba, area) {
sumba <- sum(ba)
basmaller <- 0
pix <- 0
bal <- 0
for (i in 1 : length(ba)) {
bax <- ba[i]
basmaller <- sum(ba[ba <= bax])
pix <- basmaller / sumba
bal[i] <- sumba * (1 - pix) / area
}
return(bal)
}```

Here “ba” is a vector of individual-tree basal area values and “area” is the area of the forest stand in hectare. You can then call this function in your script in the following way:

```dataOneYear\$ba <- pi * (dataOneYear\$DBH / 200)^2
xarea <- plotSize / 10000
dataOneYear\$bal <- bal(dataOneYear\$ba, xarea)```

“dataOneYear” is an arbitrary data frame, “dataOneYear\$DBH” is a vector of stem diameters measured in cm. “plotSize” is the area of the forest stand in square metres.

Literature

Burkhart, H. and Tomé, M., 2012. Modeling forest trees and stands. Springer, Dordrecht.

Schütz, J. P., 1975. Dynamique et conditions d’équilibre de peuplements jardinés sur les stations de la hêtraie à sapin. Schweizerische Zeitschrift für Forstwesen 126: 637-671.

Schütz, J.P. and Pommerening, A., 2013. Can Douglas fir (Pseudotsuga menziesii (Mirb.) Franco) sustainbly grow in complex forest structures? Forest Ecology and Management 303: 175-183.

Schröder, J. and Gadow, K. v., 1999. Testing a new competition index for maritime pine in northwestern Spain. Canadian Journal of Forest Research 29: 280-283.

Weiskittel, A. R., Hann, D. W., Kershaw, J. A. and Vanclay, J. K., 2011. Forest growth and yield modeling. John Wiley & Sons, Chichester.

Wykoff, W. R., Crookston, N. L., Stage, A. R., 1982. User’s guide to the stand prognosis model. USDA Forest Service, Intermountain Forest an Range Experiment Station, Ogden, General Technical Report INT-133.

Wykoff, W. R., 1990. A basal area increment model for individual conifers in the northern Rocky Mountains. Forest Science 26: 1077-1104.

The Chapman-Richards growth function

$Latex formula$

What does it mean?

Growth functions in general describe the change in size of an individual or population with time (Burkhart and Tomé, 2012). Assume that $Latex formula$ is a tree growth variable, e.g. tree total height or tree volume, and  $Latex formula$ is the maximum value this growth variable can take (in absolute terms for a given species in general or for a given species on a given site) then the term $Latex formula$ is a modifier reducing the maximum growth variable to its current state at time $Latex formula$.  $Latex formula$ is an empirical growth parameter scaling the absolute growth rate. The empirical parameter $Latex formula$ is related to catabolism (destructive metabolism), which is said to be proportional to an organism’s mass. Therefore it is often restricted to a value of three for theoretical, biological reasons.

The Chapman-Richards growth function can be applied to both, individual organisms as was well as to the growth of whole populations and describes cumulative growth over time. As such the function has an inflection point and an upper asymptote at  $Latex formula$ reflecting a so-called sigmoid growth curve typical of growth processes, which are influenced by biotic and abiotic factors.

Where does it come from?

The Chapman-Richards growth function is based on the seminal work by Bertalanffy for animal growth and was published by Richard in 1959 and Pienaar and Turnbull (1973) introduced it to forestry applications. The model has a reputation of being very flexible on the slight expense of biological realism. It is valued for its accuracy, although there can be problems in the process of parameter estimation when  $Latex formula$ is allowed to vary (Clutter et al., 1983; Pienaar and Turnbull, 1973; Zeide, 1933).

Why is it important?

The Chapman-Richards growth function has been a popular model for describing the growth of various tree and forest stand growth variables, e.g. tree and stand height, diameter at breast height, basal area and volume. As such it has been and is still widely used in many empirical forest growth simulators, particularly where the accuracy of model prediction is crucial (Zeide, 1933). The Chapman-Richards growth function has been used extensively to model site index development, i.e. the mean height development of the most dominant trees of a forest as a population characteristic for describing site quality (Burkhart and Tomé, 2012), leading to a so-called polymorphic height growth model.

How can it be used?

Assume you have access to sample data providing several combinations of a growth variable, e.g. tree height, and age. Using nonlinear regression methods you can estimate the parameters of the Chapman-Richards growth function. After estimating the growth parameters the model can be used for interpolation and for predicting past and future growth.

Absolute growth rate (AGR) is essentially the first derivative of the Chapman-Richards growth function:

$Latex formula$

The AGR function can be employed to model current annual increment or instantaneous growth. For relative growth rate (RGR) we lose one model parameter (Pommerening and Muszta, 2016) and the function terms simplifies to

$Latex formula$.

Also the algebraic difference form (ADA) of the Chapman-Richards growth function is often applied (Burkhart and Tomé, 2012),

$Latex formula$,

where the fracture constitutes a growth multiplier (Pommerening and Muszta, 2016) and the aymptote $Latex formula$ disappears. The algebraic difference form allows estimating the current value of a growth variable from a value in the past (anamorphic model).

R code

In R it is quite straightforward to estimate the parameters of the Chapman-Richards growth function through nonlinear regression. First we need some sample data and I have taken pairs of top height (the mean height of dominant trees, a population characteristic) and the corresponding age (assuming an even-aged forest) from a British yield table. In more interesting applications, similar data would naturally stem from field observations.

```# Using some data from the British yield table for Scots pine, YC 14.
topHeight <- c(8.9, 11.6, 13.9, 15.9, 17.8, 19.6, 21.3, 22.8, 24.2,
+ 25.4, 26.5, 27.4, 28.3, 29.0, 29.7, 30.3, 30.7, 31.1)
age <- c(17, 22, 27, 32, 37, 42, 47, 52, 57, 62, 67, 72, 77, 82, 87,
+ 92, 97, 102)```

Then we load a package for robust regression (as we know that the Chapman-Richards model can sometimes “play up”).

`library(robustbase)`

Finally we enter the actual regression code specifying the model, the data and the start parameters. The summary command provides the regression outputs including the estimated model parameters. (Parameter  $Latex formula$ corresponds to parameter  $Latex formula$ in the above equation.)

```nlsout <- nlrob(topHeight ~ A * (1 - exp(-k * age))^p, data =
data.frame(age, topHeight), start = list(A = 83, k = 0.03, p = 4),
trace = TRUE)
summary(nlsout)
summary(nlsout)\$coefficients[1 : 3]```

It is always a good idea to check the value of parameter  $Latex formula$ against the maximum value of the growth variable: Both values shouldn’t be very far off, because $Latex formula$ is the upper asymptote of the growth function.

Literature

Burkhart, H. and Tomé, M., 2012. Modeling forest trees and stands. Springer, Dordrecht.

Clutter, J. L, Fortson, J. C., Pienaar, L. V., Brister, G. H. and Bailey, R. L., 1983. Timber management. A quantitative approach. John Wiley & Sons, New York.

Pienaar, L. V. and Turnbull, K. J., 1973. The Chapman-Richards generalization of von Bertalanffy’s growth model for basal area growth and yield in even-aged stands. Forest Science 19: 2-22.

Pommerening, A. and Muszta, A., 2016. Relative plant growth revisited: Towards a mathematical standardisation of separate approaches. Ecological Modelling 320: 383-392.

Zeide, B., 1993. Analysis of growth equations. Forest Science 39: 594-616.

Statistical consultation work at universities

As part of my responsibilities I manage a statistical consultation unit in my Chair at Umeå and also take care of a university-wide Centre for consultation. SLU has in fact a good centrally organised and paid consultation service with dedicated staff. Every major campus at Umeå, Ultuna and Alnarp has a unit supporting students and staff with statistical advice. Apart from that the statistical staff is often also involved in statistical teaching. Recently statistical consultation has also stretched to consultation in R programming, as skills in scientific programming and computing are actually not as widespread in natural sciences as one would think. All statistical consultants are organised in a centre referred to as Statistics@SLU. This centre was originally founded in 2002 as Biostokastikum by Bo Ranneby and Dietrich von Rosen with an initial remit for research but then changed its focus towards consultation.

Statistics@SLU has been a new experience for me during the last three years, as statistical consultation mostly meant additional “homework” at the other places I worked before and was therefore often avoided by those charged with it. Indeed, statistical consultation on top of research, teaching and administration can be a burden and a major distraction.

Therefore it is a good idea to organise this important mission properly by setting aside funds and dedicated staff. Still a centre of statistics like Statistics@SLU is not without its challenges: Not everybody values statistical consultation and takes this kind of service for granted. Internal funding is always scarce at any university and retiring statisticians are not always replaced. And the statisticians engaging in consultation are often not well recognised for their work and face problems when applying for promotion, because their workload often does not allow them to accumulate the necessary publications.

That is why it is so important to have a Centre where all consultants are united and can exchange their experience but also support each other. The Centre can effectively negotiate terms with the heads of departments, the deans and the vice chancellor. In this effort we are advised and guided by a steering committee and I am grateful for their commitment. This is also helpful to gain different perspectives.

This year we have again received good feedback for our work from both students and staff. For the consultants involved it is valuable to know that their efforts are appreciated. My Department has supported my consultation unit at Umeå by providing the funds to employ another statistician and the Faculty of Forest Sciences is kindly contributing towards funding the space charges of a dedicated consultation room that we intend to name after one of my predecessors, Prof. Bertil Matérn, an eminent Swedish statistician. The development of statistical consultation looks promising and we hope to secure continued central support for the good work that is done for students and staff.

Open access for public data – Reality or wishful thinking?

Terms like open source software, open access journals and public data have recently been much discussed in scientific communities and are in fact buzzwords. In the last few months I happened to get involved in issues of data transparency and of the availability of raw data. As part of this involvement I had to learn that open access to public data, i.e. data collected on the expense of tax-payers’ money, is not taken for granted everywhere in Europe. This came to me as a surprise, since many scientific journals, particularly in ecology, have adopted the policy of making the public availability of data (and often also of R scripts) a pre-requisite for publishing.

To cast light into this, I grabbed the phone and got in contact with a specialised lawyer from the Deutsche Hochschulverband in Bonn. He confirmed that European laws do indeed require data collected by public institutions to be publicly available. Such data are in fact not subject to intellectual property rights. Similar laws are apparently also in place in countries outside the EU, in North America and in Australia. In the UK and in the US these laws are referred to as the Freedom of Information Act. The purpose of these laws is to ensure the transparency and the reuse of data, particularly those that are gathered in a university context. Such raw data are generally free and Simon Hoggart adds that the British Ecological Society explicitly encourage data reuse through the open data policy they have adopted for their journals. The national research councils in every country are charged with the task to watch over the open-data legislation. Open access to data is also an important principle of the European Horizon 2020 research programme.

Given these trends and the clear legislation it is difficult to understand that research groups in some countries still try to sneak their way into good publications by making the use of their data dependent on a co-authorship without contributing to neither the analysis nor the text. It is usually weak research groups that try to enforce such practices, which constitute a clear breach of international authorship agreements such as the Vancouver protocol. Other groups try to raise money by arguing the data provision creates chargeable costs.

These practices are attempts not to comply with the current international legislation and with the requirements of the respective funding bodies. The OECD paper on “Principles and Guidelines for Access to Research Data from Public Funding” cite the US National Research Council: “The value of data lies in their use. Full and open access to scientific data should be adopted as the international norm for the exchange of scientific data derived from publicly funded research.” This is a very good expression for a vision that many of us share. In my view open data and methods (in the form of, for example R or SAS scripts) stimulate research cooperation and build trust in research results. Open access to research data is a fundamental principle of the international research community similar to human rights and democracy. We should all live up to this ethos and remind others to do the same.

Can’t see the wood for the trees – The journey of mathematical forestry

Today I am going to take you on a journey, the journey of mathematical forestry, which is also a bit of my own journey. When state-organised forestry was set up in different parts of Europe around 1750 the pioneers in this new subject area must have looked at vast areas of partly untouched (similar to the coastal forests in Northern Estonia near Narva shown in the photo below) and partly devastated forest land and were puzzled by the trees they looked at. What kind of story were they telling? Even after collecting some data things would not immediately improve when they looked at lengthy columns of data. Can’t see the wood for the trees. Can’t see the story behind the data. This was the point where statistics came in, not to torture students, but to make sense of data. Of course, at that time statistics only began its own journey.

In his textbook on forest mensuration Kilkki (1989) described the forest ecosystem as an object system and how different systems and  processes eventually lead to a decision system. In the mensuration system, it is defined what to measure and how. The data are then statistically processed in the computing system. The computed summary characteristics  are processed in the planning system and the planning system leads to decisions and feed back into the object system, i.e. the forest ecosystem.

The early pioneers in forest science assigned professorships to these boxes, e.g. a professor in forest mensuration, a professor in forest inventory, in forest biometrics and in forest planning.

The development of general statistics happened approximately at the same time as the development of forest mathematics. General statistics often was influenced by applications, for example in agriculture. The work of W. S. Gosset (student) and R. A. Fisher was often inspired by some of their practical applications in agriculture.

Already in 1729 J. F. Penther published Praxis geometriae, a book on the surveying of forest areas. Around 1766 J. E. Vierenklee and C. C. Oettelt published textbooks on mathematics for foresters and forest mensuration. Something remarkable was achieved when J. C. Paulsen in 1795 published the first yield tables ever, i.e. models of biological production long before modelling started in biological science. In 1835 G. König published his famous book on forest mathematics that was published into many languages. University chairs in forest biometrics were only set up much later towards the 1960s till 1980s.

There are many different names for the academic subject area of forest biometrics. These include numerical forestry, quantitative forestry, forest statistics, computational forestry and others. At Umeå my Chair is referred to as mathematical statistics applied to forest sciences. All of these professorships accomplish similar things, which can be described as the “study of mathematical-statistical methods for effectively describing and modelling patterns and processes in forest science”. For me it is important that forest biometrics belongs to the realm of basic sciences and is a strong research area. Forest biometricians are mediators between mathematical statistics and forest science. And forest science as our application has also the potential of inspiring the development of general mathematical statistics.

Professors have specific lifetimes and their academic subject areas have lifetimes, too. Traditional subject areas may change over time, some vanish and new ones come into existence. Silviculture and forest economics have for example a long tradition, whilst soil science and forest ecology are comparatively young subject areas.

Forest biometrics is a quite young subject area and there is still a lot to expect from this field of forest science.

Mathematics is about patterns, natural sciences, too. Often mathematical patterns and patterns in natural sciences come together. In the Fibonacci sequence of numbers any subsequent number is the sum of the two previous ones. In some trees, the horizontal layering of branches is organised according to the Fibonacci sequence of numbers. This sequence of numbers is also related to another pattern, the golden section, which plays an important role both in mathematics and in biology.

The job of a professor is not only a job, it is something special, a vocation and a quest. In the same way the research topics we are dealing with should be something that intrigues us so much that we cannot simply let the important questions go. In a way they keep us awake at night. What are the topics that have kept me awake? There are certainly three that have kept me busy in the past years, i.e. woodland structure analysis & modelling, the biometry of plant growth and human behaviour in forests. There are a few things that link these three topics, one of them is space.

Two examples. A couple of years ago I wondered what might be the most important macro-processes influencing the formation of spatial patterns. Growth (G) is for example a very important process and recently evidence has been found suggesting that trees always need to grow at least a little for their survival. Interaction is another important process and the most prominent representative of tree interactions is competition (C). Then there are birth and death processes (B&D) where trees are either added to or deleted from the point pattern formed by the tree locations. Initially I believed the combined birth and death process would have the greatest influence on the formation of spatial patterns.

We developed an individual-based model two forest time series. The model was inspired by spatial statistics and is good at representing spatial tree interactions. Each of the three macro-processes was represented by three model parameters. We then applied a sensitivity analysis to study the variance of mark variograms, i.e. the model input, in terms of systematic variations of the nine input parameters. To this end we used a global sensitivity measure, the extended Fourier Amplitude Sensitivity Test. The results have shown us that birth- and death processes are of minor importance compared to growth and competition. We also learned that the importance of processes depend on the development stage of a forest. In the graph below (from Pommerening et al., 2011) the results are shown for a young Norway-spruce forest in Austria and for a natural Douglas-fir forest in British Columbia.

A quite different example is the behaviour of professional humans in forests. Forestry professionals select trees for a number of reasons, e.g. as Christmas trees, as habitat trees and in thinnings (to evict them from the forest for the benefit of others). A master student of mine has recently studied how much agreement there is among forestry professionals when it comes to tree selection. In the past it was always believed that forestry staff select trees exactly according to silvicultural guidelines. Therefore tree selection in forest simulators have been modelled quite deterministically.

My student studied the behaviour of test persons in twelve different forests throughout Great Britain from Northern Scotland to Cornwall. The test persons were asked to mark trees for two different thinning strategies, i.e. thinning from below and crown thinning. The latter technique is now often discussed in conjunction with continuous cover forestry (CCF). For assessing agreement we used Fleiss’ $Latex formula$, a measure that is often applied in medicine and particularly in psychology. The values of this measure usually range between 0 and 1, 0 indicating chance agreement and 1 denoting perfect agreement.

It turned out that the overall agreement was low with maximum values of $Latex formula$ of around 0.5. The agreement was consistently and markedly higher in low thinnings than in crown thinnings. This was somewhat unexpected as crown thinnings usually come with a didactic element that narrows down the choice of trees. We tried to understand this outcome and it turned out that low thinnings have a long tradition in Britain and many professionals are used to it while crown thinnings are somewhat new. New techniques initially often create confusion and confusion leads to low levels of agreement.

This type of work in forest biometrics requires interdisciplinary teams and a necessary pre-requisite for recruiting such teams is a solid education in quantitative methods, particularly in statistics. Often we enter discussions at faculties about new module structures and the number of hours. Of course, all subject areas are important. Still at the end of the day, all research boils down to some numbers that we need to make sense of and we end up in a similar situation as the pioneers of forest science.

(Abbreviated inaugurational lecture on 13 May 2016)

Scientific programming – Is this really necessary and important?

Basic modules in scientific computing used to be included in forest science degree programmes. For a number of years they seem to have disappeared in many countries, I am not sure why. Perhaps this has something to do with Bologna and such key-skill modules didn’t fit in any longer? This seems odd as many research topics in forest science and in natural sciences in general focus on quantitative research, which require some level of programming skills. Quantitative research almost always requires modelling work and simulations. As part of this it is often necessary to define one’s own loss function of more sophisticated regressions that for example involve spatial models of plant interactions.

Computers increasingly dominate everyone’s work life and in specialised academic fields such as forest science it is crucial to be able to produce code, which makes computers to efficiently process specific scientific jobs. And anyway, have you ever felt the fantastic and inspiring experience of creating your very own software and of getting the computer to perform your own, special vision of science? If not, then you have missed something, definitely!

Surely with the advent of R it has generally become much easier for students to implement even challenging quantification methods. The script language of this software is not too hard to learn and existing packages give easy access to more complicated methods. Thus almost anybody can design quite complex analyses and program her or his own model. Particularly in the design-phase of modelling R is great, because you can easily try and compare different options and use the wealth of visualisation and statistical tools for assessing their performance. Still not every degree programme in forest science includes an R course, although the software has become a bit of a standard in plant sciences. Even among the PhD students I teach there are still quite a few without prior knowledge of R.

Then, of course, there are also programming situations, where R hits its limits: R has for example problems with processing loops. Surely loops are included in the language syntax, however, loops can severely slow down computations. This problem can be fixed by externalising code, which takes too long to process in R. Basically you program parts of your R script in a different way and deposit this code in a separate file. Sounds easy and like a good idea. However, the implementation of external code has to happen in a higher programming language. Often C/C++ is  chosen for this purpose. Oh dear, another steep learning curve ahead! And actually, how do I decide which parts of a code to externalise and which to leave behind in my R script?

This is where the problems (or the fun?) begin(s). Many PhD students (not only those with a Forest Science background) struggle when their supervisors ask them to study a problem that involves programming skills. I have seen a great many of them coming for consultation to my office, half panicking …

And then there are also other external software tools, which in combination with R could make your life much easier. If you only knew …

To address this problem, I am offering a course on scientific programming & simulation to research students and young researchers for the first time this year at Umeå. The course is part of the Research School in Applied Forest Statistics and Scientific Computing based at my Chair. Initial application numbers seem to confirm that the lack of scientific programming skills is really a problem across a wide range of subject areas. Let’s see how this course goes, I am really looking forward to it.

Perhaps it is time to re-think the role of scientific programming skills in the academic education that we provide and to introduce this subject earlier? Not only in research is this important but also many employers expect advanced computing skills along with a knowledge of GIS and other software packages. Basic programming skills also allow you to learn new statistical and simulation methods easier. You develop a better understanding of different data types, how best to handle them and you can more easily try new methods yourself. Worth considering I think.

R script for computing growth dominance

A summary statistic that crossed my path a lot this year is growth dominance. Growth dominance is a concentration measure based on the Lorenz curve and was invented by Dan Binkley and his lab at Colorado State University, USA. Together we have worked with this characteristic this year and it has been great fun. We applied it to data from a virgin forest and learned a lot more about growth processes in such forest types.

Growth dominance characterises the contributions of different tree sizes to total population growth (Binkley et al., 2006). West (2014) provides a good statistical description of the growth dominance concept in Forest Science.

The growth dominance curve is related to different development phases of forest stands. Young, middle-aged and old stands have characteristic curves and the statistic is therefore a useful indicator. In the context of continuous cover forestry, it is likely that growth dominance can prove a useful characteristic for quantifying the progress of transforming tree plantations to uneven-aged forests.

In this “Christmas blog”, I would like to share a simple R script for the calculation of this statistic. Reading and using it is also a good way of understanding how the growth dominance statistic works. The blog is intended for readers with a basic understanding of R and unknown bits of syntax can easily be tracked down in the internet. Here are the first three code chunks:

```# Load data
dFile <- paste(filePath, "Clog1.txt", sep = "")
```# Creating the size and growth rate vector
d <- xdata\$d2002
g <- pi * (d/200)^2
ig <- pi * (xdata\$d2007/200)^2 - g
rm(xdata)
ig <- ig / 5```
```# Merging size and growth rate vector in a data frame
xdata <- data.frame(g, ig)```

The first two-three chunks of code are almost self-explanatory: Data is loaded including two columns with a size variable measured at two different points in time. In this case it is stem diameter (at 1.3 m above ground level) measured in centimetres both in 2002 and 2007. The stem diameters are  converted to basal areas (cross-sectional areas) and the mean annual basal-area (absolute) growth rate is calculated. Finally size vector (here initial basal area in 2002) and growth-rate vector are merged in one data frame.

The next step is important and is key to the interpretation of the growth dominance statistic. The whole data frame is ordered according to size from small to large trees, so that the size and the growth rate vectors contain corresponding pairs of values.

`xdata <- xdata[order(xdata\$g, decreasing = FALSE), ]`

Now we calculate cumulative relative tree sizes and cumulative relative growth rates:

```cumG <- cumsum(xdata\$g) / sum(xdata\$g)
cumInc <- cumsum(xdata\$ig) / sum(xdata\$ig)```

Based on the cumulative vectors we can now estimate a characteristic similar to the Gini coefficient of the Lorenz curve.

```area <- 0
for(i in 2 : length(cumG))
area[i] <- (cumG[i] - cumG[i - 1]) * ((cumInc[i] - cumInc[i - 1]) /
+ 2 + cumInc[i - 1])
gc <- 1 - sum(area) / 0.5```

`cumInc` can now be plotted over `cumG` to give the growth dominance curve. To obtain smooth curves, it is, however, advisable to calculate percentiles corresponding to selected points on the size axis:

```x.values <- seq(0, 1, by = 0.05)
rx <- ecdf(cumG) (x.values)
sx <- quantile(cumInc, rx)```

To obtain smooth curves, `sx` is plotted over `x.values`. The figure below gives an impression of the data from a roughly 55-year old Sitka spruce plantation in transformation to Continuous Cover Forestry.

Apparently the observed growth dominance curve (continuous green line) is almost symmetric and very close to the 1: 1 (dashed) line. This is indicative of development phase 1, where each tree’s contribution to total stand growth is proportional to its size. Usually this pattern can be found in young stands before canopy closure but apparently it is also true for middle-aged plantations at the beginning of transformation.

Got intrigued? I find growth dominance quite fascinating and for me it is definitely my personal growth characteristic of the year 2015. Perhaps you would like to try this characteristic with your own data. (I can send you the full version of my R script and the data on request.) Have fun and … Merry Christmas and a Happy New Year!

Forest Biometrics in the Southern US

Being on an extended research visit to one of the world’s Mekkas of Forest Biometrics, Virginia Tech at Blacksburg, I cannot help but notice quite a few differences to European Forest Biometrics. I am currently based on this vast and beautiful campus, where in autumn the tree leaves almost seem to be on fire and the warm, orange-red colour of the leaves forms a nice contrast to the limestone facades of the campus buildings. The sunny, warm weather at this time of the year is just awesome. Through the window of the coffee shop, where I am writing this blog, I can see the students passing by. Many wear T shirts and sweaters with Virginia Tech logos on them. Maroon and orange everywhere. The identification with this University is amazing, also among the general public in town across all age classes. This is my second visit to Blacksburg, this time representing the Swedish University of Agricultural Sciences.

Obviously the US are a big country and there are quite some differences in how Forest Biometrics is run up and down the country. However, a lot of the research over here appears to be tailored to the needs of forest industry. Interestingly this does not necessarily mean that research remains at a practical level or that research methods are simplified, on the contrary!

Private funding plays an important role in US research compared to the largely state-funded research in Europe (possibly with the exception of the UK). As a consequence industry and research representatives join their efforts in so-called research cooperatives. The industry partners pay a kind of membership fee that university researchers are free to use for the best intents and purposes and the coop members benefit from any new releases in terms of publications, models and software. This is a set-up I have always admired since I heard about it for the first time. The industry-driven funding stream also contributes to the fact that growth and yield science still plays a large role in the US, very much in contrast to Europe. Flexible, computer-implemented growth and yield models addressing current issues such as climate change, genetic tree breeding programmes but also traditional topics such as spacings and fertilisation are high in demand. Often these growth and yield models are required to go far beyond the level of growth simulators to become decision-support systems that allow the precise management planning of huge forestry companies. For collecting data, forest biometricians and even silviculturists work more with statistically designed experiments over here and less with observational plots as in Europe. In the south of the country, most research work is in the context of plantation forestry rather than in natural forests. And forest management in these plantations is highly optimised, running by the clock.

Forest Biometrics in many parts of the US largely focuses on synthesising growth patterns by using the latest statistical techniques for empirical, data-driven modelling. Research into fundamental growth processes and ecological questions is therefore often just a small byproduct. Regular conferences in this subject area (often traditionally associated with the term “mensuration”) are also visited by private consultants and company representatives. as a result there is an interesting exchange of opinions and thoughts in discussions.

Forest biometricians are well respected in America and are even explicitly mentioned in books of forest history as I have recently discovered. Blacksburg, for example, is the location of the school (of thought) of Prof. Harold Burkhart, who in turn is a student of Prof. Jerry Clutter (University of Georgia). Like in Europe, there are interesting academic lines of succession or academic family trees. A recently published book on the history of forestry in the southern US describes the development of forest biometrics in this part of the world including all the aforementioned relationships on several pages. This is indeed very remarkable and unprecedented in similar European publications.

It has been great fun here so far  thanks to my hosts and colleagues at Virginia Tech including Harold Burkhart, Phil Radtke and Randy Wynne. I started with participating in a conference of the Southern Mensurationists (where I gave talks on my recent growth dominance research at Bialowieza Forest and on human tree selection behaviour) and then stayed on to carry out joined research with Harold Burkhart. Together we are currently working on modelling size-dependent relative growth rates. This is very inspiring and takes the work my colleague Anders Muszta and I previously did in this field to a new level.

Do humans behave in woodlands?

We are using woodlands for recreation, we pick berries and mushrooms and we select picnic sites at spots in the forest we like. Certain woodlands we tend to avoid, because the tree species and tree densities make them look dark and hostile or we hate prickly brambles. Forest structures that do not allow you to clearly see through the stems for at least 50-70 m can have an intimidating effect on some people.  Surely we all have experienced the heart-lifting effect of a great day outdoors in the woods.

So what is this all about? How does this interaction tree-human, woodland-human really work? Is it the spatial structure of a woodland that creates an effect in our mind similar to the effect of a piece of artwork or of a building? Or is it the microclimate, a combination of light, humidity, noises and smells?

For most of these questions it is probably fair to admit that we simply don’t know. “Humans as part of the forest ecosystem” is not a topic that has been widely studied so far. Socio-economic relationships – yes, but not so much human behaviour and human health in relation to trees and woodlands.

In this context, I find two topics particularly interesting, the relationship between (spatial) forest structure and human health and human behaviour in terms of how trees are selected for various purposes.

Humans select trees for various purposes – e.g. as habitat trees (i.e. for their ecological value), for their timber value, for picnics, for playing, as Christmas trees and for burials among other reasons. In all these cases humans make decisions that depend on a wide range of different factors. One quite important factor, of course, constitute the trees themselves that arouse attention for their size, the straightness of their stems and for a certain aesthetic feeling they stimulate. Human selection behaviour may vary a lot between different people and also within the same persons at different times and in different situations.

For a long time it was assumed that forestry students would select trees according to their university textbooks or to the instructions of their professors throughout their professional lifetime. Variation between and within individuals was therefore ignored. Only recently it started to dawn on some people that it is worth considering this variation a bit more carefully and systematically. What are the reasons for this variation, does it matter, is it possibly even an advantage?

This research into human selection behaviour is part of a larger field of research where for example the voting behaviour of humans or the selection of products in supermarkets are analysed. How do humans distinguish between trees they like and those they don’t? How do they perceive the trees, which trees catch their eyes? This is simultaneous research into humans and trees. The data used in this research are often collected by non-scientists in forestry training programmes and then passed on to researchers for detailed analysis. Therefore this field is also an example of citizen science.

I found inspiration for this research direction in the work of my mentor Klaus von Gadow and the colleagues who worked with me at his Chair in Göttingen.  At that time many shook their heads and could not understand how such a topic could ever matter. Now, 20 years later, the research question has eventually matured and attracted a lot of interest in a number of European countries. Xin Zhao, a PhD student and Carlos Pallares Ramos, an MSc student, are currently studying the statistical side of this topic at my Chair and we are planning new experiments to better understand human tree selection behaviour. Perhaps this is something you would like to join in?