Another important, more ecological application is the reconstruction of spatial forest structure. All data, that are collected in forest ecosystems, have a temporal as well as a spatial dimensions. The properties of the whole forest ecosystem, e.g. wood production, habitat and recreational values, to a large degree depend on the underlying ecosystem structure, particularly on its microstructure. This microstructure is typically shaped by physiological and ecological interactions, but the microstructure also influences interactions. These structure-property relationships play a crucial role in providing ecosystem goods and services and in the maintenance of biodiversity. Usually data related to spatial forest structure are available only on a sample basis but subsequent research requires full information which then needs to be reconstructed.

Reconstruction can even be employed for habitat modelling: Imagine a set of summary statistics that well describe requirements of a certain, endangered animal species. These can then be used to modify an existing landscape at the computer to meet these requirements. In considering this option, we have actually moved on from reconstruction to construction, because an active change of landscape structure is modelled. Bäuerle and Nothdurft (2011) used spatial reconstruction to model habitat trees.

Another important purpose of reconstruction is the testing of competing summary characteristics. There are usually a number of statistics describing the same aspect of spatial forest structure or of a certain material. Which one is better, which one should be used in a given analysis? One way of shedding light into this is by reversing the analysis. The competing summary characteristics are used in separate simulations for reconstructing the original data from the results provided by these summary characteristics. This synthesis of a given analysis is nothing else than reconstruction and is applied a lot in materials science.

So how does it work?

Well, it turns out (re)construction is not difficult at all. First you need one or more summary characteristics describing the structure you intend to reconstruct. This can be a simple index based on a mean, it can be a histogram or a function. You can also choose a number of different summary characteristics.

Then you select a stochastic optimisation method. For this other researchers and I have successfully used the simulated-annealing method. This method was first developed in physics, it relates to thermodynamics, particularly to how the energy state of for example metals change when they anneal. The reconstruction algorithm is based on simulated annealing. One way to start is by randomly dispersing tree locations in a given observation window. Then iteratively one of these points is randomly selected and shifted to a new random location within the observation window. For this change the summary characteristic(s) are re-calculated and compared with a target, e.g. the same summary characteristic for some ideal of a reference. If the change leads to a better approximation of the target, it is made permanent otherwise it is rejected and the old state is restored (improvements-only algorithm). In either case, this is followed by the random selection and shift of another point. This process stops when the difference between target and observed characteristic(s) becomes very small or after a certain number of iterations (Torquato, 2002).

In previous publications I employed this method to reconstruct the structure of a whole forest stand from sample data by interpolating the unmeasured structure between the sample plots (Pommerening and Stoyan (2008). This even let to the astonishing result that in some cases certain summary characteristics were better estimated from the reconstruction than from the sample although no additional information was added. Nothdurft et al. (2010) in fact corrected a biased sampling design through spatial reconstruction. Together with Estonian colleagues we also used reconstruction for simulating off-plot edge correction buffers (Lilleleht et al., 2014). It is also possible to include existing, measured objects in the reconstruction through conditional simulation (Pommerening and Stoyan, 2008).

On my website pommerening.org you can find example code in R and C++ for construction, where you can essentially model a spatial point pattern that leads to a certain pre-set value of the aggregation index by Clark and Evans. Along similar lines it is also possible to set a certain species mingling index (see one of my previous blogs) for an existing point pattern and then to swap pairs of trees of different species. Here the point locations remain the same, only the species marks are re-allocated (see below for some results). Applications to remote-sensing and other image data are also known. There is no limit really to anybody’s resourcefulness to invent new ways of spatial reconstruction.*Example of species mingling construction: A random point pattern was simulated using a Poisson process in a window of 100 x 100 m. The two species (red and yellow were randomly assigned to the points with a probability of 0.5, left). Through construction the species dispersal was optimised to achieve a mingling index J of -0.20, where among the four nearest neighbours heterospecific points attract each other (right). Note that the point locations have remained the same.*

Interested? Any questions? Don’t hesitate to get in touch.

**Literature**

**Bäuerle, H. and Nothdurft, A., 2011.** Spatial modeling of habitat trees based on line transect sampling and point pattern reconstruction. *Canadian Journal of Forest Research* **41**, 715-727.

** Lilleleht, A., Sims, A. and Pommerening, A., 2014. **Spatial forest structure reconstruction as a strategy for mitigating edge-bias in circular monitoring plots.

**Nothdurft, A., Saborowski, J, Nuske R. S. and Stoyan, D., 2010.** Density estimation on k-tree sampling and point pattern reconstruction. *Canadian Journal of Forest Research* **40**, 953-967.

** Pommerening, A. and Stoyan, D., 2008. **Reconstructing spatial tree point patterns from nearest neighbour summary statistics measured in small subwindows.

**Torquato, S., 2002.** *Random heterogeneous materials*. *Interdisciplinary applied mathematics* **16**, Springer, New York.

**Tscheschel, A. and Stoyan, D., 2006.** Statistical reconstruction of random point patterns. *Computational & Data Analysis* **51**, 859-871.

]]>

Stem growth of trees can, for example, be considered a performance variable. A common assumption based on ecological theories is that trees with few competitors, i.e. growing in quite open conditions, respond more in terms of growth than those that grow in tight neighbourhoods. Research in tree mechanics, however, suggests that what is interpreted as competition effects, has more to do with a tree’s need to adjust to environmental challenges, particularly to withstand the forces of wind. A tree protected by neighbouring trees in a tight neighbourhood does not need to invest much in stem growth because it is sheltered from the wind, whilst another tree growing in more open conditions is strongly affected by wind. Considering this the growth response merits a very different interpretation.

Promising research is currently underway attempting to uncover the physiological receptors and processes related to interaction. This will undoubtedly put us into a better situation to review and select appropriate criteria for identifying the mode and intensity of interactions.

In this context, individual-based models assume an important role in exploring plant interactions. This model type is in fact specialised on handling interactions between individuals or agents. The behaviour of a system, for example a forest ecosystem, is the result of individuals acting and interacting in a given environment. The same applies to the projection of growth or mortality, which no longer is the result of statistical estimations but a consequence of individual “decisions” and interactions. Thus following this modelling philosophy interaction is handled in a very mechanistic way and can also be linked with physiological receptors and processes. Individual-based modelling is also very intuitive and creative. Many software packages such as NetLogo and the Java Agent-Based Modelling Toolkit have been developed to support this modelling approach, however, R and Python are also good choices to begin with. Individual-based modelling is not limited to any particular field of science and shares interesting links with physics and point process statistics.

An excellent opportunity for exploring the world of A/IBMs is a workshop that my group is offering at Umeå (Sweden) on 21/22 August 2018, see our workshop website. In this workshop we will have a few keynote talks followed by talks of the workshop participants. Every participant can highlight anything they consider important, e.g. why they are interested in IB modelling, what they perceive as challenge or programming problems they ran into and want to share. The workshop is not limited to any particular ecosystem or agent and I believe the exchange across (eco)systems and specific applications can be very inspiring and eye-opening. If you haven’t signed up yet please consider the opportunity and get in touch!

]]>Every research usually has quite a bit of background history that doesn’t get published. Of course, not every single bit of our research is a hit and may be as fascinating as the discovery of the DNA double helix by Watson and Crick. But still, I am sure that there are a great deal of fascinating background stories that would make our lives richer, if they were actually told. Imagine what encouragement and boost in inspiration this could potentially give to young researchers and particularly to PhD students!

In this spirit, feeling committed to my own appeal, I would like to say a few words on our recent paper “Do large forest trees tend towards high species mingling?” published in *Ecological Informatics*.

The story of this paper probably started off during my field trips to Bialowieza Forest in 2011 and 2016, when Bogdan Brzeziecki and his colleagues from Warsaw University of Life Sciences kindly showed me a number of forest sites where a species change has occurred and large remnant trees of the previous woodland community were dominating clusters of different species (see the photo below showing a remnant Norway spruce in a cluster of young hornbeam trees). The image of these remnant trees somehow stuck in my head, thank you Bogdan, and kept me thinking about the possible underlying processes that let to this situation.

Then in October 2016 I was reviewing somebody else’s manuscript that was submitted for publication. In this work, the author hinted that there may perhaps be a tendency of larger forest trees to be surrounded by heterospecific neighbours in mixed species forests. This really was the trigger and the hint somehow struck me like a lightning: I thought by myself that this would actually be a fascinating research question. I shoved the review papers onto another pile of papers on my desk and dug a bit deeper into ecological theories. From earlier I recalled the Janzen-Connell hypothesis, the herd immunity hypothesis and size-hierarchy processes that describe natural processes of maintaining species coexistence and biodiversity.

One observation in many natural plant populations is that self-thinning leads to local size hierarchies (also referred to as size inequality), where heterospecific stands include dominant trees emerging from a first colonisation cohort, which are often surrounded by patches of smaller sized trees of the same cohort. Small trees of these early colonisers are initially often of the same species as the dominant trees and according to the mechanisms of both the Janzen-Connell and herd immunity hypotheses later decrease in numbers due to self-thinning processes. Eventually the small early colonisers are partially or completely replaced by even smaller individuals of other species from subsequent colonisation cohorts. This combined effect of species and size replacement enforces both local size hierarchies and the mingling of different tree species in a given area or patch and prevents the development of monocultures (see B-D in the figure below).With every new paper that I discovered on this subject the story and research question got even better. It was like tasting new wine all the time. Finally it became clear to me that it would be interesting to study this question for a wide range of different forest types in the temperate and boreal climate zones across the world. In terms of the aforementioned ecological theories, forest ecosystems in these climate zones have to date not been as intensively studied as tropical forest.

It took me a while then to devise a good method for proving the point that there is a trend for larger forest trees towards high species mingling. At first I was thinking about using so-called second-order characteristics from point process statistics. I tried a few things there, but none of the outcomes really was to the point. Then I remembered the species mingling index that was originally devised by my mentor Prof. Klaus von Gadow back in 1993. Of course, this index was about spatial species mingling, so it should actually suit the research question quite nicely. Still, I needed to establish a relationship between species mingling and size. That line of thinking eventually let me to logistic regression, since I recalled that we often model tree selection probabilities (for thinnings) using logistic regression, where size is an important explanatory variable. Tree mortality modelling is a related topic, where logistic regression is often applied. However, the discrete mingling index can take* k* + 1 different values depending on the number of nearest neighbours, *k*. For using logistic regression I thus had to convert the mingling index to a binary variable. Instead of using only *k* = 1 neighbour I stuck to the idea of including a slightly larger neighbourhood for a better description of species mingling. Finally – after much thinking and trial calculations – I settled on the idea of using *k* = 3 neighbours leading to four discrete mingling values, i.e. 0, 0.33, 0.67 and 1. I argued that a situation where no or only one neighbour is of a different species can be summarised as “low mingling”. On the other hand, values of 0.67 and 1 indicating two and three heterospecific neighbours describes a situation of “high mingling”. A few validation checks have later shown that this was a good choice. My definition of low and high mingling also allows a straightforward mapping and detection of clusters of high and low mingling in in a forest, see Fig. 2 in the publication. I still remember that this basic idea came to me out of the blue while actually travelling between Umeå and Göttingen.

In January 2017, I gave a first talk on this research to a larger audience at Dresden Technical University (Germany). The discussion following my presentation was interesting and varied. There was some scepticism in the audience, as you often get with new ideas, particularly with regard to the finding that the investigated trend should really hold for many different forest types. This kind of scepticism also came up in the review process. It took much discussion and re-wording to bring the point of the paper across in an appropriate way. Overall we found significant support for the expectation that large trees and trees growing at low local densities often have a tendency towards high species mingling. An explanation for this can be the aforementioned theories, but also disturbances and even forest management. We also found that size differentiation and species mingling are strongly related in local neighbourhoods. This is a very interesting and useful finding for conservation.

Now it would be great to follow these relationships up in detailed studies of specific forest ecosystems using spatio-temporal data. And while working on this topic two new research ideas came to me out of nowhere. An MSc student at Umeå is currently pursuing one of them and the other one is still sitting in my folder of new ideas. Perhaps I will post their background stories here one day, too, who knows.

]]>Experience is for example a term that often comes up in the context of human resources. It is generally perceived as something positive that complements knowledge received through education. In some countries there is even the widespread believe that experience can compensate for a lack of education, but I tend to disagree here: The value of experience made in a certain field much depends on the prior knowledge received from school or university education. The more you know about a certain field the more you tend to gain from experience.

However, my recent work with Lucie Vítková has shown, that experience sometimes can also prevent people from taking in new skills as part of training: People with many years of practical work and associated experience are somewhat unwilling to leave their comfort zone and to try new ways. Our research also showed that people with little experience on the other hand tend to be open to new suggestions and are ready to implement them.

Research on human tree selection behaviour has a lot to do with assessing agreement. The question of agreement is crucial to the forest industry, perhaps today even more so, as environmentally friendly ways of forest management are meant to maintain biodiversity and to mitigate adverse effects of climate change among other things. It is therefore important to know how much agreement and chance there is when professional staff mark trees for various purposes.

Outside forest and agricultural sciences there is a large body of literature dealing with agreement. In medical science and in psychology there are many applications where humans vote for something or rate items that they are confronted with. Even the binary case with “1” meaning “approval” or “selection” and “0” denoting “rejection” is quite common. A popular statistic used to quantify agreement is for example Fleiss’ kappa that was designed to express agreement in a single number based on matrix data. This statistic has become a kind of standard, although there are also other alternatives.

However, expressing human behaviour in a single number is difficult. The more we look into this matter, the more we understand how complex this type of behaviour actually is. Apart from a large data matrix with trees in rows and raters in columns there is an interaction between raters and trees that is hard to disentangle. We can distinguish between an active behaviour of the raters and a passive rater behaviour, which is influenced by the trees they rate. The former is easier to understand and describes the marking behaviour of each rater, comparatively independent of the trees, i.e. whether they tend to mark many or few trees. The passive tree selection behaviour is consciously and subconsciously influenced by the trees, i.e. some of them are more attractive to one rater than to another.

Both processes influence the level of agreement. A good characteristic should take both processes into account to arrive at a balanced assessment of agreement. However, currently it is mostly the passive behaviour, which the existing characteristics including Fleiss’ kappa quantify. My group and I are currently looking more into these processes in order to better understand them and to find ways to quantify them. This is an intriguing quest, since this work also tells us more about ourselves and about how our mind functions.

]]>*
j* denotes the

**What does it mean?**

The mingling index is defined as the mean fraction of plants among the *k* nearest neighbours of a given plant *i* with heterospecific neighbours.

**Where does it come from?**

An early strategy and benchmark for quantifying spatial species mingling is Pielou’s segregation index (Pielou, 1977) comparing pairs of points formed by the locations of an arbitrary plant and its nearest neighbour. For all plants in a certain observation window (e.g. a research plot) these pairs are determined. Pielou (1977) defined the segregation index as the ratio of the observed probability that an arbitrary plant and its first nearest neighbour are conspecific and the same probability with independent species marks (i.e. a completely random dispersal of species). This segregation index was originally developed for bivariate species patterns and describes the neighbourhood structure of a subject plant of one species in terms of others. An individual has a high degree of mingling if its neighbourhood is highly diverse, i.e. if many heterospecific neighbours are located in its vicinity. The opposite of a largely conspecific neighbourhood is referred to as segregation.

Gadow (1993) and Aguirre *et al*. (2003) extended this concept to general multivariate species patterns involving *k* neighbours.

**Why is it important?**

Species diversity is an important and most commonly considered aspect of biodiversity worldwide (Kimmins, 2004). As a part of species diversity, spatial mingling of plants concerns the question of how plants of the same and different species mix in space. This refined concept extends beyond the idea of species richness to take the individuals’ perception of local diversity into account. Spatial mingling specifically relates to the microstructure of ecosystems, particularly to neighbourhood relationships, which play a vital role in plant ecology.

**How can it be used?**

With *k* number of neighbours the mingling index can take *k* + 1 different discrete values depending on the neighbourhood situation. This is illustrated for *k* = 4 in the following illustration, where the colours of the circular plant objects represent different species:

These discrete mingling values can be used to construct an empirical mingling distribution, which collects the relative numbers of plants with one of the k + 1 values and presents them as bars.

It is also possible to estimate arithmetic mean mingling, , at population level using an appropriate edge correction method such as NN1 (Pommerening and Stoyan, 2006).

Empirical mingling distributions and mean arithmetic mingling can also be calculated separately for individual species populations of the same plant population.

According to Lewandowski and Pommerening (1997) expected mingling (implying independent species marks), **E***M*, is independent of the number of nearest neighbours, *k*, and can be calculated as

with *s*, the number of species, *N*, the total number of plants in the observation window and , the number of plants of species *i*.

In analogy to Pielou’s segregation index and **E***M *can be arranged in an index *M* expressing the relationship between observed species mingling and completely random species mingling according to

.

I refer to this index as *species segregation index*. Consequently, *M* = 0 if the species marks are independent or random. If the nearest neighbours and plant *i* always share the same species, *M* = 1 (attraction of similar species, segregation in Pielou’s terminology). If all neighbours always have a species different from that of plant *i*, *M* = -1 (attraction of different species, aggregation in Pielou’s terminology).

**R code**

To calculate *M* two auxiliary functions are necessary, the first one calculating **E***M* and the second one calculating the Euclidean distance.

calcExpectedMinglingAllSpecies <- function(species) { ta <- table(species) s <- length(ta) ka <- length(species) swm <- 0 for (i in 1 : s) swm <- swm + ta[[i]] * (ka - ta[[i]]) / (ka * (ka - 1)) return(swm) } euclid <- function(x1, y1, x2, y2) { dx <- abs(x2 - x1) dy <- abs(y2 - y1) dz <- dx^2 + dy^2 return(dz^0.5) }

The main calculations are done in the following loop and additional lines of code. Here the NN1 edge correction is used for estimating arithmetic mean mingling.

myData$sming <- NA myData$dist <- NA myData$rf <- NA k <- 4 for (i in 1 : length(myData$x)) { sums <- 0 dn <- findNeighboursOfOnePoint(myData$x, myData$y, k, i) for (j in 1 : k) { index <- dn[j] + 1 # Correcting C++ indices if(myData$species[i] != myData$species[index]) sums <- sums + 1 if(j == k) { dist <- euclid(myData$x[i], myData$y[i], myData$x[index], myData$y[index]) myData$dist[i] <- dist myData$rf[i] <- calcRepFactor(xmax, ymax, myData$x[i], myData$y[i], dist) } } myData$sming[i] <- sums / k } sums <- sum(myData$sming * myData$rf) sumRF <- sum(myData$rf) (mm <- sums / sumRF) (em <- calcExpectedMinglingAllSpecies(myData$species)) (m <- 1 - mm / em)

“xmax” and “ymax” define the boundary of the observation window whose bottom left corner coincides with the origin of the system of coordinates. The functions “findNeighboursOfOnePoint” and “calcRepFactor” are implemented in external C++ files that can be made available on request and are loaded in the following way:

library(Rcpp) sourceCpp(paste(filePath, "findNeighboursOfOnePoint.cpp", sep = "")) sourceCpp(paste(filePath, "NN1.cpp", sep = ""))

**Literature**

**Aguirre, O., Hui, G. Y., Gadow, K. and Jiménez, J., 2003.** An analysis of spatial forest structure using neighbourhood-based variables. *Forest Ecology and Management* **183**: 137-145.

**Gadow, K. v., 1993.** Zur Bestandesbeschreibung in der Forsteinrichtung. [New variables for describing stands of trees.] *Forst und Holz* **48**: 602-606.

**Kimmins, J. P., 2004.** *Forest ecology*. *A Foundation for sustainable forest management and environmental ethics in forestry*. 3^{rd} edition. Pearson Education, Inc., Upper Saddle River.

**Lewandowski, A. and Pommerening, A., 1997.** Zur Beschreibung der Waldstruktur – Erwartete und beobachtete Arten-Durchmischung. [On the description of forest structure – Expected and observed mingling of species.] *Forstwissenschaftliches Centralblatt* **116**: 129-139.

**Pielou, E. C., 1977.** *Mathematical ecology*. John Wiley & Sons, New York.

**Pommerening, A. and Stoyan, D., 2006.** Edge-correction needs in estimating indices of spatial forest structure. *Canadian Journal of Forest Research* **36**: 1723-1739.

**What does it mean?**

The formula quantifies the sum of the basal areas of all trees that are larger or equal in basal area compared to that of a given tree *i* at time *t*. It is the complement value of the basal area percentile of tree *i* denoting relative dominance, where *G*(*t*) is basal area per hectare of a given forest stand at time *t*. Basal area of an individual tree is the cross-sectional area of its stem usually measured at 1.3 m above ground level. For convenience basal area is often calculated from stem diameter using the area formula of a circle. Basal area per hectare is a density measure taking both number of trees and their sizes in a certain area into account.

*BAL* is related to available light, since with increasing basal area of larger trees there is less light available for smaller trees. In a sense *BAL* is a surrogate for light measurements with the benefit that stem diameters and basal area are easier to measure.

**Where does it come from?**

Basal area in larger trees is also referred to as overtopping basal area suggesting the nature of a kind of competition index. In fact this was precisely the context in which Schütz (1975), Wykoff *et al*. (1982) and Wykoff (1990) suggested this measure.

**Why is it important?**

Basal area in larger trees is a simple and effective measure that simultaneously considers relative dominance of a tree and density. It is very flexible and can easily be modified to a spatially explicit measure of competition by calculating it specifically for an influence zone around a tree. *BAL* is also a very suitable competition index for trees in small-sized sample plots. Naturally it is also possible to distinguish between different species and inter- and intraspecific versions of *BAL*. It is also easy to modify basal area in larger trees in such a way that it better explains growth rate, e.g. *BAL*(*t*)/*G*(*t*) (Wykoff, 1990). Schröder and Gadow (1999) suggested a modification which they referred to as *BALMOD*(*t*):

with

*RS*(*t*) is referred to as relative spacing, another density measure linking average growing space with top or dominant height, , a characteristic of growth and site quality. Top or dominant height is often calculated as the mean height of the hundred largest trees per hectare. *BALMOD*(*t*) is useful in situations where a competition index is needed for fitting a function estimating growth rates from a multiple number of different time series scattered across a certain larger geographic entity, for example a country or even a continent. In such a context, Schröder and Gadow (1999) found that *BALMOD*(*t*) is superior to *BAL*(*t*). Other authors have applied the *BAL* concept to crown variables and Burkhart and Tomé (2012, p. 202ff.) give a good overview.

**How can it be used?**

Another important purpose of the *BAL* concept is its use for growth analysis. Scatter plots of absolute stem-diameter growth rate, *AGR*(*t*), of the trees of a given stand for a given time, *t*, over *BAL*(*t*) can be depicted. Following Schütz (1975) this relationship can the be described for example by a simple power function:

*a*, *b* and *c* are regression coefficients. However, it is best to set *c* to a fixed value between 1.5 and 4. The larger the value of *c* the stronger the saturation effect towards the lower range of *BAL* and the larger the value of the growth compensation point. For comparisons within the same time series it is probably best to set *c* to an optimised constant value. A value of *c* = 2 is a good starting point.

The regression coefficient *a* is then the intercept describing the stem-diameter growth rate of large, dominant trees (almost open-grown trees) whilst the function has a root at . This is the *growth compensation point* in analogy to the light compensation point. The growth compensation point is an expression of carrying capacity and site quality. The graph shown above is an example of this relationship for the Sitka spruce time series 2068 in the Brecon Beacons (Wales, UK, between 1961 and 1966).

Such graphs can offer valuable insights of the growth patterns of forest ecosystems in relation to density and size. Comparisons between different forest stands and tree species in the same stand (Schütz and Pommerening, 2013) are helpful in understanding the relationships between environment, size and growth. The graph below, for example, has been produced using the data from the birch-plantation time series Bagshot (England, UK). The data points represent the total time series. For fitting the saturation curves the growth data of only two subsequent survey years were used. Model parameter *c* was set to 2. The numbers under the dashed horizontal line give the base calendar years of the corresponding survey period.

Apparently, growth compensation point and maximum growth rate (parameter a) vary throughout the years with hardly any particular pattern. Partly this can be attributed to forest management activities and partly to changing environmental conditions.

In natural, unmanaged forests, such as an Interior-Douglas fir stand in the Alex Fraser Experimental Forest (plot 3, see graph below) in British Columbia (Canada), the saturation curves and the growth compensation points are typically much closer together and show a stronger saturation effect.

Another way to illustrate and analyse the* BAL* – *AGR* relationship is to use the growth dominance characteristic that I introduced in an earlier blog in 2015.

As mentioned above, *BAL*(*t*) of an individual tree can also be used as a competition index for estimating growth rate. In that case basal area of larger trees is one of a number of explanatory variables that is either used directly in a growth function or as part of a modifier function (usually an exponential function). In the potential-modifier approach (see Weiskittel et al., 2011, p. 89ff.), first potential growth rate is estimated (e.g. growth of open-grown trees or the growth of a certain, upper percentile) and then reduced by a number of modifiers, one of which usually is competition.

**R code**

For calculating basal area in larger trees it is best to define a function in R that you can use at multiple instances in your R script. Such a function could look like the code given in the box below:

bal <- function(ba, area) { sumba <- sum(ba) basmaller <- 0 pix <- 0 bal <- 0 for (i in 1 : length(ba)) { bax <- ba[i] basmaller <- sum(ba[ba <= bax]) pix <- basmaller / sumba bal[i] <- sumba * (1 - pix) / area } return(bal) }

Here “ba” is a vector of individual-tree basal area values and “area” is the area of the forest stand in hectare. You can then call this function in your script in the following way:

dataOneYear$ba <- pi * (dataOneYear$DBH / 200)^2 xarea <- plotSize / 10000 dataOneYear$bal <- bal(dataOneYear$ba, xarea)

“dataOneYear” is an arbitrary data frame, “dataOneYear$DBH” is a vector of stem diameters measured in cm. “plotSize” is the area of the forest stand in square metres.

**Literature**

**Burkhart, H. and Tomé, M., 2012. ***Modeling forest trees and stands*. Springer, Dordrecht.

**Schütz, J. P., 1975.** Dynamique et conditions d’équilibre de peuplements jardinés sur les stations de la hêtraie à sapin.* Schweizerische Zeitschrift für Forstwesen*** 126**: 637-671.

**Schütz, J.P. and Pommerening, A., 2013.** Can Douglas fir (*Pseudotsuga menziesii* (Mirb.) Franco) sustainbly grow in complex forest structures? *Forest Ecology and Management* **303**: 175-183.

**Schröder, J. and Gadow, K. v., 1999.** Testing a new competition index for maritime pine in northwestern Spain.* Canadian Journal of Forest Research* **29**: 280-283.

**Weiskittel, A. R., Hann, D. W., Kershaw, J. A. and Vanclay, J. K., 2011.** *Forest growth and yield modeling*. John Wiley & Sons, Chichester.

**Wykoff, W. R., Crookston, N. L., Stage, A. R., 1982.** *User’s guide to the stand prognosis model*. USDA Forest Service, Intermountain Forest an Range Experiment Station, Ogden, General Technical Report INT-133.

**Wykoff, W. R., 1990.** A basal area increment model for individual conifers in the northern Rocky Mountains. *Forest Science* **26**: 1077-1104.

**What does it mean?**

Growth functions in general describe the change in size of an individual or population with time (Burkhart and Tomé, 2012). Assume that is a tree growth variable, e.g. tree total height or tree volume, and is the maximum value this growth variable can take (in absolute terms for a given species in general or for a given species on a given site) then the term is a modifier reducing the maximum growth variable to its current state at time . is an empirical growth parameter scaling the absolute growth rate. The empirical parameter is related to catabolism (destructive metabolism), which is said to be proportional to an organism’s mass. Therefore it is often restricted to a value of three for theoretical, biological reasons.

The Chapman-Richards growth function can be applied to both, individual organisms as was well as to the growth of whole populations and describes cumulative growth over time. As such the function has an inflection point and an upper asymptote at reflecting a so-called sigmoid growth curve typical of growth processes, which are influenced by biotic and abiotic factors.

**Where does it come from?**

The Chapman-Richards growth function is based on the seminal work by Bertalanffy for animal growth and was published by Richard in 1959 and Pienaar and Turnbull (1973) introduced it to forestry applications. The model has a reputation of being very flexible on the slight expense of biological realism. It is valued for its accuracy, although there can be problems in the process of parameter estimation when is allowed to vary (Clutter et al., 1983; Pienaar and Turnbull, 1973; Zeide, 1933).

**Why is it important?**

The Chapman-Richards growth function has been a popular model for describing the growth of various tree and forest stand growth variables, e.g. tree and stand height, diameter at breast height, basal area and volume. As such it has been and is still widely used in many empirical forest growth simulators, particularly where the accuracy of model prediction is crucial (Zeide, 1933). The Chapman-Richards growth function has been used extensively to model site index development, i.e. the mean height development of the most dominant trees of a forest as a population characteristic for describing site quality (Burkhart and Tomé, 2012), leading to a so-called polymorphic height growth model.

**How can it be used?**

Assume you have access to sample data providing several combinations of a growth variable, e.g. tree height, and age. Using nonlinear regression methods you can estimate the parameters of the Chapman-Richards growth function. After estimating the growth parameters the model can be used for interpolation and for predicting past and future growth.

Absolute growth rate (AGR) is essentially the first derivative of the Chapman-Richards growth function:

The AGR function can be employed to model current annual increment or instantaneous growth. For relative growth rate (RGR) we lose one model parameter (Pommerening and Muszta, 2016) and the function terms simplifies to

.

Also the algebraic difference form (ADA) of the Chapman-Richards growth function is often applied (Burkhart and Tomé, 2012),

,

where the fracture constitutes a growth multiplier (Pommerening and Muszta, 2016) and the aymptote disappears. The algebraic difference form allows estimating the current value of a growth variable from a value in the past (anamorphic model).

**R code**

In R it is quite straightforward to estimate the parameters of the Chapman-Richards growth function through nonlinear regression. First we need some sample data and I have taken pairs of top height (the mean height of dominant trees, a population characteristic) and the corresponding age (assuming an even-aged forest) from a British yield table. In more interesting applications, similar data would naturally stem from field observations.

# Using some data from the British yield table for Scots pine, YC 14. topHeight <- c(8.9, 11.6, 13.9, 15.9, 17.8, 19.6, 21.3, 22.8, 24.2, + 25.4, 26.5, 27.4, 28.3, 29.0, 29.7, 30.3, 30.7, 31.1) age <- c(17, 22, 27, 32, 37, 42, 47, 52, 57, 62, 67, 72, 77, 82, 87, + 92, 97, 102)

Then we load a package for robust regression (as we know that the Chapman-Richards model can sometimes “play up”).

library(robustbase)

Finally we enter the actual regression code specifying the model, the data and the start parameters. The summary command provides the regression outputs including the estimated model parameters. (Parameter corresponds to parameter in the above equation.)

nlsout <- nlrob(topHeight ~ A * (1 - exp(-k * age))^p, data = data.frame(age, topHeight), start = list(A = 83, k = 0.03, p = 4), trace = TRUE) summary(nlsout) summary(nlsout)$coefficients[1 : 3]

It is always a good idea to check the value of parameter against the maximum value of the growth variable: Both values shouldn’t be very far off, because is the upper asymptote of the growth function.

**Literature**

**Burkhart, H. and Tomé, M., 2012. ***Modeling forest trees and stands*. Springer, Dordrecht.

**Clutter, J. L, Fortson, J. C., Pienaar, L. V., Brister, G. H. and Bailey, R. L., 1983.** *Timber management*. *A quantitative approach*. John Wiley & Sons, New York.

**Pienaar, L. V. and Turnbull, K. J., 1973.** The Chapman-Richards generalization of von Bertalanffy’s growth model for basal area growth and yield in even-aged stands. *Forest Science* **19**: 2-22.

**Pommerening, A. and Muszta, A., 2016.** Relative plant growth revisited: Towards a mathematical standardisation of separate approaches. *Ecological Modelling* **320**: 383-392.

**Zeide, B., 1993.** Analysis of growth equations. *Forest Science* **39**: 594-616.

]]>

Statistics@SLU has been a new experience for me during the last three years, as statistical consultation mostly meant additional “homework” at the other places I worked before and was therefore often avoided by those charged with it. Indeed, statistical consultation on top of research, teaching and administration can be a burden and a major distraction.

Therefore it is a good idea to organise this important mission properly by setting aside funds and dedicated staff. Still a centre of statistics like Statistics@SLU is not without its challenges: Not everybody values statistical consultation and takes this kind of service for granted. Internal funding is always scarce at any university and retiring statisticians are not always replaced. And the statisticians engaging in consultation are often not well recognised for their work and face problems when applying for promotion, because their workload often does not allow them to accumulate the necessary publications.

That is why it is so important to have a Centre where all consultants are united and can exchange their experience but also support each other. The Centre can effectively negotiate terms with the heads of departments, the deans and the vice chancellor. In this effort we are advised and guided by a steering committee and I am grateful for their commitment. This is also helpful to gain different perspectives.

This year we have again received good feedback for our work from both students and staff. For the consultants involved it is valuable to know that their efforts are appreciated. My Department has supported my consultation unit at Umeå by providing the funds to employ another statistician and the Faculty of Forest Sciences is kindly contributing towards funding the space charges of a dedicated consultation room that we intend to name after one of my predecessors, Prof. Bertil Matérn, an eminent Swedish statistician. The development of statistical consultation looks promising and we hope to secure continued central support for the good work that is done for students and staff.

]]>To cast light into this, I grabbed the phone and got in contact with a specialised lawyer from the Deutsche Hochschulverband in Bonn. He confirmed that European laws do indeed require data collected by public institutions to be publicly available. Such data are in fact not subject to intellectual property rights. Similar laws are apparently also in place in countries outside the EU, in North America and in Australia. In the UK and in the US these laws are referred to as the Freedom of Information Act. The purpose of these laws is to ensure the transparency and the reuse of data, particularly those that are gathered in a university context. Such raw data are generally free and Simon Hoggart adds that the British Ecological Society explicitly encourage data reuse through the open data policy they have adopted for their journals. The national research councils in every country are charged with the task to watch over the open-data legislation. Open access to data is also an important principle of the European Horizon 2020 research programme.

Given these trends and the clear legislation it is difficult to understand that research groups in some countries still try to sneak their way into good publications by making the use of their data dependent on a co-authorship without contributing to neither the analysis nor the text. It is usually weak research groups that try to enforce such practices, which constitute a clear breach of international authorship agreements such as the Vancouver protocol. Other groups try to raise money by arguing the data provision creates chargeable costs.

These practices are attempts not to comply with the current international legislation and with the requirements of the respective funding bodies. The OECD paper on “Principles and Guidelines for Access to Research Data from Public Funding” cite the US National Research Council: “The value of data lies in their use. Full and open access to scientific data should be adopted as the international norm for the exchange of scientific data derived from publicly funded research.” This is a very good expression for a vision that many of us share. In my view open data and methods (in the form of, for example R or SAS scripts) stimulate research cooperation and build trust in research results. Open access to research data is a fundamental principle of the international research community similar to human rights and democracy. We should all live up to this ethos and remind others to do the same.

]]>In his textbook on forest mensuration Kilkki (1989) described the forest ecosystem as an object system and how different systems and processes eventually lead to a decision system. In the mensuration system, it is defined what to measure and how. The data are then statistically processed in the computing system. The computed summary characteristics are processed in the planning system and the planning system leads to decisions and feed back into the object system, i.e. the forest ecosystem.

The early pioneers in forest science assigned professorships to these boxes, e.g. a professor in forest mensuration, a professor in forest inventory, in forest biometrics and in forest planning.

The development of general statistics happened approximately at the same time as the development of forest mathematics. General statistics often was influenced by applications, for example in agriculture. The work of W. S. Gosset (student) and R. A. Fisher was often inspired by some of their practical applications in agriculture.

Already in 1729 J. F. Penther published *Praxis geometriae*, a book on the surveying of forest areas. Around 1766 J. E. Vierenklee and C. C. Oettelt published textbooks on mathematics for foresters and forest mensuration. Something remarkable was achieved when J. C. Paulsen in 1795 published the first yield tables ever, i.e. models of biological production long before modelling started in biological science. In 1835 G. König published his famous book on forest mathematics that was published into many languages. University chairs in forest biometrics were only set up much later towards the 1960s till 1980s.

There are many different names for the academic subject area of forest biometrics. These include numerical forestry, quantitative forestry, forest statistics, computational forestry and others. At Umeå my Chair is referred to as mathematical statistics applied to forest sciences. All of these professorships accomplish similar things, which can be described as the “study of mathematical-statistical methods for effectively describing and modelling patterns and processes in forest science”. For me it is important that forest biometrics belongs to the realm of basic sciences and is a strong research area. Forest biometricians are mediators between mathematical statistics and forest science. And forest science as our application has also the potential of inspiring the development of general mathematical statistics.

Professors have specific lifetimes and their academic subject areas have lifetimes, too. Traditional subject areas may change over time, some vanish and new ones come into existence. Silviculture and forest economics have for example a long tradition, whilst soil science and forest ecology are comparatively young subject areas.

Forest biometrics is a quite young subject area and there is still a lot to expect from this field of forest science.

Mathematics is about patterns, natural sciences, too. Often mathematical patterns and patterns in natural sciences come together. In the Fibonacci sequence of numbers any subsequent number is the sum of the two previous ones. In some trees, the horizontal layering of branches is organised according to the Fibonacci sequence of numbers. This sequence of numbers is also related to another pattern, the golden section, which plays an important role both in mathematics and in biology.

The job of a professor is not only a job, it is something special, a vocation and a quest. In the same way the research topics we are dealing with should be something that intrigues us so much that we cannot simply let the important questions go. In a way they keep us awake at night. What are the topics that have kept me awake? There are certainly three that have kept me busy in the past years, i.e. woodland structure analysis & modelling, the biometry of plant growth and human behaviour in forests. There are a few things that link these three topics, one of them is space.

Two examples. A couple of years ago I wondered what might be the most important macro-processes influencing the formation of spatial patterns. Growth (G) is for example a very important process and recently evidence has been found suggesting that trees always need to grow at least a little for their survival. Interaction is another important process and the most prominent representative of tree interactions is competition (C). Then there are birth and death processes (B&D) where trees are either added to or deleted from the point pattern formed by the tree locations. Initially I believed the combined birth and death process would have the greatest influence on the formation of spatial patterns.

We developed an individual-based model two forest time series. The model was inspired by spatial statistics and is good at representing spatial tree interactions. Each of the three macro-processes was represented by three model parameters. We then applied a sensitivity analysis to study the variance of mark variograms, i.e. the model input, in terms of systematic variations of the nine input parameters. To this end we used a global sensitivity measure, the extended Fourier Amplitude Sensitivity Test. The results have shown us that birth- and death processes are of minor importance compared to growth and competition. We also learned that the importance of processes depend on the development stage of a forest. In the graph below (from Pommerening et al., 2011) the results are shown for a young Norway-spruce forest in Austria and for a natural Douglas-fir forest in British Columbia.

A quite different example is the behaviour of professional humans in forests. Forestry professionals select trees for a number of reasons, e.g. as Christmas trees, as habitat trees and in thinnings (to evict them from the forest for the benefit of others). A master student of mine has recently studied how much agreement there is among forestry professionals when it comes to tree selection. In the past it was always believed that forestry staff select trees exactly according to silvicultural guidelines. Therefore tree selection in forest simulators have been modelled quite deterministically.

My student studied the behaviour of test persons in twelve different forests throughout Great Britain from Northern Scotland to Cornwall. The test persons were asked to mark trees for two different thinning strategies, i.e. thinning from below and crown thinning. The latter technique is now often discussed in conjunction with continuous cover forestry (CCF). For assessing agreement we used Fleiss’ , a measure that is often applied in medicine and particularly in psychology. The values of this measure usually range between 0 and 1, 0 indicating chance agreement and 1 denoting perfect agreement.

It turned out that the overall agreement was low with maximum values of of around 0.5. The agreement was consistently and markedly higher in low thinnings than in crown thinnings. This was somewhat unexpected as crown thinnings usually come with a didactic element that narrows down the choice of trees. We tried to understand this outcome and it turned out that low thinnings have a long tradition in Britain and many professionals are used to it while crown thinnings are somewhat new. New techniques initially often create confusion and confusion leads to low levels of agreement.

This type of work in forest biometrics requires interdisciplinary teams and a necessary pre-requisite for recruiting such teams is a solid education in quantitative methods, particularly in statistics. Often we enter discussions at faculties about new module structures and the number of hours. Of course, all subject areas are important. Still at the end of the day, all research boils down to some numbers that we need to make sense of and we end up in a similar situation as the pioneers of forest science.

(Abbreviated inaugurational lecture on 13 May 2016)

]]>