It’s all in the Geese!

Leave a comment

May 19, 2017 by Species Ecology

It’s all in the Geese!

Mohammed Ashraf

You can download the portable document format of this essay by clicking PDF here

Wild Goose

Wildlife ecologists are often interested to find out the parameters that influence the species distribution and population size. These parameters can range from intrinsic ecological factors (for example density dependent population regulation) to extrinsic anthropogenic disturbances (man made caused of greenhouse gas emission). Within this broad spectrum, wildlife ecologists often need to find out the possible underlying trend or mechanism that influence the population parameter of species that are at concern. Lot of wildlife biologists who recently graduated are in a situation where they feel the necessary statistical tools they require to successfully carry out ecological data analyses are absent due to various economic and social factors that are hindering them to access the cutting edge scientific tools and resources. This problem is more intense in developing nations where technical and academic supports are often few and far between due to weak economic and social structure and conditions. For example, both in developed and developing nations, students are often trained to carry out necessary statistical tests under conceptually unified mathematical rigor within the broad spectrum of ecology in general and mathematics in particular. Students are trained to use handful of statistical and mathematical software that they are often introduced in their undergraduate university level education. These software usually range from Minitab, SPSS, JMP, for statistical analyses and MATLAB, Maplin and Maple for mathematical programming. These are commercially lucrative easy-to-use, graphical user interphase (GUI) based high-priced software that students once graduate, struggle to get hold of due to various economic and social factors beyond their control. One of the main factors is these are expensive software which are also closed-sourced meaning one cannot really access the source code (programming codes) to redevelop or reproduce the software or to work with the software in an ultimate freedom. Hence, students, once finish their undergraduate study in wildlife science or ecology, often find it hard to keep up their academic and research study in ecology in general and wildlife science in particular. These in turn affect the overall balance of delivering healthy pool of scientific scholars from biodiversity conservation arena. On the other hand, human existence deeply rooted into sustaining and conservation the remaining biodiversity of our planet. Simply put, if concerted and active (pro active as oppose to retrospective) measures to help conserve (if not preserve) the remaining ecological diversity within the next 25-50 years at top, our planet will simply face doomsday scenario which will eliminate human species in the very blink of our eyes. Considering to the fact, out planet is very old (over 5 billion years old), extinction of human species although pretty apparent in evolutionary time scale, will happen within the next 100 years or so if we fail to curve the species extinction in the face of capitalistic free-market resource consumption and exploitation across the hemispheres.


Despite the fact that many students from conservation biology related disciplines pose necessary scientific skills, it is unfortunate they they lack necessary technical tools to master and utilize their newly developed analytical skills in ecology due to capitalistic profit-driven market enterprise that only allow the wealthy section of the society to access their products in this case the scientific software that we are about to reveal. However, things have changed and lot of folks now started to boycotting these commercial enterprise and started to write their own source codes hence developing open-source scientific software to conduct their necessary ecological research. Therefore, in this article, I introduce R programming language which is open-source totally free (as often jokingly term as free beer) scientific software that is significantly more powerful and streamlined than any commercial enterprise software that are monopolized and controlled by capitalistic profit driven market tycoons. R can be accessed and installed pretty quick and if you are using UNIX variant operation systems for example Linux or BSD, R (the base R) usually comes with it. Here, I will not discuss the necessary steps one require to access and install R as these information are readily available on the net (use your favorite search engine to download and install R). Instead, I will dive down to the implementation and utilization of R programming language to address ecological datasets that are critically important to help conserve ecological units from genetics to population, from community to landscape level and from ecosystem to the entire planet (commonly known as biosphere or ecosphere).

Mommy goose with her gosling

I am always fascinated by ducks and geese and often interested to know more about their population and what influences their population. Hence I am going to provide you with a simple and clean example of how R (the power of R and things it can do for you are simply infinite and astounding) can help you develop your ecological model based on simple data that you can collect right after finishing your undergraduate study in wildlife ecology or any related disciplines (ecology, conservation biology, landscape ecology and so on). Remember, lack of technical resources due to capitalism is not your problem hence do not let capitalist to stop doing the good things for yourself and more importantly for our planet which needs more conservation scientists than MBA. Recently, I went to mangrove ecosystem of particular tropical estuarine landscape where I was interested to find out how geese population (greylag goose in particular) is influenced by the presence of crow and eagle nests around its vicinity. As we may know, although geese is relatively big bird, it has its fare share of enemy and often eagle and crows are the birds of prey that either kill gosling or severely disturb roosting and grazing habits of geese population. This problem is more phenomenon in tropical mangrove where geese often visit by following their long-haul winter migratory route from temperate and tundra ecosystems as far as Himalaya and Siberia. Here I am interested to see whether their is any possible relationship between number of geese and number of eagle or crow nests. I am also interested to count both male and female geese and how male and female population size are influenced by crow nest within their roosting points.

I collected sample of 49 observation of geese population size over two weeks of ecological survey in mangrove ecosystem. My sampling sites are randomly selected and no sites are repeated to collect data. The total survey area was 100 sq km estuarine mangrove. I first carried out necessary feasibility study to find out how much of an area I can cover to count geese in one fell swoop. Based on my energy and logistic resources, I worked out if I can cover 2 sq km a day, I can then generate fifty, 2-sq-km blocks (50 times 2 equates 100 sq km) to carry out my sampling survey. The block design is critically important. It is pre-requisite for random sampling design in which I must need to ensure each 2-sq-km block posses equal chance of being selected for my survey so that I do not end up choosing any block based on favoritism (as if I do what I want or like, as if ad-hoc study which has no ecological and scientific bearing). Hence each of my 2-sq-km block has probability of (1/20) or 0.05 percent chance of being selected hence will form valid representative of the entire area of 100 sq. km. Its notable to point out that the power of random sampling is very robust therefore it does not really matter how many sample blocks you going to choose to carry out your survey (although one rule of thumb is no less than 10 percent of the total sampling size). However, what does matter is whether you have randomly selected your blocks or not. Therefore, even if I choose to carry out my graylag geese survey on 10 sq km (10 percent of my total 100 sq km sampling area) which works out five 2-sq-km blocks out of total twenty blocks comprising my potential survey unit of 100 sq km, I can still come up with ecologically valid data set with regards to geese population and eagle nests to infer or generalize how the geese numbers are influenced by the eagle nests (you could class this as my working ecological question at this point of time).

Sampling Blocks

Lets do some work on R to start with. I need to choose five random blocks of 2 sq km each from total of 20 blocks that comprise my 100 sq km survey area. Please note, I created a cell block (see the matrix diagram)  with each blocked are assigned with serial number starting from 1 and ending with 50.

Sample of randomly selected blocks

I will now ask R to randomly select 5 blocks out of 50 and present me with the set of random five numbers which will be my sampling blocks. I can write a simple code that R will use to generate random five blocks out of 50 from my five by ten (5 rows and 10 columns) matrix dimension and the R code is provided below.

sample ( x= 1 : 50, size = 5, replace = FALSE)

{40 7 5 9 2}

I have written this simple code above hence asked R to generate five random numbers out of 50 hence I can write my sampling block by using set builders notation as such [1 \geq{x} \geq{50} \, x | 40, 7, 5, 9, 2] (Pronounced as the set of all x between 1 to 50 such that x is 40, 7, 5, 9, 2). Hence these set of five numbers (40, 7, 5, 9, 2) are my ecological survey unit comprising total of 10 sq km out of 100 sq km potential survey unit. I have further generated the matrix dimension but this time I have highlighted my random blocks in which I will investigate the grey-leg goose population size and how it relates with eagle and crow nests in or around their roosting/grazing/resting site.

Before we go further, just a quick note on my simple R coding. As you can see, I have asked R to randomly select 5 numbers between 1 to 50 by assigning it as as variable x hence (x = 1 \cdots \cdots 50). I then assigned R with my sample size which is 5 meaning R will randomly select five blocks out of 50 from my sample matrix. Finally I asked R not to replace the block by writing FALSE. What it means is, by default, R will pick any number between 1 to 50 randomly and then put it back into the system (often known as recycling) but we do not want to select the same number twice hence I asked R not to replace the selected number which in coding term simply works out as replace = FALSE.

Now that I have my blocks randomly selected I can begin my survey work (the fun part). I have visited the blocks every morning and every evening for the past two weeks and collected the data on greylag goose population size. I also collected the data in terms of distribution of greylag goose by gender (male and female goose). I then carried out line transect survey in the same blocks every morning and evening to count eagle and or crow nests in or around the vicinity of grazing/roosting and resting sites of the goose population. My line transects were roughly half a km long although some line transects were a km long due to high density of crow nests in relation to the vicinity of greylag goose population. My dataset is presented below. Can we make anything out of this data? Can we answer few statistically valid ecological questions from this dataset? Possibly not, because dataset is often useless on its own unless we make it meaningful. How we going to gain high level understanding from this freshly collected ecological data on grey-leg goose population in relation to crow/eagle nests? Answer lies in solid command in statistics and harnessing the power of statistical tools and modeling. We will harness the power of statistical tools by utilizing the power of R programming. Hence the remaining part of the essay will focus on R coding to gain high level understanding of out dataset.

When we are presented with dataset of two numerical variables as in the case in my data, we are often interested to find out whether these numerical variables are anyhow relate with each other. Here I am interested to find out whether there is any relationship between number of goose and number of eagle nests. I am also interested to find out whether there is any relationship between male and female goose distribution in relation to eagle/crow nests. Furthermore, I am ecologically motivated to develop a model that will provide statistically valid summery which we can utilize to generalize and make predictions in terms of goose population and eagle nests. Does any of these make sense so far and if so, how we go about it? It’s simple, we let R to answer all these interesting questions. All we gotta do is ask R by writing codes (language) that R can understand. It’s as simple as that.

As I was saying before, when we have two numerical variables (fashionably known as bivariate data), first thing we want to do is create a scatter plot to see at a glance what our data looks like graphically. This would be our first step towards gaining high level understanding of ecological data. I am going to write fairly simple code hence ask R to generate a scatter plot for me. But before we do anything, let me provide you with brief fundamental information with regards to how exactly R plots graphs. Firstly, R is highly powerful and sophisticated mathematical programming language that hosts over 5000 packages. These packages are developed by scientists from various backgrounds ranging from mathematicians to wildlife ecologists, academic scholars to computer programmers. Packages are like a restaurant where you can go and order meal and order various types of meal and enjoy! In R packages are like different restaurant. For example, you can choose to go to Pakistani restaurant to enjoy Pakistani cuisines or Bangladeshi restaurant to enjoy Bangladeshi cuisines. In R, you have packages very similar to your choice of restaurants. You can download and install package that will generate highly sophisticated data rich and powerful graphs for your analytical modeling. You can also install package that will do all the algebraic calculations or solve advance problem focusing calculus and so on. You can also install package that will do cladisitic and principle component analysis and more advance work. You can also install package that will carry out geographic analysis GIS for you. Hence its like going to different restaurant right and there are over 5000 different restaurants (packages) in R town. Making any sense so far? Now, I have also mentioned about going to restaurant ordering your favorite menu. Surely there are many menu to choose from. In R, we call them function hence each package will come with lots of functions that we need to use to write our programs or to instruct R to carry out set of specific numerical and statistical tasks. Hence, package is like restaurant and functions are like menu. Just like if you choose to go to Pakistani restaurant, you are not expecting to order Vietnamese menu right? So if you are working on ggplot2 package of R, you are not expecting to conduct matrix or principle component analysis (PCA) right? What it entails is, set of R functions are grouped together to work for specific package. Although there may be situation where you come across functions (that is menu in a restaurant) are overlapping between one package (package is your restaurant) to another, but generally packages host set of functions to carry out specific mathematical and statistical tasks. I have already indicated one of the package that we going to use to analyze our geese data. This package is called ggplot2 and it will host set of functions that we will utilize to derive high level understanding of our data through insightful graphs. Now that you have some basic background understanding how R packages work alongside with set of functions that come within the package, we can start the analytical part of our ecological study. It’s really a fun part when you learn to harness the power of R coding to gain high level understanding of you hard-earned ecological field data.

As mentioned earlier the first thing I would like do to is, generate a scatter plot to see how two of my variables are laid out. Hence I am interested to see how number of geese are influenced by eagle/crow nest. Scatter plot is really a point graph where we have our eagle/crow nest at x-axes and geese numbers in y-axes. ggplot will do all the job to generate a point corresponding to both x and y axis for my geese variables. First I load the ggplot2 package and then develop a framework in which I will simply add necessary layers to enrich the plot as we go along. It’s pretty simple. It’s like baking a cake. You make a plain cake and then add necessary toppings from strawberry cream to different flavor of vanilla or chocolate, may be even put ice cream in it too…so the options are unlimited. It’s the same with ggplot. We first ask R to develop the framework and then simply add layers to enrich our graph to gain high level statistical insight.


geese_plot <- ggplot (geese, aes(eagleCrowNest, numberOfGeese))

geese_plot + geom_point()

Scatter plot

Now, this is our very basic scatter plot. At quick glance, as we can see, almost all our geese numbers fall between the crow nests that range from 0-2.5. In other words, simply by glancing our scatter plot, we have already gained a valuable information about how our geese numbers are influenced by crow/eagle nests. We can also visualize the fact that there is considerable variation in geese numbers ranging from 0 to 20 within the nesting range of eagle/crow from 0 to 2.5. But have you spotted one or two things yet? Have you noticed that our scatter plot actually does not reveal information in terms of gender? Remember I collected data of geese numbers of both male and female geese. So the question I am not curious to know is how male and female geese population is distributed within the crow/eagle nesting range of 0 to 2.5. All I am going to do now is, write a simple code in my original framework to instruct R to provide me with gender wise population distribution and the codes are as follows:

geese_plot <- ggplot (geese, aes(eagleCrowNest, numberOfGeese, color = Gender))

geese_plot + geom_point()

Scatter plot color coded

I just typed color equal to gender and ggplot has done the rest. It has pulled together my data variables, matched the variables together to generate points based on gender. It then color coded the gender so that I can visualize the difference of geese numbers based on their gender. How nice and powerful is that? Now, my data is making more sense. Not only I now know that geese numbers do well when eagle nests or less than 2.5 but I also now pin point how female numbers are more sensitive to crow nests as oppose to the male geese. More particularly most of our female geese population do well when there is no crow nest at all. Take a good look at the first column of our scatter plot where you will find more pink dots (female geese) vertically lined up where crow nest in our x-axes is 0. Interestingly we found only one male geese when crow nest is 0 and rest of them are all females. In our second column we see considerable variation in geese population ranging from as low as 0 to as high as 20. We do now really know why there is such a high variation in terms of geese numbers but we do know that there are more male geese than female within this population variation. Now, have you spotted something else so far? Have you counted my total observation. I have collected total of 49 sample of geese population from my two-weeks field survey. But, if you count the points, it does not match up. Can you answer why not? It’s cause we may have points that are overlapping with other points meaning they possibly have similar or same number in terms of their population size. Therefore, we need to ask R to disentangle the overlapping of our data point to reveal all our data points in the graph. This hopefully would provide us with more clearer perspective how the population is actually influenced by the eagle nests. Because the position of our data points may have been overlapped, all I will do is write position equal to jitter. What jitter does is it disentangle any observation that overlaps with other. The mathematical procedure that R follows is also pretty simple. R simply assign a random number as reference point for each observation and then based on that reference number it can geometrically disentangle any closest numbers surrounding it. The codes and the output are provided below:

geese_plot <- ggplot (geese, aes(eagleCrowNest, numberOfGeese, color= Gender))

geese_plot + geom_point(position = “jitter”)

Scatter plot with randomly assigned numbers

Now this provide all our data points. If you now go ahead and count the points, it should match up to my 49 observation. This also revealed that almost half of our observation was overlapped hence we did not see it from our previous graph. This non overlapping jitter plot now actually revealed full picture of our geese population distribution. We can almost confidently say that female geese population is very sensitive to even small increment of eagle or crow nest. As you can see from the graph that there exist distinctive separation of female population size in terms of eagle/crow nest numbers. Lot of females are almost absent (see the base of the x – axes) even when the crow nest is less than $4$. I still think data are clumped together. Although it has revealed all our data points, by a quick glance we see, some points are still relatively attached to one another and that is due to the size of the point (the circle). What I would like to do now is change the size of the circle so that it provide us with slightly more improve version of our jitter plot. The code and the output are provided below:

geese_plot <- ggplot (geese, aes(eagleCrowNest, numberOfGeese, color = Gender))

geese_plot + geom_point(size = 1.5 , position = “jitter”)

Scatter plot with point size change

Now this looks lot better. Hence this would be our standard scatter plot for further data exploration and analysis. By now you probably started to realize the power of R coding and more precisely the flexibility, freedom and options of writing your own codes to explore, analyze and manipulate data under conceptually unified mathematical and statistical rigor. ggplot is extremely flexible and powerful and if you planning on becoming full blown scientist or academic, regardless of which discipline your study and research focused on, you would be million times better off harnessing the power of R programming language as oppose to commercial profit-driven capitalistic products that you have probably used when you did statistical course at your undergraduate or graduate school.

Before we go ahead carrying out further data analysis based on the scatter plot that ggplot has enabled us to create, did you notice something that we could change at this point. If we look at the labels of the graph in x and y axis, we could improve it by adding a layer. As mentioned earlier, once you developed your skeleton of the plot by using ggplot command, all we have to do afterwards is continue adding layers to improve our plot. Hence lets improve the label of our plot by simply adding a layer called labs. The command and the improved output of the plot are provided below:

geese_plot <- ggplot (geese, aes(eagleCrowNest, numberOfGeese, color = Gender))

geese_plot + geom_point(size = 1.5 , position = “jitter”) + labs (x = “Number of Eagle/Crow Nests”, y = “Number of Geese”)

Scatter plot with streamlined labels

Now this improved version is obviously reveal more clearer understanding in terms of what our x axis and y axis represents in terms of our bivariate data variables. Although you may have noticed that I keep typing the backbone code which developed our skeleton:

geese_plot <- ggplot (geese, aes(eagleCrowNest, numberOfGeese, color = Gender)).

However when you you actually working in writing R code, you only have to do it once. After that, you just work on adding layers just like the the way I added labels as one of the layers in our original skeleton which R saved as R object as geese_plot.

Now that we have covered quite a bit in terms of collecting data to data pre-processing and beyond by organizing our data hence to generate scatter plot to make some meaning of our data set by harnessing the power of ggplot in R coding, we will step back a bit and focus on statistical method underpinning our data variables. In this remit, its notable to emphasis the fact that when we work on bivariate datasets, as in our geese datasets, we are often interested in three aspects of our data variables: 1. Scatter plot to get a first hand glance hour our data are behaving hence to make first hand impression of our ecological variables. 2. We then very much interested to determine whether our data is linearly distributed. That is whether our scatter plot looks like it can be fitted with a straight line. This is statistical technique and it is known as regression method. Hence, in our scatter plot, our next job would be to conduct regression analysis. At first glance, it is pretty evident that our dataset is actually not forming a straight line as most of our data points are clustered between 0 to 4 in our x axis. Nevertheless, it does open up a question then, what proportion of our data points can be answered through fitted line or as it known as regression line. Regression line is simple a straight line that help us to predict data points within specific range of our original data values. Hence regression line is pretty helpful for making predictions. For example, firstly, I am interested to find out what proportion of our data points can fall into regression line that is if I would have to predict geese number based on crow nest variations across x axis, I am then interested to find out what percentage of our data can be explained or predict from the regression line. Do all these making sense? I am not going to go into critical details of statistical mechanism as I intend to provide you with separate treatments of regression analysis by my other articles. But for now, we will simply R to fit a regression line in our scatter plot. Again, the procedure is pretty simple. We will simply add another layer. In R programming, regression line is known as smooth. The rationale behind the name is, it makes our data variables smooth by finding the best fitted line based on all the data points we have in our scatter diagram. Of course, R does not pull this off in thin air…the mathematical procedure R use is rooted into conceptually unified statistical rigor. In other words, R will find the best fitted line based on least squared criterion which is an statistical and algebraic procedure to find the best line that can fit among our data. For now, you do not really need to focus on how this line is derived mathematically as this article is more about appreciating the R programming and it implications on ecological study. Hence, I am going to write a code for adding another layer as smooth and the code and the output are as follows:

geese_plot <- ggplot (geese, aes(eagleCrowNest, numberOfGeese, color = Gender))

geese_plot + geom_point(size = 1.5 , position = “jitter”) + labs (x = “Number of Eagle/Crow Nests”, y = “Number of Geese”)+ geom_smooth(method = “lm”)

Scatter plot with regression line modeled into it

Here we have our linear model and as you can see R has found the best fitted lines both for our male and female geese. Although, as suspected, our lines are not about the data points as most of the data are not really about any of these lines hence it intuitively answers my question that is very small proportion or percentage of our data points can be answered or predict through our best fitted linear model or regression line. Nevertheless, it still provide lot of solid insight. For example, as I was telling you before that our female geese are really super sensitive to eagle or crow nest and a quick glance at our regression lines (red line for the female data points) confirm that. As you may know from elementary geometry, more precisely from your coordinate geometry class that slope of any straight line is defined as ratio of rise and run where rise is the difference between two coordinate points in y axis and run is the difference between two points in x axis. If you look into our female geese regression line (the red line) we see, its slope is higher (because the red line is lot more steeper) than the blue line representing our male geese. In other words, even though our regression line does not really provide a robust linear model for making ecological predictions, it does however tell us the steepness of the female data points which in turns mean our female geese are extremely sensitive to eagle nests in or around their vicinity. Of course it is expected as females exhibit brooding attributes and strong motherly instinct to protect their eggs and subsequent gosling. Therefore ecological and conservation management implication is to ensure crow nests are removed if our conservation management goal is to help safeguard migratory female geese population in any specific estuarine mangrove ecosystem or freshwater wetlands as an example.

Now, lets ask R to do further improvement of our regression lines. As you may notice, that our regression lines also have shaded area. Firstly what are these shaded areas. Shaded area are actually 95% confidence interval. 95% confidence interval is a statistical measure that enable us to answer in terms of our probability to make predictions from our data points. And not surprisingly, as mentioned earlier, our regression models are pretty weak (small proportion of our data points are about the lines, meaning close to the lines) hence as you can see from the shaded area, we are 95% confident that only a small proportion of our data points can be utilized for making predictions. In other words, most of our data points are actually outside of our shaded area. However, there are overlapping between male and female confidence intervals. The middle portion (slightly more darker) is actually our overlapping proportion of male and female geese data points and this has serious conservation and ecological significance. However, before we do any further analysis, what we like to do is disentangle our common color coding of gray shaded area. Hence we would like to ask R to assign separate color for our female and male confidence intervals. This would then enable us to appreciate the overlapping part better hence would help us to gain high level understanding of overlapped confidence intervals to make robust predictions.

Did you notice this is the first time, I actually brought the option for making prediction based on our weak linear models. Can you tell me why? It is because even though we are only dealing with two variables that is geese numbers and crow nests, we in fact have two groups or levels in our geese data that is male and female geese. Hence we have this overlapped confidence interval with decent proportion of data points comprising male and female within our data range. Therefore as you can see, even a weak regression model can serve us with valuable insights into our data points providing we have grouped (male and female group) scatter plot. Before we gain ecological insight from our grouped overlapped linear model, let’s just write a simple code that will eliminate our gray color and separate our confidence intervals of our male ad female geese population sample. The code and the output are provided below:

geese_plot <- ggplot (geese, aes(eagleCrowNest, numberOfGeese, color = Gender))

geese_plot + geom_point(size = 1.5 , position = “jitter”) + labs (x = “Number of Eagle/Crow Nests”, y = “Number of Geese”)+ geom_smooth(method = “lm”, aes (fill = Gender))

Scatter plot with regression models color coded

Now this is lot better. R has got rid of the gray shaded bits and added distinct colors for both female (pink) and male (light blue) geese reflecting 95% confidence intervals. The middle portion which is overlapped common area is also lot more clearer and it reveals significant data points are in fact overlapped. However, to ensure no data points are hiding under the overlapped color codes, we can actually do better by asking R to lighten the colors so that if any data points that might be hiding behind the colors can be revealed. The code is simple. Under geom_smooth which is our regression line, we will simply incorporate alpha with numeric value to lighten the shaded area in our plot. The magnitude of numeric value in decimal points that alpha can take determines how light or dark you wish your shaded area to be, depending on the modality of your regression analysis of course.I usually stick to decimal range between 0.1 to 0.3 to lighten the shaded area to reveal any data points that me be previously hidden behind dark shadow

geese_plot <- ggplot (geese, aes(eagleCrowNest, numberOfGeese, color = Gender))

geese_plot + geom_point(size = 1.5 , position = “jitter”) + labs (x = “Number of Eagle/Crow Nests”, y = “Number of Geese”)+ geom_smooth(method = “lm”, aes (fill = Gender), alpha = 0.2)

Streamlined and final version of our regression models

Now, surely, this has greatly improved our regression graph and we can clearly see a good deal of our data points can be answered within the overlapped 95% confidence interval. At first glance, we can confidently say that the variation of female geese numbers range from 6 to 9 when there is total absent of crow or eagle nests. Moreover, we are 95% confident to make future prediction that in any particular estuarine mangrove ecosystem, female geese numbers will range between 6 to 9 when predatory bird as such eagle or crow nests are absolute absent. In terms of male geese population, generally speaking, male geese exhibit less sensitivity towards crow nests. Our male geese data points pose considerable variation ranging from 1 to 25 and most importantly within this range, almost all male geese can tolerate predatory birds (crow/eagle) presence that range from 0 to 6. Our data also reveals that we have two extreme observation in which two males show unusual characteristic. In y axis we have 25 male geese (extreme observation point) sitting against predatory nests of 5 which although unusual but intuitively it is pretty evident that large numbers of males in a flock are brave enough to tackle predatory presence ranging from 0 to almost 7. On the other hand, although we have witnessed good numbers of our female geese are absent when crow numbers varies from 1 to 12, however, in x axis, we see an extreme observation of one male and the only male which is absent when predatory nests range from 0 to 12. In other words, all our male geese were present with variations in numbers of 1 to 25 within the predatory range of 1-12 except one male which is our outlier.

As you may realize that ecological study of any species simply rooted into conceptually unified statistically valid sampling design, followed by sampling bound data collection leading to data analysis by harnessing the power of sophisticated and powerful statistical packages that are at our disposal. In this essay, I demonstrated the power of R programming language by drawing attention from basic ecological study focusing gray leg geese population influence against predatory bird population in estuarine mangrove ecosystem. This study demonstrate the power of R programming by harnessing the statistical tools as such regression model and its implications on ecological and conservation management.

Finally, in this article, I did not attempt on covering the statistical procedures to develop regression line, neither I attempted to provide underlying statistical mechanisms that underpin this study. More precisely, this study is rooted into developing regression equation \hat{y} = b_{0} + b_{1}x, followed by estimating coefficient of determination that reflects what proportion of our data can be fitted into regression line and finally calculating the correlation coefficient also known as Pearson coefficient (named in the honor of the developer Karl Person who originally developed the method). These three statistical procedures underlie the study of my geese population and provided the conceptual framework of the essay. In my next essay, I intend to present these statistical methods and the full treatment of its analytic procedures drawing attention from the same datasets of geese population. This essay is primarily intended to serve two purposes: 1. To show the power of R programming language 2. To understand and appreciate ecological study and its close association with statistics and R programming language as powerful and sophisticated mathematical package to answer simple but interesting ecological questions focusing animal population sampling and estimation methods.

This essay is prepared in \LaTeX – the brainchild of Donalnd Knuth, developed by American Mathematical Society (AMS) and created by George Gratzar from University of Manitoba Department of Mathematics. I have also utilized both Python and R Programming Language to develop quadratic population model and for designing random sampling matrix. No commercial software under capitalistic market share is used in preparation of this draft. UNIX variant GNU-Debian Linux is used throughout as core to run all software packages.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Our Gravatar Profile

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 244 other followers

Species Ecology on Flickr

RSS Conservation News

Blog Stats

  • 4,856 hits

Follow Our Books


Articles by Calendar Months

May 2017
« Mar   Jan »
%d bloggers like this: