Tag Archives: Wildlife Science

Solving Biodiversity Crisis : Highlighting Math & Computer Programming at Kindergarten Schools

Solving Biodiversity Crisis : Highlighting Math & Computer Programming at Kindergarten Schools

Mohammed Ashraf

The biological diversity of our planet are in deep crisis. The rate of extinction is overwhelming among all taxa purely due to human perturbation and argument is marshaled that earth has already undergoing 6th Mass Extinction Event. The last extinction event was of course 65 million years ago when dinosaurs gone extinct due to cosmic cataclysm and long before mammalian evolution. In spite of the fact that the probability of cosmic catastrophe for example super-size meteor from earth’s nearest asteroid belt ends up hitting our planet is relatively low but it does occur roughly of two million years interval. Therefore it is safe to say, extinction event will occur and it will eliminate human species and will give rise to new species over millions of years evolutionary time. The preponderance of humans’ interference on earth’s bio-geo-chemical cycles, ecosystems and biological diversity make humans more vulnerable to extinction and surely out pace any cosmic catastrophes that ever took place in millions of years of earth’s evolutionary timeline. The implication of anthropogenic negative impacts give rise to biodiversity and ecosystem conservation across tropical and semi tropical belts where species richness and diversity are highest. The science that govern the ecosystem conservation and management is deeply rooted into few hundred years of solid principles of ecology per se but both conservation biology and ecology remain crisis disciplines. For example, charismatic large vertebrates that are often served as flagship umbrella species across the tropics are seriously endangered due to human centric negative actions in spite of large scale conservation projects to revive their dwindling population size over the last half century. The reason for this mismatch between theoretical ecology and its practical implementations to help conserve species – the fundamental ecological unit of any ecosystem – partly relates to lack of mathematical underpinnings. Undoubtedly modern and quality wildlife ecologists recognize the importance of developing cutting edge mathematical algorithm that can solve many of the ecosystem and species diversity problems, it is nonetheless, beyond penetration for semi-educated average laymen who are in charge of conservation policy development or in political leverage to enforce species protection by and by. Ultimately the hodgepodge of thick bureaucracy coupled with weak administrative and political leverage makes a clear cut escape from the overall realm of wildlife science therefore leaving wildlife ecology and conservation as standalone ‘high-class’ obsession to protect biodiversity in the face of malnourished economy, social injustice, burgeoning human population, capitalism hence exploitation and of course hard core poverty and environmental degradation across tropical and neotropical belts. To reverse the trend, bottom-up approach but more importantly radical and revolutionary approach is needed. The kind of approach that integrates mathematics as fundamental language to convey messages across scientific and non scientific arena. This can only be achieved with wide scale integration of mathematics in kindergarten and elementary schools. The kind of mathematics that are not ramshackle hence bounded by dusty curriculum. For example, almost all kindergarten schools in ecologically rich but economically poor nations teaches math in old-fashioned way where numerical solutions of mathematical problems are either solved through using tedious calculator or manually by hand. It is however necessary to learn how to write steps to solve arithmetic and algebraic problems by hand but it is more important to embrace computer programming to write codes that can solve problems. This can out run old fashioned way of doing math by using scientific calculator per se but more so it introduce programming skills to students at very early stage. Almost all ecological and environmental problems are deeply entrenched into devising cutting edge mathematical models and simulation. These can only be performed by utilizing the power of computer programs. In other words the more efficiently one can write programming codes, the better his chance to solve complex problems. Therefore it is at paramount importance to introduce computer programming to students at very early stage of their education. The benefits are enormous and it certainly brings the concept of ‘modern problem solving approaches’ and put the students at the center stage of scientific arguments and consensus. For example, if I ought to know the kill rate that is how many deer an individual tiger can kill per year, I would need to develop a mathematical algorithm after collecting my field data on deer kill by tigers. This at a glance may daunting for kindergarten teachers to teach to their students but it is in fact easier than plotting a function by using scientific calculator. The conservation implication of numerically understand the kill rate is simple. It can enable us to devise management strategy for deer stock knowing how many deers an individual tiger needs per year for survival, breed and to successfully raise its cubs. The kill-rate algorithm can enable me to project future trend in terms of deer population size versus tiger numbers, ecosystem health, survival and mortality rate of both tiger and deer and so on : all critically important to answer in the face of rapid extinction of carnivores across tropical ecosystems. If students and I meant all students are taught math by utilizing the power of 21st century modern approach that is writing codes, chance are high we will end up having a generation pool who will chop off the thick bureaucracy, assimilate science and see the wildlife ecology from the lens of mathematical algorithms. The lectures, my first lecture on R and Maxima programming below will introduce the advantages of writing codes and to solve problems. If students watch the video and then gets motivated to write their own codes, this blog I just typed has achieved its objective. Enjoy coding folks.

Towards Sustainable Ecological Future!


Lecture on R Programming to Solve Mathematical Problems :

Lecture on Maxima Programming to Solve Mathematical Problems :

It’s all in the Geese!

It’s all in the Geese!

Mohammed Ashraf

You can download the portable document format of this essay by clicking PDF here

Wild Goose

Wildlife ecologists are often interested to find out the parameters that influence the species distribution and population size. These parameters can range from intrinsic ecological factors (for example density dependent population regulation) to extrinsic anthropogenic disturbances (man made caused of greenhouse gas emission). Within this broad spectrum, wildlife ecologists often need to find out the possible underlying trend or mechanism that influence the population parameter of species that are at concern. Lot of wildlife biologists who recently graduated are in a situation where they feel the necessary statistical tools they require to successfully carry out ecological data analyses are absent due to various economic and social factors that are hindering them to access the cutting edge scientific tools and resources. This problem is more intense in developing nations where technical and academic supports are often few and far between due to weak economic and social structure and conditions. For example, both in developed and developing nations, students are often trained to carry out necessary statistical tests under conceptually unified mathematical rigor within the broad spectrum of ecology in general and mathematics in particular. Students are trained to use handful of statistical and mathematical software that they are often introduced in their undergraduate university level education. These software usually range from Minitab, SPSS, JMP, for statistical analyses and MATLAB, Maplin and Maple for mathematical programming. These are commercially lucrative easy-to-use, graphical user interphase (GUI) based high-priced software that students once graduate, struggle to get hold of due to various economic and social factors beyond their control. One of the main factors is these are expensive software which are also closed-sourced meaning one cannot really access the source code (programming codes) to redevelop or reproduce the software or to work with the software in an ultimate freedom. Hence, students, once finish their undergraduate study in wildlife science or ecology, often find it hard to keep up their academic and research study in ecology in general and wildlife science in particular. These in turn affect the overall balance of delivering healthy pool of scientific scholars from biodiversity conservation arena. On the other hand, human existence deeply rooted into sustaining and conservation the remaining biodiversity of our planet. Simply put, if concerted and active (pro active as oppose to retrospective) measures to help conserve (if not preserve) the remaining ecological diversity within the next 25-50 years at top, our planet will simply face doomsday scenario which will eliminate human species in the very blink of our eyes. Considering to the fact, out planet is very old (over 5 billion years old), extinction of human species although pretty apparent in evolutionary time scale, will happen within the next 100 years or so if we fail to curve the species extinction in the face of capitalistic free-market resource consumption and exploitation across the hemispheres.


Despite the fact that many students from conservation biology related disciplines pose necessary scientific skills, it is unfortunate they they lack necessary technical tools to master and utilize their newly developed analytical skills in ecology due to capitalistic profit-driven market enterprise that only allow the wealthy section of the society to access their products in this case the scientific software that we are about to reveal. However, things have changed and lot of folks now started to boycotting these commercial enterprise and started to write their own source codes hence developing open-source scientific software to conduct their necessary ecological research. Therefore, in this article, I introduce R programming language which is open-source totally free (as often jokingly term as free beer) scientific software that is significantly more powerful and streamlined than any commercial enterprise software that are monopolized and controlled by capitalistic profit driven market tycoons. R can be accessed and installed pretty quick and if you are using UNIX variant operation systems for example Linux or BSD, R (the base R) usually comes with it. Here, I will not discuss the necessary steps one require to access and install R as these information are readily available on the net (use your favorite search engine to download and install R). Instead, I will dive down to the implementation and utilization of R programming language to address ecological datasets that are critically important to help conserve ecological units from genetics to population, from community to landscape level and from ecosystem to the entire planet (commonly known as biosphere or ecosphere).

Mommy goose with her gosling

I am always fascinated by ducks and geese and often interested to know more about their population and what influences their population. Hence I am going to provide you with a simple and clean example of how R (the power of R and things it can do for you are simply infinite and astounding) can help you develop your ecological model based on simple data that you can collect right after finishing your undergraduate study in wildlife ecology or any related disciplines (ecology, conservation biology, landscape ecology and so on). Remember, lack of technical resources due to capitalism is not your problem hence do not let capitalist to stop doing the good things for yourself and more importantly for our planet which needs more conservation scientists than MBA. Recently, I went to mangrove ecosystem of particular tropical estuarine landscape where I was interested to find out how geese population (greylag goose in particular) is influenced by the presence of crow and eagle nests around its vicinity. As we may know, although geese is relatively big bird, it has its fare share of enemy and often eagle and crows are the birds of prey that either kill gosling or severely disturb roosting and grazing habits of geese population. This problem is more phenomenon in tropical mangrove where geese often visit by following their long-haul winter migratory route from temperate and tundra ecosystems as far as Himalaya and Siberia. Here I am interested to see whether their is any possible relationship between number of geese and number of eagle or crow nests. I am also interested to count both male and female geese and how male and female population size are influenced by crow nest within their roosting points.

I collected sample of 49 observation of geese population size over two weeks of ecological survey in mangrove ecosystem. My sampling sites are randomly selected and no sites are repeated to collect data. The total survey area was 100 sq km estuarine mangrove. I first carried out necessary feasibility study to find out how much of an area I can cover to count geese in one fell swoop. Based on my energy and logistic resources, I worked out if I can cover 2 sq km a day, I can then generate fifty, 2-sq-km blocks (50 times 2 equates 100 sq km) to carry out my sampling survey. The block design is critically important. It is pre-requisite for random sampling design in which I must need to ensure each 2-sq-km block posses equal chance of being selected for my survey so that I do not end up choosing any block based on favoritism (as if I do what I want or like, as if ad-hoc study which has no ecological and scientific bearing). Hence each of my 2-sq-km block has probability of (1/20) or 0.05 percent chance of being selected hence will form valid representative of the entire area of 100 sq. km. Its notable to point out that the power of random sampling is very robust therefore it does not really matter how many sample blocks you going to choose to carry out your survey (although one rule of thumb is no less than 10 percent of the total sampling size). However, what does matter is whether you have randomly selected your blocks or not. Therefore, even if I choose to carry out my graylag geese survey on 10 sq km (10 percent of my total 100 sq km sampling area) which works out five 2-sq-km blocks out of total twenty blocks comprising my potential survey unit of 100 sq km, I can still come up with ecologically valid data set with regards to geese population and eagle nests to infer or generalize how the geese numbers are influenced by the eagle nests (you could class this as my working ecological question at this point of time).

Sampling Blocks

Lets do some work on R to start with. I need to choose five random blocks of 2 sq km each from total of 20 blocks that comprise my 100 sq km survey area. Please note, I created a cell block (see the matrix diagram)  with each blocked are assigned with serial number starting from 1 and ending with 50.

Sample of randomly selected blocks

I will now ask R to randomly select 5 blocks out of 50 and present me with the set of random five numbers which will be my sampling blocks. I can write a simple code that R will use to generate random five blocks out of 50 from my five by ten (5 rows and 10 columns) matrix dimension and the R code is provided below.

sample ( x= 1 : 50, size = 5, replace = FALSE)

{40 7 5 9 2}

I have written this simple code above hence asked R to generate five random numbers out of 50 hence I can write my sampling block by using set builders notation as such [1 \geq{x} \geq{50} \, x | 40, 7, 5, 9, 2] (Pronounced as the set of all x between 1 to 50 such that x is 40, 7, 5, 9, 2). Hence these set of five numbers (40, 7, 5, 9, 2) are my ecological survey unit comprising total of 10 sq km out of 100 sq km potential survey unit. I have further generated the matrix dimension but this time I have highlighted my random blocks in which I will investigate the grey-leg goose population size and how it relates with eagle and crow nests in or around their roosting/grazing/resting site.

Before we go further, just a quick note on my simple R coding. As you can see, I have asked R to randomly select 5 numbers between 1 to 50 by assigning it as as variable x hence (x = 1 \cdots \cdots 50). I then assigned R with my sample size which is 5 meaning R will randomly select five blocks out of 50 from my sample matrix. Finally I asked R not to replace the block by writing FALSE. What it means is, by default, R will pick any number between 1 to 50 randomly and then put it back into the system (often known as recycling) but we do not want to select the same number twice hence I asked R not to replace the selected number which in coding term simply works out as replace = FALSE.

Now that I have my blocks randomly selected I can begin my survey work (the fun part). I have visited the blocks every morning and every evening for the past two weeks and collected the data on greylag goose population size. I also collected the data in terms of distribution of greylag goose by gender (male and female goose). I then carried out line transect survey in the same blocks every morning and evening to count eagle and or crow nests in or around the vicinity of grazing/roosting and resting sites of the goose population. My line transects were roughly half a km long although some line transects were a km long due to high density of crow nests in relation to the vicinity of greylag goose population. My dataset is presented below. Can we make anything out of this data? Can we answer few statistically valid ecological questions from this dataset? Possibly not, because dataset is often useless on its own unless we make it meaningful. How we going to gain high level understanding from this freshly collected ecological data on grey-leg goose population in relation to crow/eagle nests? Answer lies in solid command in statistics and harnessing the power of statistical tools and modeling. We will harness the power of statistical tools by utilizing the power of R programming. Hence the remaining part of the essay will focus on R coding to gain high level understanding of out dataset.

When we are presented with dataset of two numerical variables as in the case in my data, we are often interested to find out whether these numerical variables are anyhow relate with each other. Here I am interested to find out whether there is any relationship between number of goose and number of eagle nests. I am also interested to find out whether there is any relationship between male and female goose distribution in relation to eagle/crow nests. Furthermore, I am ecologically motivated to develop a model that will provide statistically valid summery which we can utilize to generalize and make predictions in terms of goose population and eagle nests. Does any of these make sense so far and if so, how we go about it? It’s simple, we let R to answer all these interesting questions. All we gotta do is ask R by writing codes (language) that R can understand. It’s as simple as that.

As I was saying before, when we have two numerical variables (fashionably known as bivariate data), first thing we want to do is create a scatter plot to see at a glance what our data looks like graphically. This would be our first step towards gaining high level understanding of ecological data. I am going to write fairly simple code hence ask R to generate a scatter plot for me. But before we do anything, let me provide you with brief fundamental information with regards to how exactly R plots graphs. Firstly, R is highly powerful and sophisticated mathematical programming language that hosts over 5000 packages. These packages are developed by scientists from various backgrounds ranging from mathematicians to wildlife ecologists, academic scholars to computer programmers. Packages are like a restaurant where you can go and order meal and order various types of meal and enjoy! In R packages are like different restaurant. For example, you can choose to go to Pakistani restaurant to enjoy Pakistani cuisines or Bangladeshi restaurant to enjoy Bangladeshi cuisines. In R, you have packages very similar to your choice of restaurants. You can download and install package that will generate highly sophisticated data rich and powerful graphs for your analytical modeling. You can also install package that will do all the algebraic calculations or solve advance problem focusing calculus and so on. You can also install package that will do cladisitic and principle component analysis and more advance work. You can also install package that will carry out geographic analysis GIS for you. Hence its like going to different restaurant right and there are over 5000 different restaurants (packages) in R town. Making any sense so far? Now, I have also mentioned about going to restaurant ordering your favorite menu. Surely there are many menu to choose from. In R, we call them function hence each package will come with lots of functions that we need to use to write our programs or to instruct R to carry out set of specific numerical and statistical tasks. Hence, package is like restaurant and functions are like menu. Just like if you choose to go to Pakistani restaurant, you are not expecting to order Vietnamese menu right? So if you are working on ggplot2 package of R, you are not expecting to conduct matrix or principle component analysis (PCA) right? What it entails is, set of R functions are grouped together to work for specific package. Although there may be situation where you come across functions (that is menu in a restaurant) are overlapping between one package (package is your restaurant) to another, but generally packages host set of functions to carry out specific mathematical and statistical tasks. I have already indicated one of the package that we going to use to analyze our geese data. This package is called ggplot2 and it will host set of functions that we will utilize to derive high level understanding of our data through insightful graphs. Now that you have some basic background understanding how R packages work alongside with set of functions that come within the package, we can start the analytical part of our ecological study. It’s really a fun part when you learn to harness the power of R coding to gain high level understanding of you hard-earned ecological field data.

As mentioned earlier the first thing I would like do to is, generate a scatter plot to see how two of my variables are laid out. Hence I am interested to see how number of geese are influenced by eagle/crow nest. Scatter plot is really a point graph where we have our eagle/crow nest at x-axes and geese numbers in y-axes. ggplot will do all the job to generate a point corresponding to both x and y axis for my geese variables. First I load the ggplot2 package and then develop a framework in which I will simply add necessary layers to enrich the plot as we go along. It’s pretty simple. It’s like baking a cake. You make a plain cake and then add necessary toppings from strawberry cream to different flavor of vanilla or chocolate, may be even put ice cream in it too…so the options are unlimited. It’s the same with ggplot. We first ask R to develop the framework and then simply add layers to enrich our graph to gain high level statistical insight.


geese_plot <- ggplot (geese, aes(eagleCrowNest, numberOfGeese))

geese_plot + geom_point()

Scatter plot

Now, this is our very basic scatter plot. At quick glance, as we can see, almost all our geese numbers fall between the crow nests that range from 0-2.5. In other words, simply by glancing our scatter plot, we have already gained a valuable information about how our geese numbers are influenced by crow/eagle nests. We can also visualize the fact that there is considerable variation in geese numbers ranging from 0 to 20 within the nesting range of eagle/crow from 0 to 2.5. But have you spotted one or two things yet? Have you noticed that our scatter plot actually does not reveal information in terms of gender? Remember I collected data of geese numbers of both male and female geese. So the question I am not curious to know is how male and female geese population is distributed within the crow/eagle nesting range of 0 to 2.5. All I am going to do now is, write a simple code in my original framework to instruct R to provide me with gender wise population distribution and the codes are as follows:

geese_plot <- ggplot (geese, aes(eagleCrowNest, numberOfGeese, color = Gender))

geese_plot + geom_point()

Scatter plot color coded

I just typed color equal to gender and ggplot has done the rest. It has pulled together my data variables, matched the variables together to generate points based on gender. It then color coded the gender so that I can visualize the difference of geese numbers based on their gender. How nice and powerful is that? Now, my data is making more sense. Not only I now know that geese numbers do well when eagle nests or less than 2.5 but I also now pin point how female numbers are more sensitive to crow nests as oppose to the male geese. More particularly most of our female geese population do well when there is no crow nest at all. Take a good look at the first column of our scatter plot where you will find more pink dots (female geese) vertically lined up where crow nest in our x-axes is 0. Interestingly we found only one male geese when crow nest is 0 and rest of them are all females. In our second column we see considerable variation in geese population ranging from as low as 0 to as high as 20. We do now really know why there is such a high variation in terms of geese numbers but we do know that there are more male geese than female within this population variation. Now, have you spotted something else so far? Have you counted my total observation. I have collected total of 49 sample of geese population from my two-weeks field survey. But, if you count the points, it does not match up. Can you answer why not? It’s cause we may have points that are overlapping with other points meaning they possibly have similar or same number in terms of their population size. Therefore, we need to ask R to disentangle the overlapping of our data point to reveal all our data points in the graph. This hopefully would provide us with more clearer perspective how the population is actually influenced by the eagle nests. Because the position of our data points may have been overlapped, all I will do is write position equal to jitter. What jitter does is it disentangle any observation that overlaps with other. The mathematical procedure that R follows is also pretty simple. R simply assign a random number as reference point for each observation and then based on that reference number it can geometrically disentangle any closest numbers surrounding it. The codes and the output are provided below:

geese_plot <- ggplot (geese, aes(eagleCrowNest, numberOfGeese, color= Gender))

geese_plot + geom_point(position = “jitter”)

Scatter plot with randomly assigned numbers

Now this provide all our data points. If you now go ahead and count the points, it should match up to my 49 observation. This also revealed that almost half of our observation was overlapped hence we did not see it from our previous graph. This non overlapping jitter plot now actually revealed full picture of our geese population distribution. We can almost confidently say that female geese population is very sensitive to even small increment of eagle or crow nest. As you can see from the graph that there exist distinctive separation of female population size in terms of eagle/crow nest numbers. Lot of females are almost absent (see the base of the x – axes) even when the crow nest is less than $4$. I still think data are clumped together. Although it has revealed all our data points, by a quick glance we see, some points are still relatively attached to one another and that is due to the size of the point (the circle). What I would like to do now is change the size of the circle so that it provide us with slightly more improve version of our jitter plot. The code and the output are provided below:

geese_plot <- ggplot (geese, aes(eagleCrowNest, numberOfGeese, color = Gender))

geese_plot + geom_point(size = 1.5 , position = “jitter”)

Scatter plot with point size change

Now this looks lot better. Hence this would be our standard scatter plot for further data exploration and analysis. By now you probably started to realize the power of R coding and more precisely the flexibility, freedom and options of writing your own codes to explore, analyze and manipulate data under conceptually unified mathematical and statistical rigor. ggplot is extremely flexible and powerful and if you planning on becoming full blown scientist or academic, regardless of which discipline your study and research focused on, you would be million times better off harnessing the power of R programming language as oppose to commercial profit-driven capitalistic products that you have probably used when you did statistical course at your undergraduate or graduate school.

Before we go ahead carrying out further data analysis based on the scatter plot that ggplot has enabled us to create, did you notice something that we could change at this point. If we look at the labels of the graph in x and y axis, we could improve it by adding a layer. As mentioned earlier, once you developed your skeleton of the plot by using ggplot command, all we have to do afterwards is continue adding layers to improve our plot. Hence lets improve the label of our plot by simply adding a layer called labs. The command and the improved output of the plot are provided below:

geese_plot <- ggplot (geese, aes(eagleCrowNest, numberOfGeese, color = Gender))

geese_plot + geom_point(size = 1.5 , position = “jitter”) + labs (x = “Number of Eagle/Crow Nests”, y = “Number of Geese”)

Scatter plot with streamlined labels

Now this improved version is obviously reveal more clearer understanding in terms of what our x axis and y axis represents in terms of our bivariate data variables. Although you may have noticed that I keep typing the backbone code which developed our skeleton:

geese_plot <- ggplot (geese, aes(eagleCrowNest, numberOfGeese, color = Gender)).

However when you you actually working in writing R code, you only have to do it once. After that, you just work on adding layers just like the the way I added labels as one of the layers in our original skeleton which R saved as R object as geese_plot.

Now that we have covered quite a bit in terms of collecting data to data pre-processing and beyond by organizing our data hence to generate scatter plot to make some meaning of our data set by harnessing the power of ggplot in R coding, we will step back a bit and focus on statistical method underpinning our data variables. In this remit, its notable to emphasis the fact that when we work on bivariate datasets, as in our geese datasets, we are often interested in three aspects of our data variables: 1. Scatter plot to get a first hand glance hour our data are behaving hence to make first hand impression of our ecological variables. 2. We then very much interested to determine whether our data is linearly distributed. That is whether our scatter plot looks like it can be fitted with a straight line. This is statistical technique and it is known as regression method. Hence, in our scatter plot, our next job would be to conduct regression analysis. At first glance, it is pretty evident that our dataset is actually not forming a straight line as most of our data points are clustered between 0 to 4 in our x axis. Nevertheless, it does open up a question then, what proportion of our data points can be answered through fitted line or as it known as regression line. Regression line is simple a straight line that help us to predict data points within specific range of our original data values. Hence regression line is pretty helpful for making predictions. For example, firstly, I am interested to find out what proportion of our data points can fall into regression line that is if I would have to predict geese number based on crow nest variations across x axis, I am then interested to find out what percentage of our data can be explained or predict from the regression line. Do all these making sense? I am not going to go into critical details of statistical mechanism as I intend to provide you with separate treatments of regression analysis by my other articles. But for now, we will simply R to fit a regression line in our scatter plot. Again, the procedure is pretty simple. We will simply add another layer. In R programming, regression line is known as smooth. The rationale behind the name is, it makes our data variables smooth by finding the best fitted line based on all the data points we have in our scatter diagram. Of course, R does not pull this off in thin air…the mathematical procedure R use is rooted into conceptually unified statistical rigor. In other words, R will find the best fitted line based on least squared criterion which is an statistical and algebraic procedure to find the best line that can fit among our data. For now, you do not really need to focus on how this line is derived mathematically as this article is more about appreciating the R programming and it implications on ecological study. Hence, I am going to write a code for adding another layer as smooth and the code and the output are as follows:

geese_plot <- ggplot (geese, aes(eagleCrowNest, numberOfGeese, color = Gender))

geese_plot + geom_point(size = 1.5 , position = “jitter”) + labs (x = “Number of Eagle/Crow Nests”, y = “Number of Geese”)+ geom_smooth(method = “lm”)

Scatter plot with regression line modeled into it

Here we have our linear model and as you can see R has found the best fitted lines both for our male and female geese. Although, as suspected, our lines are not about the data points as most of the data are not really about any of these lines hence it intuitively answers my question that is very small proportion or percentage of our data points can be answered or predict through our best fitted linear model or regression line. Nevertheless, it still provide lot of solid insight. For example, as I was telling you before that our female geese are really super sensitive to eagle or crow nest and a quick glance at our regression lines (red line for the female data points) confirm that. As you may know from elementary geometry, more precisely from your coordinate geometry class that slope of any straight line is defined as ratio of rise and run where rise is the difference between two coordinate points in y axis and run is the difference between two points in x axis. If you look into our female geese regression line (the red line) we see, its slope is higher (because the red line is lot more steeper) than the blue line representing our male geese. In other words, even though our regression line does not really provide a robust linear model for making ecological predictions, it does however tell us the steepness of the female data points which in turns mean our female geese are extremely sensitive to eagle nests in or around their vicinity. Of course it is expected as females exhibit brooding attributes and strong motherly instinct to protect their eggs and subsequent gosling. Therefore ecological and conservation management implication is to ensure crow nests are removed if our conservation management goal is to help safeguard migratory female geese population in any specific estuarine mangrove ecosystem or freshwater wetlands as an example.

Now, lets ask R to do further improvement of our regression lines. As you may notice, that our regression lines also have shaded area. Firstly what are these shaded areas. Shaded area are actually 95% confidence interval. 95% confidence interval is a statistical measure that enable us to answer in terms of our probability to make predictions from our data points. And not surprisingly, as mentioned earlier, our regression models are pretty weak (small proportion of our data points are about the lines, meaning close to the lines) hence as you can see from the shaded area, we are 95% confident that only a small proportion of our data points can be utilized for making predictions. In other words, most of our data points are actually outside of our shaded area. However, there are overlapping between male and female confidence intervals. The middle portion (slightly more darker) is actually our overlapping proportion of male and female geese data points and this has serious conservation and ecological significance. However, before we do any further analysis, what we like to do is disentangle our common color coding of gray shaded area. Hence we would like to ask R to assign separate color for our female and male confidence intervals. This would then enable us to appreciate the overlapping part better hence would help us to gain high level understanding of overlapped confidence intervals to make robust predictions.

Did you notice this is the first time, I actually brought the option for making prediction based on our weak linear models. Can you tell me why? It is because even though we are only dealing with two variables that is geese numbers and crow nests, we in fact have two groups or levels in our geese data that is male and female geese. Hence we have this overlapped confidence interval with decent proportion of data points comprising male and female within our data range. Therefore as you can see, even a weak regression model can serve us with valuable insights into our data points providing we have grouped (male and female group) scatter plot. Before we gain ecological insight from our grouped overlapped linear model, let’s just write a simple code that will eliminate our gray color and separate our confidence intervals of our male ad female geese population sample. The code and the output are provided below:

geese_plot <- ggplot (geese, aes(eagleCrowNest, numberOfGeese, color = Gender))

geese_plot + geom_point(size = 1.5 , position = “jitter”) + labs (x = “Number of Eagle/Crow Nests”, y = “Number of Geese”)+ geom_smooth(method = “lm”, aes (fill = Gender))

Scatter plot with regression models color coded

Now this is lot better. R has got rid of the gray shaded bits and added distinct colors for both female (pink) and male (light blue) geese reflecting 95% confidence intervals. The middle portion which is overlapped common area is also lot more clearer and it reveals significant data points are in fact overlapped. However, to ensure no data points are hiding under the overlapped color codes, we can actually do better by asking R to lighten the colors so that if any data points that might be hiding behind the colors can be revealed. The code is simple. Under geom_smooth which is our regression line, we will simply incorporate alpha with numeric value to lighten the shaded area in our plot. The magnitude of numeric value in decimal points that alpha can take determines how light or dark you wish your shaded area to be, depending on the modality of your regression analysis of course.I usually stick to decimal range between 0.1 to 0.3 to lighten the shaded area to reveal any data points that me be previously hidden behind dark shadow

geese_plot <- ggplot (geese, aes(eagleCrowNest, numberOfGeese, color = Gender))

geese_plot + geom_point(size = 1.5 , position = “jitter”) + labs (x = “Number of Eagle/Crow Nests”, y = “Number of Geese”)+ geom_smooth(method = “lm”, aes (fill = Gender), alpha = 0.2)

Streamlined and final version of our regression models

Now, surely, this has greatly improved our regression graph and we can clearly see a good deal of our data points can be answered within the overlapped 95% confidence interval. At first glance, we can confidently say that the variation of female geese numbers range from 6 to 9 when there is total absent of crow or eagle nests. Moreover, we are 95% confident to make future prediction that in any particular estuarine mangrove ecosystem, female geese numbers will range between 6 to 9 when predatory bird as such eagle or crow nests are absolute absent. In terms of male geese population, generally speaking, male geese exhibit less sensitivity towards crow nests. Our male geese data points pose considerable variation ranging from 1 to 25 and most importantly within this range, almost all male geese can tolerate predatory birds (crow/eagle) presence that range from 0 to 6. Our data also reveals that we have two extreme observation in which two males show unusual characteristic. In y axis we have 25 male geese (extreme observation point) sitting against predatory nests of 5 which although unusual but intuitively it is pretty evident that large numbers of males in a flock are brave enough to tackle predatory presence ranging from 0 to almost 7. On the other hand, although we have witnessed good numbers of our female geese are absent when crow numbers varies from 1 to 12, however, in x axis, we see an extreme observation of one male and the only male which is absent when predatory nests range from 0 to 12. In other words, all our male geese were present with variations in numbers of 1 to 25 within the predatory range of 1-12 except one male which is our outlier.

As you may realize that ecological study of any species simply rooted into conceptually unified statistically valid sampling design, followed by sampling bound data collection leading to data analysis by harnessing the power of sophisticated and powerful statistical packages that are at our disposal. In this essay, I demonstrated the power of R programming language by drawing attention from basic ecological study focusing gray leg geese population influence against predatory bird population in estuarine mangrove ecosystem. This study demonstrate the power of R programming by harnessing the statistical tools as such regression model and its implications on ecological and conservation management.

Finally, in this article, I did not attempt on covering the statistical procedures to develop regression line, neither I attempted to provide underlying statistical mechanisms that underpin this study. More precisely, this study is rooted into developing regression equation \hat{y} = b_{0} + b_{1}x, followed by estimating coefficient of determination that reflects what proportion of our data can be fitted into regression line and finally calculating the correlation coefficient also known as Pearson coefficient (named in the honor of the developer Karl Person who originally developed the method). These three statistical procedures underlie the study of my geese population and provided the conceptual framework of the essay. In my next essay, I intend to present these statistical methods and the full treatment of its analytic procedures drawing attention from the same datasets of geese population. This essay is primarily intended to serve two purposes: 1. To show the power of R programming language 2. To understand and appreciate ecological study and its close association with statistics and R programming language as powerful and sophisticated mathematical package to answer simple but interesting ecological questions focusing animal population sampling and estimation methods.

This essay is prepared in \LaTeX – the brainchild of Donalnd Knuth, developed by American Mathematical Society (AMS) and created by George Gratzar from University of Manitoba Department of Mathematics. I have also utilized both Python and R Programming Language to develop quadratic population model and for designing random sampling matrix. No commercial software under capitalistic market share is used in preparation of this draft. UNIX variant GNU-Debian Linux is used throughout as core to run all software packages.

Small carnivores and their contribution to pollination

Small carnivores and their contribution to pollination, seed dispersal and forest regeneration in tropical ecosystems

Mohammed Ashraf

Civet facing human persecution across tropics

Medium to small size carnivores in tropical guild face subtle and unprecedented threats that are often undermined both in terms of its ecological research and its human dimension of managing the species. There are thirty eight species of viverrids that are so far discovered and named and all of them live in tropical and semi-tropical ecosystems that are undergoing rapid deforestation for the past few decades. Viverridae is one of the groups in carnivore mammals that looks like a cat and hyena and the small ones’ resemble mongoose. Previously mongoose belong to the same group but due to various anatomical and physiological differences, it has been taken off from the viverridae family and now belong to its own family called herpestidae: another group of mammalian carnivore that also hosts Suricates commonly known as mere-cat. Therefore, viverridae comprising thirty eight species including five subspecies is an old world i.e. Afrotropic and Indo-Malayan family that now live in an increasingly human dominated tropical rainforest that are facing anthropogenic encroachment in a dramatic rate.

Malayan Civet

Viverridae comprises four notable types of animals and these are genets, binturong, linsang (its not Chinese although it sounded as such) and civets. Civets comprise most of the species whilst little over a dozen different species of ganets and only one species of binturong left in the wild. Little study has been conducted to understand their basic ecology in terms of habitat preferences, hunting regimes, dietary basis, its predatory behavior and so forth. Nonetheless, there are practical reasons for many carnivore biologists to avoid studying viverrids. One of the reasons is its secretive, nocturnal and arboreal ‘dense forest dwelling’ attitude that creates a practical setback for biologists to develop a survey method to study them in the wild. Compounding to that, often the conservation fund for any mammal ecological study veritably grounded to the ‘high profile’ colorful mega-fauna that often served as flagship species to bring about overall biodiversity conservation in tropical nations. Hence viverrids are not the kind of carnivores that sits at the top list to be given conservation grant for collecting its basic ecological data. Purely form that standpoint, one may argue species that are less attractive face conservation discrimination and research bias despite the fact these species play significant contribution to maintain ecosystem process and services in tropical forest. For example, majority of viverrids are omnivorous hence it eats fruits, seeds and even tropical leaves. The seed dispersal and pollination are two fundamental keys for forest regeneration and natural succession be it tropical or temperate forests. Rainforest research that focused on measuring the seed dispersal and pollination by small mammals shed an interesting light regarding the food habits of viverrids and how it helps regenerating the forest. It has been postulated that full-blown solid ecological study on viverrids can help us to understand the rainforest ecosystem processes and the services it provides to human. Anecdotal and ‘ad-hoc‘ studies suggests that viverrids population size is decreasing all over the old world where deforestation correlates with its population decline.

The Power of Linux – Resources for Wildlife Ecologists

The Power of Linux and its Utilization- Free and Open Source Software (FOSS) Resources for Wildlife Ecologists & Conservation Biologists

Mohammed Ashraf

I want to tell you a story. No, not the story of how, in 1991, a guy from Finland called Linus Torvalds wrote the first version of the Linux kernel. You can read that story in lots of Linux books. Nor am I going to tell you the story of how, some years earlier, Richard Stallman began the GNU (GNU is Not Unix hence it is recursive acronym) Project to create a free Unix-like operating system. That’s an important story too, but most other Linux books have that one, as well. No, I want to tell you the story of how you can take back control of your computer. When I began working with computers as a school student in the mid 1980s, there was a revolution going on. The invention of the microprocessor had made it possible for ordinary people like you and I to actually own a computer. It’s hard for many people today to imagine what the world was like when only big business (American Telegraph and Telephone-AT&T) and big government (Pentagon or FBI) ran all the computers. Let’s just say you couldn’t get much done. Today, the world is very different. Computers are everywhere, from iPhone to giant data centers to everything in between. In addition to ubiquitous computers, we also have a ubiquitous network connecting them together. This has created a wondrous new age of personal empowerment and creative freedom, but over the last couple of decades something else has been happening. A single giant corporation (guess who?) has been imposing its control over most of the world’s computers and deciding what you can and cannot do with them.

Fortunately, people from all over the world are doing something about it. They are fighting to maintain control of their computers by writing their own software. They are building Linux. Many people speak of “freedom” with regard to Linux, but I don’t think most people know what this freedom really means. Freedom is the power to decide what your computer does, and the only way to have this freedom is to know what your computer is doing. Freedom is a computer that is without secrets, one where everything can be known if you care enough to find out.

Why Use the Command Line?

Have you ever noticed in the movies when the “super hacker”—you know, the guy who can break into the ultra-secure military computer in under 30 seconds—sits down at the computer, he never touches a mouse? It’s because movie makers realize that we, as human beings, instinctively know the only way to really get anything done on a computer is by typing on a keyboard. Most computer users today are familiar with only the graphical user interface (GUI) and have been taught by vendors and pundits that the command line interface (CLI) is a terrifying thing of the past. This is unfortunate, because a good command line interface is a marvelously expressive way of communicating with a computer in much the same way the written word is for human beings. It’s been said that “graphical user interfaces make easy tasks easy, while command line interfaces make difficult tasks possible,” and this is still very true today. Since Linux is modeled after the Unix family of operating systems, it shares the same rich heritage of command line tools as Unix. Unix came into prominence during the early 1980s (although it was first developed a decade earlier), before the widespread adoption of the graphical user interface and, as a result, developed an extensive command line interface instead.

Linux on Conservation Science

If your work pivots around ecological science encompassing the broader rubric of conservation biology, you are bound to carry out research that is deeply grounded to mathematical modeling. The 21st century modern ecologists cannot escape the hard core mathematical programming in the direction of estimating the ecological parameters; be it demographic or niche model of endangered vertebrates or phylogenetic analysis of species that is literally extinct in the wild. Modern wildlife biologists require tools the broadly falls under the mathematical underpinnings of numerical modeling: the tasks that require strong command of computer programming language. For example, if you wish to conduct a ecological research to estimate the distribution parameter of tigers in the Sundarbans mangrove ecosystem in Bangladesh, you would need to require mandatory skills in environmental/ecological statistics, first and second derivative of calculus, matrix algebra, spatial, algebraic and statistical modeling and so forth. None of these areas are possible to explore or to understand without the power and freedom of computers that would allow you to perform advance mathematical programming tasks or to help you generate visually engrossing highly sophisticated graphs. If you are from tropical developing nations where biodiversity is most rich but economical resources are most poor, you face a dwindling situation to strike the right balance to manage meager fund that are at your disposal against the backdrop of prioritizing the tasks that you can do as a serious wildlife ecologist without compromising the quality and the breadth of the rigorous hard core science of ecology, wildlife biology, and mathematics. Unless you are backed up with ‘gigantic corporate based conservation research for lets say, transnational corporations protecting their vested interests, your work will largely dominate by the key issue of how good you are at managing your scarce funding hence to publish your invaluable research work into ‘high impact’ journals (e.g. Journal of Wildlife Management or Conservation Biology). Evidently you would require software that have the utilities, tools and the power to present you with robust mathematical algorithms which at times you can extrapolate, overlay in spatial and temporal scale, and generate stochastic models and scenarios under conceptually unified rigorous statistical framework. Linux has the power and the necessary tool to provide you with the mathematical packages and the programming power that would enable you to accomplish your research work without taxing your scarce research grant. Yes, Linux packages comes under GNU (a recursive acronym that means GNU is Not Unix)-GPL (General Public License) developed under Free and Open Source Software (FOSS) project that was founded by legendary computer scientist Richard Stallman back in 70s. Therefore you are not liable of ‘copyright infringe’ nor you are liable of getting harassed by ‘corporations as such Microsoft’ that takes up all your hard earn cash for their own vested profit. In Linux Operating System (OS), everything that comes with it is free and its all neatly packaged with most of the Linux distributions that are at your disposal. Below are the necessary GNU Linux Distribution based software packages that would help wildlife ecologist and conservation biologist alike to carry out solid hard core statistical, algebraic, spatial and temporal modeling with the power of geographic information system (GIS) based structured query language (SQL) without paying thousands of dollars under Windows based operating system. All these Linux Packages are free.

For ecological numeration of statistical modeling and spatial and temporal map production use R programming environment. For wildlife science, ecology, natural resource economics, and conservation biology that strongly integrate the tools, formula, principles and theorems of mathematical modeling including matrix algebra, calculus, trigonometry, analytic geometry, stochastic simulation, Monte Carlo simulation, Markov Chain models, Boolean algebra you can use both R and Python programming language. If you already use Java programing then Python would be very easy for you to pick up. For Species Distribution Model (SDM) that borrows tools from statistics and other branches of Mathematics: Python programming language, R programming and Octave programming environment is ideal along with QGIS for GIS based modeling and mapping in large ecological or hydrological landscape.

All these highly sophisticated mathematical software comes under GNU-GPL Linux free of cost and users are free to distribute, copy and manipulate the scripts indefinitely. I am providing top five Linux distributions (also known as Linux OS) that I would of benefit to serious wildlife ecologists and wildlife science students.

1. Debian Linux

2. Open Mandriva

3. Ubuntu Linux

4. PCLinuxOS

5. Linux Mint