Ok, so I have successfully got my model to work in batch mode ! :)
Am VERY happy!
The key things seems to be that you can only access parameters from the batch_params.xml file that have been explicitly set up within the model to be controlled by the user (i.e. included within the parameters.xml file in the 'projectName'.rs folder of the project.
Also, if there are any parameters set up in this way that are not included in the batch_params.xml file, the model will not run in batch mode. This only seems to apply to parameters included in the contextCreator file, not those set up within other classes of the model.
Some information seems to be misleading.... in the previous batch post I alluded to a webpage that might be useful from the repast.sourceforge website. This information relates to running a model in batch mode but goes through a quite complicated scenario of installing GridGain and using this to run the model.
This is NOT required to run a simple batch run within eclipse on the local computer. It appears to be most useful for distributing the model to other sources, and using grid computing to run batches...
I doubt I'll need anything quite this complicated as of yet....
Today is a good repast day!
My PhD research on spatially explicit modelling of habitat permeability for mammalian wildlife
19.1.12
17.1.12
Model Decision-Making
Ok, so following on from the 'model dilemma' post, I have decided, rather than choosing between the two separate movement decision options, to combine them together.
So, instead of having my agents move around looking for food and ignoring the type of habitat, they now make a decision based on 4 factors:
1. amount of food
2. habitat cost
3. pheromone cost
4. road cost
Decisions are now based on a weighting of these 4 parameters, with the need to move still driven by food requirements, but the decision inherently based on weighing up pros of food availability vs the cons of habitat/pheromone/road cost (which implicitly include a range of other factors that may be important to movement decision making by jaguars).
A total 'cell quality' value now exists for each neighbour and the jaguar choose the cell with the best cell quality, calculated as follows:
So, instead of having my agents move around looking for food and ignoring the type of habitat, they now make a decision based on 4 factors:
1. amount of food
2. habitat cost
3. pheromone cost
4. road cost
Decisions are now based on a weighting of these 4 parameters, with the need to move still driven by food requirements, but the decision inherently based on weighing up pros of food availability vs the cons of habitat/pheromone/road cost (which implicitly include a range of other factors that may be important to movement decision making by jaguars).
A total 'cell quality' value now exists for each neighbour and the jaguar choose the cell with the best cell quality, calculated as follows:
cellQuality = food - (0.1*habitatCost) - (0.1*pheromoneCost) - roadCost
Food is explicitly linked to habitat, with the food reaching a maximum of 10 in good quality preferential forest habitats.
habitatCost and pheromoneCost reach a maximum level of 100 and so are multiplied by 0.1 to standardise the relative values to that of food.
roadCost is a tricky one, I haven't yet decided whether to go to a maximum cost of 10 here or to 100, to provide a better relative difference between road types..... The main tricky aspect here is that I want to encourage males to use the trails (classed as a type of road) but discourage the females from doing the same. In this way, males view the smaller forest trails as either of no cost or negative cost (TBC) whereas females see them of some low cost value.
This method of combining the food and cost decision making loops takes out around 3/4 of the steps required in the food loop, making only one additional step to the cost loop - a biased choice towards maximising food if the jaguar has low food reserves.
Adding in different costs for the trails also provides some inherent decision making towards using trails or not, versus the hard-coding probabilistic choice to avoid or prefer trails that was used in both the food and cost loops.
12.1.12
Publication
My jaguar model has been published via the European Conference on Artificial Life, at which I also presented the work.
Details:
Watkins, A., Noble, J. and Doncaster, C. P. (2011) An agent-based model of jaguar movement through conservation corridors. In:Advances in Artificial Life, ECAL 2011: Proceedings of the Eleventh European Conference on the Synthesis and Simulation of Living Systems, pp. 846-853, MIT Press. ISBN 978-0-262-29714-1
found here
ECAL took place 8th to 12th August 2011 in Paris, France.
Details:
Watkins, A., Noble, J. and Doncaster, C. P. (2011) An agent-based model of jaguar movement through conservation corridors. In:Advances in Artificial Life, ECAL 2011: Proceedings of the Eleventh European Conference on the Synthesis and Simulation of Living Systems, pp. 846-853, MIT Press. ISBN 978-0-262-29714-1
found here
ECAL took place 8th to 12th August 2011 in Paris, France.
11.1.12
Jaguars: the next step
So what I should have explained by now is that I've moved on from the initial 'lets get a model of jaguar movement in an abstract world working' to "lets validate how jaguars move around in a real-landscape using real data'.
But I should first add some results from the initial 'abstract' model. Not only are territories clearly defined (see post 'habitat preference and territory development' from september '11) and habitat preferences clearly working, but I made a first attempt to investigate some questions about landscape-level population effects of fragmentation and landscape structure. I outlined the model in brief in a previous post ('1st jaguar simulation model' from november '11) and chose to focus on two key questions:
So, to answer question 1. I looked at the 'happiness' of individuals in the population over the 9 different landscape designs. 'Happiness' here is defined as the inverse of the cost per time step per individual, where cost is defined as the combination of cost of the habitat (via the least-cost model) and cost of encountering another individuals territory (defined via 'pheromones' which are deposited when an individual moves into a cell and degrade over time). The lower the cost per time step, the happier the individual.
Figure 1. The mean cost per time step for populations within each of the 9 landscapes.
The primeval design is the clear winner here, but this tells us nothing interesting. This design is all forest and so tells us nothing about the effect of landscape structure on the population. By removing this from the figure, we see the results much more clearly:
Figure 2. The mean cost per time step for populations within each of the 8 landscapes that have some amount of matrix habitat.
Now we can see that there is a much clearer difference between those landscape with a corridor and those without. But what is really noticeable is that the happiness of the population relies on how much pure forest habitat there is rather than the structure of the landscape. The Random Islands design includes island of forest that are so small, they are effectively all edge and so this landscape contains less pure forest than all of the other landscape except for the No Corridor design.
To answer question 2. I looked at the proportion of individuals that were able to migrate from one side of the world to the other, in short, the proportion that started in one habitat patch and ended up in the other. This represents the ability of the corridor design to facilitate movement of individuals through the landscape.
Figure 3. The proportion of individuals that moved from one side to the other by the end of the simulation.
Again, clearly the primeval landscape wins, but that tells us nothing interesting. There are no barriers to movement in this design. But there is clearly some big differences between the other 8 landscapes. In fact there is a three-fold difference in the proportions vs a 10% difference found when looking at the happiness of individuals. Clearly, it is the ability to facilitate movement across the landscapes that really differentiates landscapes in terms of their structure. Connected corridors (represented by One Corridor, Three Corridors and Five Corridors) all increase the proportion of individuals that successfully switches sides during the simulation. What is interesting is that there is such a big difference between these three corridor designs. This surprised me at first until I started to think more about the problem.
One big fat corridor limits the ability of an individual to randomly find it, given that individuals are spread through the entire forest block. Having more corridors increases the chances that any individual will find it. Not only this, but having a corridor that is large enough to support resident individuals means that others may be prevented from travelling through the corridor. It seems to me that the optimum corridor design differs depending on what you want to maximise. A single fat corridor may be fine if you simply want to increase the amount of habitat available, but multiple narrower corridors that reduce the potential for resident individuals to setup territories within the corridor itself may maximise the potential for individuals to travel and increase migration between habitat patches.
It appears there then becomes a trade-off between maximising habitat and population size potential and maximising flow of individuals between habitat patches.
Conclusion
In conclusion, my findings so far suggest that not all landscape structures/corridor designs are the same and that the critical feature for maximising happiness/fitness of a population is to increase the amount of preferred habitat, no matter where this habitat is (with certain constraints on size of patches to be expected here). However, the key difference between corridor designs and in fact landscape structure per se, is the ability to promote and facilitate migration of individuals from one habitat patch to another. This is enhanced with a design of physically connected corridors that are more numerous and narrower in design (again given certain limits due to edge effects/territory size of the species in question etc).
But I should first add some results from the initial 'abstract' model. Not only are territories clearly defined (see post 'habitat preference and territory development' from september '11) and habitat preferences clearly working, but I made a first attempt to investigate some questions about landscape-level population effects of fragmentation and landscape structure. I outlined the model in brief in a previous post ('1st jaguar simulation model' from november '11) and chose to focus on two key questions:
1. What is the effect of landscape structure on the population?
2. How well do corridors connect discrete habitat patches?
So, to answer question 1. I looked at the 'happiness' of individuals in the population over the 9 different landscape designs. 'Happiness' here is defined as the inverse of the cost per time step per individual, where cost is defined as the combination of cost of the habitat (via the least-cost model) and cost of encountering another individuals territory (defined via 'pheromones' which are deposited when an individual moves into a cell and degrade over time). The lower the cost per time step, the happier the individual.
Figure 1. The mean cost per time step for populations within each of the 9 landscapes.
The primeval design is the clear winner here, but this tells us nothing interesting. This design is all forest and so tells us nothing about the effect of landscape structure on the population. By removing this from the figure, we see the results much more clearly:
Figure 2. The mean cost per time step for populations within each of the 8 landscapes that have some amount of matrix habitat.
Now we can see that there is a much clearer difference between those landscape with a corridor and those without. But what is really noticeable is that the happiness of the population relies on how much pure forest habitat there is rather than the structure of the landscape. The Random Islands design includes island of forest that are so small, they are effectively all edge and so this landscape contains less pure forest than all of the other landscape except for the No Corridor design.
To answer question 2. I looked at the proportion of individuals that were able to migrate from one side of the world to the other, in short, the proportion that started in one habitat patch and ended up in the other. This represents the ability of the corridor design to facilitate movement of individuals through the landscape.
Figure 3. The proportion of individuals that moved from one side to the other by the end of the simulation.
Again, clearly the primeval landscape wins, but that tells us nothing interesting. There are no barriers to movement in this design. But there is clearly some big differences between the other 8 landscapes. In fact there is a three-fold difference in the proportions vs a 10% difference found when looking at the happiness of individuals. Clearly, it is the ability to facilitate movement across the landscapes that really differentiates landscapes in terms of their structure. Connected corridors (represented by One Corridor, Three Corridors and Five Corridors) all increase the proportion of individuals that successfully switches sides during the simulation. What is interesting is that there is such a big difference between these three corridor designs. This surprised me at first until I started to think more about the problem.
One big fat corridor limits the ability of an individual to randomly find it, given that individuals are spread through the entire forest block. Having more corridors increases the chances that any individual will find it. Not only this, but having a corridor that is large enough to support resident individuals means that others may be prevented from travelling through the corridor. It seems to me that the optimum corridor design differs depending on what you want to maximise. A single fat corridor may be fine if you simply want to increase the amount of habitat available, but multiple narrower corridors that reduce the potential for resident individuals to setup territories within the corridor itself may maximise the potential for individuals to travel and increase migration between habitat patches.
It appears there then becomes a trade-off between maximising habitat and population size potential and maximising flow of individuals between habitat patches.
Conclusion
In conclusion, my findings so far suggest that not all landscape structures/corridor designs are the same and that the critical feature for maximising happiness/fitness of a population is to increase the amount of preferred habitat, no matter where this habitat is (with certain constraints on size of patches to be expected here). However, the key difference between corridor designs and in fact landscape structure per se, is the ability to promote and facilitate migration of individuals from one habitat patch to another. This is enhanced with a design of physically connected corridors that are more numerous and narrower in design (again given certain limits due to edge effects/territory size of the species in question etc).
Model Dilemma
So now my model is nearing completion (sort of), I have one main dilemma.
Following a presentation of my work to fellow biologist/ecologists, it seemed that what people wanted to see what a motivation for why individuals move around, rather than simply hard-coding them to move around x% of the time. This meant a shift away from the least-cost model to something more intuitive - food.
At first glance, this seems a relatively straight-forward shift to make. Now, instead of individuals choosing a location based on some negative cost value (and opting for the least-cost move) now they would move around according to their food requirements. In principle, the landscape-effect is similar. Forest is still good whilst agricultural areas/urban areas etc are not so good. However, the shift to 'emergent' behaviour, i.e. individuals moving around to meet some food requirement involves quite a lot of behind the scenes work within the code itself. To put in brief, the two flowcharts outline the decisions made by individuals within each given scenario, 1. moving to find enough food and 2. moving according to least-cost principles. Circles in red are decisions, those in thick red are standard default decisions.
Pros and Cons
1. Food
Now it becomes clear, whilst moving for food may be more intuitive and more 'realistic', it is a lot more complicated, involving a large number of additional parameters, is more likely to cause some un-conceieved bias in the results, and hence is far more sensitive to initial conditions and far less robust that a simpler movement process. There is also no explicit recognition of mortality risk, protection, mating opportunities, physiological costs etc that are encompassed with the single cost value used in the least-cost model. In short, the only factor affecting where a jaguar moves is the availability of food.
In addition, mortality risk is modelled separate to movement choice. Individuals are not 'aware' of the mortality risk of any particular location and are unable to choose a location according to this risk. Of course, this could be added as another factor, but this would increase the complexness of the model, add further parameters and make it even more sensitive.
Some additional parameters include:
1. Amount of food required by an individual (per step or as an average per day still to be decided)
2. Reduction in food per time step - should this be different depending on if the individual physically moves compared to staying in the same cell? should there be a reduction per time step and another depending on how the individual moves? Should this be a set amount or a proportion of the total food resource/food required per step/day?
3. Increase in food availability in cells that have been visited by a jaguar - should this be proportional, an incremental increase or a logistic increase? - i.e. type I, type II or type III response to predation?
4. Should the food be homogenous or clumped/variable in distribution?
5. Should the food change in spatial representation over time?
2. Cost
But then again, cost doesn't necessarily make the best option either. Here, whilst the model decision making is simple and straight-forward there is now no motivation for individuals to move. Movement is not 'emergent' but hard-coded. Males move x% of the time, females y% of the time. However, there is much less risk of the model being sensitive to initial conditions as there are only two parameters to be concerned about: relative differences in cost between habitats and movement rates for males and females.
In addition to this, movement decisions are now based on a range of factors, implicitly outlined by the cost of any particular habitat. Not only does food get represented in the cost, but also mortality risk, cover, mating opportunities, physiological costs of moving through the terrain.
Following a presentation of my work to fellow biologist/ecologists, it seemed that what people wanted to see what a motivation for why individuals move around, rather than simply hard-coding them to move around x% of the time. This meant a shift away from the least-cost model to something more intuitive - food.
At first glance, this seems a relatively straight-forward shift to make. Now, instead of individuals choosing a location based on some negative cost value (and opting for the least-cost move) now they would move around according to their food requirements. In principle, the landscape-effect is similar. Forest is still good whilst agricultural areas/urban areas etc are not so good. However, the shift to 'emergent' behaviour, i.e. individuals moving around to meet some food requirement involves quite a lot of behind the scenes work within the code itself. To put in brief, the two flowcharts outline the decisions made by individuals within each given scenario, 1. moving to find enough food and 2. moving according to least-cost principles. Circles in red are decisions, those in thick red are standard default decisions.
Pros and Cons
1. Food
Now it becomes clear, whilst moving for food may be more intuitive and more 'realistic', it is a lot more complicated, involving a large number of additional parameters, is more likely to cause some un-conceieved bias in the results, and hence is far more sensitive to initial conditions and far less robust that a simpler movement process. There is also no explicit recognition of mortality risk, protection, mating opportunities, physiological costs etc that are encompassed with the single cost value used in the least-cost model. In short, the only factor affecting where a jaguar moves is the availability of food.
In addition, mortality risk is modelled separate to movement choice. Individuals are not 'aware' of the mortality risk of any particular location and are unable to choose a location according to this risk. Of course, this could be added as another factor, but this would increase the complexness of the model, add further parameters and make it even more sensitive.
Some additional parameters include:
1. Amount of food required by an individual (per step or as an average per day still to be decided)
2. Reduction in food per time step - should this be different depending on if the individual physically moves compared to staying in the same cell? should there be a reduction per time step and another depending on how the individual moves? Should this be a set amount or a proportion of the total food resource/food required per step/day?
3. Increase in food availability in cells that have been visited by a jaguar - should this be proportional, an incremental increase or a logistic increase? - i.e. type I, type II or type III response to predation?
4. Should the food be homogenous or clumped/variable in distribution?
5. Should the food change in spatial representation over time?
2. Cost
But then again, cost doesn't necessarily make the best option either. Here, whilst the model decision making is simple and straight-forward there is now no motivation for individuals to move. Movement is not 'emergent' but hard-coded. Males move x% of the time, females y% of the time. However, there is much less risk of the model being sensitive to initial conditions as there are only two parameters to be concerned about: relative differences in cost between habitats and movement rates for males and females.
In addition to this, movement decisions are now based on a range of factors, implicitly outlined by the cost of any particular habitat. Not only does food get represented in the cost, but also mortality risk, cover, mating opportunities, physiological costs of moving through the terrain.
Repast Development and Batch Runs
So its back to the grindstone following a not-long-enough christmas and new year break. This month its all about finalising my model and getting it to a state where I can send off a batch run of the model to Southampton University's super computer. Not only will this mean I get my model to work much more efficiently but that I also won't have to sit around for ages watching the GUI display and waiting... and waiting.... and waiting... whilst tediously repeating... repeating... and repeating runs whilst manually changing parameters.
Excellent.
Only problem is, Repast, once again, seems to be letting me down on documentation! This has to be THE major disadvantage of using this software over others such as MASON (whom several people have suggested I use, but I'm loathe to, considering the time and effort I've spent in learning Repast and the fact that MASON seems to be heavily social simulation focused).
However, there are some other kind people out there who, also noticing the severe lack of Repast documentation, have been amazingly generous by recording and allowing public access to their findings. Pamela Toman, I am very grateful to you for this post about doing batch runs in Repast. Some other useful pages exist including one of Repast Simphony's own online pages found here that outlines the format of how to amend the batch_params.xml file, even if it doesn't go into detail about how to get this to work, or what the file should look like once its been constructed.
Basically, its all a bit more complicated that it needs to be. Repast has an option of a Batch run from the run menu in Eclipse. This requires you to show it where it can find the batch parameters run file (batch_params.xml located in the batch folder within the model architecture in eclipse). I've yet to get to the point of trying this out but it seems that if you just want to run a batch run within eclipse, this MAY be all you need to do.
Exporting the model to run in batch via an external source (i.e. a super-computer) seems much more complicated and there are several files that need amending in order for this to work. So far, I've done the amending, but not been able to test if the export works. Thats next on my list of jobs to do.
Pamela Tomans post outlines all of this, but in brief:
1. The main amendments involved the start_model.bat or start_model.command files (.bat in Windows, .command in Mac/Linux). The model needs to know explicitly where to look for all necessary files, which need to be manually added (running in GUI doesn't require this step). The repast.simphony.repast.RepastMain command also needs to be changed to repast.simphony.batch.BatchMain so it knows to run in batch mode (see Pamela Tomans post for more details on how to amend this file).
2. Actually, what you need to do is create a new start_model.command/bat file. You can call it whatever you want, but if you want the model to be able to run in both GUI and batch, you can't amend the original start_model.command file. If you do, it will no longer run in GUI mode.
3. Now you have added a new file for batch, you need to add this to two other files, so the model knows to look at this file. Here you need the find the installation_coordinator.xml and installation_components.xml files (both within the installer folder of the model architecture in eclipse). Again, more info on Pamela Tomans post.
I have a slight issue with this step however. My installation_coordinator.xml file has a slightly different format to that suggested by Pamela (and others when I googled online). Im still trying to work out where to add the necessary information..... The install_components.xml file was straightforward to amend.
4. Ok, now you should be ready to update the batch_params.xml file. Again, Pamela has a good example on her page. Im currently working through this to get ready to test with a simple one-parameter change and only 5 runs. I'll update once I know more and post my final .xml file.
5. Now comes the tricky bit as far as I can tell. If the batch is to run within repast then you should just be able to choose the batch model run and show it the batch_params.xml file. Again I should be testing this soon.... if the batch is to export and go to a super-computer, then the batch file itself needs to be amended. Not sure where to find this or how to run the batch file outside of repast. Should have some answers here within the month. Fingers crossed.
Useful webpages
Pamela Toman - http://www.pamelatoman.net/blog/tag/batch-runs/
Repast Simphony batch runs - http://repast.sourceforge.net/docs/reference/SIM/Batch%20Runs.html
Repast Simphony batch parameters - http://repast.sourceforge.net/docs/reference/SIM/Batch%20Parameters.html
Repast Parameter Sweeps Getting Started - http://repast.sourceforge.net/docs/RepastParameterSweepsGettingStarted.pdf
Excellent.
Only problem is, Repast, once again, seems to be letting me down on documentation! This has to be THE major disadvantage of using this software over others such as MASON (whom several people have suggested I use, but I'm loathe to, considering the time and effort I've spent in learning Repast and the fact that MASON seems to be heavily social simulation focused).
However, there are some other kind people out there who, also noticing the severe lack of Repast documentation, have been amazingly generous by recording and allowing public access to their findings. Pamela Toman, I am very grateful to you for this post about doing batch runs in Repast. Some other useful pages exist including one of Repast Simphony's own online pages found here that outlines the format of how to amend the batch_params.xml file, even if it doesn't go into detail about how to get this to work, or what the file should look like once its been constructed.
Basically, its all a bit more complicated that it needs to be. Repast has an option of a Batch run from the run menu in Eclipse. This requires you to show it where it can find the batch parameters run file (batch_params.xml located in the batch folder within the model architecture in eclipse). I've yet to get to the point of trying this out but it seems that if you just want to run a batch run within eclipse, this MAY be all you need to do.
Exporting the model to run in batch via an external source (i.e. a super-computer) seems much more complicated and there are several files that need amending in order for this to work. So far, I've done the amending, but not been able to test if the export works. Thats next on my list of jobs to do.
Pamela Tomans post outlines all of this, but in brief:
1. The main amendments involved the start_model.bat or start_model.command files (.bat in Windows, .command in Mac/Linux). The model needs to know explicitly where to look for all necessary files, which need to be manually added (running in GUI doesn't require this step). The repast.simphony.repast.RepastMain command also needs to be changed to repast.simphony.batch.BatchMain so it knows to run in batch mode (see Pamela Tomans post for more details on how to amend this file).
2. Actually, what you need to do is create a new start_model.command/bat file. You can call it whatever you want, but if you want the model to be able to run in both GUI and batch, you can't amend the original start_model.command file. If you do, it will no longer run in GUI mode.
3. Now you have added a new file for batch, you need to add this to two other files, so the model knows to look at this file. Here you need the find the installation_coordinator.xml and installation_components.xml files (both within the installer folder of the model architecture in eclipse). Again, more info on Pamela Tomans post.
I have a slight issue with this step however. My installation_coordinator.xml file has a slightly different format to that suggested by Pamela (and others when I googled online). Im still trying to work out where to add the necessary information..... The install_components.xml file was straightforward to amend.
4. Ok, now you should be ready to update the batch_params.xml file. Again, Pamela has a good example on her page. Im currently working through this to get ready to test with a simple one-parameter change and only 5 runs. I'll update once I know more and post my final .xml file.
5. Now comes the tricky bit as far as I can tell. If the batch is to run within repast then you should just be able to choose the batch model run and show it the batch_params.xml file. Again I should be testing this soon.... if the batch is to export and go to a super-computer, then the batch file itself needs to be amended. Not sure where to find this or how to run the batch file outside of repast. Should have some answers here within the month. Fingers crossed.
Useful webpages
Pamela Toman - http://www.pamelatoman.net/blog/tag/batch-runs/
Repast Simphony batch runs - http://repast.sourceforge.net/docs/reference/SIM/Batch%20Runs.html
Repast Simphony batch parameters - http://repast.sourceforge.net/docs/reference/SIM/Batch%20Parameters.html
Repast Parameter Sweeps Getting Started - http://repast.sourceforge.net/docs/RepastParameterSweepsGettingStarted.pdf
Subscribe to:
Posts (Atom)