Building a Better Wumpus Hunter: Evaluating Memory in the World
Of the Wumpus Using Genetic Programming

Jim Lesko
jjlesko[at]martianwind.com

ABSTRACT

The classic computer game "Hunt the Wumpus" provides an interesting environment with which to evaluate agents evolved using Genetic Programming (GP). Three types of agents are developed: a purely reactive type, a version that uses memory with a one to one correspondence with the environment, and a form of agent that utilizes an abstract random access memory. The results show that the simple reactive agent works best in this environment despite the apparent advantages of using memory.

INTRODUCTION

Evolving truly intelligent agents through Genetic Programming requires agents to discover and remember features of their environment. For a complex search space, memory becomes an important aspect of the agent’s ability to navigate successfully. Earlier research using memory in Genetic Programming has concentrated on having an agent navigate a square grid of coordinates, using a mental model to navigate from point to point to achieve some goal (Andre, 1994; Brave, 1996). The search space used in this paper consists of a more complex topology, a network of 20 nodes where each node is connected to exactly three other nodes, forming what has been referred to as a "squashed dodecahedron" (figure 1).

Figure 1 -- A "squashed dodecahedron"

This topology, used in the game "Hunt the Wumpus" (or simply "Wumpus") was introduced to the world in the early 1970’s by Gregory Yob and the People’s Computer Company. Within the game the network of nodes is a cave of interconnected rooms in which a hunter must find and shoot the Wumpus, a mythical creature which apparently just wants to be left alone. Yob was disillusioned with games that were played on a grid and so created one that was played using his "favorite Platonic solid" (Ahl, 1979). This is the environment that our evolved agent, acting as hunter, must learn to thrive in.

PLAYING THE GAME

The goal of "Hunt the Wumpus" is for a hunter to find and kill the Wumpus, which is hidden in a random location in a cave. In addition to the Wumpus, the cave contains obstacles such as bottomless pits and "Superbats," which kill or randomly displace the hunter, respectively. When the hunter enters a room, a list of the 3 connecting rooms is provided, as well as an indication of the presence of any dangers (pits, bats, Wumpus) in the three connecting rooms. Based on this information the hunter may either move or shoot into an adjacent room. The game ends when the hunter uses all 3 of the provided arrows, a maximum number of moves has been made, the hunter has entered a room containing a pit, or the Wumpus has moved into the room containing the hunter.

All of the connections between rooms are bi-directional; there are two bottomless pits and two bat rooms distributed at random throughout the cave. The Wumpus is placed at a random location in the cave, as is the hunter. If the node that the hunter starts in contains one of the dangers, another node is selected until one is found that allows the hunter to start the search from a safe location.

Three memory models were tested in this environment. The first model is not a model but rather a lack of one: the agent must navigate the search space without any knowledge of prior moves and therefore is purely reactive. An "auto-mapper" is used for the second model, which involves the agent automatically adding the information found at each node to an indexed array of memory cells which maps one to one onto the environment (figure 2). The third memory model used consists of an indexed array of memory locations (integers, in this case); the number of locations available is greater than the amount needed to provide an exhaustive list of the contents of the entire search space (figure 3).
 


Figure 2 -- Each room in the cave has
a corresponding location in memory.

Figure 3 -- The agent decides where to 
put what arbitrary information.

The least complicated of the memory models, the "auto-mapper" is designed to conform to the environment it will be used in, similar to the model presented in Andre (1994). The "gold digger" presented by Andre utilized a two-step process using a "MapMaker" and "MapUser;" the former created a model of the environment and the latter utilized the model to navigate in the environment. This was done to encourage the evolution and use of memory. Without an accurate mental model of the environment to work with, the MapUser will not be as fit as a MapUser which is provided with an accurate map.

 For the Wumpus simulation, the split between the memory writing and memory reading steps is retained, but the two functions alternate within each step of the simulation instead of being run to completion separately. In the case of the auto-mapping agent, the information received regarding the environment is stored in a memory location corresponding to the room the agent is currently in as the agent receives it. The information presented to the agent at each step -- room choices and the presence of dangers in adjacent rooms – gets stored at a single reference point in memory, associating the information in a manner that’s easy to understand. The agent has full access to this memory, allowing it to "remember" its entire history in the cave. The longer the agent survives, the better its chances to build an internal representation of the environment that can be referenced to help it succeed.

In the case of the agent with the large indexed memory space, the agent is responsible for storing and utilizing the information it receives. The data is not stored automatically, and what details the agent writes to memory is not limited in content or location. To access the complete history of its travels, the agent must evolve some mechanism for storing and retrieving the information it is presented with; no such mechanism is provided a priori by human hands.

PREPARING THE GENETIC PROGRAMMING ENVIRONMENT

The program used to evolve the agent and run the simulator was DGPC, a Genetic Programming algorithm written in C by D. Andre; it was run on an Intel Pentium computer running the Linux operating system (Red Hat 5.1) and a Sun Ultra-5 running Solaris 2.6. The parameters used in the simple no-memory case are given in table 1. Each agent is given 20 tries of a maximum of 25 moves to kill the Wumpus; each try confronts the agent with a new random cave.

The three CHOICE functions return the unique numbers of the connecting rooms. The PIT, BAT, and WUMPUS functions are "if…then…else…" statements (implemented as macros): if the indicated danger exists in an adjacent room, the first of two branches is evaluated; likewise, if the danger is not present, the second branch will be evaluated, giving the agent the ability to react to its environment. The MOVE and SHOOT functions take an integer value as a parameter and set that as the destination for the act specified; they also set an "action variable" accessible to the simulator which indicates the action to be performed. Both functions return the parameter passed in to them, so if, for example, the SHOOT function is passed the results of a MOVE CHOICE1, the end result will be SHOOT CHOICE1. Multiple actions are not permitted by the agent. The result of the result producing branch (RPB) being evaluated is the setting of the action and destination variables.

The simulator uses the values of the action and destination variables to make a move in the game. If either of the variables contains unusable information, for instance if neither a MOVE nor a SHOOT was evaluated, or a destination was chosen that was not available to the agent, no action is performed and the move is wasted.

Two automatically defined functions (ADFs) are defined for the reactive agent to allow oft-used functionality to be reused as described in Koza (1994). The ADFs are not given access to the MOVE or SHOOT functions; the RPB must determine the action that must be taken during a move. The impact of their not being included was not studied; the motivation behind their not being included in the ADFs was to minimize time running simulations by removing multiple setting of the action variable.

The 20 fitness cases used to test each agent were randomly generated, providing 27,907,200 different configurations of the cave, which reduce to 5,581,440 unique situations for the agent when the symmetry of the problem is considered. Once a "cave" is constructed, the simulation is run and the agent is sent exploring. When the simulation has ended, the number of moves the hunter made is added to a running total which is used after all 20 cases have finished; if the agent died in the cave, the maximum moves allowed is added to the total. The total fitness for an individual is calculated by multiplying the total moves the hunter made over the 20 caves by the number of times the agent failed to kill the Wumpus. Thus, killing the Wumpus is heavily rewarded, while finding the Wumpus in a short number of moves is only peripherally considered. The perfect Wumpus hunter would has a fitness of 0, indicating that for all 20 cases the hunter was able to find and kill the Wumpus.
 
Table 1: Tableau for agent simulation using no memory
Objective: Discover a program that allows a software agent (the hunter) to successfully navigate a search space of connected nodes to find and kill the Wumpus.
Function Set for the RPB: CHOICE1, CHOICE2, CHOICE3, MOVE, SHOOT, PIT, BATS, WUMPUS, ADF0, ADF1
Function Set for ADF0 and ADF1: Same as the set for the RPB, but without ADF0, ADF1, MOVE, or SHOOT.
Fitness Cases: A set of 20 randomly created caves
Fitness: Sum of moves used for each fitness case times the number of unsuccessful trials. If a hunter is killed in a trial, that trial’s fitness equals the maximum number of moves allowed.
Parameters: Population size (M): 5000 

Maximum number of generations (G): 100

Maximum number of nodes per individual: Both the result producing branches and the automatically defined functions are allowed a maximum of 100 nodes.
Success Predicate: A fitness of 0, indicating the hunter found and killed the Wumpus in all 20 fitness cases.

As expected, the first generation of agents does not produce very fit individuals. The majority of individuals in early generations do nothing as a result of no action being set during their evaluation. The most fit members of early generations are those which simply fire all three arrows in a specific spot and hit the Wumpus on a few of their 20 runs. An example LISP S-expression of the RPB of such an agent is the following, which achieved 7 successful hits during its trials in the caves:

(SHOOT (CHOICE3))

Later generations evolve to utilize movement, such as this individual which achieved 13 hits:

(WUMPUS (SHOOT (BATS (CHOICE3) (CHOICE2))) (MOVE (CHOICE3)))

This agent, if it detects a Wumpus, shoots an arrow either to choice 3 or choice 2 (depending on the presence or absence of bats). If the Wumpus is not nearby, the agent moves to the third choice of the rooms presented to it. The best of this set of agents achieved an average fitness of 12.2, which is better than average and is probably as good as (perhaps better than?) a human player with no set strategy. Regular, non-random behavior is exhibited that shows that an agent evolved through Genetic Programming can succeed in better than half the test cases.

EVALUATING THE UTILITY OF MEMORY

The parameters used by the simulation of agents with "auto-mapping" memory are identical to those used by the reactive case with a few exceptions. The CHOICE functions accept a parameter which specifies which memory location the choice is for. For instance, if the agent has previously visited room 7 and CHOICE2 has 7 passed into it, the function will return the second room choice for that room. If the room has not been visited, a non-usable number (-1 in this case) is returned to indicate no information exists for that room. The danger detecting functions PIT, BATS, and WUMPUS work in the same way; they take in an additional argument which is used to retrieve a value from the appropriate location in memory. The terminal CURRENT has been added to the function set, which returns the current room number. Using this function it is possible for the agent to access the state of its current location; for instance, PIT CURRENT is equivalent to the PIT function in the no memory case. All other parameters – population size, generations, etc. – are the same as those used to evaluate the hunter with no memory available.

A typical best-of-generation individual from the first generations is functionally equivalent to those found in the no-memory runs. For example, the individual:

RPB0:(SHOOT (CHOICE2 (ADF1)))

ADF1:(WUMPUS (CHOICE2 (CURRENT)) (CHOICE1 (CURRENT)) (WUMPUS (CURRENT) (CURRENT) (CURRENT)))

Is functionally equivalent to the reactive agent:

(SHOOT (CHOICE2))

This degeneration to a no-memory case is typical of the results produced for the entire set of agents produced by this setup; the best of run individuals for the entire set of runs averaged only 10.8 hits.

The parameters used to evolve an agent with randomly accessible indexed memory are quite different from those used in the previous two examples, and are summarized in table 2. For this set of agents, an array of 200 integers was made available; while not a great deal larger than the 120 locations needed to store the state of all rooms in the cave, it does give the agent opportunities to develop a customized memory system.

Functions to read from and write to memory are provided but depend on parameters passed in to them to determine which memory locations are accessed. Specifically, the functions READ and WRITE are provided, which take one and two parameters respectively. READ returns the value stored in the memory location of the parameter passed in; WRITE sets the value of the memory location determined by its first parameter to the value of its second parameter. To support the use of this memory model, the functions ADD, SUB, MULT, and DIV have been added to the function set, which perform the arithmetical operations addition, subtraction, multiplication, and protected division -- the last as defined in Koza (1992). To broaden the range of values the agent could use as an index, the terminal CONST is used, which takes no parameters and returns a random number between 0 and 199. A function IFGTZ is also provided which takes in three parameters. If the first parameter is greater than zero, the second parameter is evaluated, otherwise the third is. This function was provided to allow the agent to store and access boolean values.

In addition to changes in the function set, an additional result producing branch is used in conjunction with an additional automatically defined function. These additional branches are given the ability to read from and write to memory, while the original three branches may only read from memory and set the action to SHOOT or MOVE. During the fitness evaluation of an individual, the memory-writing branch is evaluated first, followed by the branch that can read and act, maintaining the differentiation of memory writing and memory using steps.
 
Table 2: Tableau for the indexed-memory hunter
Function Set for RPB 1: CHOICE1, CHOICE2, CHOICE3, MOVE, SHOOT, PIT, BATS, WUMPUS, ADD, SUB, MULT, DIV, IFGTZ, CURRENT, CONST, READ, ADF0, ADF1
Function Set for ADF0 and ADF1: Same as the set for RPB 1, but without ADF0, ADF1, MOVE, or SHOOT.
Function Set for RPB 2: CHOICE1, CHOICE2, CHOICE3, CURRENT, READ, WRITE, CONST, PIT, BATS, WUMPUS, ADD, SUB, MULT, DIV, IFGTZ, ADF2
Function Set for ADF 2: Same as for RPB 2, sans ADF2

Initial hunters produced by this scheme are predictably poor, but the RPB which sets the action has access to current variables so is able to reproduce the fitness levels found in the no-memory cases. Unfortunately the same logic applies to the best cases of later generations; the access the action-choosing RPB has to its environment allows it to degenerate into the reactive case. The average hits of the best-of-set individuals was a disappointing 11.1.

CONCLUSIONS

As stated in Andre (1994), "GP is known for exploiting loopholes," and that is obvious as one evaluates the simulation results of the memory using agents presented here. The fact that the no-memory reactive model which evolved agents able to react to their immediate environment with no regard for their past actions produced a higher best-of-set average than either of the two memory based sets leads one to believe that in the latter sets, much of the computational effort was used to evolve mechanisms around the "impediment" of memory. In the "Hunt the Wumpus" game, there is no obvious way to produce a two-stage agent such as that used by Andre and Brave, so a more clever method needs to be found to force a genetically evolved agent to utilize memory to achieve a higher fitness.

The author of "Hunt the Wumpus" believed that the optimal method of finding and killing the Wumpus would involve making a map of the rooms traversed and, using information given to the player during the game, deducing the exact whereabouts of the Wumpus. Through analyzing these results and watching humans play the game, it becomes obvious that while such deviousness on the part of the player may produce a higher success rate, the rates of success obtained via a mildly clever strategy are high enough to deter all but the most dedicated from pursuing the optimal strategy. Finding a natural analogue to this situation is easy; of the millions of species on this planet, very few utilize more than a simple mental model, and the vast majority operate in a purely reactive mode.

FUTURE WORK

Associative memory such as that used by Brave (1996) may prevent the memory evading tricks discovered here from evolving. Finding a method of creating a two step planning/acting agent using sensory deprivation for the action-producing branch may also prove useful if one could be found that conforms to the rules of "Hunt the Wumpus." It may eventually be shown, however, that the cost of creating a complex memory model that produces the perfect hunter is too extreme and the best agent for this world is one that simply reacts to its environment.

ACKNOWLEDGEMENTS

The author would like to thank Eric S. Raymond, whose C translation of the original BASIC Wumpus code formed the basis for the simulator discussed in this paper.

BIBLIOGRAPHY

Ahl, David H. 1978. Basic Computer Games. New York, NY: Workman Publishing.

Andre, David. 1994. Evolution of Mapmaking: Learning, planning, and memory using Genetic Programming.

Brave, Scott. 1996. The Evolution of Memory and Mental Models Using Genetic Programming. Genetic Programming: proceedings of the first annual conference, 1996. Cambridge, MA: The MIT Press 1996. Pages 261-266.

Koza, John. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. Cambridge, MA: The MIT Press.

Koza, John. 1994. Genetic Programming II: Automatic Discovery of Reusable Programs. Cambridge, MA: The MIT Press.


(C) Copyright 1999 by Jim Lesko
Return to Cafe Oblivion