Second Generation Adventure Games Reprinted from The Journal of Computer Game Design Volume 1, number 2, (August 1987): pages 4-7 Copyright 1987 by David Graves dag@hpsemc.cup.hp.com Expanding the horizons of the traditional adventure game format has created a new genre in literary entertainment: the participant novel. In a participant novel, the player is more a character in a plot than a solver of puzzles. Implementing a participant novel demands special computer modeling techniques. As the program increases in complexity, use of the proper representation of the novel's simulated environment becomes critical. Without a strong model of the world and its physical laws, and without a dynamic interface to this world, the developer is doomed to producing a limited and unstimulating game. The field of Artificial Intelligence provides programming constructs useful for creating these games of greater complexity and sophistication. This paper will present a few ideas borrowed from the field of AI: use of lists and trees, attributes, natural language parsing, English generation, semantic frames, and goal-oriented routines, and show how they support the complex model that a participant novel requires. All of the ideas presented have been successfully applied to a working game system. LISTS AND TREES Given that a participant novel will be proliferated with a great many people, places, and things (hereafter called "objects"), there must be a dynamic method for tracking and altering the position and relationships of each physical object. One can create a model of the world as lists of containers: cities contain houses, houses contain rooms, rooms contain things, and things contain other things. The traditional method for representing complex relationships of objects is through lists of lists, also called trees. Under this arrangement, every object has pointers to its parent (the thing that contains it), its children (the things it contains), and its siblings (the things in the same place with it). This can be coded as three arrays: parent, child, and sibling. Each slot in the array contains either an object's identifying number or "nil" (a null pointer). An object that is empty would have nil in its child slot, and an object that is alone in a container would have nil in its sibling slot. (Object numbers would not be declared explicitly, of course; they would be assigned to each object's symbol name by a compiler). Objects containing several other objects are represented using the child and sibling arrays together. The container points to the first child and the first child points to his sibling, who points to the next sibling. The end of the list is indicated with a nil sibling pointer. For example, a chest containing a lamp and a rope would have the object number of the lamp in its child slot, and the object number of the rope would be in the lamp's sibling slot. The lamp's sibling slot would be nil. Both the lamp and rope would have the object number of the chest in their parent slots, and nil in their child slots. The sibling and child links define a formal tree. The links from a child back up to its parent result in a directed graph rather than a tree proper. The inclusion of the parent links allow easier traversal of the tree in either direction. Representing the world in this way makes it very easy to manipulate groups of objects. Moving a chest to new location involves changing only the pointers of the old and new locations and the chest. The contents of the chest never need be considered. No matter how large the object tree becomes, moving any object and all of its contents can be done by changing only a few pointers. Since people, places and things are all represented in the same type of data structure, only one routine is required for all types of object movement. It is important that the constructs used in the model be simple and consistent. The model should be general enough that special constructs will not be required each time the programmer thinks of a new type of object to implement. For example, supporting surfaces (such as tables and chairs) can be implemented using the same parent, child, sibling scheme. The objects resting on the table are its children. The internal representation of the table and its children is identical to the container representation, so supporting surfaces are easily added to the model. ATTRIBUTES OF OBJECTS All objects have characteristics or "attributes" that define that object's limitations. Some attributes are boolean (stating whether this object is flammable or immovable for example) and some may have a numeric value associated with them, such as the weight and volume of an object. Some attributes are unchangeable (such as flammability and mass) and some are variable, describing only the current state of an object (i.e. closed, locked, and burning). Using attributes to define how an object may be manipulated within the model world allows the programmer to define a much more consistent and dynamic world. Consider the case where the programmer creates a fireplace and some logs. He adds some special case logic to his program that will support burning the logs with the expectation that the player will attempt to burn the logs in the fireplace. However, if the player tries burning a chair in the fireplace, nothing happens because the luckless programmer didn't think of the player trying that. This mess can be avoided by building the game's verb routines around a system of attributes. By defining the attribute "flammable" and making the Burn verb check for the presence of this attribute, the programmer allows the player to burn down anything flammable, such as furniture, papers, doors, and fireplace logs. Thus, he defines the generalized phenomenon of "burning" rather than coding individual cases. Once the programmer has decided on the set of attributes he wishes to support, he need only write individual action routines that manipulate those attributes, then decide which attributes apply to each of his objects. Thus, some whiskey could be defined as having the attributes liquid, edible, and flammable. Groups of attributes can work together in subtle and wonderful ways, allowing the player to perform activities in ways not even considered by the programmer. Would you think of putting moonshine whiskey in a lantern if you had no kerosene? RECOGNIZING NATURAL LANGUAGE Understanding natural language has proven to be a difficult problem in AI, but much can be accomplished in a participant novel by recognizing relatively simple statements in the following syntactic form: [Subject] VerbClause [DirectObject] [Preposition IndirectObject] Optional sections of the syntax are enclosed in brackets. Each word in a command typed by the player is looked up by a dictionary routine and is identified as a verb, noun, adjective, or whatever. The most simple command or "statement of intent" is simply a verb. Instructions to another person are indicated by using that person's name as the subject of the sentence, as in "Sam, run!". Omitting the subject of the sentence implies that the player himself is the actor. Direct and indirect objects are noun clauses, which consist of some optional adjectives followed by a noun. Prepositions like "at," "in," and "on" separate a direct object from an indirect object. Conjunctions (such as "and"), punctuation, and articles (such as "a" and "the") can be tossed out by the parser as "noise", or verified to appear in the expected positions. This syntax will accept input strings such as "Put the gray stone into the large oak chest". By further extending the parser to allow a list of objects as a direct object and allowing sentences to be strung together, the program can recognize statements like "Open the chest, take the lamp, the knife, and the long rope, then climb the stairs". Since everyone has a slightly different way of saying what they mean, the program must have a very large dictionary with links between synonyms. Thus, "Set the lamp on the table" and "Place the lantern onto the table" could be recognized as having the same meaning. Travel is something that happens frequently in this type of game, so it should be very easy for the player to express what is desired. The parsing scheme above will recognize travel syntax such as "Climb the tree" or "Enter the cottage," but not "Go east" or its abbreviated forms, "East" and "e". Recognition of these few irregular forms can be added to the parser as a special case. The parser should be able to recognize modifiers for verbs. The simple case is the use of an adverb, as in "Carefully open the door". A more complex case is where the player uses a verb and a preposition together in a way that changes the meaning of the verb. For example "put down" means drop, "put out" means extinguish, and "put on" means wear. This can be implemented by creating a table with verb/preposition pairs followed by the resulting verb. When a verb/preposition pair in the command string matches an entry in the table, it can be replaced with the resulting verb. Parsing then continues as if no preposition has been seen yet. This technique also allows recognizing a command with a dangling preposition such as "Take the cloak off" which would otherwise be rejected by the parser as an incomplete sentence. We are all accustomed to using pronouns (such as "it") and expecting our listener to resolve what we mean by reviewing the preceding context. A natural language parser might be expected to do the same. A simple rule for implementing parsing of "it" could be "Replace the 'it' with the last direct object named". This allows proper parsing of "Put the lamp on the table and light it," but not "Put the lamp into the chest, then close it". In the second statement, the player probably meant to close the chest, not the lamp. Changing the rule to use the last direct or indirect object in place of "it" won't work in all cases either. The effect of this rule on the command "Put the lamp on the table and light it" would be to set the table on fire. While it is possible to define the semantics of "it" for each of these cases through complex programming, it might make more sense to choose a simple rule, such as "use the last direct object in place of 'it' in all cases," then inform the user of the limitations of the parser through the game documentation. The parser could be expanded to allow words such as "everything" or "all" in place of a direct object. This lets the player make statements like "Drop everything into the chest". Note that the semantics of "everything" change with the verb used. In "Drop everything," "everything" really means "everything that I'm holding," while in "Get everything," it means "everything that I'm not holding". For some verbs, such as "Examine," the scope of the word "everything" is "all visible objects, whether held or not". The player may also wish to set the scope of "everything" to a limited domain, as in "Look at everything on the table". Regardless of how "everything" is used in a sentence, it can be implemented by simply calling the verb action routine once for every object within the the command's scope. GENERATING ENGLISH While interacting with a participant novel, the player is going to read vast quantities of text. Some of this may be "canned text" and some will need to be generated. The constant attributes of a location could be described with fixed text, but certainly not the descriptions of the objects there. Since most objects can be moved around, placed in and on each other, and change state, the program must be capable of dynamically producing the text to describe them. The key to generating interesting text is to vary the length of the sentences generated, vary the sentence structure, and use different synonyms whenever possible. For example, to describe the "children" of a table the program might generate: "There is a knife, some bread, and a key on the table," or "A knife, some bread, and a key are on the table," or "Resting on the table are a knife, some bread, and a key". The description generator would have a list of synonymous phrases such as "resting on," "lying on," and "sitting on". A text generator must pay careful attention to the issue of grammar. When a sentence contains a list of objects, the objects must be counted so that the conjunction "and" can be inserted before the last object. Lists containing more than one object will use "is" instead of "are" (except when using the form "There is a ...."). Each object in a list should be followed by a comma (except the last two) and preceded by a the appropriate article ("a," "the" or "some"). Selecting the article for an object depends on the attributes of the object and the type of sentence. When introducing an ordinary object, the article is "a" or "an". An object that has the attribute of being "a quantity" (bread, wine, cheese) the article would be "some". Familiar people would be mentioned by name, without any preceding article. Objects that were mentioned recently would have the article "the". The attributes of an object will also dictate the type of preposition used in the generated text. For instance, describing the children of a surface (such as a table) would dictate the use of "on," while containers would have things "in" them. The result of the player's actions can be reported via a text generation facility as well. When the player enters a vague command such as "Drink," the program would give all the details of what happened, like "You drink some cool water from the stream," thus informing the player that his canteen has not been taxed. This reporting mechanism should make use of pronouns (like "he," "she," or "it") when an object is mentioned more than once in a sentence, to avoid generating text that sounds too mechanical. An example of this would be "You give the bowling ball to Sam, but he drops it". As before, the attributes of an object will aid in selecting the appropriate pronoun. FRAME-BASED EXCEPTIONS Any non-trivial model of reality will contain a large number of objects whose characteristics and behaviors differ from normal expectations. These exceptions might range from personality quirks to magical or "high- technology" physical properties that deviate from the norm. A computer model of such a world needs a simple method to define, organize, and deal with all these exceptions. One method is to attach to each exceptional object some information about the context in which an exception applies, and the effect of invoking that exceptional behavior. The effect is defined as a fragment of executable code. The context is defined in a semantic frame. In AI, a frame may be defined as "an instantiation of a context". In this scheme, a frame is a simple template for a single semantic action using a specific verb type and specific objects. Any object can point to a unique frame, which points to a code fragment. This way, a frame defines the semantic context in which an exception handling code fragment applies. [Addendum, August, 1991: Note that in object-oriented languages, frames are a built-in construct (usually called "selectors"). Thus an object-oriented language makes it very easy to implement a base of general rules with hierarchies of exceptional rules (methods)]. Assume we had modeled the mouth of a volcano as a large container. Normal semantics for an object tossed into container dictate that the object will come to rest inside the container. In this case, however, we want to have the object be completely destroyed, thus eliminating any possibility of getting it back. The volcano could then be defined with a frame that reads "Throw anything into the volcano", to provide the context in which the exception will be invoked. Whenever an actor throws any object into the volcano, this frame is selected as matching the actor's action. After the default processing for this action, the code fragment attached to the frame is executed. The code fragment would cause the object to be destroyed, and perhaps display a colorful description of the event. Frames may specify a specific context (by requiring a specific verb, direct object, and indirect object) or a more general context (with a range of verbs, and "wild card" matching of direct or indirect objects). Moreover, some frames must take precedence over others. Any action might match the frames attached to the direct object, the indirect object, the location, or the character performing the action. Since an action might invoke several conflicting exceptions, the game program must select the one frame that is the best match to this action. In most cases, "best match" means selecting the matching frame with the most specific statement of action. (Other criteria for precedence may be used as well, such as selecting the frame attached to the direct object rather than the indirect object). Consider the case of an explosive with a context frame stating "Throw the explosive into the volcano". This frame is more specific than the volcano's "Throw anything into the volcano", so the explosive's frame is selected. The code fragment can then define some effect that is an exception to the "normal" exception. An object may have any number of frames attached to it, and multiple frames may point to the same code fragment. A code fragment may override the default semantics for an action completely, or simply append additional semantics. Finally, objects may be defined which inherit frames from other objects, which speeds development and lowers the chance of coding error when defining objects with similar exceptional properties. It is important to note that the use of semantic frames is intended only for those actions or properties that are exceptional, and should therefore be used as sparingly as possible. Attaching the same exception frame to many objects can be an indication that it is not so exceptional after all. In this case, one may want to define a new object attribute for this "non-exceptional" exception, and handle those objects via the normal semantics routines. Incidentally, one can use a scheme of frames and code fragments to control behavior for simulated actors in the story, but the results are very limited. The actor becomes only a "stimulis-response" being; he responds only to actions or commands from the player, and speaks only when spoken to. For example, an actor named Sam might have a frame attached to him, stating "Throw anything at Sam". If the player (or another actor) throws any object at Sam, the attached code fragment is executed, which might put out a message like: "Sam is outraged by the insult, and draws his weapon". Clearly, the actor can only demonstrate those behaviors that were pre-programmed. Things become much more interesting when actors are capable of original behavior. Each computer-controlled actor should be able to recognize a need, then create a plan to fulfill it. Different personality types (based on attributes, once again) would give rise to different types of behavior. GOALS AND SUB-GOALS A standard feature in many adventure games is the requirement that the player remember long sequences of actions in minute detail and type these sequences when needed. Forgetting to perform one of the steps in a sequence results in an error message. For example, if the player sees a bottle of beer on the table and says "Drink the beer," he is likely to get an error message like "You aren't holding the beer" or "The beer isn't open". Memorizing detailed sequences of instructions is not what most people would call fun. Wouldn't it be much nicer if the computer could just "understand" the player's intent? It turns out that it is relatively easy to create a system where the programmer defines a corrective action for such an error, instead of giving an error message. By defining the prerequisite state attributes for any action, the system can automatically break a goal into sub-goals. Thus, when the Drink routine finds that the player is not holding the beer, a new sub- goal is set: get the beer. Upon resolving that sub-goal, the Drink routine is re-entered. It then checks to see if this container is open and if not, it sets a new goal: open the container. In the end the player is told of how all these events proceeded with a message like: "You take the beer from the table, open it, and drink from it". Filling in the implied prerequisite actions is a simple form of planning, called backwards chaining. Any new goals must be stored on a stack rather than in a static structure, since any goal can create new sub-goals, which may create others. Each goal must be resolved in order, the latest goal first. The game program simply attempts the action on the top of the stack, pops it off if successful, or pushes new sub-goals if needed. It is important to note that only errors concerning variable attributes are correctable. Errors caused by conflicts in fixed attributes (such as attempting to set fire to a stone) are not correctable by creating a new sub-goal. If a non-correctable error occurs, the player is informed and his goal stack is cleared. Note that once all of the verb routine have all their prerequisite states defined as sub-goals, it becomes very easy to simulate intelligent behavior by other actors in the story. For example, if the player asks another character to "Go out, get the rope and return," the character appears to make intelligent decisions like opening the door before attempting to exit, and untying the rope from a tree before attempting to walk away with it. The pace of the game remains brisk since the player need not specify each detailed step in a process. The software "understands" what is implied by the player. SUMMARY By selecting a simple, yet powerful representation for the system, the designer can create a much more dynamic, consistent model of the novel's fantasy world. Building each verb routine around a set of attributes instead of coding special cases allows the player to perform actions that were not thought of by the designer. A well defined parsing scheme, aided by a large dictionary, allows a player to form his commands many different ways, instead of just the one "right way". Dynamic generation of text in a variety of formats can make the computer generated text much more interesting for the player to read. Defining exceptions to normal semantics via frame-activated code fragments aids in managing the complexity of such exceptions. Routines that can detect simple state errors and create correcting sub-goals can fill in the implied actions, and thereby help the game keep pace. Of course, even the best software needs a well written story to make an interesting participant novel, but that a topic for another paper altogether.