Interactive movies, in which interaction capabilities are introduced into movies, is considered to be a new type of media that integrates various media, including telecommunications, movies, and video games. In interactive movies, people enter cyberspace and enjoy the development of a story there by interacting with the characters in the story. In this paper, we first explain the concept of interactive movies using examples of movies developed on a prototype system, then describe techniques for improving the interactivity.
The current system incorporates two significant improvements for multimedia interactivity: the introduction of interaction at any time and two-person participation through the use of network communications. The software and hardware configurations of the system are briefly summarized. The paper concludes with an example of an interactive story installed in this system and briefly describes the interaction between the participants and the system.
Ever since the Lumiere brothers created Cinematography at the end of the 19th century[1], motion pictures have undergone various advances in both technology and content. Today, motion pictures, or movies, have established themselves as a composite art form in a wide domain that extends from fine arts to entertainment. Movies have the power to draw viewers into a virtual world where they can actually "experience" the development of a story. Based on the storytelling power that is embedded in novels and other forms of literature, movies add visual images and sound to create a virtual world (cyberspace).
The use of interaction technology, on the other hand, gives movies much greater inherent possibility than the current forms of movies. Conventional movies present predetermined scenes and story settings unilaterally, so audiences take no part in them and make no choices in story development. As a further step, the use of interaction technology makes it possible for the viewer to "become" the main character in a movie and enjoy a first-hand experiences. We believe that this approach would allow producers to explore the possibilities of a new class of movies.
Based on this viewpoint, we have been conducting research on interactive movie production by applying interaction technology to conventional movie making techniques. As an initial step in creating a new type of movie, we have produced a prototype system[4]. Based on this system, we are currently developing a second prototype system with many improvements. This paper briefly describes the first prototype system and outlines its problem areas and required improvements. The paper also introduces the configuration of the second prototype system, which is now under development by incorporating the described improvements.
(1) An interactive story that develops differently depending on the interaction of the audience;
(2) An audience that becomes the main character and experiences the world created by the interactive story;
(3) Characters who interact with the main character(audience) in the story.
Interactive movies have the following functions:
(1) The use of CG technology and the generation of three-dimensional imagery create a virtual reality that the audience perceives as the actual surroundings;
(2) The audience can enjoy the story development by interacting with the characters in cyberspace through talking and gesturing.
(1) Telecommunications
Research has been conducted to reproduce a three-dimensional visual image of a person and a background at a distance displayed, so that another person can communicate with that person in a seemingly face-to-face situation. An example of the results achieved by this research is a teleconferencing system with true-to-life presence[5]. This communication concept is implemented as an advanced version of a teleconference and thus does not include a story development feature.
(2) Movies
Movies use dynamic images and sound to provide strong input to the human visual and auditory sensory systems and draw audiences into a cyberspace. The rapid advance of CG technology in recent years has enabled the production of very realistic images of extraordinary worlds and phenomena that do not exist or happen in the real world. There have been attempts to add an interactive function to those movies. However, such efforts were limited to a primitive level: several different story developments were prepared, and an audience selected from available options.
(3) Video games
Video games, particularly role playing games (RPGs), provide a reality similar to that created by novels but in a game format. In an RPG, a basic story is preset. The player controls the story development by manipulating the main character. In a sense, video games have many similarities to interactive movies. In a video game, however, interactions are carried out by operating buttons, and this is a major difference from the natural interaction that an interactive movie allows an audience to engage in.
(4) Other media
Various experiments have been conducted to create a cyberspace in which an audience can interact with a virtual reality or with movie characters. These include computer-generated characters[2][9] that interact with people through gestures and emotional expressions, and interactive art[8]. However, the interaction provided by these media is short in time duration, and no story is constructed.
The script manager stores the data of the state transition network and controls the scene transition according to the interaction result. The scene manager contains descriptive data of individual scenes. The descriptive data includes background scene/music with their starting times, character animations/dialogs with their starting times, and so on. The scene manager generates each scene by referring to the descriptive data of the scene specified by the script manager. The interaction manager is under the control of the script manager and scene manager, and it manages the interaction in each scene. An interaction is achieved by the speech recognition function and gesture recognition function. The handlers are controlled by the scene manager and interaction manager. They control various input and output functions. The speech recognition handler controls the speech recognition function. The speech recognition program is equipped with speaker-independent and continuous speech recognition functions based on HMM[7]. The gesture recognition handler controls the gesture recognition function. Gesture recognition software detects several characteristic points of a human figure based on an image captured by a camera[10]. The image handler controls the output of visual images, such as background images and character animation. Finally, the sound handler controls the output of sounds, such as background music, sound effects and character dialogs.
(1) Participation in cyberspace
a) Number of participants
In the first system, the basic concept was a story with just one player acting the role of the hero. However, the first system lacks certain functions needed for the story to take place in cyberspace since cyberspace will be created over a network and will require the story to develop from not just one player, but from among several players participating at the same time.
b) Presence or absence of an avatar
When players participate in a story in cyberspace, the question then becomes whether to give the players a persona (avatar). The first system did not use avatars because the one player always acted out the leading role. Even so, players never really felt that they were active participants with the system because aspects such as clothing never matched the situation in cyberspace (being in the past or on an unknown planet). To that end, we conducted research in areas such as a virtual Kabuki system[6] that uses avatars to change players into forms (a Kabuki actor in this case) that fit in cyberspace, and the results suggested the benefits of using avatars.
(2) Interaction
a) Frequency of interaction
Interaction in the first system was generally limited to change points in the story, so the story progressed linearly along a predetermined course like a movie except at these change points. Even though there are certain advantages to this technique, such as being able to use movie story development technology and expertise, the disadvantage of fixed story elements created as in a movie is that the player seems to end up a spectator who finds it difficult to participate interactively at points where interaction is clearly required. The limited opportunities for interaction create other drawbacks for the player, such as having little to distinguish the experience from watching a movie and having a very limited sense of involvement.
b) Types of interaction
The interaction technologies used in the first system were voice and gesture recognition. However, only exaggerated gestures rather than minute gestures could be recognized because of low lighting in the area where the system played. As a result, the system ran almost exclusively on voice recognition and players were limited to available modalities that allowed only simple interaction.
(1) Participation in cyberspace
a) System for multiple players
Our initial effort to develop a system for multiple players allowed two players to participate in cyberspace in the development of a story. The ultimate goal was to create a multi-player system operating across a network, but the first step in the present study was to develop a prototype multi-player system with two systems connected by a LAN.
b) Avatar representation
We used a system that showed avatars as alter egos of the players on screen. There were several advantages to this system, as outlined below.
System for representing the avatar: The relationship of the player to the avatar and the relationship of the avatar to other characters in the movie can be controlled in various ways by changing the representational form of the avatar.
System for controlling the avatar: The basic control system inputs player movement by using magnetic sensors and uses that movement to map avatar movement in a motion capture system. Giving autonomy to avatar movement enables complex movement that combines autonomous avatar movement with player movement. By varying the proportion of each movement with time and circumstance, player movement can be used directly or the player can introduce desired movement that together add diversity and depth to the relationship of the player to cyberspace.
(2) Interaction
a) Introduction of interaction any time
To increase the frequency of interaction between the participants and the system, we devised a way for players to interact with cyberspace residents at any point in time. Basically, these impromptu interactions, called story unconscious interaction (SUI), occur between the players and characters and generally do not affect story development. On the other hand, there are sometimes interactions that affects story development. This kind of interaction, called story conscious interaction (SCI), occurs at branch points in the story, and the results from such an interaction determine the future development of the story.
b) Introduction of multimodal interaction
The following interactive functions were added to the primary interactive function of voice recognition.
Emotion recognition: To realize interaction at any time, an emotion recognition capability was introduced. When players utter spontaneous utterances, the characters react by using their own utterances and animations according to the emotion recognition result. Emotion recognition is achieved by using a neural-network-based algorithm[9].
Motion capture: We used a motion capture system based on magnetic sensors attached to applicable parts of the player's body in order to reflect player movement in avatar movement. Data from the magnetic sensors is input into the system in order to move the computer graphic avatar so that players get the feeling they are controlling the movement of their avatar. This provides another form of interaction at any time.
Gesture recognition: We captured motion with magnetic sensors and used an HMM to process data from the sensors in order to recognize 3-D gestures, minute gestures and gestures under low-light conditions. Gesture recognition results are used for SCI. Fig. 3 shows an overview of the second system equipped with these functions.
(1) System structure concept
While the first system stressed story development, the second system had to achieve a good balance between story development and impromptu interaction by incorporating the concept of interaction at any time. This required building a distributed control system instead of a top-down system structure.
There is some variety in the architectures available for distributed control systems, but we chose to use an action selection network [3] that sends and receives activation levels among multiple nodes. These levels activate nodes and trigger processes associated with the nodes at a point beyond the activation level threshold.
(2) Script manager
The role of the script manager is to control transitions between scenes, just as it did with the first system. An interactive story consists of various kinds of scenes and transitions among scenes. The functions of the script manager are to define the elements of each scene and to control scene transitions based on an infinite automaton(Fig. 5). The transition from a single scene to one of possible consecutive scenes is decided based on the SCI result sent from the scene manager.
(3) Scene manager
The scene manager controls the scene script as well as the progress of the story in a scene. Action related to the progress of the story in a scene is called an event, and event transitions are controlled by the scene manager. Events for each scene derive from the following elements.
1) Scene images
2) Background music
3) Sound effects
4) Character animation and character speech
5) Player and character interaction
The script for each scene is stored ahead of time in an event network, and the scene manager generates each scene based on data from the script manager via a script in the format shown in Fig. 6. The timing for transition from one event to the next was controlled by the scene manager in the first system, but absolute time cannot be controlled in the second system because it incorporates the concept of interaction at any time. However, relative time or time order can be controlled in the second system, so the action selection network was applied here as well. The following describes how this works.
1) Activation levels are sent or exchanged among events as well as external events.
2) An event activates when the cumulative activation level exceeds the threshold.
3) On activation of an event, a predetermined action corresponding to the event occurs. At the same time, activation levels are sent to other events, and the activation level for the activating event is reset. The order of events can be preset, and variation as well as ambiguity can be introduced into the order of events by predetermining the direction that activation levels are sent and the strength of activation levels.
(4) Interaction manager
The interaction manager is the most critical aspect for achieving anytime interaction. Figure 7 shows the structure of the interaction manager. The basis for anytime interaction is a structure where each character (including the player's avatar) is allotted an emotional state, and interaction input from the player as well as interaction between the characters determines the emotional state as well as the response to that emotional state for each character. Some leeway is given to how a response is expressed depending on the character's personality and circumstances. The interaction manager is designed based on the concepts outlined below.
1) Defining an emotional state
The state and intensity of a player's (i = 1, 2...) emotion at time T is defined as follows:
Ep (i, T), sp(i, T) where sp(i, T) = 0 or 1
(0 indicates no input and 1 indicates an input.)
Similarly, the state and intensity of an object's (i = 1, 2...) emotion at time T is defined as follows:
Eo (i, T), so(i, T)
2) Defining the emotional state of an object
For the sake of simplicity, the emotional state of an object is determined by the emotional state when player interaction results from emotion recognition.
{Ep(i, T)} -> {Eo(j, T + 1)}
Activation levels are sent to each object when emotion recognition results are input.
sp(i, T) -> sp(i, j, T)
Where sp(i, j, T) is the activation level sent to object j when the emotion of player i is recognized. The activation level for object j is the total of all activation levels received by the object.
so(j, T + 1) = · sp(i, j, T)
3) Exhibiting actioAn object that exceeds the activation threshold performs action Ao(i, T) based on an emotional state.
More specifically, action is a character's movement and speech that is a reaction to the emotional state of the player. At the same time, activation levels so(i, j, T ) are sent to other objects.
if so(i, T) > THi
then Eo(i, T) -> Ao(i, T), Eo(i, T) -> so(i, j, T)
so(j, T + 1) = · so(i, j, T)
This mechanism creates interaction between objects, and enables more diverse interaction than simple interaction with a one-to-one correspondence between emotion recognition results and object reactions.
(1) Image output subsystem
Two workstations (Onyx Infinite Reality and Indigo 2 Impact) capable of generating computer graphics at high speed are used to output images. The Onyx workstation is used to run the script manager, scene manager, interaction manager and all image output software. Character images are stored on the workstations ahead of time in the form of computer graphic animation data in order to generate computer graphics in real time. Background computer graphic images are also stored as digital data so background images can be generated in real time. Some background images are real photographic images stored on an external laser disc. The multiple character computer graphics, background computer graphics and background photographic images are processed simultaneously through video boards on both the Onyx and Indigo 2 workstations.
Computer graphics are displayed in 3-D for more realistic images, and a curved screen is used to envelop the player with images and immerse the player in the interactive movie world. Image data for the left and right eye, created on the workstations ahead of time, are integrated by stereoscopic vision control and projected on a curved screen with two projectors (Fig. 9.). On the Indigo 2 end, however, images are output on an ordinary large-screen display without stereoscopic vision because of processing speed.
(2) Voice and emotion recognition subsystem
Voice and emotion are recognized with two workstations (Sun SS20s) that also run the voice and emotion recognition handlers. Voice input via microphone is converted from analog to digital by the sound board built into the Sun workstation, and recognition software on the workstation is used to recognize voices and emotions. For the recognition of meaning, speaker-independent speech recognition algorithm based on HMM is adopted also in the second system. Each workstation processes voice input from one player.
(3) Gesture recognition subsystem
Gestures are recognized with two SGI Indy workstations that run the gesture recognition handlers. Each workstation takes output from magnetic sensors attached to a player and uses that data output for both controlling the avatar and recognizing gestures.
(4) Sound output subsystem
The sound output subsystem is comprised of several personal computers because background music, sound effects and speech for each character must be output simultaneously. Sound effects and character speech are stored as digital data that is converted from digital to analog as needed, and multiple personal computers are used to enable simultaneous digital to analog conversion of multiple channels in order to output these sounds simultaneously. Background music is stored ahead of time on an external compact disc whose output is also controlled by the personal computer. The multiple channel sound outputs are mixed and output with a mixer (Yamaha 02R) that can be controlled by computer .
a) There are two main characters, Romeo and Juliet, in the story and, therefore, this story supplies a good example of multi-person participation.
b) "Romeo and Juliet" is a very well known story, and people have a strong desire to act out the role of hero or heroin. Therefore, it is expected that people can easily get involved in the movie world and experience the story.
The main plot of the story is as follows. After their tragic suicide their souls are sent to Hades, where they find that they have totally lost their memory. Then they start their journey to rediscover who they are and what their relationship is. With various kinds of experiences and with the help and guidance of characters in Hades, they gradually find themselves again and finally go back to the real world.