Merging Real and Virtual Worlds

Paul Milgram, David Drascic, Julius J. Grodski*, Anu Restogi, Shumin Zhai, Chin Zhou

Ergonomics in Teleoperation and Control Laboratory (ETC-Lab)
Industrial Engineering, University of Toronto
4 Taddle Creek Road, Toronto, Ontario, Canada M5S 1A4

* Defence and Civil Institute of Environmental Medicine,
PO Box 2000, Station A, Downsview, Ontario, Canada M3M 3B9


Proceedings of IMAGINA'95, Monte Carlo, Feb 1-3 1995.

(c) Copyright 1995.


Abstract

An introduction is given to the concept of Augmented Reality within the context of the continuum spanning completely real environments and completely virtual ones. Augmented Reality (for visual displays) is defined as pertaining to any otherwise completely real scene which is somehow enhanced by means of computer graphics. Specific research on monitor based (as opposed to head-mounted) Augmented Reality is then summarised, based on extensive experience with the ARGOS (Augmented Reality through Graphic Overlays on Stereovideo) system. Particular implementations and applications discussed include virtual pointers, virtual tape-measures, virtual tethers and virtual control. Illustrations include two sets of stereo pair Augmented Reality images suitable for free viewing.

Keywords: Augmented reality, mixed reality, virtual reality, augmented virtuality, virtual control, telerobotic control, stereoscopic displays.


1. Introduction

In this paper we present a survey of our research on monitor-based Augmented Reality (AR), a display technique which "does not oppose real and virtual worlds, but fuses them in an intimate symbiosis" (Imagina95). We commence by discussing the concept of Augmented Reality and its relationship to Virtual Reality within the broader framework of Mixed Reality. We then review a number of particular applications of our system for Augmented Reality through Graphic Overlays on Stereovideo (ARGOS).

The topic of Augmented Reality has begun to appear in the literature with increasing frequency, usually in conjunction with some treatment of the better known subject of Virtual Reality (VR). However, there is currently little consensus on precise definitions of either VR or AR. VR, for example, is used to refer to systems ranging from totally immersive computer generated virtual environments, to interactive desktop computer graphic applications, to text-only "Adventure"-style computer games. The term "Augmented Reality" is also used in different ways by different people, without what could reasonably be considered a consistent definition. We use AR to refer to real scenes that are enhanced or "augmented" with computer graphics. Although in terms of their fundamental properties VR and AR may appear to be quite different, they face many of the same issues, and much of the research and technology of one pertains to the other.

In general, a Virtual Reality environment is one in which the user is immersed in a completely synthetic world, which mimics the properties of a real-world environment to a certain extent, and which may also exceed the bounds of physical reality by creating a world in which the physical laws governing gravity, time and material properties no longer hold. In contrast, the real-world environments of Augmented Reality systems are obviously constrained by the laws of physics, which necessarily impose certain restrictions on one's ability to interact with the world. AR tools are designed to facilitate such interactions. Rather than regard the concepts of VR and AR as antitheses, however, it is more instructive to view them as lying at opposite ends of a continuum, which we refer to as the Reality-Virtuality (RV) continuum (Milgram et al, 1994).

Figure 1: Schematic Representation of the Reality-Virtuality (RV) Continuum.
Augmented Reality (AR) and Augmented Virtuality (AV) are special cases of Mixed Reality (MR), within the RV continuum.

Fig 1a: Completely Real
Fig 1b: Augmented Reality
Fig 1c: Augmented Virtuality
Fig 1d: Completely Virtual

Figures 1c and 1d were kindly provided by the ATR Communication Systems Research Laboratories, Kyoto, Japan, as an illustration of a portion of their Virtual Space Teleconferencing system.

The RV continuum concept is illustrated in Figure 1, where the "completely real environment" shown at the left side of the RV continuum defines any environment consisting solely of real objects, and includes whatever is observed when viewing a real-world scene either directly (i.e. in person) or by means of a video display. An illustration of this case is given in Fig. 1a. The "completely virtual environment" case at the right defines any world consisting solely of virtual objects, examples of which would include conventional computer graphic simulations, either monitor-based or immersive. An illustration of this case is given in Fig. 1d. Between the extremes of the RV continuum lies the range of Mixed Reality (MR) environments in which real and virtual worlds are combined in various proportions and presented as a unified whole.

Using the framework of the RV continuum, the definition of Augmented Reality, as indicated in Fig. 1b, pertains to any otherwise completely real environment which is somehow enhanced by means of computer graphics. (Although in this paper our treatment of Augmented Reality is limited strictly to visual displays, analogous concepts apply also to both auditory and haptic display modalities.) That is, the image in Fig. 1b comprises essentially the same objects as the real-world image of Fig. 1a, with the addition of the virtual robot shown in the foreground.

Another important class of MR displays, also shown along the RV continuum in Fig. 1, is labelled Augmented Virtuality (AV). This class is similar to AR, but comprises enhancement of otherwise completely virtual environments with real images and objects. One example of such a display is shown in Fig. 1c, which is to be compared to Fig. 1d. Another example of what might be considered Augmented Virtuality occurs in haptic display research in which subjects viewing virtual objects can touch corresponding real objects, where the real object simulates an advance haptic display of a virtual object.

In the completely virtual environment shown in Fig. 1d, we have a modelled participant in a virtual space teleconference (Takemura & Kishino, 1992), manipulating a collection of modelled (virtual) objects. We see here that, through the addition of an (unmodelled) background video scene, a significant degree of richness and realistic detail has been added, at minimal computational expense, to replace the otherwise plain backdrop of the completely modelled environment.

2. ARGOS (TM): Augmented Reality through Graphic Overlays on Stereovideo

Since 1987 the University of Toronto Ergonomics in Teleoperation and Control (ETC) Lab has been developing technology and researching issues related to one particular class of AR displays: monitor or desktop based systems, in which computer generated graphic images are superimposed onto live or pre-recorded video images of a remote scene. The dominant feature which distinguishes our work from other graphic-enhanced video display research is the fact that we use stereoscopic displays for both the video and computer graphic portions of the display. This has culminated in the development of ARGOS (TM), a system for Augmented Reality through Graphic Overlays on Stereovideo (Drascic, Grodski, Milgram, Ruffo, Wong and Zhai, 1993; Milgram, Drascic & Grodski, 1992). Although much is known about stereoscopic video displays (e.g. Diner and Fender, 1993; Drascic, 1991), and although stereoscopic computer graphics is quickly becoming a mature technology (e.g. McAllister, 1993), several new capabilities and significant performance advantages for existing tasks can be achieved by combining these two display technologies (Milgram, Drascic & Grodski, 1991; Milgram, Zhai, Drascic & Grodski, 1993). In the following sections we review some of these advantages.

A block diagram of the ARGOS (TM) system is given in Fig. 2. Separate left and right camera images are combined into one video signal using the alternating-field method (Milgram, Drascic & Grodski, 1990). The heart of the computer generated imaging system is the graphics workstation, which creates the stereographic (SG) images, which can in turn be manipulated interactively in 3D space by means of one of a variety of 6 degree-of-freedom input devices (Zhai and Milgram, 1993). Stereoscopic images can be presented either as conventional 60 Hz NTSC images (in North America), using alternating left and right fields, or as 120 Hz non-interlaced flicker-free images (Lipton and Meyer, 1984). The Stereo Format Conversion System is able to convert stereoscopic video back and forth between 60 and 120 Hz formats, allowing us to generate images for 120 Hz display on a graphics workstation, to store these on a regular 60 Hz VCR, and then view the images again during playback on a(nother) 120 Hz flicker free system.

One important related class of AR systems are those which use head-mounted displays rather than a monitor for viewing the remote scene, with the remote cameras slaved to the head motions of the user (e.g. Tachi, 1993). Such systems are typically used for manually controlling remote robotic manipulators or vehicles. Other immersive forms of AR use see-through head-mounted display systems, in which the real world surrounding the observer is viewed directly, either using half-silvered mirrors as in optical see-through displays (e.g. Caudell & Mizell, 1992; Janin, Mizell & Caudell, 1993; Feiner, MacIntyre & Seligmann, 1993), or using carefully aligned video see-through imaging (e.g. Rolland, Holloway & Fuchs, 1994; Edwards, Rolland & Keller, 1993). All of the immersive approaches are therefore based on the observer's feeling naturally part of the world being viewed, which brings with it clear advantages for applications in which presence is required.

Although immersive displays such as head-mounted video can be very effective for certain tasks in which the full attention of the operator is required, they interfere with the operator's ability to attend to more than one task at the same time. An advantage of the monitor-based approach is that operators can selectively attend to and meet the demands of a variety of tasks simultaneously, and are therefore well suited to both manual and supervisory control tasks. In addition, when computer augmentation (AR) is added to the immersive displays, new technical difficulties arise with respect to body tracking and graphic calibration and registration (Rolland et al, 1994). A detailed taxonomy of immersive and non-immersive AR and other related MR displays is given elsewhere (Milgram & Kishino, 1994; Milgram et al, 1994).

Fig. 2: Block diagram of ARGOS system. The signals from the two cameras are combined with a stereoscopic encoder, and images from the graphic workstation are combined with the video. The AR display is present on a monitor at either 60Hz or 120Hz, depending on the specific hardware being used.

3. Virtual Pointer / Tape Measure

The first application of the ARGOS technology was to enhance the user-machine interface of a telerobotic system, by providing AR tools for interactively measuring objects within the remote world and for controlling the telerobot. The need for such enhancements derives from the nature of the task itself. In our original application, the telerobot was being used for bomb disposal, and the operator needed to be able to navigate and manipulate in an unknown, unstructured remote environment in an accurate and timely manner. The use of stereoscopic video displays went a long way towards improving teleoperation performance (Drascic & Grodski, 1993; Drascic, 1991). However, although stereoscopic displays enable users to perceive relative depth and size directly using their binocular visual system, they do not solve the problem of determining absolute depth and size. Cues such as colour and brightness can be directly perceived with one eye, relative depth can be directly perceived with two eyes, but absolute depth must be inferred using cues in the visual scene and knowledge of the objects being viewed. Making such judgements is a learned skill, which must be maintained with consistent practice. So while determining if object A is farther away than object B is much easier when using a stereoscopic display rather than a monoscopic display, it remains difficult to determine accurately how large the separation between the objects might be. Such absolute depth and size judgements are necessary for tasks such as path planning, estimating clearances, and range-finding. When communicating goals and destinations to semi-autonomous vehicles, absolute position information is vital.

The problem of absolute position determination in a video image can be solved by using ARGOS to transform it from an absolute position estimation task into a relative position estimation task. We present the viewer with a Virtual Pointer, which appears to "float" in the remote scene (Milgram et al, 1990). By aligning the Virtual Pointer with objects in the remote scene, the operator is able to specify precise three dimensional coordinates. Because it is a relatively straightforward task to calibrate and align the stereoscopic graphics with the stereoscopic video, it is possible to animate the Virtual Pointer in such a way as to appear as realistic as desired, given the limits of the graphics computer. While our current implementations use a variety of Silicon Graphics workstations and IBM PC-compatibles, our early work used Commodore Amigas to animate the Virtual Pointer. Our studies with the earlier systems showed that operators can use the Virtual Pointer with essentially the same accuracy as a real pointer, with standard errors in depth corresponding to less than one pixel on the display (Drascic & Milgram, 1991).

A simple extension of the Virtual Pointer is the Virtual Tape-measure, both of which are illustrated in Fig. 3. The Virtual Tape-measure can be used to measure sizes and distances in the remote world. Operators use it by first specifying the starting point of the Virtual Tape-measure with the Virtual Pointer, and then dragging it out toward the end point. The ARGOS system can then report the depth information in a variety of ways, in addition to transmitting it to the telerobot. For example, it can display text floating in space at the depth of the end point, comprising the absolute {x,y,z} locations of the start and end points and of the total distance between them. Another option is to use speech synthesis to report the distance audibly.

Because the user is free to position the Virtual Pointer at any location in space, it can also be used to specify a target or mark a path for an intelligent mobile telerobot to follow. (See Section 5.)

4. Virtual Tether

An interesting extension of the Virtual Tape-measure concept is illustrated in Fig. 4, which shows a small table-top manipulator and a superimposed line joining the end effector to a target object. Assuming that the prescribed task is to move the manipulator to the end of the highlighted pathway, the line shown in the figure can be considered a "Virtual Tether", if it appears to remain permanently attached to the manipulator and the target as the manipulator moves about. According to this concept, the virtual tether is designed to provide an enhanced display of information about the position and orientation of the real manipulator relative to the real target in the video scene. We have implemented and investigated the effectiveness of the virtual tether concept within the context of a laboratory peg-in-hole task, as shown in Fig. 4. In those experiments the ability of subjects to integrate this particular calibrated stereographic augmentation with the surrounding stereovideo display was investigated. One experiment compared performance with real and virtual tether enhancements relative to a baseline condition with no tether. This experiment found that the mean number of errors per trial decreased dramatically when using the virtual tether compared to the no tether condition. (Ruffo & Milgram, 1992)

Figure 4. The Virtual Tether concept, for a peg-in-hole experiment. The Virtual Tether is shown joining the gripper to the cyclindrical target tube.

5. Virtual Control

One of the most important application areas of Augmented Reality technology is the field of teleoperation. In such situations the human operator must control a robot from a remote location, primarily using visual feedback from the robot work environment. Telerobots are used when it is too dangerous, impractical, or uneconomical for physical human presence. In typical teleoperation systems, the operator controls the robot using one or more monoscopic video displays. Stereoscopic systems are becoming more common however. In terms of the RV continuum (see Fig. 1), this case can be considered a "completely real environment". Some of the problems associated with operating the telerobot in such a way are: (1) even stereoscopic visual displays may not provide adequate depth perception for fine manipulation of the robot; (2) there generally is no protection system to prevent the robot from colliding with other objects in its vicinity; (3) any time delay between the operator's commands and the responses of the robot can degrade operator performance considerably; and (4) the operator must remain in the manual control loop continuously, which can be fatiguing.

Some of these limitations can be overcome by providing the operator with a graphical simulation of the robot operating in a modelled workspace. (In terms of the RV continuum this would be considered a "completely virtual environment".) With such a tool the operator can plan the robot's tasks by issuing high level "put-that-there" types of commands (Cannon & Leifer, 1994), while lower level task execution can be governed by local sensing and automatic control. For improved scene interpretation the operator could be provided with a variety of artificially generated cues about the relationships between objects in the workspace and the robot. Such a control scheme requires complete and up to date knowledge about the robot and its workspace, which can be expensive to acquire and maintain. Such models are typically created only for repeatedly-used sites and are unfeasible for one-time sites and one-time operations.

To overcome some of the difficulties of completely real and completely virtual environments, we have been developing a new application of Augmented Reality for telerobotics, which we call "Virtual Control". Using a stereoscopic video display, the operator views the robot workspace and receives continuously updated task information from the remote site. Because the robot itself remains invariant, it can be modelled beforehand. No model of the remote world is necessary. Initially, a 3D wireframe image of the robot is rendered stereoscopically and superimposed on the stereovideo display, conforming exactly in size and location to the video image of the real robot, thereby creating a highlighted outline around the real robot. An example of such a virtual robot (Rastogi, Milgram, Grodski & Drascic, 1993), as shown (monoscopically) in Fig. 5.


Figure 5: Operator's view of the remote workspace with a wireframe overlay of robot.


Figure 6: Operator's view of the remote workspace with a wireframe overlay of robot.

The virtual robot can also be rendered fully, as shown in Fig. 6. The operator controls the virtual robot, as it appears to interact with objects in the real environment. The operator can define a task off-line by directing the virtual robot to different locations in the workspace. If desired, paths and tragectories can also be displayed. When desired, the planned task can be executed by the real robot.

Another option of our system is for the operator to use the virtual pointer to define virtual planes, which can be used as a means of specifying constraints in the workspace to the robot control system. These boundaries can be displayed as opaque or transparent planes, if desired. For example, in Fig. 6, the four corners of the table on which the robot is operating can be selected interactively, creating a computer model of the surface of the table. This constraining boundary can be used to prevent collisions of the robot with the table. A related capability is for real objects to be interactively encapsulated within graphical wireframe boxes. These virtual encapsulators serve to prevent potential collisions of the real robot with the encapsulated real objects. With these Augmented Reality tools, the operator has the capability to create a partial world model of the robot workspace. With this approach we obtain many of the benefits of full graphical simulation, without the difficulties of acquiring a complete virtual world database.

6. Current Projects / Further Research

Several new applications of ARGOS(TM) have been envisaged, for a variety of applications. A number of these are based on extensions of the basic principles of the virtual pointer and tape-measure. In particular, some of our preliminary investigations include interactively measuring and modelling images ranging from outdoor views for (landscape) architecture, to microscopic images of the inside of cells as taken with an electron micrograph. Another class of applications exploits some of the potential advantages of overlaying wireframe outlines onto real object images in a video scene (Milgram et al, 1991). This will allow us to integrate task-related information into the stereoscopic video view of the remote site. For example, object images can be enhanced under conditions of poor visibility, provided that adequate information about the object and its location and orientation is available. One particularly promising application domain for this technology is space telerobotics (e.g. Maclean et al, 1990).

A final class of applications involves using overlaid stereographic objects for the purpose of visualising how these modelled objects might appear were they to be really added to the scene. This is anticipated to be useful in architecture or interior design, for example, for visualising how changes or additions to existing rooms, buildings, neighbourhoods, or landscapes might appear. This same concept is illustrated in Fig. 7, where the potential for using Augmented Reality as a tool for choreography is presented. In this example, we assume that the choreographer does have a means of interactively controlling the positions and motions of individual dancer mannikens, but does not necessarily have a sufficiently tailed (world) model of a particular stage. The AR system can then be used to assist the choreographer in visualising how some dance combinations might appear, not within a crudely modelled virtual computer environment, but superimposed on a high quality stereoscopic video image of the actual stage.

Figure 7 Illustration of ARGOS as a tool for on-line virtual choreography

7. References

Note: Many of the papers published by the authors are available via from the ftp site vered.rose.utoronto.ca, or via the World Wide Web (Mosaic) at http://vered.rose.utoronto.ca.

[1] DJ Cannon and LJ Leifer. "Point -and-direct robotics". Proc. International Conference on Intelligent Teleoperation, Greensboro, NC, 95-106, 1991.

[ 2] TP Caudell and DW Mizell. "Augmented reality: An application of heads-up display technology to manual manufacturing processes". Proc. IEEE Hawaii International Conf. on Systems Sciences, 1992.

[3] DB Diner and DH Fender. Human Engineering in Stereoscopic Viewing Devices. Plenum Publishing, 1993.

[4] D Drascic. "Skill acquisition and task performance in teleoperation using monoscopic and stereoscopic video Remote Viewing", Proc. Human Factors Society 35th Annual Mtg, San Francisco, 1367-71, 1991.

[5] D Drascic and P Milgram. "Positioning accuracy of a virtual stereographic pointer in a real stereoscopic video world", SPIE Vol 1457 - Stereoscopic Displays and Applications II, San Jose, Calif., Feb. 1991.

[6] D Drascic and JJ Grodski. "Defence teleoperation and stereoscopic video", SPIE Volume 1915 - Stereoscopic Displays and Applications IV, San Jose California, 58-69, Feb. 1993.

[7] D Drascic, JJ Grodski, P Milgram, K Ruffo, P Wong and S Zhai. "ARGOS: A display system for augmenting reality", ACM SIGGRAPH Tech Video Review, Vol 88: InterCHI `93 Conf on Human Factors in Computing Systems, (Abstract in Proceedings of InterCHI'93, p 521), Amsterdam, April 1993.

[8] EK Edwards, JP Rolland and KP Keller. "Video see-through design for merging of real and virtual environments". Proc. IEEE Virtual Reality International Symp. (VRAIS'93), Seattle, WA, 223-233, 1993.

[9] S Feiner, B MacIntyre and D Seligmann. "Knowledge-based augmented reality". Communications of the ACM, 36(7), 52-62, 1993.

[10] Imagina 95, Programme notes, Monte Carlo, Feb. 1-3, 1995.

[11] AL Janin, DW Mizell and TP Caudell. "Calibration of head-mounted displays for augmented reality". Proc. IEEE Virtual Reality International Symposium (VRAIS'93), Seattle, WA, 246-255, 1993.

[12] L Lipton and L Meyer. "A flicker-free field-sequential stereoscopic video system". J. Society of Motion Picture & TV Engineers (SMPTE),1047-1051, Nov. 1984.

[13] SG Maclean, M Rioux, F Blais, JJ Grodski, P Milgram, HFL Pinkney and BA Aikenhead. "Vision system deelopment in a space simulation laboratory. Proc. ISPRS: Close Range Photogrammetry & Machine Vision. 1990.

[14] DF McAllister (ed). Stereo Computer Graphics and Other True 3D Technologies. Princeton University Press, Princeton, NJ, 1993.

[15] P Milgram, D Drascic and JJ Grodski: "A virtual stereographic pointer for a real three dimensional video world", in Human-Computer Interaction -- INTERACT'90, D Diaper, D Gilmore, G Cockton & B Shackel (ed's), Elsevier , 695-700, 1990.

[16] P Milgram, D Drascic & JJ Grodski. "Enhancement of 3-D video displays by means of superimposed stereographics", Proc. Human Factors Soc. 35th Annual Meeting, San Francisco, 1457-1461, 1991.

[17] P Milgram, D Drascic and JJ Grodski. "Stereoscopic video-graphic coordinate specification system". US Patent No. 5,175,616; Dec. 29, 1992.

[18] P Milgram and F Kishino. "A taxonomy of mixed reality visual displays", IEICE (Institute of Electronics, Information and Communication Engineers) Transactions on Information and Systems, Special issue on Networked Reality, Dec. 1994.

[19] P Milgram, H Takemura, A Utsumi and F Kishino. "Augmented Reality: A class of displays on the reality-virtuality continuum". SPIE Vol. 2351-34, Telemanipulator and Telepresence Technologies, 1994.

[20] P Milgram, S Zhai, D Drascic & JJ Grodski. "Applications of augmented reality for human-robot com-munication", Proc. IROS'93: Int'l Conf. on Intelligent Robots and Systems, Yokohama, 1467-72, 1993.

[21] A Rastogi, P Milgram, JJ Grodski and D Drascic. "Virtual telerobotic control". Proc. DND Knowledge-based Systems and Robotics Workshop, Ottawa, Ontario, Canada, 1993.

[22] JP Rolland, RL Holloway and H Fuchs. "Comparison of optical and video see-through head-mounted displays". Proc. SPIE Vol. 2351-35, Telemanipulator and Telepresence Technologies, 1994.

[23] K Ruffo and P Milgram. "Effects of stereographic + stereovideo "tether" enhancement for a peg in hole task", Proc. 1992 IEEE Int'l Conf. on Systems Man and Cybernetics, 1992.

[24] S Tachi. "Virtual reality and tele-existence - Harmonious integration of synthesized worlds and the real world", Proc. Industrial Virtual Reality Conf. (IVR'93), Makuhari Messe, Japan, June 23-25, 1993.

[25] H Takemura and F Kishino. "Cooperative work environment using virtual workspace". Proc. Computer Supported Cooperative Work (CSCW'92), 226-232, 1992.

[26] S Zhai and P Milgram. "Human performance evaluation of manipulation schemes in virtual environments", VRAIS'93: 1st IEEE Virtual Reality Annual International Symposium, Seattle, Sept 1993.

8. Acknowledgements

This work has been supported by the Defence and Civil Institute of Environmental Medicine (DCIEM), Downsview, Ontario, Canada (contract W7711-7-7009/01-SE), the Manufacturing Research Corporation of Ontario (MRCO), and the Natural Sciences and Engineering Council of Canada. We also gratefully acknowledge Mr. Fumio Kishino of ATR Communication Systems Research Laboratories, Kyoto, Japan, for permission to use Fig.'s 1c and 1d. Special thanks are extended as well to Mr. Peter Lind, for his tireless assistance in preparing the images presented here.