/people/spike/html> Drascic: Stereoscopic Video and Augmented Reality

Stereoscopic Video and Augmented Reality

David Drascic

Human Engineering Research and Consulting
Toronto, Ontario, Canada

spike@vered.rose.utoronto.ca


This article appeared in a slightly different form as:

"Stereoscopic Vision and Augmented Reality", Scientific Computing and Automation, 9(7), 31-34, June 1993.

I have repaired some of the (minor) editing, and updated some terminology, but it is essentially identical to the published version.


Introduction

Future military operations will take place in increasingly hostile environments. As technology advances, many non-military operations are extending into hazardous environments as well, such as the ocean bed, the interior of volcanoes, and outer space. Efficient deployment of teleoperated and telemanaged robots will be essential for successful interaction with these environments. Autonomous robotics, where the robot is capable of acting without human intervention, is far from being achievable in unstructured environments such as battlefields, bomb disposal scenarios, weapons handling, and hazardous materials management. For the foreseeable future, remotely controlled systems will depend on human intelligence and perception.

The effectiveness of human-machine systems is often determined by the quality of the human-machine interface. Unfortunately, most existing telerobots are equipped with standard monoscopic video (MV) displays as the main source of information to the operator. MV displays eliminate all binocular depth cues (i.e. eye convergence and disparity), as well as several monocular depth cues (i.e. texture gradient). The loss of these important depth cues results in situations where the location of objects in the remote scene is ambiguous. While motion parallax or multiple views can sometimes resolve these ambiguities, operating conditions may render these options unfeasible.

A related problem is the difficulty in estimating absolute sizes with a MV system. It is difficult to determine whether an obstacle is too steep to climb, or if a depression is deep enough to present a hazard. One British study reported that using standard MV systems made bomb squad personnel reluctant to use their remote manipulator. (Robinson, M. "Remote control vehicle guidance using stereoscopic displays", Proc. Human Factors Society Meeting, 1984)

Human Engineering Research and Consulting (HERC) recently investigated the benefits of using 3-D, or stereoscopic video (SV) for teleoperation applications in the Canadian Armed Forces. SV provides an immediate and compelling sense of depth, which can greatly simplify teleoperation tasks requiring delicate manipulation.


Stereoscopic Video Application Research

Stereoscopic video systems use two cameras to pick up images from two slightly different perspectives, one for each eye of the operator. The display system must channel these two different images to the appropriate eyes. The most practical system, employing standard television equipment, uses an alternating field approach. The images from the left and right cameras are displayed alternately on the monitor. Special glasses are equipped with liquid crystal shutters that switch from opaque or clear. These shutters are electronically synchronised with the monitor, so that the left eye only sees the image from the left camera, and the right eye only sees the image from the right camera.


Figure 1: David Drascic conducting research for the Canadian Forces wearing StereoGraphics' eyeware

Since 1987, Prof. Paul Milgram of the Department of Industrial Engineering at the University of Toronto and David Drascic, under contract for the Defence and Civil Institute of Environmental Medicine (DCIEM), have conducted a number of experiments at the University of Toronto to investigate the benefits of SV for novice operators attempting typical defence-oriented telerobotic tasks. In one experiment, subjects performed a positioning task related to bomb disposal teleoperation that required careful alignment of the telerobot in depth. The difficulty of the task was varied by changing the precision requirements. The results indicate that operators need considerably less training to become proficient at this type of telerobotic task, and can perform faster and with fewer errors when using an SV display.

At the lowest level of difficulty, it was found that the benefit of SV faded as subjects repeated a single task again and again. However, whenever the task changed, the advantages of SV were once again immediately apparent. At the highest levels of difficulty, the performance advantages of SV were found even after subjects had performed the same task many times. Since defence-related teleoperation tasks, such as bomb disposal and hazardous materials management, are all characterised by an unpredictable and changing environment, operators will not have the luxury of repeating a task several times. Thus even for very simple tasks, it is reasonable to expect the benefits of SV to be significant and important. For difficult tasks, it can mean the difference between success and failure.

More recently, Human Engineering Research and Consulting (HERC), in conjunction with DCIEM, conducted an investigation into the benefits of using SV for teleoperation applications in the Canadian Armed Forces for experienced telerobot operators. Using several tasks related to bomb-disposal teleoperation, these experiments showed that even expert operators perform better when using SV. More importantly, the operators strongly preferred SV to MV, judging it highly desirable for a variety of tasks, and rating it more usable and more comfortable to use than a comparable MV display.


Stereoscopic Video Systems

All the research described above was performed using an NTSC-based SV system, originally developed by Milgram and Drascic, and later updated at DCIEM. This system uses standard cameras, monitors and video equipment. The SV signal is a standard video signal that can be recorded with any VCR. This system can be implemented for under US$4,000 without cameras. NTSC monitors have an image refresh rate of 60 Hz. Using the alternating field SV technique, each eye sees only half of these images, and thus has a 30 Hz image update rate. As a result there is a perceptible flicker in the image that many operators find distracting at first. Nonetheless, operators of all skill levels adapted very quickly to this SV system, most strongly preferring it to the MV system. No eye-strain attributable to the SV system was reported even after several hours use; in fact, most operators rated the SV display more comfortable and more usable than the original MV display.

Until recently, the high cost and technical complexity of flickerless SV systems has limited their use, but the recent introduction of 120 Hz SV systems has made it possible to consider these systems for a wide range of new applications. Several different systems are available, ranging in price up to US$15,000. DCIEM has obtained one of these systems and is considering it as an alternative to the low-end NTSC SV system. It is expected that the flicker-free display will be more easily accepted by the operators and should result in greater user satisfaction with the display. Initial results are encouraging, but cross-talk (seeing the right image with the left eye, and vice versa) due to phosphor persistence in the 120 Hz monitor is distracting. It remains to be seen whether the lack of flicker will outweigh the greater cross-talk and considerably greater expense.


Augmenting Reality with ARGOS

Improving the display of a telerobot is only one aspect of the human-machine interface. Another very important aspect is the method used to communicate human goals and instructions to the telerobot. Most telerobots in use today are almost entirely manual, requiring the constant attention of the operator. Great strides have been made in giving telerobots a certain degree of intelligence at executing low-level tasks. Robots have been created that are capable of driving from one location to another while avoiding obstacles, or reconfiguring a multi-joint manipulator to move the end-effector to a new location. In order to use one of these systems in an interactive telerobotic situation, the operator needs to be able to communicate precise 3 dimensional co-ordinates to the telerobot. Such co-ordinates may be known or defined in well-specified environments, such as a laboratory, but until recently there was no practical technique available for specifying such co-ordinates in the field.

Since 1989, Drascic and Milgram have been breaking new ground by combining computer generated stereoscopic graphics with live stereoscopic video (SV), a technology they dub ARGOS, which means "Augmented Reality through Graphic Overlays on Stereo-video". Using ARGOS it is possible to create virtual objects that appear to exist in the video image. By generating a carefully calibrated virtual pointer of some sort, and allowing the operator to adjust the position of this pointer in the three dimensional video space, it is possible for the operator to indicate a precise destination for the telerobot, or to indicate a path for it to follow. Positioning a virtual pointer is a much simpler task than driving a telerobot. Using such an interface would reduce operator workload considerably.


Figure 2: Creating a graphic point which can be calibrated to determine the location of an object in three dimensions. An example of Augmented Reality. (The image shows both the left and right eye views super-imposed. It didn't scan well, but there is a small white line connecting the tip of the arrow with a joint of the mechanism on the left. This is the virtual tape-measure.)

An experiment was conducted to determine how accurately subjects could align a virtual pointer with real world targets. This experiment showed that the calibration of the graphics with the video was successful, and that subjects could align the virtual pointer essentially as well as they could a real pointer in the video space, at the limits of their depth perception as determined by the display system, i.e. one pixel.

ARGOS is the foundation of the University of Toronto's Augmented Reality system. Much media attention has been devoted to the phenomenon of Virtual Reality, which generally entails immersing people in completely artificial computer-generated worlds, using as many different senses as possible to complete the illusion. By contrast, Augmented Reality does not attempt to create a virtual world; instead, its goal is to allow the user to perceive the real world more clearly and with greater understanding than is possible using ordinary vision.

Several different kinds of Augmented Reality systems exist. ARGOS is one of the simplest and most robust, because it uses a standard monitor as the stereoscopic display device. Other augmented reality systems use immersive head-mounted displays, but there are many perceptual and calibration issues that remain to be resolved before these systems can be used by industry.

Since the virtual pointer can be used to specify single points in the remote space, it is a simple extension to create a virtual tape-measure, so that the operator can make measurements of the locations and sizes of remote objects.

As a further example of Augmented Reality, consider a space-going telerobot. All video images in space suffer the same problem with shadows: because there is no air in space to scatter light, shadows are completely black, and anything in shadow is completely invisible. However, since the dimensions of everything sent into space are very well known, it is possible to use ARGOS to generate the missing images, carefully drawn to appear at the correct location in the video image.

In other situations, objects that may be invisible to normal vision may be detectable with other sensors. In many underwater situations, normal vision is good only for a very limited distance. While it is easier to see through murky depths with SV than with MV, operators are still very limited. However, using radar and sonar and infra-red cameras, it is possible to sense objects that would otherwise be invisible. If the information from these sensors is sent to the ARGOS computer, appropriately shaped graphic objects can be drawn at the correct position in space, in effect making visible what is normally invisible.

Similarly, information from various medical imaging sensors, such as CAT, PET, and MNR scanners can be used to generate graphic images of the interior of the human body. These images can be super-imposed onto a live video image of the body using ARGOS, and seen in three dimensions, providing a clear advantage of systems that use flat two-dimensional displays.

Improving the human-machine interface of telerobots will enable them to fulfil the myriad tasks they will be facing in the future. Stereoscopic Video and Augmented Reality can greatly improve the feedback of information from the remote machine to the human operator, and tools such as the Virtual Pointer can greatly facilitate the communication of human instructions to the machine.


The Author

David Drascic has been working the field of telerobotics and stereoscopic displays since 1987. He received his MASc in Industrial Engineering from the University of Toronto in 1991. He founded Human Engineering Research and Consulting (HERC) in 1990. Further information on the reserach described herein can be obtained by contacting Prof. P. Milgram, Industrial Engineering, University of Toronto, 4 Taddle Creek Road, Toronto, Ontario, Canada, M5S 1A4.