Positioning Accuracy of a Virtual Stereographic Pointer in a Real Stereoscopic Video World



David Drascic & Paul Milgram

Department of Industrial Engineering
University of Toronto
4 Taddle Creek Road
Toronto, Ontario
Canada M5S 1A4

drascic@ie.utoronto.ca
milgram@ie.utoronto.ca



SPIE Volume 1457: Stereoscopic Displays and Applications II,
58-69, San Jose, California, September 1991.


(c) Copyright 1991.



ABSTRACT

Typical teleoperator display systems to date have relied upon standard monoscopic video as the primary feedback link. Research clearly shows that using stereoscopic video (SV) is generally a significant improvement over monoscopic video (MV) for most teleoperation tasks. Abundant as the benefits of SV are, they can be dramatically increased with the super-positioning of stereoscopic graphics (SG) on the SV image. The resulting SV+SG system can greatly enhance the benefits of SV alone in the understanding of a remote world.

This combination of media results in an exploration tool uniquely suited for examining different environments, whether remote, microscopic, or artificial. The SV+SG technology can also be used as a command tool that can be used to control manipulators (remote, microscopic, or artificial) in the different environment.

A SV+SG system was implemented, and an experiment was performed to determine whether or not SG images were actually perceived as existing at the desired location in the SV view. The experiment compared the performance of a virtual SV+SG Pointer with that of a real world pointer, and showed that the virtual pointer could be positioned as accurately as the real pointer, with only a slightly higher variance.



1. INTRODUCTION


1.1 Stereoscopic Video and Stereoscopic Graphics (SV+SG)


Comprehending the Remote World

There are many instances in which a person might want to effectively "be" in an environment where circumstances render this impossible, difficult, expensive, or hazardous. The circumstances that come to mind first are the obviously hazardous ones, such as working inside a nuclear reactor, disarming bombs, or working in hostile environments, such as the ocean floor, or in space. Although there is a great deal of interest in the question of "telepresence" with respect to these situations, i.e. how important it is to give operators a feeling of being remotely present versus simply giving them sufficient detail to comprehend the remote world, there is also the question of dealing with alternate worlds, such as microscopic areas, macroscopic environments, and completely artificial worlds. In any situation where there is a separation between the observer and the environment in question, whether physical, temporal, dimensional, or functional, there is the question of how to make the observer sufficiently understand the alternate environment. So although the "remote environments" implies a teleoperation-like task, we use this phrase to refer to any alternate environment.

Two of the main issues associated with acting in a remote environment are comprehending the environment (knowing what is present, and where everything is located), and accomplishing the desired task (using real or virtual manipulators with sufficient accuracy).


Available Techniques

The most common technique used to enable an observer to visualise a remote environment is the standard video display. In a great many situations this is more than adequate; television can provide a sense of reality and immediacy unequalled by any other medium. It has proven highly suitable for situations in which it is important only to observe the remote environment (recorded or live viewing, security monitoring, etc). When it is necessary not only to observe the remote world, but to actually influence it, to move in it, to understand it, and perhaps to change it, the limitations of standard video become quickly apparent.

Video cameras are transducers and therefore necessarily filter out information. Typical single camera monoscopic video (MV) systems not only restrict image resolution, but completely eliminate all stereoscopic depth cues, cues which are highly salient and which can be vitally important for understanding the remote environment. The lack of these cues can render comprehension of the remote world difficult, and can make interacting with it challenging, if not impossible.

Some authors have suggested adding monoscopic graphics (MG) to MV systems, to help replace some of these missing depth cues. Kim et al [8] found that in a computer graphic simulation, the addition of the MG cues to MV can improve performance with a simulated robot manipulator to levels approaching stereoscopic video (SV), under certain limited conditions. Kim et al [9] subsequently implemented a real system, and found that adding a vertical reference line, a wire-frame outline of the manipulator gripper and its projection, and a reference grid combined to greatly improve a teleoperation task. Others [7][16] have used MG to enhance low bandwidth or degraded MV signals.

Overlayed graphic depth cues are limited, however, by the fact that the graphics computer must have exact information about the location and geometry of objects in the remote environment. For purely artificial worlds, or highly structured ones, such as space operations [12], this is not a great problem, but for most real-world teleoperation tasks, graphic cues can be used only with some difficulty.

Using stereoscopic video (SV) preserves the stereoscopic depth cues. For this reason, it has proven to be very effective in improving teleoperation tasks in a wide variety of circumstances, because these cues provide a highly salient and immediate sense of the relative location in space of the objects in the remote environment. Many forms of SV displays have been developed, including those that use a video monitor or other fixed viewing device, and those mounted on the head of the observer, typically coupled with head-tracking devices. The latter are used to reinforce the sense of virtual presence and provide additional motion-related depth cues.

A SV system, whether fixed or head mounted, is still a transducer, and will change and distort the stereoscopic depth cues to some extent. Because of this, it is very difficult to use these cues to make reliable estimates of the absolute size and location of objects in the remote environment. Skilled operators working in well known environments can learn to make these kinds of judgements with fairly good accuracy, but in unknown or unfamiliar environments, and for SV systems with dynamically adjustable cameras or viewpoints, training alone is not sufficient.

In the same way that adding MG cues can improve teleoperation tasks using MV, adding stereoscopic graphics (SG) to SV systems can revolutionise the capabilities of those systems. SV+SG technology combines the benefits of SV with benefits of overlaid graphics, but with a far greater potential than a comparable MV+MG system.

The aim of this paper is to describe briefly some of the potential benefits of SV+SG, and to present the results of an initial experiment using a prototype SV+SG system developed in our laboratory.


1.2 The benefits of SV+SG


1.2.1 SV+SG as a display aid


Absolute Distance and Size Measurement

While well-designed stereoscopic displays convey a very compelling sense of depth, it is virtually impossible for the display to generate an image exactly like the real scene; there are almost always some differences in magnification and differences in binocular disparity. While it is comparatively easy to perceive relative differences in depth with SV, these differences of scale make absolute judgements of distance and size difficult.

By using a SV+SG system to generate a virtual three dimensional pointer, however, operators can use this SG Pointer to make accurate measurements of absolute distances and sizes in the remote video world by using their sensitive relative depth perception abilities. If the graphics computer has sufficiently detailed information about the configuration of the SV system (i.e. camera separation, convergence angle, focal length, etc), it can generate SG images of arbitrary objects in arbitrary locations in space as they would appear if they were physically present in the remote world. The computer need have no information about the remote environment at all; the alignment of the SG images with the SV objects is accomplished by the operator, using her own perception of relative depth. She need merely indicate to the graphics computer where she wants the SG Pointer to be located. The graphics computer can express the location of the Pointer in remote world coordinates, and so identify the location of the remote object. By indicating two or more points in the remote three-dimensional world, operators can also measure (or indicate) absolute lengths, areas, and volumes of real or virtual items.

Situations in which this facility is expected to be particularly useful are those where current measurement techniques are inadequate, such as in complex telemanipulation tasks. Furthermore, on a microscopic level, for example, it should be possible to measure the various features of a photomicrograph using the SV+SG system as a photogrammetric device.


Other Benefits as a Display Aid

Using SV+SG technology should also prove advantageous in many other ways and for many purposes, including integrating information about the remote environment from other sensors and devices, serving as a medium for predictive displays, providing feedback and warning signals, improving degraded or low bandwidth images, as a simulation tool, and for modelling hypothetical constructs in real-world environments. See Grodski et al [6] and Milgram et al [14] for further discussions of these possibilities.


1.2.2 SV+SG as an input device

Most telerobots in operation today are manually operated, requiring continuous attention and control. As technology advances, it becomes more and more possible to assign parts of the robotic control task to computer intelligence, freeing the human operator for more important tasks. Hirzinger describes a space telerobotic system where "the human operator (via stereovision and 3D graphics) is enclosed in the feedback loop on a very high level by low bandwidth, while the low level sensor loops are closed on-board at the robot directly with high bandwidth. Thus we try to prepare a supervisory control technique that will shift more and more autonomy to the robot while always offering real-time human interference." [7]

As an extension of that supervisory control concept, the operator should be able to control the remote telerobots by using SG predictive displays and SG telerobot simulators. Another concept, discussed by Chavand et al [4], is Computer Aided Teleoperation, or CAT, in which the telerobot would automatically follow instructions given it by the operator. Without SV+SG technology, however, such a system would be handicapped by the lack of a suitable three dimensional indicator. Using SV+SG, the operator can indicate arbitrary points, and the telerobot can follow the prescribed path or manoeuvre its own path to a particular target.

For difficult telerobotic tasks, such as those with slow responses, as in underwater telerobotics, or long delays, as with manipulators in space operated from the ground, using SV+SG as an input device should ease control greatly, since the human operator is elevated from the inner control loop. The operator can simply move a pointer to a particular location with ease, and need not consider the control dynamics of the telerobot.



2. SV+SG IMPLEMENTATION

Two variations of the SV+SG system have been implemented. The first is designed to be a low cost system, using standard NTSC CCD cameras and an inexpensive micro-computer, the Amiga 2500. This was the system used for the experiment described below. The SV system is described in Drascic et al [5], while the SV+SG system is described in Milgram et al [14]. The second system also uses standard NTSC cameras, but is based on a far more powerful graphics computer, an IRIS 4D/310 GTX. This system is very much more expensive than the Amiga based system, but is capable of much more sophisticated real-time graphics.

Both systems use the following approach: the NTSC video signals from the two genlocked cameras are combined using the alternating field technique [15] into a single standard NTSC video signal. This signal is used to genlock the graphics computer, which operates in an NTSC display mode, with the cameras. A linear keying device is used to combine the computer graphics with the video image. By using appropriate software, the computer is able to generate alternating field stereoscopic graphics.

Both of these systems have a 30 Hz image update rate, with a 30 Hz flicker rate. This flicker is usually perceptible under normal lighting conditions, but the use of neutral density filters and lower ambient light levels can reduce this perception to acceptable limits. Considerable lab experience has shown that the 30 Hz system can be used for several hours at a time without undue eye-strain.



3. VALIDATION OF SV+SG ALIGNMENT CAPABILITIES


3.1 Purpose of the experiment

As discussed previously, the technology of superimposing Stereoscopic Graphics (SG) onto a Stereoscopic Video (SV) image produces a tool with great potential power. Before this power can be exploited, it is necessary to first establish whether or not such a tool can be used accurately and reliably by human operators.

Because use of the SV+SG technology as a tool for measuring distances and sizes in the remote SV world was the first application of the system implemented, an experiment was designed that would examine the ability of operators to successfully place a SG Pointer at specific points in the three dimensional SV world. The aim of the experiment, in other words, was to investigate whether or not the SG Pointer can be used as an input and/or measuring device for SV images. This entails establishing whether or not the Pointer can be consistently and accurately aligned with real objects by human operators. The key objective is to discover if the means of a set of positioning trials correspond to the positions of a set of targets, and if the variances of these trials are sufficiently small.


3.2 Background of the experiment

There has been little work reported in the literature which is directly relevant to this particular question. Yeh & Silverstein [19,20] examined the resolution of depth perception of SG images, and reported that judgement errors tended to be one graphic pixel in magnitude.

Butts & McAllister [3] looked at the ability of people to align a SG pointer with some SG images, and found that very good alignment capabilities were possible, also within one pixel.

Spain [18] looked at the question of aligning a SV rod with a SV target, using a variety of stereoscopic camera configurations. He examined the relative effects of different camera configurations, and found that hyperstereoscopic camera configurations tended to give better performance, but not in a way consistent with simple geometric models of stereoscopic perception.

Beaton et al [2] studied the effects of using different input devices on a computer graphic cursor alignment task, and found that using a SG display resulted in a positioning error approximately 60% less than that of a similar task using a monoscopic perspective display.

Reinhardt [17] examined the effects of different depth cues on a variety of performance measures regarding depth judgements, using computer graphics. His third experiment looked at, among other things, the mean absolute depth error in aligning a SG cursor with a SG target. He found that the benefits of the depths cues of luminance, size, and stereoscopic disparity were additive in their effects on positioning accuracy. In particular, for monoscopic graphics, using luminance and size cues enabled mean absolute depth positioning errors (i.e. the first absolute moment of the distribution of positioning errors) to be as small as a .12 pixel separation between the left and right eye images, where the pixel size was approximately 2 arc mins. Using stereoscopic graphics in addition to luminance and size cues enabled mean absolute depth positioning errors to be as small as .03 pixels separation.

It is important to point out that there is a considerable difference between images generated by a computer and those transduced by a video camera, despite the fact that both are presented on the same display device. Sophisticated computer graphics techniques and hardware are steadily closing the gap at the high-end (expensive) level for the production of still images (and non-real time recorded animations), but real time computer graphics still lags significantly behind real time video.

Video and graphic images differ in a variety of ways, particularly in aliasing, lack of surface detail (texture), and lack of reflections and shadows. The latter two shortcomings of computer graphics can be dealt with by using ray-tracing and texture-mapping techniques. A few high-end systems are now available that can perform these operations on near real-time graphic animations.

Aliasing is the result of being constrained to pixel boundaries. For example, in a graphic image consisting of 640 horizontal by 400 vertical pixels, such as the one used in our original implementation of the SV+SG Pointer, a vertical line one pixel wide can appear in exactly 640 different positions. When moving slowly across the screen horizontally, this line would be seen to jump from position to position. On the other hand, the image from a video camera of a correspondingly thin vertical rod, also one pixel wide, would be seen to move continuously across the screen. This is because the optical image cast on the camera sensor by the lens of the camera can activate more than one sensor pixel; as the image moves from one sensor pixel to the next, the first pixel grows dim as the next grows brighter, keeping the total luminance approximately constant. This creates the impression of continuous motion.

This process can be simulated in computer graphics by using a technique known as anti-aliasing. However, because a compatible real-time implementation of this technique was not available at the time of this research, the SG Pointer was unfortunately limited by aliasing. In practical terms, this means that, in its first implementation, the SG Pointer could not appear at any location within the SV world, but only in certain discrete locations. For example, since the apparent position in depth of the SG Pointer is a function of the disparity between the left and right images on the monitor, and since this disparity is restricted to integral multiples of one pixel, it follows that as the Pointer is moved in depth, it will appear to jump from one position to the next, rather than move continuously between them. Some authors [2][17] have referred to these different depths as depth planes. However, since a simple geometric analysis shows that the surfaces described by objects with uniform pixel separations are not planar at all, we avoid the use of this terminology.

Given the particular camera configuration used in this experiment, the Pointer may appear along the optical axis of the camera system at the distances shown in Table 1. Note how the change in apparent position due to a one pixel step increases with the pixel separation. This indicates a decreasing depth resolution with distance from the cameras. To a first approximation, geometric analysis shows that the step size in distance varies with the square of the distance.


Table 1: SG Pointer Resolution

These are the apparent SG Pointer distances from the cameras corresponding to the separation of the left and right images, measured in pixels. Negative separations imply crossed disparity, where images appear in front of the monitor surface. Positive separations imply uncrossed disparity, where images appear behind the monitor surface.

  Separation       Corresponding    |   Separation       Corresponding      
   (pixels)       Pointer Distance  |    (pixels)       Pointer Distance    
  ----------      ----------------  |   ----------      ----------------
     -16               .813 m       |       39              1.771 m         
     -15               .821 m       |       40              1.810 m         
     -14               .830 m       |       41              1.850 m         
      -1               .954 m       |       59              3.092 m         
      0                .965 m       |       60              3.212 m         
      1                .977 m       |       61              3.341 m         
      19              1.241 m       |       69              4.927 m         
      20              1.259 m       |       70              5.237 m         
      21              1.279 m       |       71              5.590 m         

In order to avoid this technical implementation problem and examine SV+SG Pointer capabilities closer to what they may be in future implementations, the experiment was designed so that the target positions with which the Pointer was to be aligned corresponded exactly with these natural "quantum step" positions of the Pointer. This temporarily sidesteps the complicated issue of what people will do when aliasing makes it impossible to align the Pointer with the target exactly.


3.3 Method


3.3.1 Subjects

The subjects consisted of ten university students, nine male and one female, between the ages of 22 and 32. All had had some previous experience with stereoscopic displays. The subjects were screened using the Bausch and Lombe Modified Ortho-Rater, using the criteria of Standard #2: Inspection and Close Machine Work, with the additional requirement that distance depth perception score at least a three. All subjects were paid $6 per hour for participation.


3.3.2 Camera Configuration

For the experiment, the cameras were separated 0.118 m, with a convergence point of 0.965 m. It was necessary to measure these distances to this degree of accuracy in order to be able to calibrate the SG images with SV. This particular configuration was used to balance the need for enough stereo resolution to be able to measure distances with fair accuracy, while avoiding very large convergences and divergences that might strain the subjects' eyes and render the experimental images unfusable. The cameras were Hitachi VK-C150 colour CCD cameras, equipped with 8 mm lenses.


3.3.3 Stereoscopic Resolution

The width of each pixel of the computer display was 0.39 mm. At a viewing distance of .7 m, a difference of one pixel in the horizontal separation between the left and right images of the target and the pointer results in a disparity of .03172 degress, or 1'54" of arc. Since human foveal stereoscopic resolution is approximately 10 to 30 seconds of arc [1], it is reasonable to assume that a difference in disparity of a one SG pixel step using our display should easily be visible by most people.


3.3.4 Target Distances

The four target distances selected for the experiment were chosen to give a reasonable range of convergence angles, and corresponded to left and right graphic image separations of -15 pixels (i.e. crossed disparity, where the image appeared in front of the screen), 0 pixels (i.e. no disparity, where the image appeared on the surface of the screen), 20 pixels, and 40 pixels (i.e. uncrossed disparity, where the image appeared behind the surface of the screen). Given the particular camera configuration, these corresponded to the distances given in Table 2.


Table 2: Target Distances and Disparity Resolution
  Disparity      Equivalent       Step Size    % Resolution    Convergence
  on Screen      Distance (z)     (Delta_z)      (Delta_z /       Angle       
   (pixels)        (metres)        (metres)       z * 100%)     (degrees)     
  ---------      ------------     ---------    ------------    ------------
   -15 pix          .821 m          .008 m          1.0 %       -0deg28'30"
     0 pix          .965 m          .012 m          1.2 %        0deg     
    20 pix         1.259 m          .020 m          1.6 %        0deg38'   
    40 pix         1.810 m          .040 m          2.2 %        1deg16'   


3.3.5 Experimental Task

In order to determine whether the SV+SG Pointer can be used accurately and reliably, a standard psychophysical Method of Adjustment paradigm was used [11]. The subjects would repeatedly adjust the depth of the Virtual SG Pointer until satisfied that it corresponded with that of the Real Target.

In order to determine how successfully the Virtual Pointer could be used in practice, it was necessary to compare it with a suitable metric: the aligning performance of a similar Real Pointer with a Real Target.

Given the discontinuous motion of the Virtual Pointer, and the possibility that this might interfere with the alignment task, it was decided to investigate the inverse alignment problem as well, that of using a Real Pointer with a Virtual Target. In this case the subjects still have to align a real object and a virtual one, only in this case it is the real one which is moving, rather than the virtual one.

Finally, in order to determine whether or not any biases which might occur were due to calibration problems, the performance of a Virtual Pointer with a Virtual Target was also measured. If systematic biases were found for the Virtual Pointer-Real Target and Real Pointer-Virtual Target combinations, but were not found for the Real Pointer-Real Target or Virtual Pointer-Virtual Target combinations, this would be strong evidence for errors in calibration. Conversely, the absence of such biases would imply a successful calibration.


3.3.6 Pointer Shape

Laboratory experience with the SG Pointer in a number of real SV worlds suggested that the most effective form of the Pointer was a downward pointing arrow. Of the alternatives considered, solid vertical lines were hard to see, given the limits of the hardware used to implement the SV+SG system; crosshairs proved difficult to use; and a solid triangle did not have enough vertical edges to localise it precisely in depth. Users seem to need to be able to adjust the Pointer's position freely in "open air" space, without intersecting any other body, in order to be able to align it accurately with a particular point in space. In real-world situations, it is not unusual for the space above an object to be clear.

Subjective reactions of a number of colleagues indicated that aligning the Pointer above a target was easier than aligning the Pointer to the side of an object. This may be due to nearby objects interfering with the alignment task. Further study is clearly necessary on all of these points. In order to commence this research, however, we selected the Pointer that proved most popular, the downward pointing arrow, shown in Figure 1a.


3.3.7 Target Shape

The many issues involved in determining the shape of the target led to the decision to use a simplest case target which is a mirror image of the pointer, exactly the same in size and shape, but pointing upwards rather than downwards, as shown in Figure 1a. In this way, the advantages of the pointer shape are repeated in the target. Furthermore, it was easy to implement, both in real life and with SG. Finally, any regular influences of object shape, flanking contours, and other such factors known to affect stereoacuity [1] would be controlled; that is, no particular bias in the results was expected. As a first test of the SV+SG Pointer, this was judged suitable.

It is important to point out that the identical sizes of the Pointer and the Target can provide a potentially strong monoscopic depth cue, a cue that would not be available under typical SV+SG situations, where Targets and Pointers are not identical. Thus our results may have slightly less operator error than would otherwise be expected. Given the nature of our particular task, however, where very fine adjustments are needed to align the pointer with the target, it was felt that it was unlikely that the size cue, which required significant eye movements to use, would provide significant benefit. Furthermore, the fact that the experimental task was designed so as to reduce all other monoscopic cues to a minimum counterbalances this factor. Additional research will be needed to determine the importance of such monoscopic cues for SV+SG tasks.


3.3.8 Controlling the Pointer

The real pointer was mounted on a sliding track, so that it could slide smoothly directly towards or away from the cameras. Movement of the pointer was accomplished using a closed loop of wire, illustrated in Figure 1b. By sliding the wire, the operator was able to move the pointer back and forth until it was in the desired position. There was an exact one to one correspondence between the movement of the wire and the movement of the pointer.

Figure 1: Experimental Task
The virtual pointer was controlled with a Microspeed FastTrap, a combination trackball and thumbwheel device. For the experiment, only one degree of freedom of the trackball was used to control the movement of the virtual pointer in depth. The movement of the trackball corresponded directly with the movement of the virtual pointer in real world coordinates, not in graphical screen units.

Since the experiment was designed using the Method of Adjustment, and since the subjects were under no time pressure to complete their adjustments, it was felt that the difference in control modes between the wire and the trackball would not significantly affect results.

Note that careful attention was paid to ensure that no horizontal or vertical movement of the real pointer could be seen on the monitor as it was moved in depth. These movements could provide additional depth cues for the real pointer, thus giving it an uncontrolled advantage over the virtual pointer.


3.3.9 Experimental Design

Three different factors were varied:

A full 2 * 2 * 4 factorial design was used, so that each subject completed all 16 possible combinations of the three factors. The presentation orders of the Pointer, Target, and Distance combinations was randomised by computer. Pilot studies indicated that the variation in subject response was sufficiently large that good experimental power could be obtained if 10 repetitions were performed for each combination of Pointer, Target, and Distance.


3.3.10 Experimental Procedure

At the start of the experiment, each subject was given a standard set of instructions, describing the nature of the experimental task. This was followed by the screening procedure mentioned above. All subjects passed this screening procedure.

The nature of the task made it sensitive to head position, particularly the distance to the monitor. The subjects were therefore seated in a comfortable position with their heads comfortably supported by a headrest.

Each subject was shown a two minute SV recording of a tour of the Human Factors laboratory. The wealth of detail and depth cues this demonstration provided allowed the subjects to easily adapt to viewing video stereoscopically. All subjects had prior experience with SV, so this demonstration was meant to serve only as a "warm-up". Subjects with no prior SV experience would likely need a longer familiarisation period before being able to perceive the experimental stimuli reliably.

The subjects then began the training procedure. Using the Virtual Pointer with a Virtual Target, the subjects practiced at each of the four distances in order, starting with the Virtual Target at the .821 m distance. The Virtual Pointer would appear at a random starting distance of between .1 and .25 m directly in front of or behind the Virtual Target, and the subjects would then adjust the Virtual Pointer position until the Pointer appeared immediately above the Target. The Target and the Pointer appeared as high contrast flat white arrows in an otherwise featureless black field. The tips of the Target and Pointer were separated by approximately 0.5 cm on the monitor, or approximately 24' of visual arc. Figure 1b shows the task layout.

In order to reduce the considerable effects of learning on stereoacuity performance, no feedback was given to the subjects during the course of the experiment.

The subjects would continue practicing at each depth condition until they successfully completed three consecutive trials with an error in disparity of within one pixel. Most subjects completed the training with very few errors, indicating that the stereoscopic resolution of this SV+SG system was well within the limits of human stereoacuity.

When the training was complete, the subjects would perform 10 repetitions for each combination of Pointer (real and virtual), Target (real and virtual), and Distance (.821, .965, 1.259, and 1.810 m), i.e. 160 trials. The sequence of the combinations was randomised to avoid order effects. Eight subjects completed all 160 trials in approximately two hours; the other two took approximately three hours.


3.4 Observations

The data recorded were the final position of the real and virtual pointer with respect to the target. These data were recorded in metres, and were converted to disparity angles in order to normalise the effects of target distance, and to express the results in general terms, not dependent on the particular camera configuration or equipment used. This will serve to make these results compatible with the results of future studies. The disparity between the final pointer position and the target is measured assuming that the eyes of the subjects are separated by 65 mm, and converge on the target. Thus negative disparities represent final pointer positions in front of the target (i.e. closer to the cameras), while positive disparities represent final pointer positions behind the target (i.e. further away from the cameras).

Many researchers [2][17] [18][20] have taken the absolute value of the error and conducted their analyses on these. We did not follow this route, however, since an analysis of variance is meant to be conducted on the mean of a distribution, not on the first absolute moment. The first absolute moment does not have a normal distribution, and so this approach violates a fundamental assumption of analysis of variance. While anova is usually robust to small deviations from this assumption, it is doubtful that using the first absolute moment rather than actual trial measures can be considered a small departure from the assumption. Instead of this, two different measures were examined: the means of the trials in each of the 16 conditions, and the variances of those trials.

The hypotheses underlying our experiment were that:

  1. the variances of all conditions would be equal,

  2. the mean disparity of all conditions would be equal, and

  3. the mean disparity of all conditions would be zero.

In order to examine the first hypothesis, several tests for homogeneity of variance were conducted. The Hartley Fmax Test and Cochran's Test for Homogeneity of Variance both found the variances of the different conditions significantly different at the .001 level, as did the Box-Scheffè Test [10] (F(16,1420)=21.903, p<.0005). We must therefore reject the first hypothesis, and say that the variances of all conditions were not the same. In order to see what effects there may be, the standard deviations of the different cells are plotted in Figure 2.
The four lines represent the four different combinations of pointers and targets, as a function of Target Distance from the cameras:

A post-hoc analysis using paired F-Tests was conducted, using a combined significance level of .05, looking for significant differences between all 120 possible pairs of cells (i.e. each F-test was conducted with alpha = .0004167), in order to reduce the risk of Type 1 error to a conservatively low value.

Looking at Figure 2, and considering first the effect of the factor Target Distance, we observe little effect due to Distance on the four curves shown, with the exception of the Virtual Pointer + Virtual Target condition, which shows a dramatic decrease in variance for the .821 m Target Distance. The paired F-tests confirm that this observation is statistically different from the others (F(98,98)=4.68, p<alpha). Otherwise, Target Distance does not appear to influence the variance of the disparity error. This was expected, due to the normalising effect of transforming the data into visual angles.

Next, examining the factor Target Type, we see that for the Real Pointer, there appears to be little difference in variance due to the Target Type, except at the Target Distance of 1.81 m. Again, the statistical tests confirmed this observation (F(99,99)=2.86, p<alpha). Similarly, for the Virtual Pointer, there appears to be little difference in variance due to the Target Type, except at the Target Distance of .821 m (F(98,99)=5.482, p<alpha).

Finally, examining the factor Pointer Type, we see that the Virtual Pointer has a consistently higher variance than the Real Pointer for all conditions except for that of the Virtual Pointer + Virtual Target at the .821 m Target Distance. The statistical tests again confirm these observations, and show that the standard deviation when using the Virtual Pointer is approximately 1.6 times that of using the Real Pointer.

In summary, therefore, it appears that in general neither Target Distance nor Target Type showed a consistent influence on the variance of the subject responses, while Pointer Type did.

We now consider the second and third hypotheses, that the means of the different conditions are all the same and that they are all zero. To do this we must first consider the propriety of conducting an analysis of variance given our rejection of the first hypothesis.

Since the non-homogeneity of the variances is significant, and since no transformation on the data served to reduce this non-homogeneity, the assumptions for performing an analysis of variance were violated. However, since the distributions are approximately normal, since the differences in the variances are not large, and since analysis of variance is robust to minor violations of the assumption of homogeneity, an anova was conducted using Subjects as a blocking factor and Target Type (real or virtual), Pointer Type (real or virtual), and Convergence Angle (4 values) as within group factors. While a significant difference was found between Subjects (F(1,9)=5.8, p=.039), none of the main effects or interactions proved significant (at the p = .05 level). Since violations of the homogeneity of variance may serve to make the actual significance level appreciably larger than the nominal level [10], we feel confident in accepting the null hypothesis that the mean value of the 16 different cells are all equal, i.e. none of the factors are significant.

However, the third hypothesis, that the mean disparity is zero, is false: both the Real Pointer and the Virtual Pointer were positioned on the average in front of the target. Although statistically significant (F(1,1595)=160.155, p=.000), this forward bias is small: the mean disparity between the Pointer and the Target is -25 arc seconds.



4. DISCUSSION

The complete lack of any factor significantly affecting the mean disparity (that is, the pointer positioning error) implies that, under these laboratory conditions, there is no positioning bias attributable to whether or not the pointer or target is real or computer generated. This implies two important conclusions: that the calibration of the SV+SG was accurate, and that a SG Pointer can be used successfully with SV images.

The existence of the 25 arc seconds forward bias is interesting; human stereoacuity is estimated to be between 10 and 30 arc seconds under natural direct viewing conditions [1]. Using video as the display medium inherently degrades image clarity, and thus acuity, and thus probably stereoacuity. The 25 arc seconds bias is therefore presumably below the level of human stereoacuity using the SV+SG system, and yet it is statistically a significant result. Examination of video-tapes of the experiment show that most subjects appeared to use a Method of Limits [11] approach to the pointer positioning. That is, they would move the pointer in one direction until satisfied that it was too far, move it back in the other direction until again it was too far, and then return it to a position midway between these two. However, if the subjects had indeed accurately placed the pointer in the middle of this range, they should have shown a slight positive disparity bias, that is, behind the target, since the geometry of stereoscopic perception indicates that the threshold for a just noticeable difference in depth is smaller when moving towards the observer than when moving away.

The fact that the subjects consistently placed the pointer in front of the target could be the result of an unaccounted for perceptual issue influencing the results. Another possibility is that the subjects felt it preferable to err on the near side than to err on the far side; no effort was made to control such subject bias. We could find no record of any similar results anywhere in the literature. Typically, experimenters do not report any analyses of the means of their alignment tests; they instead consider only the absolute first moment of their results. Further research is necessary to understand this phenomenon.

The differences in the variances of the real and virtual pointers are not large, but they are statistically significant. There are several possible explanations for these differences. The most likely is that the real pointer had associated with it several monoscopic depth cues to aid in depth perception, cues that were not provided by the virtual pointer.

As discussed previously, this particular SV+SG system was not able to avoid aliasing. This meant that the size cue of the virtual pointer when moving could not be as realistic as the size cue of the real pointer, since the virtual SG pointer was constrained to change in size by whole pixel amounts, while the real SV pointer could change in size by partial pixel amounts. An additional shortcoming of the graphics software was that the SG Pointer size did not change by single pixel steps, but instead changed by double pixel steps. This means that the size cue for the SG Pointer was degraded even further than by simple aliasing.

In addition to the difference in size cues, it is likely that the real and virtual pointers differed in luminance as well. Although efforts were made to ensure that the illumination of the real pointer was as uniform as possible, it is likely that some variation in the illumination of the real pointer did take place as it moved in depth, while the virtual pointer maintained a uniform brightness.

These differences in the monoscopic depth cues of size and luminance most likely contributed to the difference in variance between the real and virtual pointers. This is consistent with Reinhart's finding that the use of the monoscopic depth cues of size and luminance can reduce the variability of a stereoscopic cursor alignment task [17].

A second factor which could have influenced the variance results may be the control modes used for the different pointers, the wire control for the real pointer and the trackball control for the virtual pointer. While efforts were made to match the two control gains, no measurements were made to verify the success of this matching. It is possible that differences in control gains could have influenced the results.

The observed interaction effects on the variance are not easy to understand. Why the variance of the virtual pointer + virtual target combination at the one position in front of the monitor screen was so much lower than the other positions is unknown. It may have been an artifact of the virtual pointer size changes discussed above. Further examination of this question is also necessary.



5. CONCLUSION

Combining stereoscopic video and stereoscopic computer graphics has resulted in a new technology with great potential power, with possible applications in a wide number of fields. An experiment was conducted to verify the usability of this technology. The experiment verified that a SG Pointer can be accurately aligned with real SV targets, with only slightly more variance than a similar real pointer. Improving the SV+SG technology with anti-aliasing techniques should reduce this difference in variance even further.



6. ACKNOWLEDGEMENT

Portions of the work described herein were carried out under contract W7711-7-7009/01-SE with Supply and Services Canada, for the Defence and Civil Institute of Environmental Medicine, Downsview, Ontario, Canada.



7. REFERENCES

1. A Arditi, "Binocular Vision", Chapter 23 of Handbook of Human Perception and Performance, edited by K R Boff, L Kaufman, J P Thomas; John Wiley & Sons, New York, 1986

2. R J Beaton, R J DeHoff, N Weiman, P W Hildebrandt "An evaluation of input devices for 3-D computer display workstations", SPIE Vol 761 True 3D Imaging Techniques and Display Technologies, 94-101, 1987

3. D R W Butts, D F McAllister, "Implementation of True 3D cursors in computer graphics", SPIE Vol 902 Three-Dimensional Imaging and Remote Sensing Imaging, 74-84, 1988

4. F Chavand, E Colle, J P Gaillard, A Mallem, J P Stomboni "Visual assistance to the operator in teleoperation and supervision situations", Proc. Int. Symposium on Teleoperation and Control, 237-248, July 1988

5. D Drascic, P Milgram, J Grodski, " Learning Effects in Teleopertion using Monoscopic Versus Stereoscopic Remote Viewing", Proceedings of the IEEE International Conference of the Systems, Man, and Cybernetics Society, 1989

6. J Grodski, P Milgram, D Drascic, "Real and Virtual World Stereoscopic Displays for Teleoperation", NATO DRG Seminar: Robotics in the Battlefield, March 1991

7. G Hirzinger, "The space and telerobotic concepts of DFVLR ROTEX", Proceedings of the IEEE International Conference of Robotics and Automation, 443-449, 1987

8. W S Kim, F Tendick, L W Stark "Visual Enhancements in Pick-and-Place Tasks: Human Operatosr Controlling a Simulated Cylindrical Manipulator", IEEE Journal of Robotics and Automation, v RA-3, no 5, pp 418-425, 1987

9. W S Kim, M Takeda, L Stark "On-The-Screen Visual Enhancements for a Telerobotic Vision System", Proceedings International Conference IEEE Systems, Man, and Cybernetics Society, Beijing, 126-130, Sep 1988

10. R E Kirk Experimental Design, Procedures for the Behavioural Sciences, 2nd Ed., Brooks/Cole Publishing Company, Monterey, 1982

11. J W Kling, L A Riggs Woodworth & Schlosberg's Experimental Psychology, 3rd Ed., Holt, Rinehart & Winston, Inc, Toronto, 1971

12. S G Maclean, M Rioux, F Blais, J Grodski, P Milgram, H F L Pinkney, B A Aikenhead "Vision System Development in a Space Simulation Laboratory", Proceedings of the International Society for Photogrammetry & Remote Sensing, Commission Five: Close Range Photogrammetry & Machine Vision, Zürich, Sep 1990

13. P Milgram, D Drascic, J Grodski "Stereoscopic Video + Superimposed Computer Stereographics: Applications in Teleoperation", Proc. Second Canadian Workshop on Military Robotic Applications, Kingston, Aug 1989.

14. P Milgram, D Drascic, J Grodski "A Virtual Stereographic Pointer for a Real Three Dimensional World", Interact `90: Third IFIP Conference on Human-Computer Interaction, Cambridge, UK, August 1990

15. P Milgram, R van der Horst "Alternating-field stereoscopic displays using light-scattering liquid crystal spectacles", Displays: Technology & Applications, v 7, n 2, 67-72, April 1986

16. M V Noyes, Superposition of graphics on low bit rate video as an aid in teleoperation, MSME Dissertation, M.I.T. 1984

17. F W Reinhardt, Effects of Depth Cues on Depth Judgements Using a Field-Sequential Stereoscpoic CRT Display, PdD Dissertation, Industrial Engineering & Operations Research Dept, Virginia Polytechnic Institute & State University, 1990

18. E H Spain, A Psychophysical Investigation of the Perception of Depth with Stereoscopic Television Displays, PhD Dissertation, University of Hawaii, May 1984

19. Y Y Yeh, L D Silverstein, "Depth Discrimination in Stereoscopc Displays", SID `89 Digest, 372-375, 1989

20. Y Y Yeh, L D Silverstein, "Limits of fusion and depth judgment in stereoscopic colour displays", Human Factors, v 32, n 1, 45-60, Feb 1990