drascic@ie.utoronto.ca
To that end, a practical Stereoscopic Video (SV) system was developed that is compatible with standard video display and recording equipment. An experiment was conducted to examine the potential benefits of SV for teleoperation. The results showed that SV can aid teleoperation by reducing task execution times, reducing error rates, and reducing the time needed for training.
As Don Norman says in The Psychology of Everyday Things, "Nothing succeeds like a good display." (Norman, 1988) Presenting necessary information in a natural form facilitates all human-machine interactions. Unfortunately, MV displays act as a filter, removing all stereoscopic depth cues from a scene while retaining some of the monoscopic cues. Stereoscopic cues are a very important source of information about the spatial layout of the remote scene, and thus much of the "knowledge in the environment" is rendered inaccessible. While it is often possible to accomplish a teleoperation task using such a display, it is usually difficult, and takes considerable training to acquire sufficient skill. Robinson (1984) reports that standard two-dimensional video systems with their restricted amount of visual information resulted in Bomb Squad personnel being reluctant to use the telemanipulation vehicle.
The loss of stereoscopic depth information means that there are frequently times when the spatial locations of objects in a static image are ambiguous. While motion parallax or multiple views can sometimes resolve such ambiguities, operating conditions may render this option unfeasible.
Dependence on monoscopic cues.
MV displays filter out all stereoscopic cues, and several monoscopic cues as well, such as texture gradient. Operators therefore have a greater dependence on certain monoscopic depth cues than most stereoscopically-able persons are accustomed to.
It has been shown that when using a modified direct view, such as with a prism arrangement to artificially exaggerate eye separation, or magnifying lenses, monocular cues need to be learned, or recalibrated, a process which takes time (McGovern, 1987). Furthermore, it is known that binocular depth cues play a fundamental role in the calibration of the monocular depth cues, and that binocular disparity is perceived more quickly than any other visual cue (Clapp, 1986, Clapp, 1987).
Even after having learned how to interpret monocular cues for a considerable time, it remains a fairly weak depth cue, and is easily dominated by other cues such as perspective and occlusion. (Wickens, 1990)
On the other hand, as Baker notes: "Because the accommodation and convergence differs in stereoscopy and the physical world, the ability to see binocular depth on a CRT must be learned. On first occasion, many people adapt in a few seconds, while others may take several minutes to see the image comfortably." (Baker, 1987) This short time period, however, is likely considerably shorter than the time needed to master interpretation of monocular cues.
These facts imply that it will take a novice longer to become proficient in teleoperation using a monoscopic display than a stereoscopic display. Since the view from the video monitor is very different from a direct view of the real world with respect to the relationship between the monocular depth cues and the binocular depth cues, it is reasonable to expect that without constant practice, the temporary voluntary recalibration (or learning) of the depth cues of the monoscopic display will fade.
For degraded or complex displays, the monoscopic cues may not prove sufficient; other more radical methods of obtaining depth information may be required: "At present [using MV displays] robots used in the nuclear industry and elsewhere have to make contact with their surroundings for the operator to know exactly where they are. Aided only by a standard two-dimensional TV picture, the operator has to bash the robot arm around inside the reactor until the right position is somehow established. `This can cause damage both to the robot arm and the surroundings.'" (Macilwain, 1989)
Benefits of stereoscopic displays
The expected benefits of using stereoscopic displays include a faster and more accurate perception of the spatial layout of the remote scene, visual noise filtering, enhanced effective image quality, enhanced slope and depression detection, wider field of view, enhanced object recognition and image interpretation, increased user satisfaction, and fewer casual errors (critical movements are generally done with such care that there is very little room for improvement). (Drascic, 1991, Milgram et al, 1989, Merritt, 1988)
The literature shows that using stereoscopic video (SV) can greatly improve teleoperation performance and user satisfaction, particularly for tasks "which involve ballistic movement, recognition of unfamiliar scenes, analysis of three dimensionally complex scenes and the accurate placement of manipulators or tools within such scenes." (Dumbreck et al, 1987) SV can in fact make possible tasks that are otherwise impossible (Merritt, 1984).
Further, it has been quite clearly shown that as image quality is degraded or scene complexity increased, the advantages of SV are intensified. (Pepper, 1986)
As stated above, a certain amount of training is necessary to learn how to use the monoscopic depth cues on a MV display. This suggests another benefit of SV displays not greatly discussed in the literature, which is that they do not require the same degree of training, and should be easier for novices to use, possibly reducing the training and practice time needed for skilled teleoperation. While the parameters affecting the utility of SV displays have been investigated for a variety of stereoscopic display formats (e.g. Pepper, 1983 Spain, 1984), most studies have examined the behaviour of well-trained operators. Very little work has been done to investigate the behaviour of relative novices to remote manipulation tasks, particularly with regards to the type of video system used. Of the studies that have been carried out, the findings have not been consistent. For instance, whereas one study found that for a simple target positioning task, the advantages of using a SV system were not as great for novices as for experienced operators, implying that the relative benefits increase with experience (Pepper & Hightower, 1984), another study, using a peg-in-the-hole task, found, in contrast, that the relative benefits of SV decreased with experience (Smith et al, 1979).
In order to investigate this question further, the author and his colleagues conducted two experiments in the Department of Industrial Engineering at the University of Toronto. The first (reported in Drascic, 1991) investigated the behaviour of novices to teleoperation, and found that for a very simple task with little need for depth cues of any kind, using SV displays provided an initial advantage over MV displays, an advantage which faded as subjects became more accomplished at using the MV displays for the highly repetitive task. The second experiment, reported here, examined skill acquisition for skilled operators under a variety of different conditions.
Driving a telerobot along a clear path would fall towards the SV-independent end of the spectrum: the information required in order to accomplish the task can easily be obtained without SV. The experiment alluded to above involved such a task, and found that the benefits of SV faded as subjects became more accustomed to using a MV display. (Drascic, 1991) It is reasonable to expect that all tasks near the SV-independent end of the spectrum will show similar results (e.g. Pepper et al, 1981).
On the other hand, at the SV-dependent end of the spectrum, there will likely exist tasks where the initial advantage of SV might even increase with experience, as skill with SV improves, while skill with MV is so handicapped that little improvement can occur.
Between these two extremes will exist the bulk of teleoperator tasks. Since most telerobots in use today are equipped with MV system(s) (Meieran, 1988), the only tasks possible have been those near the SV-independent end of the spectrum. As SV becomes more commonly used, the variety of telerobotic tasks will increase dramatically.
Based on these results and the discussion above, the hypotheses posed for this experiment were:
1. Subjects using SV will show an initial performance advantage over those using MV.
2. The performance difference between MV and SV will decrease as the subjects become more experienced, more so for the low difficulty SV-independent task conditions than the high difficulty SV-dependent task conditions.
Part A of the experiment was designed to look at the acquisition of highly task-specific skills for a very repetitive situation, as a function of experience (trial number), the video system used, and the difficulty of the task. Part B of the experiment was designed to look at the differences in performance of tasks in a non-repetitive situation, as a function of the video system used and of the difficulty of the task.
In Part A, the factors being examined were (1) task learning, by having the subjects repeat the same task 16 times consecutively; (2) video system, either SV or MV; and (3) task difficulty and SV-dependence, using four different target sizes (8, 16, 32, and 64 cm). A full-factorial design was used, so that each subject performed 16 * 2 * 4 error-free trials in Part A. Trials with errors were discarded (as explained below), so this meant that subjects with a high error rate performed more runs in total than those subjects with low error rates. Although this may influence the results somewhat, since some subjects therefore had more experience and practice than the others, it was felt that the additional experience from making errors would not contribute a great deal to the performance of the task, since errors were so disruptive (see below).
In Part B, the factors being examined were (1) video system, either SV or MV; and (2) task difficulty, with one of the four target sizes. Each of the 2 * 4 conditions was repeated 8 times, so each subject performed 2 * 4 * 8 error-free trials in Part B of the experiment. Again, those subjects with high error rates performed more trials than those with low error rates.
Final Design.
Considering the role of experience and transfer effects, Part A had the following factors:
Part B had the following factors:
The task was derived from an X-raying procedure used by the Canadian Department of National Defence Explosive Ordnance Disposal team in examining suspected parcels. (See Drascic, 1991) The task was simplified, and a Fitts' Law approach was used to control and calibrate the level of difficulty and SV-dependence. The telerobot used for this experiment was the Remote Mobile Investigation unit (RMI), manufactured by Pedsco (Canada) Ltd. The RMI is a mobile platform with a three degree of freedom manipulator arm and an 80 m tether. Using various tools and attachments, the RMI can be used to remotely X-ray, disable, detonate, or transport a suspected explosive device, without risk to human life. The RMI is typically equipped with a single MV system. The one used for this experiment was modified to have a switchable MV-SV system.
The experiment task consisted of driving the RMI a distance of 3 m forward, and lowering a mock X-Ray photographic plate between two "bombs", set a certain distance apart. The X-ray photographic plate was simulated by using a pointer suspended from the end of the RMI's forearm, that could swing freely. The two "bombs" were flat black briefcases. The operators began each condition with the forearm of the RMI pointed upward, so that the target was not visible on the monitor screen. The operators had to lower the arm until the target was visible, drive forward until the hanging pointer was between the two briefcases, and lower the forearm until a buzzer on the pointer sounded, indicating the end of the trial. (See Figure 1)

According to Fitts' Law, movement time is linearly related to the Index of Difficulty (ID), measured in bits, where ID = log2 (2 * movement distance / target width). Using this formulation, the different target sizes (bomb separations) can be converted into Index of Difficulty bits. Given a movement distance of 3.0 m, and target widths of 0.64, 0.32, 0.16, and 0.08 m, the corresponding IDs are 3.2, 4.2, 5.2, and 6.2 bits.
Training.
At the start of the experiment, each subject received a standard set of instructions, describing the experiment and their task. It was emphasized that speed was important, but that every error meant they would have to perform another trial and should to be avoided.
This was followed by a familiarisation period with the controls of the RMI. The RMI was then placed in the standard starting position, and the briefcases were set to the training separation of 24 cm. Using direct view the subjects practiced the experimental task until they were able to pass a skill level criterion by performing four consecutive error-free trials in under six seconds. They then repeated the above familiarisation period and training procedure using the remote view.
The subjects were eight university students, three female, five male. All were volunteers, and were paid $5 per hour for their participation.

Examining first the results of Day 1 of the experiment, we find that there is considerable support for Hypothesis 1, that subjects using SV have a considerable performance advantage over those using MV at first.
Looking at the easiest condition (ID = 3.2) in Figures 3 and 4, we see that the trial times for SV are considerably shorter than those for MV. Furthermore, there is a considerable downward trend in the MV times. The error rates for both MV and SV are relatively low and approximately equal for both MV and SV. The advantage of SV is decreasing throughout the set of 16 trials.
At ID = 4.2, the next harder condition, we see that the trial times for both MV and SV are slower than for the previous condition, as expected. MV shows a decreasing trial time throughout the 16 trials, with a consistent error rate. SV shows a slightly decreasing trial time, but has an increasing error rate, suggesting that the subjects are exploring the speed-accuracy trade-off more than exhibiting signs of learning. Again, the advantage of SV appears to decrease throughout the set of 16 trials.
At ID = 5.2, there is an indication of learning with MV (the second group of 4 trials have a decreased error rate while trial time remains the same; the third and fourth groups show fewer errors still and slower trial times which might suggest a simple speed-accuracy trade-off, or might indicate continued improvement in performance), while those using SV show no particular learning trend (decreasing error rates are matched by slower trial times, suggesting a speed-accuracy trade-off). The advantage of SV does not appear to decrease throughout this set of 16 trials, unlike the previous two conditions.
At ID = 6.2, both MV and SV show a strong learning trend in the error. The MV time drops for the second grouping of four trials, and then rises for the two groups, while the error rate continues to drop, suggesting some exploration of the speed-accuracy trade-off. SV shows a small decrease in trial time with a large drop in error rate at first, followed by constant times and an increasing error rate. This could be an indication of fatigue, or simply a statistical artifact. Again, the SV advantage does not appear to decrease throughout this set of 16 trials.


Figure 4: Part A Day 1: Number of Errors versus Trial Number and Index of
Difficulty with Std Error Bars
Furthermore, the advantage of SV decreases as the subjects become more experienced, especially for the easier conditions. This is consistent with hypothesis 2.
Now consider the results of Day 2 of the experiment. The subjects are very much more experienced with the task on this day, albeit with the other type of video system.
These results are considerably different from those of Day 1. A cursory examination of the trial times shows no particular advantage of SV except for the most difficult condition. A glance at the error rates, however, shows that there is indeed a difference in performance between MV and SV.
At ID = 3.2, there is little difference in the times, but those using SV had a lower error rate for the first few trials. Those using MV showed some learning in both time and error rate, so that there was very little difference in performance by the end of the set of 16 trials.
At ID = 4.2, those using MV showed little change. Those using SV appear to get worse briefly, then slightly better. The small SV advantage at the beginning vanishes by the end of the 16 trials.
At ID = 5.2, those using MV anomalously perform considerably better during the first set of four trials than the rest, getting considerably worse and then little better. This suggests that the initial set of trials were unusually good, a statistical anomaly. Those using SV get consistently better, trading off time for errors a little. Although the performance of MV is better than SV at first, this situation is quickly reversed to the expected situation with SV performance being consistently better than MV. As on Day 1, the SV advantage does not appear to decrease with experience.
At ID = 6.2, those using MV show consistent learning, predominantly in error rates. Those using SV performed better at first than in the rest of the set of trials, similar to the MV performance at ID = 5.2. Here the SV advantage does decrease, but does not vanish, with experience.


Figure 6: Part A Day 2, Number of Errors versus Trial Number and Index of
Difficulty, with Std Error Bars
In general, then, the results of Part A of the experiment strongly support Hypotheses 1 and 2.
Figure 7 shows the results of the trials times of Part B of the experiment as a function of the Index of Difficulty, while Figure 8 shows the same for the error rate results.


Figure 8: Part B Percent Errors
Considering Day 2 of the experiment, we find that performance for both MV and SV is much improved, thanks to the large amount of experience received on Day 1. There is no significant difference between the MV and SV conditions with regards to trial times, although those using MV appear to be somewhat faster, at the expense of higher error rates. The only significant difference between the MV and SV performance is at the highest level of difficulty. Thus we find that the SV advantage decreases quickly for SV-independent tasks, and is persistent for a longer period for SV-dependent tasks.
Analysis of variance on these results (not shown) confirm the observations made from the graphs above.
The experimental results support these hypotheses, and demonstrate that the benefits of SV, even after a great deal of practice, will still be apparent for difficult, SV-dependent tasks, long after the benefits have faded for easier, SV-independent tasks.
The implications for telerobotics are obvious. Given the nature of most telerobotics applications, operators have only a very few chances to accomplish the task correctly. The performance benefits of SV, even though they may fade with practice for highly repeatable tasks, should be very strongly evident in these single-attempt situations. Furthermore, given that operators can learn to use a SV display much more quickly than a MV display, operators should require less initial training and less constant practice in order to maintain their skills at a suitable level.
Clapp, R E, "Stereoscopic Displays and the human dual visual system", SPIE Vol 624 Advances in Display Technology VI, 41-52, 1986
Clapp, RE, "Stereoscopic Perception", SPIE Vol 761 True 3D Imaging Techniques and Display Technologies, 79-87, 1987
Drascic, D "An Investigation of Monoscopic and Stereoscopic Video for Teleoperation ", MASc Dissertation, University of Toronto, June 1991
Dumbreck, A A, C W Smith, S P Murphy "The Development & Evaluation of a Stereoscpoic Television System for Use in the Nuclear Industry", Int'l Workshop on Nuclear Robotic Technologies and Applications, University of Lancaster, June/July 1987
Macilwain, C "Remote control robots seen through 3D spectacles" , THE ENGINEER, 35, 8 June 1989
McGovern, D E "Current developments needs in the control of teleoperated vehicles", SANDIA National Laboratories Report SAND87-0646 UC-15, Albuquerque, New Mexico, August 1987
Meieran, H B "Robotics and teleoperator-controlled devices", Health Physics, v 55. n 2, 215-222, Aug 1988
Merritt, J O, "Visual tasks requiring 3-D stereoscopic displays", SPIE Vol 462 Optics in Entertainment II, 56-59, 1984
Merritt, J O, "Often-overlooked advantages of 3-D displays", SPIE V. 902 Three-Dimensional Imaging and Remote Sensing Imaging, 46-47, 1988
Milgram, P, D Drascic, J Grodski "Stereoscopic Video + Superimposed Computer Stereographics: Applications in Teleoperation", Proc. Second Canadian Workshop on Military Robotic Applications, Kingston, Ontario, Aug 1989.
Milgram, P, R van der Horst "Alternating-field stereoscopic displays using light-scattering liquid crystal spectacles", Displays: Technology & Applications, v 7, n 2, 67-72, April 1986
Norman, D A, The Psychology of Everyday Things, Basic Books Inc, New York, 1988
Pepper, R L, "Human Factors in Remote Vehicle Control", 30th Annual Meeting of the Human Factors Society, Dayton, Ohio, Sep-Oct 1986
Pepper, R L, J D Hightower "Research Issues in Teleoperator Systems", Proceedings of the HFS 28th Annual Meetings, pp 803-807, 1984
Pepper, R L, David C Smith, Robert E Cole "Stereo TV improves operator performance under degraded visibility conditions", Optical Engineering, v 20, n 4, 579-585, July/Aug 1981
Pepper, R L, "Research issues involved in applying stereoscopic television to remotely operated vehicles", SPIE Proceedings, v 402, 170-173, 1983
Robinson, M "Remote control vehicle guidance using stereoscopic displays", Proceedings of the HFS 28th Annual Meeting, 809, 1984
Smith, D C, R E Cole, J O Merritt, R L Pepper "Remote Operator Performance Comparing Monoa nd Stereo TV Displays: the Effects of Visibility, Learning, and Task Factors", NOSC Technical Report No. 380, Feb 1979
Spain, E H, A Psychophysical Investigation of the Perception of Depth with Stereoscopic Television Displays, PhD Dissertation, U of Hawaii, 1984
Wickens, C D "Three-dimensional stereoscopic display implementation: Guidelines derived from human visual capabilities", SPIE Vol 1256 Stereoscopic Displays and Applications, 2-10, 1990