An Investigation of Monoscopic and Stereoscopic Video for Teleoperation


David Drascic

Department of Industrial Engineering
University of Toronto
4 Taddle Creek Road
Toronto, Ontario
Canada M5S 1A4

drascic@ie.utoronto.ca


A Thesis submitted in conformity with the requirements for the degree of Master of Applied Science in the University of Toronto.
(c) Copyright by David Drascic 1991.

Some of this material is of a sensitive nature and has been deleted at the request of the sponsors of my work. I have marked the deleted sections.


Table of Contents


List of Figures

Figure 1.1: Remote Mobile Investigation Unit

Figure 1.2: RMI Monoscopic Viewing Aids

Figure 2.1: Dynamic Stereoscopic Camera Mount

Figure 3.1: SV Dependence Spectrum

Figure 3.2: Position on the SV-Dependence Spectrum of Experiments 1 and 2.

Figure 4.1: Driving Task for Experiment One

Figure 4.2: Pointing Device for Experiment One

Figure 4.3: Results of Experiment One

Figure 5.1: Fitts' Law Task for Experiment Two

Figure 5.2: Experiment Two: Training

Figure 5.3: Average Trial Times of Experiment Two,

Figure 5.4: Average Trial Times of Experiment Two

Figure 5.5: Experiment Two Part A Day 1 Trends

Figure 5.6: Experiment Two Part A Day 1: Number of Trials with Errors Made while completing Four Error-Free Trials, versus Trial Number and Index of Difficulty

Figure 5.7: Experiment Two Part A Day 2 Trends

Figure 5.8: Experiment Two Part A Day 2: Number of Trials with Errors Made while completing Four Error-Free Trials, versus Trial Number and Index of Difficulty

Figure 5.9: Results of Experiment Two Part B

Figure 5.10: Experiment Two Part B Number of Trials with Error made while completing Eight Error-Free Trials

Figure 5.12: Expt 2 Part B Performance Graphs, where # Errors refers to the number of errors made while completing the 8 successful runs of Experiment 2 Part B.

Figure 5.13: Expt 2 Part A Runs 9-16 Performance Graphs, where # Errors refers to the number of errors made while completing the last 8 successful runs of Experiment 2 Part A.


List of Tables

Table 4.1: ANOVA of Trial Time as a function of Set, Video, and Trial Number

Table 4.2: ANOVA of Trial Times for the First Video Condition

Table 4.3: ANOVA of Trial Times for the second video condition

Table 5.1: Relationship between Target Width and Index of Difficulty

Table 5.2: Analysis of variance of the number of trials needed to complete the training procedure.

Table 5.3: ANOVA of Experiment 2 Part A Trial Times, as a function of Order (MV First or SV First), Video System (MV and SV), Difficulty (8, 16, 32, 64 cm), and Trial Number (16 trials per condition).

Table 5.4: Analysis of Variance of Trial Times for Experiment Two Part A Day 1, grouping the 16 trials into four sets of four ("Learn4") to elucidate any trends.

Table 5.5: Analysis of Variance of Trial Times for Experiment Two Part A Day 2, grouping the 16 trials into four sets of four ("Learn4") to elucidate any trends.

Table 5.6: Analysis of Variance for Errors for Experiment 2 Part A Day 1, grouping the 16 trials into fours sets of four ("Learn4") to elucidate any trends.

Table 5.7: Analysis of Variance for Errors for Experiment 2 Part A Day 2, grouping the 16 trials into fours sets of four ("Learn4") to elucidate any trends.

Table 5.8: ANOVA of Experiment 2 Part B Trial Times

Table 5.9: ANOVA of Experiment 2 Part B Percent Errors


Acknowledgements

The author is indebted to many people who provided guidance and assistance in the preparation of the material presented herein. First and foremost of these is Prof. Paul Milgram, my thesis supervisor and mentor, who provided the original ideas, suggestions, contracts, goals, and motivation for the work, who freely bestowed his time, guidance, and wisdom considerably beyond the call of duty, who was a model of professorial responsibility, professionalism, and thrift, and who is doubtlessly culpable for engendering perfectionist tendencies in his pupil.

None of this work would have been possible without the generous contributions of Dr. Julius Grodski, of the Defence and Civil Institute of Environmental Medicine, and the considerable resources he provided. His comments and suggestions regarding this work are gratefully acknowledged.

The valiant labours of Prof. Mike Carter in protecting me from raging bureaucracy ensured my continued existence as a graduate student and is much appreciated. His contributions to and criticisms of this work were very instructive.

Several other people assisted in the formulation of the ideas discussed herein, including Brian Fitzsimmons, Phil Roberts, Dianna Drascic, and my fellow students in the Department of Industrial Engineering. Suzanne Rochford's contribution as laboratory assistant and guinea pig is greatly appreciated.

The comments and advice from the EOD staff and team at DCIEM were critical to this work, and is greatly appreciated.

The invaluable participation and input of those who devoted many hours of their time to serve as subjects in these experiments is also greatly appreciated. These people are Robert Drascic, Angela Gaudio, Laura Logan (twice), Drew van Camp (twice), Farshad Jajarmi, Parminder Kalirai, Yan Xiao (twice), Rhea Plosker, Yi Kou, Prabir Sarkar, Antonella Arcaro, Sue Clouse-Jensen (twice), Paul McInerney, Randy Sollenberger, Hao Zhao, Dave Colter, Zoltan Leskowsky, John Ovcauk, Deniz Ulguray, and Margaret Campell.

The work described herein was carried out under contract W7711-7-7009/01-SE with Supply and Services Canada for the Defence and Civil Institute of Environmental Medicine. Considerable personal financial assistance and support was provided by my parents, Janet and Savino Drascic, without which this work would have been impossible. At least for me.


Abstract

There are many tasks hazardous to human life which can be accomplished remotely through telerobotic manipulation. Robotic technology has advanced to the stage where teleoperated manipulators are versatile and effective enough to be used successfully in a wide variety of circumstances. As telerobotic systems become more sophisticated, it is important to ensure that the human-machine interface is adequate for the task. One very important type of feedback information that is missing from standard telerobotic control stations is the immediate and compelling binocular coding of depth, which is thwarted through the use of a standard monoscopic ( "2D") video system, making the operator dependent on other less salient visual cues. This is unfortunate, since most telemanipulation tasks require operators to have a good sense of the relative locations of objects in the remote world.

To that end, a practical Stereoscopic Video (SV) system was developed that is compatible with standard video display and recording equipment. Two experiments were conducted to examine the potential benefits of SV for teleoperation, with particular emphasis on the effect of experience.

The first experiment examined the issue of whether it was easier to learn how to interpret a SV display than a standard monoscopic video (MV) display. Using a task that had very little demand for binocular depth cues (i.e. was SV-independent), it was found that there was a benefit in performance due to SV that diminished as the operators learned how to use the monocular cues of the MV display. Furthermore, the first experiment provided evidence to suggest that SV can be used effectively with little or no training, while MV requires a period of adjustment and learning.

The first experiment also revealed an interesting transient effect that changing from one video condition to another can have on performance. Those who change from an SV to a MV display show a temporary but dramatic drop in performance, while those who change from a MV to an SV display show a large improvement in performance. The results of the experiment and the literature suggest that the differing appearances of "reality" of the two displays may affect the confidence of the operators in their abilities to perform the task, and so therefore affect their performance.

The second experiment examined the second issue, that of how the transience of the benefits of SV are a function of the difficulty of the task and the dependence on binocular depth cues. It showed that the benefits of SV, even after a great deal of practice, will still be apparent for difficult tasks, long after the benefits have faded for easier tasks.


1. Teleoperation Display Systems

Telerobots, telemanipulators, and remotely operated vehicles refer to a class of machine used to accomplish a task remotely, without need for human presence. These machines are typically used in situations were human presence is undesirable due to the dangers to human life, and are employed by the nuclear industry, police and military forces, and in space and underwater operations. In each of these cases, the goal is to in some sense simulate the presence of a human operator, both by recreating enough manipulative ability at the remote site using remotely-controlled robots, and by providing sufficient feedback information of the remote site to the operator, who can then accomplish the necessary tasks.

Most of these telerobots must be controlled manually by a human operator because computer intelligence systems are not yet smart enough to control the robots autonomously, except under very restricted circumstances. The operators must make all of the high level decisions, about where to go and what to do, as well as all of the low level decisions involved in carrying out these high level decisions, using the limited amount of information available from the sensors on the telerobotic device.

The goal of this work is to make the job of the operators of these telerobots easier. In this paper we report some efforts to improve the display part of the human-machine interface.

As Don Norman says in The Psychology of Everyday Things, "Nothing succeeds like a good display." (Norman, 1988) Giving necessary information in a natural form facilitates all human-machine interactions. The majority of teleoperation devices employed around the world are equipped with one or more standard video displays for feedback. (Meieran, 1988) This means that one very important type of feedback information missing from the standard display is the immediate and compelling binocular coding of depth, which is thwarted through the use of a monoscopic video system, making the operator more dependent on the variety of other depth cues. This is unfortunate, since most telemanipulation requires the operators to have a good sense of the relative locations of objects in the remote world. This depth information must be obtained through such cues as size constancy, interposition, shadows, and so on. While this is possible, and is a skill that can be mastered (Clapp, 1986), it is not the most direct way of conveying this information to the operators. By presenting the depth information in a natural manner, through stereoscopic perception, the entire task is made simpler.

The work discussed herein focuses on one particular telerobotic task, namely Explosive Ordnance Disposal (EOD), but the research and its implications applies to a much broader range of problems.

1.1. Teleoperation for Explosive Ordnance Disposal

Much of this work was carried out under a research contract for the Defence and Civil Institute of Environmental Medicine (DCIEM). The purpose of the project was to investigate the potential use of stereoscopic video (SV) for Explosive Ordnance Disposal (EOD), more commonly known as bomb disposal.

In general, EOD means rendering suspected or known bombs inoperative with as much regard for human safety as possible. Before the advent of suitably advanced telerobots, most of this work was done manually, at great risk to the safety of the bomb disposal expert. It is now possible in many situations to use the telerobot to disable the bomb from a safe distance.

1.1.1. The Remote Mobile Investigation Unit (RMI)

The telerobot used for most Department of National Defence (DND) and police EOD operations in Canada is the Remote Mobile Investigation Unit (RMI), manufactured by Pedsco (Canada) Ltd. The RMI is the unit of choice for several reasons: (i) it is made in Canada; (ii) it is inexpensive; and (iii) it is simple and reliable.


Figure 1.1: Remote Mobile Investigation Unit
........5 paragraphs deleted: hardware overview of the RMI

1.1.2. The RMI Control Station

The control panel of the RMI is a simple box, with a single 2 degree of freedom joystick and several toggle switches. The driving of the RMI is done with the joystick, which controls the direction and speed of the robot, while the switches control the movement of the arm, the panning of the camera, and the activation of the various attachments.

It is possible to learn how to control the RMI without too much difficulty, although there are some violations of stereotypes with respect to the direction of operation of the toggle switches which control the movement of the arm. The actual driving of the device is relatively straightforward: in the course of this work, it seemed that most novices had only to master the technique of inching forward; the steering of the RMI came naturally (when moving forward, at least).

The feedback information provided by the RMI Control Station typically consists of a single black and white video monitor and a bi-directional sound system.

........1 paragraph and diagram deleted: operating procedure

1.1.3. The RMI Tools and Techniques

In order to properly begin the investigation, a brief task analysis was conducted. Seven EOD specialists working for the Department of National Defence were interviewed, one of which was an instructor. A summary of the information drawn from these interviews follows:

A variety of tools can be attached to the RMI, and they are used in the following manner:

* x-ray unit: this consists of an x-ray projector mounted on the RMI base, and an x-ray film plate mounted on the end of the arm. The film plate, roughly 30 cm high and 50 cm wide, is mounted on a hinge so that it can swing freely, and thus remain vertical regardless of arm orientation. There is approximately a one metre gap between the x-ray projector and the film plate.

The plate is usually lowered behind the suspected parcel and slowly brought forward until it is as close to the parcel as possible. The closer the plate is to the parcel, the better the x-ray image. On the other hand, many bombs have anti-tampering mechanisms, and so it is important that the plate does not touch the parcel. Positioning the plate is a delicate task that must be done with a great deal of caution and that requires much training. It is highly dependent on such depth cue as are provided by lighting and shadows.

Once the x-ray has been taken, the RMI is returned to the control station where the film is developed to determine whether the parcel is indeed a bomb, and if so, where in the parcel the mechanism lies.

* claw: this is a simple hinged claw, driven by an electric motor.

........2 paragraphs deleted: claw limitations

* shotgun: this is mounted onto the "fore-arm" of the RMI.

........2 paragraphs deleted: shotgun use

* disruptor (centaur):

........1 paragraph deleted: disruptor description and use

The disruptor has an effective range of approximately 30 cm, with a spread of about 30 degrees. It must be positioned from 6 to 10 cm from the target, and should be aligned along the longest axis of the parcel to achieve greatest effectiveness.

In order to ensure that the disruptor is placed at the proper distance without disturbing the target in any way, the operators attach a loop of tape around the end of the disruptor, so that it sticks out about 8 cm. The operators then inch the RMI closer and closer to the target, until a movement in the tape observed on the remote monitor indicates that the RMI is at the proper distance. The tape is flexible enough that there is little risk of it triggering the explosive.

This task requires a very good perception of the relative distances of the target and the disruptor. In unknown environments, size cues are insufficient, and the operators must rely on lighting and shadows, and particularly on the loop of tape.

* horseshoe: The horseshoe is used to blast the cap off a pipe bomb and scatter its contents.

........2 paragraph deleted: horseshoe description and use

* pipe-carrier:

........1 paragraph deleted: pipe-carrier description and use

The pipe carrier can only be used for pipe-bombs that are known not to be motion sensitive. Even for such bombs, great care must be taken, since there is usually explosive powder in the threads of the screw-on caps, and a jar to the pipe may be enough to cause it to explode.

1.1.4. Summary of the Problems

Lack of Relative Depth Information

The standard monoscopic video (MV) display requires that the operator interpret the monocular cues available on the MV display in order to make judgements about the relative positions of objects in the remote world. For degraded or complex displays, the monocular cues provided by the MV systems may prove insufficient. Robinson (1984) reports that using standard two-dimensional video systems with their restricted amount of visual information resulted in Bomb Squad personnel being reluctant to use the telemanipulation vehicle.

Even when it is possible to make judgements about the relative position of items in the remote world, it usually takes a long time, and may require that the telerobot be driven around for a while in order to provide motion parallax information and a variety of views.

Lack of Absolute Depth Information

Under the best of circumstances using direct natural viewing, human are not very accurate at making absolute visual judgements of distances in the real world. Using the standard MV display of a telerobotic system makes this task considerably more difficult. The operator is completely dependent upon familiarity with the objects in the remote world. Even in a highly familiar environment, it takes a great deal of practice achieve and maintain any proficiency in making absolute depth judgements.

Additions to the MV display, such as the marks made by the operators on the monitor as described above (Section 1.1.2) can improve this situation, but such techniques are very limited in their scope and applicability.

Lack of Absolute Size Information

Operators are completely dependent on a familiarity with the objects in the remote environment in order to be able to estimate their absolute size. Some examples of where this ability is necessary for EOD involve estimating the size of obstacles: is it higher than the RMI can safely climb? Is the opening wide enough for the RMI to fit through? Is the depression deep enough for a wheel to get stuck in?

........1 paragraph deleted

Lack of a Colour Display

The RMIs in current service with DND are equipped with a single black and white MV system. In the course of the work, seven expert operators were interviewed, and all complained of the black and white display, calling it a hazard to their job. They reported that monochrome video makes reconnaissance difficult, especially when dealing with small objects outdoors, or in bright sunshine and shadow conditions.

The literature shows that using colour video displays can significantly improve obstacle recognition and course planning for terrestrial teleoperation (McGovern, 1987 b), and they are consistently rated higher on subjective scales of satisfaction than comparable monochrome displays (Miller, 1988). Changing to a colour display would be an obvious improvement with comparably little extra expense. While this question was not pursued further, being beyond the scope of this study, it is mentioned here to explain why colour displays are used for the experimentation described herein.

1.2. Displaying Depth

1.2.1. Background

Visual Perception

Our understanding of human visual perception is still fairly poor. A variety of models and theories exist which can be used to describe some aspects of visual perception, to varying degrees of usefulness. One particularly poorly understood facet of visual perception is that of depth perception. It is known that the brain uses a variety of techniques to establish the spatial location of objects in the universe, including using features of the image and of the visual system known as depth cues.

Depth Cues

Depth cues can be divided into two different categories, the first being monocular depth cues. These are cues that are identical and available to both eyes of the viewer, and work just as effectively when seen with just a single eye. It is through these cues that a two dimensional image, such as on a photograph or television screen, can appear to the observer as possessing three dimensional qualities. These cues include interposition, motion parallax, linear perspective, vertical position, size constancy of familiar objects, light and shading, accommodation, brightness, and so on. (For details, see any good text on the human visual system. For a summary, see Boff & Lincoln, 1988.)

The second category of depth cues is binocular depth cues. These cues result from seeing with two eyes from a slightly different viewpoint, and include (i) the convergence angle of the eyes, and (ii) retinal disparity, that is, the differences between the retinal images of the left eye and the right eye. (For a detailed explanation of binocular depth cues, see Arditi, 1986.)

It is the sum of all available monocular and binocular cues which results in the perception of depth.

A further advantage of binocular viewing is binocular parallax , which while not a depth cue per se, provides additional visual information: since each eye views the world from a slightly different position, it is possible to see "around corners" (that is, what may be occluded from view with one eye may be visible to the other).

Tele-Vision

Many techniques are used to extend the natural range of human vision. These techniques vary greatly in fidelity. Presumably the earliest technique was that of drawing pictures; it allowed people to "see" places they have never been. In time this technology was refined through photography, cinema, and video.

Telescopes were another technology, effectively bringing close what was far away. But both pictures and telescopes were a limited representation of the remote view, in that they were monoscopic, or "2-D", and made it hard to perceive the spatial relationships of objects in the remote view. Thus both were refined to permit stereoscopic, or "3-D", viewing: telescopes became binoculars, and pictures were presented as stereoscopic pairs for using in such display devices as the "stereoscope". By presenting both eyes of the observer with a slightly different viewpoint, binocular depth coding could be used, and a higher fidelity presentation of the remote world was possible.

Seeing at a distance has progressed from static, low-fidelity monoscopic images to dynamic, high fidelity stereoscopic images. Future innovations already being developed include holography (Frey, 1986), head-tracking displays (Merritt, 1987), graphic aids (Kim et al, 1987, Drascic et al, 1991 b), computer-assisted perception (Grodski et al, 1991, Milgram et al, 1991), and interaction.

Importance of Binocular Depth Cues

Some researchers (e.g. Lippert et al, 1982) have argued that using a stereoscopic display rather than a monoscopic display provides little benefit at a great cost, based on the assumption that the only stereoscopic cue that provides an absolute sense of depth is binocular convergence. (The only monoscopic cue to do the same is accommodation.) Lippert et al concluded that:

"2D displays would provide all essential information for the formation of distance perception and estimation, and that 3D displays, adding no important distance cues, yet providing all 2D cues, would yield similar estimations." [page 325]

What they failed to take into account was that binocular disparity provides a very powerful perception of relative distance, and so they completely ignored the most fundamental advantage of stereoscopic displays. While it is true that eye convergence is not effective as a depth cue beyond a distance of approximately 2 metres, retinal disparity is effective to a much greater distance. Human resolution of retinal disparity is generally considered to be approximately 10 seconds of arc, although some individuals have been able to detect disparities as small as 4 seconds of arc under ideal conditions (Arditi, 1986). Assuming an estimate of 10 arc seconds for stereoacuity, this means that a difference in depth of one metre can be detected at 37 metres.

It therefore stands to reason that the benefits of stereoscopic display will apply to a variety of teleoperation tasks, involving both near work, such as the precise placement of tools, and far work, such as in reconnaissance or driving.

1.2.2. Techniques for Displaying Depth

There are several different techniques that can be used in teleoperation for displaying the location of objects in the remote world with sufficient clarity.

Multiple Camera Systems

Some researchers have found that good performance can be obtained for certain specific telemanipulation tasks using multi-camera MV systems. (Kim et al, 1987) Unfortunately, these systems usually involve widely separated views, typically perpendicular views, which is rarely possible or practical in real world teleoperation tasks, especially for EOD and other work in unfamiliar environments. Furthermore, multi-camera views can be expected to be less advantageous as scene complexity and image degradation increase, thereby thrusting more of the function of integrating information about the remote environment onto the operator.

Kinetic Depth Effect Displays

Kinetic Depth Effect is often used as a depth cue for graphics systems for data visualisation and display purposes. The closest approximation to a continuous motion system available for real world video operation is the VISIDEP(TM) system (Jones et al, 1984). This system works by alternating images between two closely placed cameras at speeds between 5 and 15 Hz, giving the illusion that the remote world is rocking. While systems that utilise the Kinetic Depth Effect can indeed convey a strong sense of depth, and can give some information about the relative placement of objects in the remote world, they do not convey this information as unambiguously or as automatically as does a stereoscopic display, and they are difficult to use for any interactive purpose, where the operator must act within the moving display space.

Overlayed Monoscopic Graphics

When the operators make the calibrating marks on the monitor as described above (
Figure 1.2), they are overlaying a static graphic image onto the monoscopic video. The horizontal line serves under certain conditions to indicate a particular distance on the ground in front of the RMI, and thus some sense of absolute distance can be accurately conveyed. Furthermore, the width markings give some indication of absolute horizontal size for objects at the distance indicated by the horizontal line. While this system is not flexible at all, it has proven to be serviceable.

This system could be improved by making the superimposed monoscopic graphics dynamic. If they were to be generated on-line by a computer, they could be drawn to correspond to any distance, not just one, and with the RMI in any configuration, not just the "home" position. In that way, they could be used for absolute depth estimation and absolute size estimation whenever the floor is flat and level and the target is clearly located above a visible spot on the floor. Unfortunately, if the target is located at some place where the floor is not visible, such as on a step, this operator will be required to guess.

Kim, Takeda, and Stark (1988) report a system which superimposes a horizontal grid and vertical reference lines for objects in the remote world on top of a monoscopic view. They found that this enhanced monoscopic view enabled performance similar to that of two perpendicular monoscopic views or a single stereoscopic view. Unfortunately, this system requires that the operator individually identify all items of interest in the remote world, and enter the three dimensional position of those items into a computer database (using a relatively simple graphical procedure). It also requires that the telerobot be able to accurately report its current position relative to a known reference point. Refitting the RMI to permit this would be prohibitively expensive.

Stereoscopic Displays

The literature clearly shows that using stereoscopic video (SV) can greatly improve teleoperation performance and user satisfaction, and can in fact make possible tasks that are otherwise impossible, such as retrieving small objects from within a tangle of wire (Merritt, 1984). Dumbreck, Smith and Murphy (1987) report that "Remote handling tasks where 3-D viewing is particularly helpful are those which involve ballistic movement, recognition of unfamiliar scenes, analysis of three dimensionally complex scenes and the accurate placement of manipulators or tools within such scenes." Furthermore, it has been quite clearly shown that as image quality is degraded or scene complexity increased, the advantages of SV are intensified (Pepper, 1986). EOD teleoperation involves all of these factors.

1.2.3. Benefits of Stereoscopic Displays

In order to improve teleoperation performance, it is suggested that stereoscopic displays be used rather than monoscopic displays. The benefits of using stereoscopic displays include (Grodski et al, 1991, Merritt, 1988):

* Enhanced image interpretation, especially in unfamiliar or complex environments.

* Visual noise filtering. Random noise of any sort, whether through poor transmission of the signal, or through sediment underwater, will obscure different parts of each eye's view. Using the two eyes' views for stereoscopic perception, the brain automatically suppresses the uncorrelated noise, enhancing the observer's ability to identify objects in the scene.

* Enhanced effective image quality. Stereoscopic displays can cause improvements in effective image quality which is degraded due to low resolution, lack of focus, motion blur, and so on.

* Wider effective field of view.

* Enhanced slope and depression detection. In situations where edges, textures and sizes vary randomly, such as when driving a telerobot on a field of grass, a change in slope may not be detectable with monocular cues, which may cause the telerobot to overbalance and fall.

1.2.4. Further Hypothesised Benefits of SV Displays

In addition to these benefits, there are several benefits not explicitly discussed, but implied by the literature.

* More accurate sense of depth. This will perhaps lead to reduced error rates for casual movements (critical movements are generally done with such care that there is little room for improvement), and faster execution of the task.

* Faster perception of spatial relationships in the remote world. This is due to a more salient presentation of the depth information; no search for non-salient monocular cues is needed. This should result in a faster execution of the task, as well as decreasing the amount of time needed to learn how to use the monoscopic display.

* More complete information about spatial relationships in the remote world. This should result in a more quickly perceived and more accurate mental image of the remote environment. This could reduce complex task execution time, and would certainly reduce execution time of a repetitive task.

In addition to these benefit, a further unproven benefit appeared likely prior to this research, and was in fact supported by the results of the experiments discussed herein:

Reduced Training and Practice Time. All operators must undergo extensive training in order to be qualified to conduct EOD operations, and all are required to partake in regular practice sessions with the RMI. Without these practice sessions, the operators' skill quickly deteriorates. It was our expectation, based on experience with the RMI and the reports of trained EOD specialists, that this deterioration of EOD operational skill is due primarily to the binocular deficiencies of the monoscopic display, and only to a lesser extent due to forgetting of the motor control skills.

It has been shown that when using a modified direct view, such as with a prism arrangement to artificially exaggerate eye separation, or magnifying lenses, monocular cues need to be learned, or recalibrated, a process which takes time (McGovern 1987 a). Furthermore, it is known that binocular depth cues play a fundamental role in the calibration of the monocular depth cues, and that binocular disparity is perceived more quickly than any other visual cue (Clapp 1986, 1987).

Even after having learned how to interpret monocular cues for a considerable time, it remains a fairly weak depth cue, and is easily dominated by other cues such as perspective and occlusion. (Wickens et al, 1990)

On the other hand, as Baker notes: "Because the accommodation/convergence differs in stereoscopy and the physical world, the ability to see binocular depth on a CRT must be learned. On first occasion, many people adapt in a few seconds, while others may take several minutes to see the image comfortably." (Baker, 1987) This short time period, however, is insignificant compared with the time needed to master interpretation of monocular cues.

These facts imply that it will take a novice longer to become proficient in teleoperation using a monoscopic display than a stereoscopic display. Since the view from the video monitor is very different from a direct view of the real world with respect to the relationship between the monocular depth cues and the binocular depth cues, it is reasonable to expect that without constant practice, the temporary voluntary recalibration (or learning) of the depth cues of the monoscopic display will fade.

These considerations suggest a benefit of stereoscopic displays not greatly discussed in the literature: they are easier for novices to use, and will thus require less time for training and practice than monoscopic displays. There has been little rigourous study of this suggestion, and so two experiments were designed to look at learning behaviour in monoscopic and stereoscopic displays.


2. Implementation of SV Displays

A great deal of work was invested in developing special hardware, electronics, and software, in order to realise the SV system. While a few commercial SV systems are available, they are very expensive and thus were considered inappropriate for use on an EOD telerobot, which runs the risk of being destroyed during every operation.

Although considerable time was spent developing this technology, this was not the main point of this thesis. There are various other ways of achieving the same goals (although they are almost universally more expensive). Because of this, what follows includes only a short description of the implementation developed. What is important to this thesis is the Human Factors research conducted. That will be described in considerably more detail in the following chapters.

2.1. Implementation of SV

2.1.1. Background

There are many different ways to present a binocular image to an observer. In this overview they have been grouped into four different categories: (i) dual optical path techniques, (ii) filter separation techniques, (iii) illusory 3D image techniques, and (iv) real 3D image techniques. (For summaries of these, see Lipton, 1984, Lipton & Meyer, 1987; Spain 1984; Lane 1982; Hart & Dalton 1990)

Dual Optical Path Techniques

Dual optical path techniques produce stereoscopic images by presenting the left and right eye views using two separate image sources and optical paths. The observer typically looks through a set of lenses, mirrors and/or prisms which present the separate images in appropriate locations in front of the eyes. Examples of such systems include the stereoscope of the 19th century (Lane, 1982), binoculars, and binocular microscopes. A more modern version of the stereoscope is the ViewMaster toy for children, and the dual miniature video displays of a Head Mounted Display (e.g. Fisher et al 1986).

Dual optical path systems can produce high fidelity stereoscopic images, but in virtually all implementations can only be used by one observer at a time, since typically only one set of optics is provided. Furthermore, the need for lenses, mirrors and prisms is often expensive and inconvenient. Since the operator must be looking directly into the optics of the system, use of such systems can interfere with the performance of other tasks.

Filter Separation Techniques

Filter separation techniques produce stereoscopic images by presenting the left and right images via the same optical path, where filters or shutters are used to separate the combined images at the eyes of the observer. In these systems the left and right images are presented in the same location on a display surface of some kind, such as a projection screen, video display, or printed page, or else are combined from separate displays into a single optical path by using semi-silvered mirrors. Some of these systems present the left and right images simultaneously, while others alternate rapidly between the two images.

One version of this type of system uses what is known as a chromatic anaglyph , where the left image is presented in one colour, say green, and the right in another colour, say red. The observers wears one red and one green filter in front of their left and right right eyes respectively. The left eye, with the red filter, will not be able to distinguish the red image from the filtered white background, and yet the green will stand out as being considerably darker than the surrounding region. Similarly, with right eye with its green filter will not be able to see the green left image, but the red right eye image will stand out considerably darker than the background.

Chromatic anaglyphs are relatively simple and inexpensive to implement, can be used to produce stereoscopic images from a printed page (for example, many books on drafting use this technique). Unfortunately, chromatic anaglyphs are generally not acceptable for extensive use, resulting in too much eye strain and user dissatisfaction. (Lane, 1982)

A similar technique using polarised light has replaced the chromatic anaglyphs in most applications, where the left eye image is presented using light polarised in one direction, and the right eye image is presented using light polarised in a perpendicular direction. Viewing the combined images directly, the observer would see both the left and right images superimposed. If the images were viewed through a polarising filter, however, only the light polarised in the direction of the filter can pass through it, and so the observer will see either the left or the right eye view, depending on the orientation of the filter. For stereoscopic viewing, the observer wears spectacles with appropriately polarised lenses, so that the combined left and right eye images are separated for the viewer.

When used for viewing stereoscopic slides and motion pictures, the typical implementation uses separate projectors or projector optics for the left and right eye views, each set of optics fitted with a polarising filter. The two images are simultaneously projected onto the same screen (which must be capable of preserving the direction of polarisation). When used for video applications, the left and right images are time-multiplexed, where the image on the monitor switches rapidly between the left and right camera views, and a liquid crystal polarising filter, capable of rapidly changing the direction of polarisation, is fitted over the surface of the monitor. When the left image is presented, the filter is polarised on one direction, and when the right image is presented, the filter is polarised in a perpendicular direction. When the alternating rate is fast enough the illusion of continuous presentation can be achieved.

Polarised filter separation techniques have the advantage of being full colour, and are generally more acceptable to users than colour filter separation techniques (chromatic anaglyphs). Linear polarisers place restrictions on head tilt; circular polarisers do not. Both types have some problem with light transmission and crosstalk. And while relatively inexpensive to implement for slide and movie projection (since simple, small polarising filters can be used), polarised filter separation techniques are expensive to implement for a video system, since the polarising filter must be large enough to cover the entire display surface, and must be dynamic, able to change polarising direction many times per second.

A third variation of the filter separation technique uses alternating dual images with active shutters that alternately occlude one or the other eye are placed in front of the eyes of the observer. The presentation of the time-multiplexed left and right images is synchronised with the shutters, so that each eye is presented with only its corresponding image. A variety of shutters have been used, including mechanical, PLZT, and liquid crystal (Lipton & Meyer 1984, Lipton 1987, Milgram & van der Horst 1986). These systems, especially when using liquid crystal shutters, can give high quality stereoscopic images without complicated equipment. Some shutters suffer from low transmission problems and crosstalk. Depending on the configuration of the system, the stereoscopic image may be viewable by more than one observer. Unfortunately, observers with an oblique view will see a distorted image.

Stereoscopic Virtual Image Techniques

A third approach for producing binocular images depends on generating optically virtual images, where light rays are produced so that they appear to be coming from an image in three-dimensional space. The hologram is an example of this kind of display. The use of lenticular lenses is another (Butterfield, 1979). Other methods typically involve the deformation, vibration, and/or rotation of lenses and/or mirrors (Fajans, 1979, Lane, 1982).

The intended advantage of such systems is that no filters or shutters are necessary, and that observers with oblique views will not see a distorted image. Unfortunately, many holograms and lenticular displays can be seen only when the observer's head is in a particular location, limiting the number of viewers. Systems using dynamic mirrors or lenses usually have a restricted field of view. Furthermore, even though holographic technology is continually improving, it cannot yet be used as a live, interactive display, and the enormous bandwidth required by such a system makes such a system in the near future extremely unlikely. Although some work has been done to use lenticular lenses for video display purposes (Butterfield, 1979), it has proven difficult and expensive to implement and use.

Those systems using dynamic mirrors or lenses have been successfully implemented for the display of simple computer generated images, but no implementation of such a system for a video display has been reported. The nature of this type of system imposes dramatic bandwidth limitations on such a display.

Stereoscopic Real Image Techniques

A fourth method for producing binocular images works by dynamically illuminating a real volume in space which can then be viewed by an observer using natural vision.

This can most easily be achieved by using carefully timed projection images or lasers to illuminate spots on an opaque surface that is sweeping through a particular volume (Lane, 1982, Williams & Garcia, 1988). Other techniques which have been investigated include using intersecting laser beams to excite small "spots" of gases in a display "tank", and sweeping a grid of light sources, such as LEDs, through a volume at a rapid rate (Fajans, 1979).

Such systems have the advantage of being viewable by many observers simultaneously, and suffer from no distortion due to oblique viewing angles. Unfortunately, stereoscopic real image displays are very costly and difficult to implement, and are still in their infancy. Some simple computer-generated images have been demonstrated, but there is considerable doubt that such systems could ever be used for the display of live video.

2.1.2. Present SV Implementation

Given currently available technology, the most economical and most easily implemented method for presenting three dimensional images is to use the filter separation technique outlined above, with active shutters to separate the time-multiplexed left and right video images.

This is consistent with the demands of this particular teleoperation task, and with teleoperation in general: under most circumstances only a single operator need view the display, so there is no viewing angle distortion, and properly designed shuttering spectacles need not interfere with the operation of the telerobot and associated equipment.

Alternating Field Technique

The full image on a conventional video display is updated 25 times per second for the PAL and SECAM standards, 30 times per second for the North American NTSC standard, and at a rates up to 120 times per second for special computer graphics displays. In order to create a cost-effective system, it was clear that using NTSC standard video equipment would be the most appropriate: off-the-shelf cameras, monitors, and video recorders could be used.

The NTSC video standard has been described in considerable detail elsewhere (see (Harshbarger, 1984) for a good summary). In brief, due to early limitations in broadcasting bandwidth, the NTSC system has 525 horizontal scan lines for each image, or frame, with an update rate of 30 frames per second. Since such a slow update rate would result in a visible flicker under normal viewing conditions, the lines are not drawn on the display in numerical order from top to bottom. Instead, the 263 odd-numbered lines are drawn from top to bottom, presenting the odd-field, and then the 262 even-numbered lines are drawn from top to bottom, presenting the even field. Together, the odd and even fields constitute one frame. In this way the conflicting requirements of bandwidth limitations, sufficiently high image refresh rate to avoid visible flicker, desire for a high-quality image, and need for a frequent image-update rate to give the illusion of continuous motion have all been balanced fairly equitably.

In order to create a stereoscopic image with off-the-shelf video technology, it is necessary to alternate rapidly between the left and right images. When using the NTSC standard, it is most convenient to create the stereoscopic image by taking the odd field from one video camera, and the even field from another. This results in a standard video signal that can be displayed and recorded on standard video equipment. (For more details on field interlace techniques, see Milgram & van der Horst, 1986, Lipton & Meyer, 1984.)

Cameras and Lenses

The two cameras used for the stereoscopic video system are made by Hitachi, model number VK-C150. These are MOS-sensor colour cameras with a width of 6 cm. They were purchase for $1500 each in 1987. It was important that the cameras be well matched for image distortion and colour, and that there be a good alignment between the optical axes and the housing of the cameras. Few commercial cameras are constructed with precision regarding the optical axes, however, and the two cameras purchased are significantly different.

The lenses used were 8 mm automatic iris TV Camera lenses manufactured by Cosmicar, model number C814DEX2.

Special Circuitry

Three different functions required the design and construction of special circuitry. The cameras needed a sync signal generator, which was created by the author. The odd and even fields from the two cameras needed to be combined, which was accomplished with circuitry originally provided by Paul Milgram and subsequently redesigned and rebuilt by DCIEM personnel. And finally, the shuttering spectacles needed to be driven in synchrony with the odd and even fields of the stereoscopic video signal, accomplished with additional circuitry originally provided by Paul Milgram and subsequently redesigned and rebuilt by DCIEM personnel.

Shuttering Spectacles

A variety of different technologies have been used for this purpose. We employed two different types.

The first technology used for shuttering glasses are based on cholesteric liquid crystal. In the OFF state, the liquid crystal lens has a milky white appearance which scatters the incident light, resulting in an extremely low contrast image. In the ON state, the liquid crystal lens becomes almost transparent, with a high transmission rate, resulting in a high contrast image. (See Milgram & van der Horst 1986 for a discussion of the merits of these shutters.) In order to reduce the apparent flicker when using a bright display, a neutral-density filter was added to the original shutters for use in the work described below.

The second technology used is based on twisted nematic liquid crystals. When in the OFF state, these liquid crystals cause incident polarised light to be rotated ninety degrees. When in the ON state, they simply transmit the incident light. Each liquid crystal cell is sandwiched between two aligned polarising filters. When the liquid crystal is in the ON state, incident light is polarised by the first filter, is transmitted through the liquid crystal cell, and is then transmitted (and attenuated further) by the second polariser. When the liquid crystal is in the OFF state, the liquid crystal rotates the polarised light from the first filter by ninety degrees, so that it cannot pass through the second filter; the lens effectively goes black.

The cholesteric liquid crystal glasses were constructed by Dr. Paul Milgram. The twisted nematic liquid crystal glasses are commercial units made for the Amiga computer by Haitex, with the trade name X-Specs.

2.2. Remotely Adjustable Dual Camera Mount

The literature discusses the question of the stereoscopic camera configurations in some detail, but without any clear consensus. In order to conduct research on the issues involved, a computer-controlled stereoscopic camera mount was designed by the author and constructed by DCIEM personnel.

The mount was intended to be able to symmetrically adjust the convergence angle and separation of the two cameras. This is accomplished by using a dual roman-screw design (see Figure 2.1).

The optional stepper motors and computer interface permit dynamic remote control. For the experimental work described herein, however, the mount was adjusted manually. Further research using this device is under way in this department to investigate the utility of dynamically adjustable camera configurations.


Figure 2.1: Dynamic Stereoscopic Camera Mount

3. Background for Experimental Investigation

3.1. Introduction

Information Needed to Control the Telerobot

As a framework for analysing the informational needs of the telerobot operator, employing the categorisation formulated by Rasmussen (1986), the following classification can be used:

Human Information Requirements for Telerobotics:

1) Knowledge & Rules (e.g. what procedures to follow, which techniques are most effective for a particular situation, special considerations)

2) Skills, which can be divided into the following categories:

i) Interpreting the Display (e.g. how to translate the information displayed into a suitably accurate internal model of the remote world)

ii) Controlling the Telerobot (e.g. familiarity with the control dynamics and behaviour of the telerobot, how to use the buttons, knobs, and switches to operate the telerobot, and knowing how the robot responds to a certain control action or to a certain external influence, such as a steep grade, etc.)

3) State Information, that is, information about the current:

i) Robot Configuration (e.g. whether the arm is extended or not, the claw open or not)

ii) Spatial Layout of the Remote Environment (e.g. what the objects in the remote environment are, and where they are in relation to each other)

iii) Location of the Telerobot within the Remote Environment

Effects of Video System on Knowledge and Rules

It is reasonable to expect that using stereoscopic video (SV) rather than monoscopic video (MV) will not greatly influence the acquisition or exercise of the Knowledge and Rule based information, if at all.

Effects of Video System on Skill Acquisition and Exercise

On the other hand, it is reasonable to expect that using SV rather than MV will affect the acquisition and exercising of Skill based information. Regarding Skill in Interpreting the Display: using binocular cues to determine depth is natural to most people, and most people can adapt to a well-designed stereoscopic display quickly Learning how to interpret the reduced monocular depth cues on a MV display, on the other hand, takes training to acquire and practice to maintain (as discussed previous in Section 1.3.5). We can therefore justifiably hypothesise that operators will learn how to interpret an SV display more quickly than a MV display. Additionally, when they are experienced, operators will require less training to maintain their skills, and will generally take less time to interpret the visual information presented on the SV display. This should ultimately decrease task execution time, and possibly reduce some types of errors.

Regarding (2ii) Skill in Controlling the Telerobot: while binocular cues are very important for conveying depth information, they are even more critical for conveying information about motion in depth (Clapp, 1987). AN SV display system can be expected to provide much better visual feedback of the motion of the telerobot to the operator, so it can be hypothesised that using SV can facilitate the acquisition of some of the skills needed to control the robot. There is little discussion in the literature about this point, and what there is is contradictory. Pepper & Hightower (1984) report that for a simple target positioning task, comparing the task execution time when using SV and MV display systems, the performance advantage of SV is greater for experienced operators than for novices, implying that the relative benefits of SV increase with experience. On the other hand, Smith et al (1979) found that the relative benefits of SV decrease with experience. In order to investigate these issues further, two experiments were designed and conducted, as discussed below.

Effects of Video System on State Information Acquisition and Precision

Binocular cues present depth information in a highly salient, relatively precise, and very robust manner, and so it is reasonable to expect that using SV rather than MV will also affect the acquisition and precision of (3) State Information, as well. This is no doubt a contributing factor to the reported advantages of SV systems described above. SV can be relied on to operate under a wide range of operating conditions and in virtually any environment, whereas monocular cues can prove insufficient in a great many situations (Merritt 1984). Furthermore, since the binocular depth cue of disparity takes less time to interpret than all other depth cues (Clapp 1987), it can be expected that operators will be able to acquire spatial information about the remote environment faster using SV than MV.

Stereoscopic Video (In)Dependence

[Author's Note: Since I wrote this I have tried to figure out a better descriptive term, and have come up with "Depth Cue Dependence" which I used in a recent paper, and "Stereo Cue Dependence", which I think is probably the best. So please read "Stereo Cue Dependence" whenever you see "SV Dependence".]

Merritt states: "Certain visual tasks are trivially easy with 3D vision, but extremely difficult with only 2D vision. In some cases, the monocular cues do not carry the required depth information; in other cases, the monocular cues normally present are degraded by poor visibility or image quality." (Merritt, 1984) Extending this idea, teleoperation tasks can be seen as existing in a spectrum between the two extremes of not requiring any depth information (i.e. SV-independent), and being impossible without stereoscopic displays (i.e. SV-dependent). (See Figure 3.1) Driving a telerobot along a clear path would fall towards the SV-independent end of the spectrum: sufficient information to accomplish the task could be readily acquired without SV. Although there may be an initial advantage to using SV, as discussed above, it is reasonable to expect that for most such tasks, this initial advantage will fade with experience. (e.g. Pepper et al, 1981, commenting on Pesch, 1967)

At the SV-dependent end of the spectrum exist tasks that predicate entirely on relative and absolute positions in depth, information that may only be available with SV displays, such as the precise placement task with an edge view that is shown in Figure 3.1. Depending on the exact location of the task on the spectrum, it may or may not be possible to learn how to use the monocular depth cues to accomplish the task. For tasks at the extreme SV-dependent end of the spectrum, it is not unreasonable to expect that the initial performance advantage of SV will actually increase with time.

Between these two extremes exist the bulk of teleoperator tasks. Since most telerobots in use today are equipped with MV system(s) (Meieran, 1988), the only tasks possible have been those near the SV-independent end of the spectrum. As SV becomes more commonly used, the variety of telerobotic tasks will increase dramatically.


Figure 3.1: SV Dependence Spectrum

Expected Benefits of SV for Telerobotics

The hypotheses discussed above can be summarised as follows:

Hypothesis 1: Operators will learn how to interpret SV displays faster than MV displays.

Hypothesis 2: Operators will be able to perceive the static and dynamic spatial relationships of the remote scene faster and more accurately using an SV display, and will therefore show a performance advantage for SV displays as a function of the SV-dependence of the task.

Hypothesis 3: The advantage of SV over MV will change as a function of experience; the more SV-dependent the task, the longer the SV advantage will last.

These hypotheses are particularly important for EOD operations, since actual operations are relatively rare in Canada, and considerable time and expense is devoted to maintaining the high skill levels of EOD experts. If it were to be shown that skill acquisition occurred at a faster rate when using SV, the practical implications are obvious.

3.1.1. Overview of the Experiments

The first experiment was designed to examine the question of whether operators learn to interpret SV and MV displays at different rates.

In order to isolate this particular type of learning, all other learning of the Human Information Requirements for Telerobotics listed above had to be controlled. The Knowledge and Rule based information was controlled automatically, since all subjects received exactly the same training. The Skill in Controlling the Telerobot was controlled by having the subjects train with the robot under direct viewing conditions, and by using an extremely simple task. The State Information of the Configuration of the Telerobot was controlled by presetting the robot into a fixed position. The Spatial Layout of the Remote Environment was simplified and presented to the subjects using direct viewing. The Location of the Telerobot within the Remote Environment was restricted to a very small range. By using this approach, it was hoped that any learning effects would be attributable to learning how to interpret the SV and MV displays.

The second experiment was designed to examine the question of whether the different display systems affected the learning of the Control of the Telerobot, and the role of different level of SV-dependence. Similar controls of the other learning factors were controlled using the techniques above, with the addition that the learning of how to Interpret the Display was controlled with training. Four different levels of SV-dependence were used.

If we consider the SV-dependence spectrum, the two experiments could be characterised as shown in Figure 3.2.


Figure 3.2: Position on the SV-Dependence Spectrum of Experiments 1 and 2.

4. Experiment One: Learning SV versus MV

4.1. Introduction

In order to examine the different rates at which operators learn to interpret MV and SV displays, a task was designed that would be very simple to perform and highly repeatable. In order not to unfairly bias the results, a real world environment, rich in monocular cues, was simulated (in contrast to an environment in which as many depth cues as possible have been removed, which is often the case in laboratory experiments) so that subjects using the MV system for feedback would not be unduly handicapped by the display. That is, the task was designed to be at the SV-independent end of the SV-dependence spectrum.

4.2. Method

4.2.1. The Task

The task was based on the use of the disruptor in EOD operations (see Section 1.1.3). The RMI was used as the telerobot, and was configured as in Figure 3.1. The operators were required to drive the RMI a distance of three metres and to just touch the target within a 1 cm wide vertical strip.

Figure 4.1: Driving Task for Experiment One
The pointing device used is shown in Figure 3.2, and consisted of a spring-loaded rod which could slide freely back and forth along the base. Wrapped around the front end of the rod was a "motion indicator", a stiff piece of paper, approximately 7 cm wide. As soon as the rod touched the target, the paper moved visibly. As the RMI continued to move forward, the rod slid further back. If it reached the setpoint, a buzzer automatically sounded.

The EOD disruptor task was explained to the subjects, and they were told that they had to position the pointer 7 cm from the target. The loop of paper was 7 cm from the front of the pointer, and so they should stop the RMI when they observe any movement of the paper.

The subjects were warned that if they drove too far forward, the "disruptor" (pointer) would touch the "bomb" (target), causing it to "explode" (i.e. the buzzer sounded, and the image on the viewscreen was temporarily disrupted).


Figure 4.2: Pointing Device for Experiment One
The subjects were also told that the "bomb" was equipped with a timer that would cause it to explode within 30 seconds of the start of each trial. This was to give them a sense of urgency. When a subject's task completion time exceeded 15 seconds, a random timer of up to 15 seconds was started; if the subject did not complete the task before the timer ran out, the buzzer was sounded on the completion of the run; the run was not aborted or interfered with in any other way, so as not to invalidate the data.

4.2.2. The Subjects

The nine subjects consisted of eight university students and one defence scientist. They were all volunteers, and the students were paid $5 per hour for their participation.

4.2.3. Experimental Design

We were interested in investigating the differences in performance and in learning trends for novices attributable to the video system used. Pilot studies revealed considerable individual variation, so a balanced within-groups design was employed, meaning that each subject would perform the task using both video conditions. In order to balance the need for enough repetitions of the experimental task to reveal the learning trends with the need to keep the number of repetitions small so as to minimise carry-over effects, an intermediate number of repetitions, 16, was selected.

Since the experience level of the subjects would be considerably different for their second set of trials than for their first, the order in which the subjects performed their trials needs to be considered another factor. This was a between-groups factor.

4.2.4. Experimental Procedure

At the start of the experiment, the subjects received a standard set of instructions and explanation as described above. This was followed by a short training procedure, where the subjects familiarised themselves with the RMI and its control interface, and practised the experimental task several times using direct viewing conditions, until they were able to complete the task successfully. This took an average approximately five direct-view trials. Note that this minimal amount of training was intended only to familiarise the subjects with the operation of the RMI, to ensure that they understood the experimental task and could successfully accomplish it. It was not intended to impart any great efficiency of operation or control skill.

When the training was complete, the subjects were placed in a darkened booth, with a single colour monitor for viewing and the control panel for the RMI. No effort was made to restrict viewing distances or head positions. Since we were interested in examining the learning of how to interpret the display, the subjects received no training using the remote view.

Each subject, considered a "novice" at this point of the experiment, then performed sixteen consecutive trials of the driving task, half the subjects using MV (the MF "mono first" Group), the other half using SV (the SF "stereo first" Group). The RMI was positioned at a fixed starting distance three metres from the target. The subject was given a three second countdown, and on the word "Go" would drive the RMI forward, aimed at the centre line of the target, until a small movement of the paper on the pointer indicated that the pointer had indeed touched the target. The subjects would indicate that they were satisfied with the position of the RMI by calling out "Stop". Trial times were recorded by the experimenter with a stop-watch.

When all sixteen runs under the first viewing condition (either MV or SV) were complete, the subjects had a five minute break, followed by an additional sixteen runs under the other viewing condition.

The entire procedure lasted approximately two hours.

4.2.5. Stereoscopic Camera Setting

The stereoscopic cameras were set approximately 10 cm apart, with their longitudinal axes converging on the tip of the pointer, approximately 80 cm away. This moderate stereoscopic baseline was used as a compromise between needs for high depth resolution and the desire not to tax the viewing abilities of SV-naive subjects.

4.2.6. Measures Recorded

Three measures were recorded for each trial: (1) the time to complete the trial, (2) the distance the pointer was compressed, and (3) any horizontal aiming error.

4.3. Experimental Hypotheses

During each set of runs, the subjects were expected to learn (i) how to perform the telerobotic driving task, and (ii) how to interpret the information on the display. Since all of the subjects in both the MF Group ("Mono First") and SF Group ("Stereo First") must learn both, any differences in performance between these groups are likely attributable to the video system being used.

Pilot studies conducted in preparation for this experiment indicated that several simple SV-independent tasks, similar to the one used in this experiment, could be performed as quickly using MV as with SV once the operators had sufficient experience. It was therefore expected that as experience increased in this experiment, the difference between the MV and SV performance would decrease.

In the following discussion we using two different approaches to analysing this experiment, the first based on the information being presented to the subjects, the second based on the effects that this presentation of information may have on the confidence level of the subjects. From these two approaches are developed two sets of hypotheses. The results of the experiment will then be compared with these hypotheses to provide some understanding of the various performance issues involved.

4.3.1. Effects of Information Presentation

Analysing this experiment with respect to the information being presented, the naive subjects beginning their first trial are faced with the problem of sorting the visual data on the display into information and noise. This will take processing time, and task performance should be relatively slow. With repetition, however, the subjects can learn how to ignore the extraneous cues, thus reducing their processing time, and reducing overall task task completion time. The MF Group using a MV display are presented only with monocular depth cues, and are therefore presented with less information than those using an SV display. The SF Group using the SV are presented with all of the monocular depth cues, plus the much more salient and easily perceived binocular depth cues. Hypothesis 1 states that learning to use the SV display should be easier than learning to use the MV display. We should therefore find that subjects using SV either (i) show a very quick improvement in performance at the start of the experiment compared with those subjects using MV, or (ii) perform significantly better from the start of the experiment that those subjects using MV, having a natural ability to understand the binocular depth cues of the SV display.

Once the interpretation of the display is mastered, any performance differences between those using MV and those using SV will depend on the quantity and quality of visual information, as well as on the time required to process that information. With this particular task being SV-independent by design, the richness of the monocular cues should be similar in quantity and quality to the combined monocular and binocular cues of the SV display. The binocular cues are unnecessary and to a certain extent redundant. Therefore little or no long-term performance advantage due to SV is expected.

All of the subjects can be expected to have learned how to perform the telerobotic driving task to some extent during the first part of the experiment, and so task execution times should generally be somewhat shorter in the second half of the experiment than the first. On the other hand, changing from one display system to another should involve a certain amount of "overhead", since the subjects must learn how to interpret the new display. The MF Group ("Mono First") of subjects, who switch from MV to SV, must learn how to extract the information they are already experienced with from the new display, and must decide how to integrate the new binocular depth cues. This could theoretically increase task completion time, although given the natural ability to interpret SV displays that has been suggested above, it is more likely that performance will not change significantly. The SF Group ("Stereo First") of subjects, who switch from SV to MV, must learn how to do without the binocular depth cues they have presumably been relying upon, and must learn how to accomplish the task using only monocular cues. Given the experience they have already have, however, this transition should not be very dramatic, and the task execution time of the SF Group using MV in the second set of trials should generally be shorter than MF Group using MV in the first set of trials.

4.3.2. Effects of Confidence

There are many factors affecting task performance in addition to the information being presented, such as the important psychological factors of motivation and confidence.

When the subjects begin the first trial, they are performing an unfamiliar task under unfamiliar circumstances. Under such conditions, it would be reasonable to expect the subjects to approach the task with some degree of caution, and thus perform the task relatively slowly. As they repeat the task, they learn how to interpret the visual cues and how to control the telerobot more accurately, and can therefore perform the task more quickly. Furthermore, it is reasonable to expect that they will be more confident in their abilities, and that this too could contribute to a reduction in trial time.

The literature shows that using SV rather than MV increases subject confidence in ability and performance, even when no performance benefit can be seen (Chavand et al 1986, Lippert et al 1982). This difference in confidence may be particularly relevant for the first few trials of the experiment.

When beginning the second set of trials, the subjects must adapt to using a new display system. The MF Group are given an SV display, with more information and a greater sense of reality (Chavand et al 1986), and will likely experience an increase of confidence in their ability to accomplish a somewhat familiar task. This could serve to decrease task execution time below what was obtained in the first set of trials. The SF subjects, on the other hand, are suddenly deprived of information and are faced with a loss of reality. This contrast could serve to shake their confidence, and their task execution times may increase correspondingly.

4.3.3. Summary of Experimental Hypotheses

Based on the above discussion, the expected results of this experiment can be summarised as follows:

Effects of Information Presentation:

1A. Subjects using MV will perform slower than subjects using SV initially, due to extra processing demands in interpreting monocular cues.

2A. Subjects using MV will show significant improvement in performance due to learning how to interpret the display. Subjects using SV will show either a rapid improvement in performance, or very little improvement in performance; that is, they will learn how to interpret the SV display quickly, or they will already know how to interpret it and will show very little learning.

3A. Subjects switching from MV in the first set to SV in the second set will show no change or a small increase in task execution time, due to the overhead associated with re-interpreting the display. Task completion times should then continue to decrease.

4A. Subjects switching from SV in the first set to MV in the second set should show an increase in task execution time, since they must learn how to interpret the MV display. The slower task execution should still be faster than the MV results in the first set, however, since the subjects already have some skill in remote teleoperation and in using video displays for feedback.

Effects of Confidence

1B. Subjects using SV should perform faster than subjects using MV initially, due to greater confidence in their ability to perceive spatial relationships in the remote world. (Similar to 1A.)

2B. (Same as 2A.)

3B. Subjects switching from MV in the first set to SV in the second set should show a decrease in task execution time, since little or no learning of the display is needed, and since the subjects will have increased confidence in their ability to perceive spatial relationships in the remote world. (Different from 3A.)

4B. Subjects switching from SV in the first set to MV in the second set should show a large increase in task execution time due to the need to learn how to interpret the MV display, and a decreased confidence in their ability to perceive spatial relationships in the remote world. (Similar to, but slightly different from 4A.)

4.4. Experimental Results and Discussion

Three measures were recorded: Task Execution Time (or "Trial Time"), Pointer Compression, and Horizontal Aiming Error. Only the Trial Time showed any significant differences between trials, video conditions, and experience conditions. The measure of Horizontal Error was consistently very small, and analysis of variance showed no significant factors. The measure of Pointer Compression showed no significant factors.

Figure 4.3 shows the mean trial times of each of the two groups of subjects for all 16 trials of both video conditions. The left half of the figure shows the results of subjects completely naive to telemanipulation. Although they have had some practice using the telerobot with direct view (from an "outside-in" perspective), they have not yet had any experience using the telerobot with remote viewing (an "inside-out" perspective). The right half of the figure shows the mean trial times of the same two groups, now experienced in one video condition, performing the same task using the other video condition. In other words, the five subjects who started with the SV continued using the MV system (grey hollow boxes), while the four subjects who started with MV continued using the SV system (black solid circles).


Figure 4.3: Results of Experiment One

Notice that the mean trial times for the first trial for both groups of subjects are approximately the same (18.5 seconds), but that those subjects using SV appear to show a considerable reduction in trial time by the third trial, while those subjects using MV do not show the same improvement until the fifth trial. This tends to support hypotheses 1 and 2 (A and B) as stated in section 4.3.3. This difference is significant at the 10% level (F(1,7)=3.884, p=.089).

An analysis of variance (anova) on the full factorial balanced design, using Subject as a blocking factor, Order of video system used (MF or SF) as a between-groups factor, Video system used (MV or SV) as a within-groups factor, and Trial Number ("Learn16") as a within-groups factor, is given in Table 4.1.

Table 4.1: ANOVA of Trial Time as a function of Set, Video, and Trial Number
FACTOR:    Subject      Order      Video    Learn16       Time 
LEVELS:          9          2          2         16        288
TYPE  :     RANDOM    BETWEEN     WITHIN     WITHIN       DATA

SOURCE          SS      df            MS            F       p
===============================================================
mean	60581.0078	 1    60581.0078      607.614	0.000 ***
S/O	  697.9219	 7	 99.7031
Order	  339.7891	 1	339.7891	3.408	0.107 
S/O	  697.9219	 7	 99.7031
Video	  477.9180	 1	477.9180       12.176	0.010 *
VS/O	  274.7500	 7	 39.2500
OV	  252.9258	 1	252.9258	6.444	0.039 *
VS/O	  274.7500	 7	 39.2500
Learn16	  528.1641	15	 35.2109	1.866	0.035 *
LS/O	 1981.0000     105	 18.8667
OL	  475.6172	15	 31.7078	1.681	0.066 
LS/O	 1981.0000     105	 18.8667
VL	  599.9102	15	 39.9940	1.908	0.030 *
VLS/O	 2201.1719     105	 20.9635
OVL	  248.8203	15	 16.5880	0.791	0.685 
VLS/O	 2201.1719     105	 20.9635
The marked difference between the two different sets of trials (as seen in
Figure 4.3) and the results of anova suggest it would be appropriate to divide the data into the two separate sets for further analysis.

An analysis of variance on the trial times for the first set, using subject as a blocking factor, and using Trial Number (Learn16) and Video system used as treatments, is presented in Table 4.2.

Table 4.2: ANOVA of Trial Times for the First Video Condition
FACTOR:    Subject   VidPart1    Learn16  TimePart1 
LEVELS:          9          2         16        144
TYPE  :     RANDOM    BETWEEN     WITHIN       DATA

SOURCE          SS      df           MS         F           p
===============================================================
mean	33672.2500	 1   33672.2500	  374.315	0.000 ***
S/V	  629.6992	 7	89.9570
VidPart	   12.8008	 1	12.8008	    0.142	0.717 
S/V	  629.6992	 7	89.9570
Learn16	  445.7500	15	29.7167	    1.095	0.370 
LS/V	 2848.8984     105	27.1324
VL	  458.6016	15	30.5734	    1.127	0.342 
LS/V	 2848.8984     105	27.1324
Considering the results of the first set of 16 trials in
Figure 4.3, where the subjects have a little previous experience in controlling the telerobot, but none in using the display system for telerobotic control, we find no significant difference in the overall mean trial times of the MV and the SV conditions, nor any evidence of significant learning effects. This is contrary to our expectations (1A and 1B above). One possible explanation is that the change between the "outside-in" training and the "inside-out" performance of the first set was much more significant than was expected, and resulted in a great deal of variability. A second possibility is that the subjects were being very cautious during their first set of trials, in order to avoid making any errors, since the importance of avoiding errors was stressed at the start of the experiment. If this were the case, then there need not be much difference in trial time at all. This issue will be discussed further in the next chapter.

Considering the results of the second set of trials, where the subjects are no longer totally naive with telemanipulation, but are naive with that particular display type, we find the following results:

Table 4.3: ANOVA of Trial Times for the second video condition
FACTOR:    Subject   VidPart2    Learn16  TimePart2 
LEVELS:          9          2         16        144
TYPE  :     RANDOM    BETWEEN     WITHIN       DATA

SOURCE          SS     df            MS         F           p
===============================================================
mean	27087.6738	1    27087.6738	 552.854	0.000 ***
S/V	  342.9727	7	48.9961
VidPart	  878.9160	1      878.9160	  17.938	0.004 **
S/V	  342.9727	7	48.9961
Learn16	  351.9961     15	23.4664	   1.848	0.037 *
LS/V	 1333.2754    105	12.6979
VL	  596.1660     15	39.7444	   3.130	0.000 ***
LS/V	 1333.2754    105	12.6979
Considering these results regarding the second set of trials in
Figure 4.3, we find that there is a significant difference in the overall mean trial time between the two video conditions, and that each video condition shows very different kinds of learning. There is considerably less variability, and the trends are much more obvious. The initial difference in performance and the different learning rates suggested in Hypotheses 1 and 2 (A and B) are apparent.

Considering the possible role that subject confidence might play, we see that the MF Group, who switched from MV to SV, shows a significant decrease in task completion time: the final trial of the first set was completed in a mean time of 15.25 seconds, while the first run of the second SV set was completed in a mean time of only 7.75 seconds; i.e. mean trial time was cut almost in half. This clearly supports expectation 3B and belies expectation 3A, lending credence to the argument of the important role which operator confidence plays in these results. Furthermore, the small upward trend in the SV results in the second set of trials may be due to the confidence level of the MF Group subjects, which is perhaps exaggerated as they switch to an SV display, returning to a more appropriate level.

(Examining the error data provides little insight into this issue, since the error rates were low, and no significant factors were found. On the other hand, this suggests that although the MF Group experienced an increase in confidence when switching from the MV to the SV, they were not over-confident, that is, they did not decrease their task completion times at the expense of increased errors.)

The SF Group, who switched from SV to MV, shows a large increase in task completion time: the final trial of the first SV set was completed in a mean time of 11.6 seconds, while the first trial of the second MV set was completed in a mean time of 23 seconds. The mean trial time was almost doubled, and was more than 5 seconds slower than the corresponding first MV trial in the first set (for the MF Group). (This difference in the first trial times of the two different MV sets is not significant, however (F(1,7)=1.337, p=.286).

Learning of two kinds was expected: (i) learning how to control the RMI, and (ii) learning how to use the display. The first kind of learning is a global type of learning, and can be expected to carry over from the first set of trials to the second set. That such learning did take place is suggested by the decreased variability of the results in the second set of trials. In light of this, the fact that for the first few trials the MV performance in the second set is at least as poor as the MV performance in the first set is contrary to Hypothesis 4A, but in agreement with Hypothesis 4B. This suggests that, in addition to having to learn how to use the MV display, the SF Group did indeed suffer from a loss of confidence in their abilities, which temporarily made their performance even worse than it should have been.

Note that this effect of confidence (if indeed real, as the data suggest) in this situation is small and transient, and is in addition to the need to learn how to use the MV display.

4.5. Conclusion

The results of this experiment are not inconsistent with Hypothesis 1 (A and B), that there is an initial performance benefit for those subjects using SV compared with those using MV. This expectation does not hold true for the first set, with subjects who are complete novices to remotely-viewed teleoperation, but it is supported by the results of the second set of trials, where subjects do have some experience at remotely-viewed teleoperation. The initial difficulties associated with learning to control the telerobot from an "inside-out" perspective could have overwhelmed any differences due to video system.

The results of this experiment support Hypothesis 2 (A and B), that subjects using SV will rapidly settle down to a steady-state performance, with or without an initial learning phase, and that subjects using MV will show a much more gradual improvement before reaching a steady-state performance. In the first set of trials, the SV results show a very quick improvement, while in the second, they appear close to a steady-state performance from the beginning. The MV results in both sets show a much more gradual approach to a steady-state performance.

The results provide support for Hypothesis 3B, which suggests that subjects switching from MV to SV will show a marked improvement in performance due to an increase in confidence, and belies Hypothesis 3A which suggests that performance should not change, since the added binocular cues are largely redundant and not necessary for the task at hand.

The results also provide support for Hypothesis 4B, which says that subjects switching from SV to MV will show a large increase in task execution time due both to a need to re-interpret the display and decreased confidence in their perception of the remote world. The results are somewhat in contradiction to Hypothesis 4A, which suggested that task execution times in the second set of trials should be shorter than the task execution times in the first set of trials due to a learning of the factors involved in controlling the telerobot.

In general, then, this experiment confirmed the hypothesis that SV can be used with little or no training, while considerably more training is necessary to use MV displays. The experiment also demonstrated the important role subject confidence can play in performance.


5. Experiment Two: Task Learning & Performance

5.1. Introduction

The goal of the second experiment was to elucidate the results of the first, to clarify the various learning factors to a greater degree, and to examine the question of what practical benefit SV can be expected to provide in a real-world situation.

In particular, the second experiment was designed to examine the issue of skill acquisition within the context of a highly repeatable task. Furthermore, in order to see if the benefits of SV were dependent on the difficulty of the task, the highly repeatable task was designed so as to have well-calibrated different difficulty levels.

5.2. Method

5.2.1. The Task

The task was designed with the following characteristics in mind: (1) it must be elementary with respect to control operation, to reduce training time and learning effects to a minimum; (2) it must be a task related to some aspect of EOD teleoperation, with a realistic requirement for depth perception; and (3) the difficulty of the task must be adjustable.

In order to fulfil the desired characteristics, the task was based on the procedure for using the X-Ray Unit for EOD (see section 1.1.3), using a Fitts' Law approach (Wickens, 1984) to control and calibrate the level of difficulty.

The task was to drive the RMI a distance of 3 metres forward, and to lower the mock "X-Ray plate" between two "bombs", set a particular distance apart. The X-Ray plate was simulated by using the pointer from Experiment 1, but hanging suspended from the end of the RMI's forearm, so that it could swing freely. The two "bombs" were flat black briefcases.

The operators began each condition with the forearm of the RMI pointed upward, so that the target was not visible on the monitor screen. The operators had to lower the arm until the target was visible, drive forward until the hanging pointer was between the two briefcases, and finally lower the forearm until the buzzer on the pointer sounded, indicating the end of the trial. (See Figure 5.1)

The separation of the suitcase for the training session was 24 cm. Employing a Fitt's Law paradigm, the separation between the suitcases was varied to control the difficulty of the task. The separations used for the experiment were 8 cm, 16 cm, 32 cm, and 64 cm. By the log2() relationship between the separation and the Fitts' Law Index of Difficulty, each separation is one log unit more difficult than the next larger one, and trial completion time should be a linear function of the Index of Difficulty.

The subjects were told that the "bombs" were touch sensitive, so that touching either suitcase would be counted as an error. This was done because pilot studies revealed that errors such as accidently touching either suitcase would very often require the operator to make complex recovery actions. More importantly, the design of the control panel for the RMI was ergonomically poor: the toggle switch controlling the movement of the forearm of the RMI was upside down with respect to stereotypes of control-response compatibility of most of the subjects. When most subjects in the pilot studies and in this experiment made an error such as lowering the arm onto one of the suitcases rather than between them, thus sounding the buzzer prematurely, their quick, instinctive response was to pull the toggle switch towards themselves, in the hopes of raising the forearm of the robot. Unfortunately, this control action causes the RMI's forearm to lower even further, resulting in a disruption of the experimental apparatus. Because of the massive interference in task execution times caused by errors, they had to be considered separately from the successful runs. Therefore, every time a subject made an error, an additional run was added so that the total number of successful error-free runs was constant for all subjects.

Note that this poorly designed interface did not cause significantly more errors; rather, it interfered with recovery from other errors.


Figure 5.1: Fitts' Law Task for Experiment Two

5.2.2. The Subjects

The subjects were eight university students, three female, five male. All were volunteers, and were paid $5 per hour for their participation. Four subject had some previous experience with SV and telerobotics through participation in Experiment One four months previously, but all subjects were trained to a particular level of performance well beyond that studied in Experiment One, so this should not significantly affect the results.

5.2.3. Experimental Design

Design Goals

This experiment consisted of two parts. Part A was designed to look at the acquisition of highly task-specific skills for a very repetitive situation, as a function of experience (trial number), the video system used, and the difficulty of the task. Part B was designed to look at the differences in performance of tasks in a non-repetitive situation, as a function of the video system used and of the difficulty of the task.

In Part A, the factors being examined were (1) task learning, by having the subjects repeat the same task 16 times consecutively; (2) video system, either SV or MV; and (3) task difficulty and SV-dependence, with the briefcases separated by one of four distances (8 cm, 16 cm, 32 cm, 64 cm). A full-factorial design was used, so that each subject performed 16 * 2 * 4 error-free trials, plus an unknown (at the design stage of the experiment) number of trials with errors. This meant that subjects with a high error rate performed more runs in total than those subjects with low error rates. Although this may influence the results somewhat, since some subjects therefore had more experience and practice than the others, it was felt that the additional experience from making errors would not contribute a great deal to the performance of the task, since errors were so disruptive.

In Part B, the factors being examined were (1) video system, either SV or MV; and (2) task difficulty and SV-dependence, with one of the four separations. Each of the 2 * 4 conditions was repeated 8 times, so each subject performed 2 * 4 * 8 error-free trials in Part B of the experiment. Again, those subjects with high error rates performed more trials than those with low error rates.

It took between two and three hours to complete both Part A and B using a single video condition, so each subject participated in the experiment on two separate non-consecutive days, performing both Part A and then Part B on each day, using the same video condition (MV or SV).

Order of Presentation

An additional factor inherent in the design was that of Order of Presentation. For a balanced full factorial design, the subjects were divided into two groups, the MF ("Mono First") Group and the SF ("Stereo First") Group. The MF Group used MV on the first day of the experiment, and SV on the second day. The SF Group used SV on the first day and MV on the second. Regrettably, there were no control groups who used the same video condition on both days of the experiment. The results of Experiment One indicate that there may be considerable transfer effects to be considered and examined, so the results of this experiment must be examined to determine what role these transfer effects play.

The Role of Experience

A very critical factor that must be controlled in order to be able to understand the results of the experiment is that of experience. It is reasonable to expect that, in general, task performance should improve with experience.

At the beginning of the experiment, each subject receives a certain amount of training in order to ensure all start with approximately the same ability. Throughout the course of Part A of the experiment, the subjects are practicing the task, and are most likely considerably more skilled for Part B of the experiment. On the second day of the experiment, they are presumably even more skilled.

By using a balanced design, it was expected that the grand effect of experience throughout the experiment could be average out. Within each group this is a reasonable expectation. However, given the results of Experiment 1, it is necessary to consider the two different groups of subjects as distinct, and their results should not be pooled without first establishing whether or not transfer effects are relevant.

Final Design

Considering the role of experience and transfer effects, Part A of Experiment Two had the following factors:

* Order (2 levels: MF ("Mono First") and SF): between groups

* Video (2 levels: MV ("Mono Video") and SV): within groups

* Difficulty (4 levels): within groups

* Learning (16 levels): within groups

Part B of Experiment Two had the following factors:

* Order (2 levels: MF and SF): between groups

* Video (2 levels: MV and SV): within groups

* Difficulty (4 levels): within groups

* Replication (8 repetitions in a randomised order)

5.2.4. Experimental Procedure

All subjects participated in the experiment on two non-consecutive days. On the first day they performed the entire training and set of trials using one video system, and on the second day they repeated the training and the set of trials using the other video system. Four of the subjects (MF Group) began with the MV system, the other four (SF Group) with the SV system.

Training

At the start of the experiment, each subject received a standard set of instructions, describing the experiment and their task. It was emphasized that speed was important, but that every error meant they would have to perform another trial and thus should be avoided.

This was followed by a familiarisation period with the controls of the RMI. The subjects were given an opportunity to drive the RMI around the laboratory for several minutes, until they were able to manoeuvre the telerobot comfortably. The RMI was then placed in the standard starting position, and the briefcases were set to the training separation of 24 cm. Using direct view, the experimental task was demonstrated to the subjects by the experimenter. Still using direct view, the subjects were then made to practice the experimental task, with coaching on technique by the experimenter, until they were able to perform four consecutive trials in under six seconds.

Each trial was timed by the experimenter using a stopwatch, and began with a 3 second countdown. On the word "Go", the subjects would begin lowering the forearm of the RMI, and driving the robot towards the briefcases. As they approached the briefcases, the subjects would slow down, and adjust the height of the forearm to avoid accidently touching them. When satisfied that the pointer was hanging between the two suitcases, the subjects then lowered the robot arm until a buzzer sounded. The pointer was spring loaded, and would sound the buzzer when sufficiently compressed. Suspended between the two suitcases, out of view of the subjects, was a flat board for the pointer to touch. Without this board, the arc described by the lowering of the robot arm would cause the pointer to accidently touch a suitcase when the 8 cm separation was used, unless continual adjustment of the position of the RMI was made, which would result in an error. In order to avoid this problem, the arc angle was reduced sufficiently to minimise the unwanted displacement of the pointer.

When the subjects were able to perform four consecutive error-free trials in under six seconds using direct view, they repeated the above familiarisation period and training procedure using the remote view with the training separation of 24 cm, using the same training criteria: four consecutive error-free trials with trial times less than six seconds.

Experiment Two, Part A: Consecutive Presentation

When the training was complete, the suitcases were set to the appropriate separation based on the latin square layout. The subjects were not made aware of the separation.

Using a three second countdown to mark the start of the trial, the subjects would drive the RMI forward and lower the pointer, hopefully between the briefcases, until the buzzer sounded. The subjects repeated the task until a total of 16 successful trials for each of the four separations were complete. They had a short break of approximately one minute between sets of trials.

When the subjects completed all four separations, that is, 64 successful trials, they would be given a longer break, of approximately five minutes, to rest their eyes and their hands. A few subjects complained of tension and eyestrain, and they were given longer breaks.

Experiment Two, Part B: Random Presentation

After their break the subjects would begin Part B of the experiment, which was to perform exactly the same task, but with the separation changing every trial. The order of the separations was generally randomly before the experimental session, and was adjusted by the experimenter during the course of the experiment to allow trials with errors to be redone without having many instances of the same separation consecutively. The subjects continued until they performed eight error-free trials for each separation.

Fitts' Law

According to Fitts' Law (Wickens, 1984), the time to move one's hand from a starting position to a target of a known size at a known distance can be expressed as a function of the distance and width as follows. Fitts defines a measure known as the Index of Difficulty (ID), measures in bits, where:

Index of Difficulty = ID = log2 BBC(F( 2 * distance ,width)),

where distance means the distance from the starting point to the middle of the target, and width refers to the width of the target.

Fitts said that the Movement Time (MT) is a linear function of the Index of Difficulty:

Movement Time = MT = a + b * ID

Much research has been done to improve and extend Fitts' work, but his original formulation has proven robust and applicable to a wide variety of tasks. To that end, the separation of the targets is translated into Index of Difficulty bits:

Table 5.1: Relationship between Target Width and Index of Difficulty
      Width      Distance         Index of            
                                Difficulty (ID)     
       8 cm         3 m           6.2 bits            
      16 cm         3 m           5.2 bits            
      32 cm         3 m           4.2 bits            
      64 cm         3 m           3.2 bits            

5.2.5. Stereoscopic Camera Settings

The cameras for this experiment were set approximately 10 cm apart, with their longitudinal axes converging at the tip of the hanging pointer, approximately 95 cm away. This setting permitted good depth resolution without causing undue eyestrain.

5.2.6. Measures Recorded

The two measures recorded were the trial execution times, and the trial successes.

It is important to stress that in this experiment both the trial execution time and the number of faulty trials are relevant measures of performance. This is a task subject to a speed accuracy trade-off. In order to avoid making the subjects very cautious, the importance of speed was emphasized, while on the other hand the subjects were cautioned about the "dangers" of making errors (not the least of which was to have to repeat the trial). Because it is impossible to know exactly where on the speed-accuracy trade-off curve the subjects are at any given moment, it is important to examine both sets of results.

5.3. Experimental Hypotheses

This experimental task was designed to examine a broad cross-section of the SV-Dependence spectrum, as discussed in Chapter Three. The easier conditions are both very close to the SV-independent end, since there is almost no requirement for accurate depth perception. The most difficult condition is nearing the SV-dependent end of the spectrum, although the task was by no means impossible with MV.

A previous study investigating the effects of degraded image quality on a telerobotic task using MV and SV found that, for the easiest condition, those subjects using MV showed considerable learning, while those using SV displayed no clear learning effect. (Pepper et al, 1981) The authors argue that since the task was designed to have very strong monocular cues (i.e. be near the SV-independent end of the spectrum), it is not unexpected that SV shows no improvement; the main improvement for the MV condition is likely through acquiring skill in using the MV display, skill they already had for SV displays. (Pepper et al, 1981)

Regarding the important role that confidence played in the first experiment, it was felt that by performing the two experimental video conditions on two separate days would diminish this effect, and that any remaining confidence effects, seen to be fairly transient and not very large in Experiment One, would likely be "washed out" during the second day's training period.

And finally, since in Part A of the experiment the subjects would know what the target separation was for all trials save the first due to the consecutive presentation, it was felt that performance would be somewhat better than for Part B of the experiment, where the subjects would have to figure out what the target separation time was for every trial.

Based on these ideas and those explored in previous chapters, the hypotheses posed for this experiment were:

Experiment Two Hypotheses

1. Subjects using SV will show an initial performance advantage over those using MV.

2. The performance difference between MV and SV will decrease as the subjects become more experienced, more so for the low difficulty task conditions than the high difficulty task conditions.

3. Performance will be better during the repetitive trials of Part A than the random trials of Part B.

5.4. Observations and Discussion

5.4.1. Training

All subjects were trained to the same time-based level of performance using both direct view and remote view. Based on Hypothesis 1 above, it was expected that those subjects using SV on their first day would finish their training faster. Figure 5.2 shows the number of trials needed to pass the training criterion of four consecutive error-free trials in less than six seconds.

An analysis of variance of these data is shown in Table 5.1. The three-way interaction of Order, Day and Video is significant at the 10% level (F(1,6)=4.402, p=.081). As we can see from Figure 5.2, the only difference between the two groups of subjects occurs on the first day, when the subjects are first learning how to use the telerobotic device remotely. Here we see that those subjects using SV take an average of 16 trials to pass the training criteria, while those using MV take an average of 28. This is clear support for Hypothesis 1, and suggests that SV can be of great aid for novices to telerobotics.


Figure 5.2: Experiment Two: Training

Table 5.2: Analysis of variance of the number of trials needed to complete the training procedure.
FACTOR:    Subject      Order        Day       View        Num 
LEVELS:          8          2          2          2         32
TYPE  :     RANDOM    BETWEEN     WITHIN     WITHIN       DATA

SOURCE         SS      df            MS             F       p
===============================================================
mean	7657.0313	1     7657.0313        84.813	0.000 ***
S/O	 541.6875	6	90.2813
Order	  52.5313	1	52.5313	        0.582   0.474 
S/O	 541.6875	6	90.2813
Day	 935.2813	1      935.2813		6.044	0.049 *
DS/O	 928.4375	6      154.7396
OD	  16.5313	1	16.5313	        0.107	0.755 
DS/O	 928.4375	6      154.7396
View	  52.5313	1	52.5313	        0.734	0.424 
VS/O	 429.1875	6	71.5313
OV	  94.5313	1	94.5313		1.322	0.294 
VS/O	 429.1875	6	71.5313
DV	   0.7813	1	 0.7813		0.017	0.901 
DVS/O	 279.4375	6	46.5729
ODV	 205.0313	1      205.0313		4.402	0.081 
DVS/O	 279.4375	6	46.5729

5.4.2. Experiment Two Part A: Consecutive Presentation

The average trial times of the two groups of subjects are presented for each day of the experiment, divided into four categories of difficulty (i.e. target size), and shown for each of the 16 consecutive trials, in Figure 5.3 and Figure 5.4.

Figure 5.3: Average Trial Times of Experiment Two,
Part A (Consecutive Presentation), Day 1


Figure 5.4: Average Trial Times of Experiment Two
Part A (Consecutive Presentation), Day 2
An analysis of variance of these results, considering the factors of Trial Number, Video, and Separation, is summarised in Table 5.3.
Table 5.3: ANOVA of Experiment 2 Part A Trial Times, as a function of Order (MV First or SV First), Video System (MV and SV), Difficulty (8, 16, 32, 64 cm), and Trial Number (16 trials per condition).
FACTOR:  Subject    Order    Video      Sep    Trial     Time 
LEVELS:        8        2        2        4       16     1024
TYPE  :   RANDOM   ETWEEN   WITHIN   WITHIN   WITHIN     DATA

SOURCE           SS     df           MS             F       p
===============================================================
mean	 22391.0937	 1   22391.0937	      623.746	0.000 ***
S/O	   215.3867	 6	35.8978
Order	    79.3242	 1	79.3242		2.210	0.188 
S/O	   215.3867	 6	35.8978
Video	   195.6973	 1     195.6973	       31.930	0.001 **
VS/O	    36.7734	 6	 6.1289
OV	   140.9453	 1     140.9453	       22.997	0.003 **
VS/O	    36.7734	 6 	 6.1289
Sep	  1035.9531	 3     345.3177	       22.580	0.000 ***
SS/O	   275.2695	18	15.2928
OS	     3.8711	 3	 1.2904		0.084	0.968 
SS/O	   275.2695	18	15.2928
VS	    46.0098	 3	15.3366		5.640	0.007 **
VSS/O	    48.9473	18	 2.7193
OVS	     1.0469	 3	 0.3490		0.128	0.942 
VSS/O	    48.9473	18	 2.7193
Trial	    47.6445	15	 3.1763		2.038	0.021 *
TS/O	   140.2969	90	 1.5589
OT	    22.5586	15	 1.5039		0.965	0.498 
TS/O	   140.2969	90	 1.5589
VT	    17.4219	15	 1.1615		0.954	0.509 
VTS/O	   109.5293	90	 1.2170
OVT	    22.4160	15	 1.4944		1.228	0.266 
VTS/O	   109.5293	90	 1.2170
ST	    54.6797	45	 1.2151		1.107	0.306 
STS/O	   296.2344    270	 1.0972
OST	    36.6465	45	 0.8144		0.742	0.886 
STS/O	   296.2344    270	 1.0972
VST	    48.6582	45	 1.0813		0.920	0.620 
VSTS/O	   317.2793    270	 1.1751
OVST	    58.1074	45	 1.2913		1.099	0.318 
VSTS/O	   317.2793    270	 1.1751
Considering for a moment just the Trial Times of Experiment Two Part A, we find that on the first day of the experiment (
Figure 5.3), those subjects using SV performed consistently better than those using MV, at all difficulty levels (target sizes). This again confirms Hypothesis 1. On the second day of the experiment ( Figure 5.4) there is no apparent difference between MV and SV except at the most difficult level.

In order to observe any trends, these "noisy" data are grouped into four sets of four trials, as with Experiment 1, and presented below in Figures 5.5 and 5.7. Furthermore, since subjects are able to trade speed for accuracy, the corresponding error rates are shown in Figures 5.6 and 5.8.


Figure 5.5: Experiment Two Part A Day 1 Trends

Figure 5.6: Experiment Two Part A Day 1: Number of Trials with Errors Made while completing Four Error-Free Trials, versus Trial Number and Index of Difficulty
Examining first the results of Day 1 of the experiment, we find that there is considerable support for Hypothesis 1, that subjects using SV have a considerable performance advantage over those using MV at first.

Looking at the easiest condition, Index of Difficulty of 3.2, in Figures 5.5 and 5.6, we see that the trial times for SV are considerably shorter than those for MV. Furthermore, there is a clear downward trend in the MV times. The error rates for both MV and SV are relatively low and approximately equal for both MV and SV. The advantage of SV is decreasing throughout the set of 16 trials.

At ID = 4.2, the next harder condition, we see that the trial times for both MV and SV are slower than for the previous condition, as expected. MV shows a decreasing trial time throughout the 16 trials, with a consistent error rate. SV shows a slightly decreasing trial time, but has an increasing error rate, suggesting that the subjects are exploring the speed-accuracy trade-off more than exhibiting signs of learning. Again, the advantage of SV appears to decrease throughout the set of 16 trials.

At the Index of Difficulty level of 5.2, there is some indication of learning with MV (the second group of 4 trials have a decreased error rate while trial time remains the same; the third and fourth groups show fewer errors still and slower trial times which might suggest a simple speed-accuracy trade-off, or might indicate continued improvement in performance), while those using SV show no particular learning trend (decreasing error rates are matched by slower trial times, suggesting a speed-accuracy trade-off). The advantage of SV does not appear to decrease throughout this set of 16 trials, unlike the previous two conditions.

At the Index of Difficulty level of 6.2, both MV and SV show a strong learning trend in the error. The MV time drops for the second grouping of four trials, and then rises for the two groups, while the error rate continues to drop, suggesting some exploration of the speed-accuracy trade-off. SV shows a small decrease in trial time with a large drop in error rate at first, followed by constant times and an increasing error rate. This could be an indication of fatigue, or simply a statistical artifact. Again, the SV advantage does not appear to decrease throughout this set of 16 trials.

In summary, then, we find that for the easier conditions there is considerable improvement in performance (as measured by both time and error rate) for the MV condition, with very little change in the SV condition. At the higher levels of difficulty the learning trends are less obvious for both MV and SV, and there appears to be more active exploration of the speed-accuracy trade-off.

Furthermore, the advantage of SV decreases as the subjects become more experienced, more so for the easier conditions. This is consistent with hypothesis 2.

We now consider the results of the second day of the experiment. The subjects are very much more experienced with the task on this day, albeit with the other type of video.


Figure 5.7: Experiment Two Part A Day 2 Trends

Figure 5.8: Experiment Two Part A Day 2: Number of Trials with Errors Made while completing Four Error-Free Trials, versus Trial Number and Index of Difficulty
These results are considerably different from those of day 1. A cursory examination of the trial times shows no particular advantage of SV except for the most difficult condition. A glance at the error rates, however, shows that there is indeed a difference in performance between MV and SV.

At the Index of Difficulty level of 3.2, there is little difference in the times, but those using SV had a lower error rate for the first few trials. Those using MV showed some learning in both time and error rate, so that there was very little difference in performance by the end of the set of 16 trials.

At the Index of Difficulty level of 4.2, those using MV showed little change. Those using SV appear to get worse briefly, then slightly better. The small SV advantage at the beginning vanishes by the end of the 16 trials.

At the Index of Difficulty level of 5.2, those using MV curiously perform considerably better during the first set of four trials than the rest, getting considerably worse and then a little better. This suggests that the initial set of trials were unusually good, a statistical anomaly. Those using SV get consistently better, trading off time for errors a little. Although the performance of MV is better than SV at first, this situation is quickly reversed to the expected situation with SV performance being consistently better than MV. As on day 1, the SV advantage does not appear to decrease with experience.

At the Index of Difficulty (ID) level of 6.2, those using MV show consistent learning, predominantly in error rates. Those using SV performed better at first than in the rest of the set of trials, similar to the MV performance at ID = 5.2. Here the SV advantage does decrease, but does not vanish, with experience.

Analyses of Variance

The following tables present the analyses of variance corresponding to the graphs above.
Table 5.4: Analysis of Variance of Trial Times for Experiment Two Part A Day 1, grouping the 16 trials into four sets of four ("Learn4") to elucidate any trends.
FACTOR:    Subject      Video Difficulty     Learn4  Day1Times 
LEVELS:          8          2          4          4        128
TYPE  :     RANDOM    BETWEEN     WITHIN     WITHIN       DATA

SOURCE           SS    df           MS            F         p
===============================================================
mean	  3260.7100	1    3260.7100	    442.966	0.000 ***
S/V	    44.1665	6	7.3611
Video	    65.5120	1      65.5120        8.900	0.025 *
S/V	    44.1665	6	7.3611
Diff	   123.6892	3      41.2297	     13.903	0.000 ***
SS/V	    53.3777    18	2.9654
VD	     3.3704	3	1.1235	      0.379	0.769 
SS/V	    53.3777    18	2.9654
Learn4	     7.9934	3	2.6645	      3.746	0.030 *
LS/V	    12.8037    18	0.7113
VL	     2.2695	3	0.7565	      1.064	0.389 
LS/V	    12.8037    18	0.7113
DL	     4.4180	9	0.4909	      0.927	0.510 
SLS/V	    28.5981    54	0.5296
VDL	     3.0137	9	0.3349	      0.632	0.764 
SLS/V	    28.5981    54	0.5296

Table 5.5: Analysis of Variance of Trial Times for Experiment Two Part A Day 2, grouping the 16 trials into four sets of four ("Learn4") to elucidate any trends.
FACTOR:    Subject      Video Difficulty     Learn4  Day2Times
LEVELS:          8          2          4          4        128
TYPE  :     RANDOM    BETWEEN     WITHIN     WITHIN       DATA

SOURCE         SS       df          MS        F         p
===============================================================
mean	2372.3132	 1   2372.3132	754.210	    0.000 ***
S/V	  18.8726	 6	3.1454
Video	   3.2312	 1	3.2312	  1.027	    0.350 
S/V	  18.8726	 6	3.1454
Sep	 135.5498	 3     45.1833	 29.387	    0.000 ***
SS/V	  27.6753	18	1.5375
VS	   9.1101	 3	3.0367	  1.975	    0.154 
SS/V	  27.6753	18	1.5375
Learn4	   2.7700	 3	0.9233	  2.103	    0.136 
LS/V	   7.9026	18	0.4390
VL	   1.5083	 3	0.5028	  1.145	    0.358 
LS/V	   7.9026	18	0.4390
SL	   2.0579	 9	0.2287	  1.022	    0.435 
SLS/V	  12.0869	54	0.2238
VSL	   1.8606	 9	0.2067	  0.924	    0.512 
SLS/V	  12.0869	54	0.2238
These tables confirm the statistical significance of our observations. There is indeed a consistent benefit from SV on Day 1 at all difficulty levels being seen in reduced trial times, but there is no indication of a similar benefit on Day 2 in the trial times.
Table 5.6: Analysis of Variance for Errors for Experiment 2 Part A Day 1, grouping the 16 trials into fours sets of four ("Learn4") to elucidate any trends.
FACTOR:    Subject      Video Difficulty     Learn4   #ErrDay1 
LEVELS:          8          2          4          4        128
TYPE  :     RANDOM    BETWEEN     WITHIN     WITHIN       DATA

SOURCE        SS        df           MS          F          p
===============================================================
mean	634.5703	 1     634.5703	    44.313	0.001 ***
S/V	 85.9219	 6	14.3203
Video	  2.8203	 1	 2.8203	     0.197	0.673 
S/V	 85.9219	 6	14.3203
Difficu	432.5234	 3     144.1745	    18.685	0.000 ***
DS/V	138.8906	18	 7.7161
VD	 25.5234	 3	 8.5078	     1.103	0.374 
DS/V	138.8906	18	 7.7161
Learn4	 40.3984	 3	13.4661	     3.493	0.037 *
LS/V	 69.3906	18	 3.8550
VL	  3.1484	 3	 1.0495	     0.272	0.845 
LS/V	 69.3906	18	 3.8550
DL	 54.3828	 9	 6.0425	     1.721	0.107 
DLS/V	189.5469	54	 3.5101
VDL	 17.8828	 9	 1.9870	     0.566	0.819 
DLS/V	189.5469	54	 3.5101

Table 5.7: Analysis of Variance for Errors for Experiment 2 Part A Day 2, grouping the 16 trials into fours sets of four ("Learn4") to elucidate any trends.
FACTOR:    Subject      Video Difficulty     Learn4   #ErrDay2 
LEVELS:          8          2          4          4        128
TYPE  :     RANDOM    BETWEEN     WITHIN     WITHIN       DATA

SOURCE        SS       df           MS       F      p
===============================================================
mean	381.5703	1     381.5703	96.971	0.000 ***
S/V	 23.6094	6	3.9349
Video	 13.1328	1      13.1328	 3.338	0.117 
S/V	 23.6094	6	3.9349
Difficu	227.8984	3      75.9661	13.330	0.000 ***
DS/V	102.5781	18	5.6988
VD	 19.9609	3	6.6536	 1.168	0.350 
DS/V	102.5781	18	5.6988
Learn4	 18.3359	3	6.1120	 2.806	0.069 
LS/V	 39.2031	18	2.1780
VL	 13.8984	3	4.6328	 2.127	0.132 
LS/V	 39.2031	18	2.1780
DL	 24.8203	9	2.7578	 0.847	0.577 
DLS/V	175.8594	54	3.2567
VDL	 94.1328	9      10.4592	 3.212	0.003 **
DLS/V	175.8594	54	3.2567
As expected from the figures, there is no consistent effect due to video, but there is a consistent difference due to the task difficulty: the harder the task, the more errors are made. There is also a significant trend, or "learning", reported for Day 1. For Day 2, the same effect is significant at the 10% level. Given the strange behaviour of MV at the ID=5.2 level and SV at the ID=6.2 level, this lowering of significance is not surprising.

In general, then, the results of Part A of the experiment strongly support Hypotheses 1 and 2.

5.4.3. Experiment Two Part B: Random Presentation

Part B of Experiment Two, where the separation between the two briefcases was varied randomly with each trial, was designed to examine a more steady-state behaviour than that in Part A, with subjects who were well-skilled at the task, but had not necessarily memorised the particular motor movements necessary for that particular task. Although 8 trials for each separation and video combination were performed, no learning trend was expected (or subsequently found) in the data.

Figure 5.9 shows the results of the trials times of Part B of the experiment as a function of the Index of Difficulty. Figure 5.10 shows the error rate results for the same, where the error rate is defined as being the number of trials with errors completed in the course of completing the 8 successful trials.


Figure 5.9: Results of Experiment Two Part B
(Random Presentation)


Figure 5.10: Experiment Two Part B Number of Trials with Error made while completing Eight Error-Free Trials
If we first examine the results of the trial times of the first day of the experiment, we find the very unexpected results that the SV advantage for trial times actually decreases as the difficulty of the task increases. However, if we examine the corresponding error rates, we see that the difference between MV and SV error rates is significantly different at the most difficult task level. Considering that performance is a combination of both trial time and error rate, we see that there is indeed a consistent benefit due to SV. It appears that the subjects considered speed to be far more important than accuracy. At that level of difficulty, the low speed could only be obtained at the expense of a high error rate.

Considering the second day of the experiment, we find that performance for both MV and SV is much improved, thanks to the large amount of experience received on the first day. There is no significant difference between the MV and SV conditions with regards to trial times, although those using MV appear to be somewhat faster, at the expense of higher error rates. The only significant difference between the MV and SV performance is at the highest level of difficulty, and is found in the difference in error rate. The SV advantage decreases with experience, though less so at the higher difficulty levels.

An analysis of variance on these results gives the statistics listed in Tables 5.8 and 5.9. Again these analyses confirm the observations made from the graphs above.

In order to consider the differences in performance between Part A of the experiment and Part B of the experiment, it would be useful to combine the trial time and error data into a single graph. Furthermore, since the results of Part A exhibit changes within each set of 16 trials, it is important to consider only the "steady-state" results. For the purpose of this comparison, the last 8 trials of each set are being used.

Figures 5.12 and 5.13 show percent errors on the vertical axes, and trial time on the horizontal axes. This way the performance of each particular experimental condition is represented by a single point. For example, Figure 5.12 shows the performance of the subjects on both days of the experiment. The performance, consisting of both trial time and percent errors, gets worse as it moves further from the origin.

Table 5.8: ANOVA of Experiment 2 Part B Trial Times
as a function of Order (MV First or SV First), Video System (MV and SV), and Separation (8, 16, 32, 64 cm), with 8 replications.
FACTOR:    Subject      Order      Video        Sep      BTime 
LEVELS:          8          2          2          4         64
TYPE  :     RANDOM    BETWEEN     WITHIN     WITHIN       DATA

SOURCE         SS       df          MS        F      p
===============================================================
mean	1461.4976	 1   1461.4976	291.827	  0.000 ***
S/O	  30.0486	 6	5.0081
Order	   2.4093	 1	2.4093	  0.481	  0.514 
S/O	  30.0486	 6	5.0081
Video	  27.4589	 1     27.4589	 30.033	  0.002 **
VS/O	   5.4857	 6	0.9143
OV	   0.2848	 1	0.2848	  0.311	  0.597 
VS/O	   5.4857	 6	0.9143
Sep	  28.0372	 3	9.3457	 28.084	  0.000 ***
SS/O	   5.9901	18	0.3328
OS	   0.7675	 3	0.2558	  0.769	  0.526 
SS/O	   5.9901	18	0.3328
VS	   0.3687	 3	0.1229	  1.252	  0.321 
VSS/O	   1.7673	18	0.0982
OVS	   0.8909	 3	0.2970	  3.024	  0.057 
VSS/O	   1.7673	18	0.0982

Table 5.9: ANOVA of Experiment 2 Part B Error Rates
as a function of Order (MV First or SV First), Video System (MV and SV), and Difficulty ("Sep"), with 8 replications.
FACTOR:    Subject      Order      Video        Sep      %ErrB 
LEVELS:          8          2          2          4         64
TYPE  :     RANDOM    BETWEEN     WITHIN     WITHIN       DATA

SOURCE           SS     df             MS           F       p
===============================================================
mean	202500.0000	 1    202500.0000      47.578	0.000 ***
S/O	 25537.1093	 6	4256.1851
Order	   244.1406	 1	 244.1406	0.057	0.819 
S/O	 25537.1093	 6	4256.1851
Video	 20664.0625	 1     20664.0625	7.960	0.030 *
VS/O	 15576.1718	 6	2596.0286
OV	  3525.3906	 1	3525.3906	1.358	0.288 
VS/O	 15576.1718	 6	2596.0286
Sep	227441.4370	 3     75813.8125      23.147	0.000 ***
SS/O	 58955.0625	18	3275.2813
OS	   322.2188	 3	 107.4063	0.033	0.992 
SS/O	 58955.0625	18	3275.2813
VS	 32871.0625	 3     10957.0205	4.420	0.017 *
VSS/O	 44619.1250	18	2478.8403
OVS	  1806.6875	 3	 602.2292	0.243	0.865 
VSS/O	 44619.1250	18	2478.8403
The hollow circles indicate the MV performance on the first day of the experiment. The hollow boxes represent the SV performance. For the three easy conditions, we see that the difference in performance is predominantly in trial time, with MV trial times being longer than SV trial times. For the difficult condition (ID=6.2), however, the large performance difference is almost entirely in the percent errors. On day 2 of the experiment, we find that the performance difference for the three easier conditions is much smaller than on the previous day, and consistently better. The difference at the most difficult level again is seen almost entirely in the error rate.

This is the same as we saw above. If we now compare these results to those shown in Figure 5.13, and look at the first day of the experiment, we find that the performance in Part A Runs 9-16 is slightly better than that in Part B except for the MV performance at the ID = 5.2 level, which is considerably worse for Part A than Part B.

For the second day of the experiment, we find that performance for the two easiest conditions is very similar for both parts of the experiment. Performance at the ID = 5.2 level shows a similar result to the first day, where MV is worse for Part A, but SV is very similar on both days. Finally, at the ID = 6.2 level, the SV is again very similar in performance, but Part A has a marked advantage.


Figure 5.12: Expt 2 Part B Performance Graphs, where # Errors refers to the number of errors made while completing the 8 successful runs of Experiment 2 Part B.


Figure 5.13: Expt 2 Part A Runs 9-16 Performance Graphs, where # Errors refers to the number of errors made while completing the last 8 successful runs of Experiment 2 Part A.
Hypothesis 3 was that performance would be better for Part A of the experiment, with the consecutive trials, than for Part B, with the randomly ordered trials. The results of the first day are in agreement with this hypothesis for the most part, the difference between the two presentation types being greater for the higher difficulty trials. The results of the second day, however, do not support this hypothesis for the easier conditions. It seems likely that the accumulated experience over the course of the experiment rendered the subjects relatively expert at the task, so that by the time they performed the second part of the experiment on the second day, they were so expert at the easier trials that they did not show a performance difference. This suggestion is supported by the fact that performance for the two easy conditions was approaching a physical limit based on the speed of the robot: the mean trial time was on the order of 3 to 4 seconds. Several measurements indicated that the shortest time possible was approximately 2.75 seconds. There was little room for improvement, no matter how much easier the task might be.

The anomalous behaviour at the ID=5.2 level for Part A of the experiment might possibly be due to an indecision observed in several subjects regarding how to treat the condition. Observation by the author of the subjects' behaviour during the course of the experiment suggested that many could not decide whether the 16 cm separation was easy, and should therefore be approached quickly, or difficult, and should therefore be approached slowly. As a result of trying to accomplish the task quickly, their error rate rose considerably. When approaching the same condition in Part B of the experiment, however, the subjects seemed much less uncertain about their approach, and used a fairly conservative, and much more successful, technique. Unfortunately, there is no way to verify these speculations.

5.5. Conclusion

For the sake of the reader, the Hypotheses for Experiment Two are restated below:

Experiment Two Hypotheses

1. Subjects using SV will show an initial performance advantage over those using MV.

2. The performance difference between MV and SV will decrease as the subjects become more experienced, more so for the low difficulty task conditions than the high difficulty task conditions.

3. Performance will be better during the repetitive trials of Part A than the random trials of Part B.

The results of this experiment gave clear supporting evidence for the first two of these hypotheses. The results of the first day of trials supported Hypothesis 3 fairly consistently. However, the easy conditions of the experiment showed little difference on the second day of the experiment. This could be due to the fact that the performance of the subjects in the easy condition for both Part A and Part B was approaching the limit of the RMI, and so differences in difficulty were insignificant. At the highest level of difficulty, however, performance was consistently better during the repetitive trials of Part A than the randomly presented trials of Part B.


6. Conclusion

In the work described herein, various theoretical and practical aspects of using SV rather than MV for teleoperation were considered. It was hypothesised that SV is easier to learn than MV, and that the benefits in performance due to SV are a function of how dependent the task is on binocular depth cues, i.e. where the task lies on the SV-dependence spectrum. For situations where binocular depth cues are relatively unimportant, it was suggested that the benefits of SV would be temporary, and would last only as long as it would take operators to learn how use the monocular depth cues of a MV display. Once the MV display has been mastered, no particular benefit to SV displays will be noted for SV-independent tasks. For situations where the binocular depth cues are important, it was suggested that the benefits of SV would be longer lasting.

The first experiment conducted examined the first issue. Using a task that had very little demand for binocular depth cues (i.e. was SV-independent), it was found that there was a short-lived benefit in performance for SV that quickly vanished as the operators learned how to use the monocular cues of the MV display. Furthermore, the first experiment provided evidence to suggest that SV can be used effectively with little or no training, while MV requires a period of adjustment and learning.

The first experiment also revealed an interesting transient effect that changing from one video condition to another can have on performance. Those who change from an SV to a MV display show a temporary but dramatic drop in performance, while those who change from a MV to an SV display show a large improvement in performance. The results of the experiment and the literature suggest that the differing appearances of "reality" of the two displays may affect the confidence of the operators in their abilities to perform the task, and so therefore affect their performance.

The second experiment examined the second issue, that of how the transience of the benefits of SV are a function of the difficulty of the task and the dependence on binocular depth cues. It showed that the benefits of SV, even after a great deal of practice, will still be apparent for difficult tasks, long after the benefits have faded for easier tasks.

The implications for telerobotics and EOD are obvious. Given the nature of the most telerobotics applications and all EOD tasks, operators have only a very few chances to accomplish the task correctly. The performance benefits of SV, even though they fade with practice for highly repeatable tasks, should be very strongly evident in these single-attempt situations. Furthermore, given that operators can learn to use an SV display much more quickly than a MV display, operators should require less initial training and less constant practice in order to maintain their skills at a suitable level.


7. References

Aries Arditi, "Binocular Vision", Chapter 23 of Handbook of Human Perception and Performance, edited by Kenneth R Boff, Lloyd Kaufman, James P Thomas; John Wiley & Sons, New York, 1986

John Baker, "Generating images for a time-multiplexed stereoscopc computer graphics system", SPIE Vol 761 True 3D Imaging Techniques and Display Technologies, 44-52, 1987

K R Boff, and J E Lincoln, Engineering Data Compendium: Human Perception and Performance, AAMRL, Wright-Patterson AFB, Ohio, 1988

James F Butterfield "Autostereoscopy delivers what holography promised", SPIE Vol 199 Advances in Display Technology, 42-46, 1979

F Chavand, E Colle, JP Gaillard, A Mallem, JP Stomboni "Visual assistance to the operator in teleoperation and supervision situations", Proc Int Symp Teleoperation and Control, 237-248, July 1988

Robert E Clapp, "Stereoscopic Displays and the human dual visual system", SPIE Vol 624 Advances in Display Technology VI, 41-52, 1986

Robert E Clapp, "Stereoscopic Perception", SPIE Vol 761 True 3D Imaging Techniques and Display Technologies, 79-87, 1987

David Drascic, "Skill Acquisition and Task Performance in Teleoperation Using Monoscopic and Stereoscopic Video Remote Viewing", Human Factors Society 35th Annual Meeting, 1991a

David Drascic, Paul Milgram "Positioning Accuracy of a virtual stereographic pointer in a real stereoscopic video world", SPIE Vol 1457: Stereoscopic Displays and Applications II, 1991b

A A Dumbreck, C W Smith, S P Murphy "The Development & Evaluation of a Stereoscpoic Television System for Use in the Nuclear Industry", Int'l Workshop on Nuclear Robotic Technologies and Applications, University of Lancaster, June/July 1987

Joel Fajans "Three-dimensional display", SPIE Vol 199 Advances in Display Technology, 23-28, 1979

S S Fisher, M McGreevy, J Humphries, W Robinett "Virutal Environment Display System", ACM 1986 Workshop on Interactive 3D Graphics, Chapel Hill, North Carolina, 1986

Allan H Frey, "An evaluation of holograms in training and as job performance aids", SPIE Vol 615 Practical Holography, 57-63, 1986

Julius J. Grodski, Paul Milgram, David Drascic "Real and virutal world stereoscopic displays for teleoperation", NATO Defence Research Group Seminar: Robotics in the Battlefield, 6-8 March 1991

John H Harshbarger "Structure of the interlaced television raster", SPIE Vol 457 Advances in Display Technology IV, 80-84, 1984

Stephen J Hart, Michael N Dalton "Display holography for medical tomography", SPIE Vol 1212 Practical Holography IV, 116-135, 1990

Edwin R Jones Jr., A Porter McLaurin, LeConte Cathey "VISIDEP (TM): visual image depth enhancement by parallax induction", SPIE Vol 457 Advances in Display Technology IV, 16-19, 1984

Won S Kim, Munehisa Takeda, L W Stark "On-the-screen visual enhancements for a telerobotic vision system", Proceedings IEEE Systems, Man, and Cybernetics Conference, 126-130, 1988

Won S Kim, F Tendick, L W Stark "Visual Enhancements in Pick-and-Place Tasks: Human Operatosr Controlling a Simulated Cylindrical Manipulator", IEEE Journal of Robotics and Automation, v RA-3, no 5, pp 418-425, 1987

Bruce Lane "Stereoscopic displays", SPIE Vol 367 Processing and Display of Three-Dimensional Data, 20-32, 1982

Thomas M Lippert, David L Post, Robert J Beaton "A study of direct distance estimations to familiar objects in real-space, two-dimensional, and stereographic displays" , Proceedings of the Human Factors Society 26th Annual Meeting, 324-328, 1982

Lenny Lipton "Factors affecting `ghosting' in time-multiplexed plano-stereoscopic CRT display systems", SPIE Vol 761 True 3D Imaging Techniques and Display Technologies, 75-78, 1987

Lenny Lipton, Lhary Meyer "A Flicker-Free Field-Sequential Stereoscopic Video System", SMPTE Journal, v 93, n 11, 1047-1051, 1984

Colin Macilwain "Remote control robots seen through 3D spectacles" , THE ENGINEER, 35, 8 June 1989

Douglas E McGovern, "Current developments needs in the control of teleoperated vehicles", SANDIA National Laboratories Report SAND87-0646 UC-15, Albuquerque, New Mexico, August 1987a

Douglas E McGovern, "Experiences in Teleoperation of Land Vehicles", SANDIA National Laboratories Report SAND87-1908 UC-15, Albuquerque, New Mexico, October 1987b

H B Meieran "Robotics and Teleoperator-Controlled Devices", Health Physics, v 55 n 2, 215-222, 1988

John O Merrit "Visual-motor realism in 3D teleoperator display systems" SPIE Vol 761 True 3D Imaging Techniques and Display Technologies, 88-93, 1987

John O Merritt, "Visual tasks requiring 3-D stereoscopic displays", SPIE Vol 462 Optics in Entertainment II, 56-59, 1984

John O Merritt, "Often-overlooked advantages of 3-D displays", SPIE Vol 902 Three-Dimensional Imaging and Remote Sensing Imaging, 46-47, 1988

H B Meieran "Robotics and teleoperator-controlled devices", Health Physics, v 55. n 2, 215-222, Aug 1988

Paul Milgram, David Drascic, Julius Grodski "Enhancement of 3-D video displays by means of superimposed stereo-graphics", Human Factors Society 35th Annual Meeting, 1991

Paul Milgram, David Drascic, Julius Grodski "A Virtual Stereographic Pointer for a Real Three Dimensional World", Interact `90, Third IFIP Conference on Human-Computer Interaction, Cambridge, UK, August 1990

Paul Milgram, David Drascic, Julius Grodski "Stereoscopic Video + Superimposed Computer Stereographics: Applications in Teleoperation", Proc. Second Canadian Workshop on Military Robotic Applications, Kingston, Ontario, Aug 1989.

Paul Milgram, R van der Horst "Alternating-field stereoscopic displays using light-scattering liquid crystal spectacles", Displays: Technology & Applications, v 7, n 2, 67-72, April 1986

Dwight P Miller, "Evaluation of vision systems for teleoperated land vehicles", IEEE Control Systems Magazine, p37-41, June 1988

Donald A Normam, The Psychology of Everyday Things, Basic Books Inc, New York, 1988

Ross L Pepper, "Human Factors in Remote Vehicle Control", 30th Annual Meeting of the Human Factors Society, Dayton, Ohio, Sep-Oct 1986

Ross L Pepper, J D Hightower "Research Issues in Teleoperator Systems", Proceedings of the Human Factors Society 28th Annual Meetings, 803-807, 1984

Ross L Pepper, David C Smith, Robert E Cole "Stereo TV improves operator performance under degraded visibility conditions", Optical Engineering, v 20, n 4, 579-585, July/Aug 1981

J Rasmussen, Human Information Processing, 1986

M Robinson "Remote control vehicle guidance using stereoscopic displays", Proceedings of the HFS 28th Annual Meeting, 809, 1984

D C Smith, R E Cole, J O Merritt, R L Pepper "Remote Operator Performance Comparing Monoa nd Stereo TV Displays: the Effects of Visibility, Learning, and Task Factors", Naval Ocean Systems Center Technical Report No. 380, Feb 1979

Edward H Spain, A Psychophysical Investigation of the Perception of Depth with Stereoscopic Television Displays, PhD Dissertation, University of Hawaii, May 1984

Christopher D Wickens, "Three-dimensional stereoscpoic display implementation: Guidelines derived from human visual capabilities", SPIE Vol 1256 Stereoscopic Displays and Applications, 2-10, 1990

Christopher D Wickens, Engineering Psychology and Human Performance, Charles E Merrill Publishing Company, Toronto, 1984

Rodney Don Williams, Felix Garcia Jr. "A Real-Time Autostereoscopic Multiplanar 3D Display System", Proceedings of the Society for Information Display, Anaheim, California, 1988