This term paper was written for the course WR 327 , Technical Report Writing, and uses the IEEE authoring style.
II. COMPUTER VISION BASICS
III. TYPES OF APPLICATIONS
IV. LIMITATIONS OF COMPUTER VISION
V. TRENDS IN RESEARCH
The present and future use of computer vision is examined. A brief overview describes computer vision basics. Areas of application in industry are detailed. Current limits of vision systems and implementation problems are examined. Current research in imaging methods, hardware, and software point to future trends.
Computer vision is the technology of viewing objects with a video camera and processing the images with a computer to determine the nature of the objects. The basic processes involved are image capturing, image processing, feature extraction, and pattern recognition. Essential parts of any vision system are a video camera, a video frame grabber, a computer, and software algorithms to process the information.
Computer vision is used in industry to manufacture items precisely, detect defects, guide robots, and to read characters. Other uses are mail sorting and missile and vehicle guidance. Some potential applications of computer vision are infeasible because of speed, reliability, and implementation limitations.
Research in many fields will directly and indirectly effect the technological development of computer vision. Parallel processing and faster microprocessors will increase speed. Developments in stereo imaging and color imaging will potentially enhance its versatility. Programming development environments are being devised to make the implementation of computer vision systems easier.
An electrical engineering student at Oregon State University, I began by first learning for myself the fundamental concepts behind computer vision. This was done from books about computer vision and periodicals where computer vision is discussed. Next, I went to professors who could point me the way in my research. I gathered relevant information, and went back to these professors with the questions I had. A major source of information was Ken Tubbs from Intelledex located here in Corvallis, Oregon. I gathered considerable knowledge on how computer vision works, where it is used, and what its problems are. Also, what is being tried to solve those problems and advance the technology forward.
First comes a description of the basic parts of computer vision. Next, the types of applications are explained. Then, we look at computer vision's limitations and why it cannot always do the job. Lastly, we look at trends in future research that should help push computer vision forward and overcome some of the present limitations.
A. Capturing the Image
Under the right conditions, a video camera scans in an image of the object of interest and converts its image into a digital format.
For good imaging, positioning and lighting must be properly controlled. The object must be in the location the camera expects to view it, either stationary or moving at a slow enough speed so as not blur the image. The object's orientation should be such that it emphasizes its most important visual features when this is possible. Also, the lighting conditions should be properly controlled to provide sufficient contrast, reduce glare, and emphasize the important features.
Most cameras used in industry provide an analog* measurement of the light intensity for each pixel* in a square pixel matrix*. These analog measurements are converted into discrete levels through A/D conversion* to provide the so called gray scale* image. The device that usually does this is called a video frame grabber. An example is the frame grabber used by Intelledex in their vision systems . It uses a 512 X 512 pixel matrix with an 8-bit A/D converter, giving 256 different gray scale levels.
The imaging capability of the camera depends on the lens attached to it. Install a camera on an astronomical telescope and you will be analyzing the features of stars and galaxies. Hook a camera up through an electron microscope and you can capture the minute features of microorganisms. Of course, resolution is limited to the number of pixels used in the camera and the frame grabber.
B. Image Processing
An image in raw form is usually difficult for a computer to deal with. It likely contains noise* and probably has too much information (some of it being redundant or insignificant). For a sequence of objects, the angle, position, lighting conditions, and distance of the objects may vary. Furthermore, if the vision system is trying to analyze certain features, we would like to make those features stand out.
Noise represents a seemingly random distribution of pixels with gray scale values that are widely different from what we would expect them to be; a very dark pixel there, a couple of very light pixels over there. If the image has a general structure to it, the effect of these few odd pixels can be eliminated. In this process, called smoothing, if a pixel's gray scale level differs markedly from those around it, its value is changed to some average value computed from the values of its neighbors.
Often we only need a limited amount of information from an image. For example, to represent the alphabetic character 'T', we only need to recognize two line segments intersecting at right angles. If in the scanned image each line segment is three pixels thick, we can thin these segments to a thickness of only one pixel while still retaining the important information about this character.
Commonly, if the object of interest is of either a dark or light shade, it is placed on a background of the opposite shade for maximum contrast while imaging. The resultant image will have a grouping of gray scale levels that represent the background shade and a grouping of gray scale levels that represent the object's shade. When we only need to distinguish the object from the background, binarization can be applied. In binarization, a threshold level in the gray scale levels is determined that best separates the background's shade from the objects's shade. Say the object is much lighter than the background. Any pixels whose gray scale level is greater than the threshold level will be assigned a binary 1 and be considered part of the object. Any pixels with gray scale levels below the threshold level will be assigned a binary 0 and be considered part of the background. In this way, the amount of information is considerably reduced, from 8 bits per pixel (representing 256 gray scale levels) to 1 bit per pixel (determining if the pixel is part of the object or the background).
Normalization* is used when position, orientation, lighting conditions, and distance vary from one object to the next. Suppose the objects of interest are a quantity of squares. It is much easier for image characterization and pattern recognition* for all of the squares viewed to have identical images. Imagine the ideal square image with the square centered in the screen with a certain size, orientation of its corners, and lighting conditions which we specify. Now, for any image that has the potential of being a square, we adjust it to most closely resemble the ideal square image. If it is too small, it is magnified. If it is off to one side and turned at an angle, the image is shifted and rotated to provide the correct orientation and placement of its corners. Or, if the potential square has a different amount of contrast with its background than does the ideal square, its gray scale is shifted so that that matches as well. In this manner, candidate square images are more easily compared with the ideal square image to judge their worthiness as a square.
C. Feature Extraction
The human sense of sight is remarkable in its ability to characterize an image by easily extracting its important features. Texture, color, size, shape, orientation, and distance are analyzed almost automatically. Determining these features is a complicated task for computers, requiring lots of processing time.
A fundamental feature in distinguishing objects is shape. In the two-dimensional world viewed with today's computer vision technology, shape is determined by an objects flat surface outline. This outline consists of a closed chain of the object's edges. In the process of edge detection, pixel gray scale levels are examined. Where the gray scale levels dramatically change as you go from one set of pixels to their near neighbors, it represents a change from object to background (owing to their sharply differing shades of light or darkness) and thus is an edge. Hopefully the many edges connect together to form a complete closed loop outline.
Once the object's outline is determined, it is an easier task to determine such features as the perimeter, area, and maximum width. The perimeter logically consists of the sum of the length of the individual line segments of the outline. The total number of pixels contained in the outline is the area. The maximum width is simply the farthest distance between any to points on the outline. A calculation of not too much complexity gives the center of mass of the object.
To determine more complicated features, transformations* are applied and their results analyzed. For example, in deciding whether a set of points form a straight line, the Hough Transformation is used . The formula for a straight line segment is y = mx + c. This is rearranged giving, m = y/x -c/x. The pixels of interest are transformed with this equation so that points in xy-space are mapped to mc-space. The resulting transformation gives the characteristic of the connecting line segment if and when their transformed values intersect in mc-space. Other features besides the existence of straight line segments can be determined through similar transformations.
D. Pattern Recognition
Pattern recognition is the final step in computer vision. We determine to the best of our ability what it is we are looking at. The two basic approaches used are the statistical approach and the deterministic approach.
1) Statistical Approach: This uses generalized decision theory to decide the classification of an object, where the classification classes are determined ahead of time from a set of representative samples for each class. Each object class has a statistical distribution of various features such as area, darkness, and the number of straight line segments. These features, assembled together from the representative samples, form a distribution in feature-space. Given the features from a scanned object, generalized decision theory assigns the object to the appropriate class.
It is important that a sufficient number of truly representative samples be used to define an object class. Say the class is eggs. Most eggs encountered are chicken eggs. So in your representative sample include several chicken eggs, but also include a duck egg and an ostrich egg so as not to exclude them from this class.
In generalized decision theory, a decision to classify an object based on certain features has an associated cost. For example, suppose you were in a bomb disposal squad and you came across a bomb where all you knew about it was that the bomb was made of plastic. Say that because the bomb was plastic it had a 50% chance of being a live bomb and a 50% chance of being a dud. If the bomb was either a live one or a dud and you made the correct choice, initiating the correct disposal technique, there would be zero associated cost. If the bomb was a dud and you chose it to be a live one and treated it accordingly, there would be a cost because you wasted your time. However, if the bomb was live and you chose it to be a dud and consequently got yourself killed, there would be a much greater associated cost. Lesson: if all you knew was that the bomb was plastic, consider it to be live. Generalized decision theory balances the probabilities of making likely choices with the consequences of making wrong choices.
2) Deterministic approach: Here an object either is or is not a member of a particular object class. All of the classes exist in feature space so that they do not overlap. If an object has features so that it falls within the particular boundaries of a class, it is a member of that class. There is entirely no possibility that it is a member of another class. This is the preferred method of classification if the distribution of object features for different classes permits this.
Sometimes an object may seem not to fit into any class. The object may then be rejected by the machine and (depending on the application) require some human intervention.
In the Midwest, where most of this country's manufacture of large machines takes place, computer vision is used most to measure machine parts and assemblies of various sizes to make sure they fall in to the required specifications . Using lenses appropriate to the application, tolerances of inches, millimeters, and micrometers can be measured. If the part does not meet the specified dimensions, it is rejected.
B. Robot Guidance
Robots guided by machine vision range from industrial manufacturing robots, to Tomahawk missiles. Machines are used to adjust the alignment of parts in preparation for various manufacturing processes. Also, research is being done on navigating automobiles under the control of computer vision. One can imagine in the future real life C3PO's wondering the corridors guided only by their computer eyes.
Manufacturing robots typically have a video camera attached to a part of their extension arms. A robot welder or driller is steered to the precise location where it is to perform its action. Often in alignment tasks, the target is marked with a very visible dot, contrasting strongly with its background to make the computer vision task simpler. Computer vision guides pick n' place machines in the placing of components on PCB* boards by following the geometric edges of the empty slots.
Researchers at the University of Bristol and other places are experimenting with computer vision for the navigation of vehicles along roads and through obstacle-filled terrain . In the Bristol Study, the robotic vehicle followed the edges of a road to traverse a path on the university grounds. Such systems must presently travel at slow speeds to be sure of their course. If the eventual goal is for a vehicle to automatically drive itself along a highway, much more development must be done, as the many visual details used to good effect by human drivers are far beyond the imaging and processing abilities of today's computer vision systems.
A notable example of the use of computer vision in robotic guidance is the Tomahawk missile, featured prominently in the recent Gulf War. It compares the terrain image seen by it nose-mounted camera with transformed satellite map data and steers the missile to provide the best match between the two. Ken Tubbs of Intelledex says he wishes he knew the details of how the military accomplishes that, but is sure they follow the same basic principles that Intelledex does.
Quality control engineers apply computer vision to sort the good from the bad on an assembly line. Many, many applications can be thought of. A few are solder joint inspection, seed sorting, and detection of errors in packaging.
Solder joints on printed circuit boards electrically wire components together and secure them in place. Given a master guide showing what solder joints should be present, comparing it with an image of a board under a camera detects flaws such as a missing solder joint or the wrong quantity of solder. Several images can be taken with the light source coming from various angles. This way, the not-really flat solder joints reflect the light off their various surfaces, giving a better estimation of their volumes than from just one image .
In another use, a certain seed variety was harvested at the National Forage Seed Production Research Center. In order to have a very pure seed sample, seeds from other plant varieties that grew among the desired one had to be sorted out. A vision system accomplished this by classifying the different seed types by their size and shape. After classification, they were sorted by mechanical means to one side or the other .
A masters student in the Industrial Engineering Department completed a thesis where a vision system detected flaws in consumer packaging . The product was inserted in the open end of a plastic pouch and the pouch was then sealed. Sometimes a part of the packaging plastic extended suspiciously up through the seal, causing consumers to regard it as defective. The vision system he designed aligned the pouch with the seal side facing north and rejected all samples having significant material extending up beyond the seal.
D. Optical Character Recognition
Optical character recognition recognizes alphanumeric and other types of symbols carrying important information on an item. Thus, inventory control and routing of items can be handled by a machine. Applications include mail sorting, semiconductor testing, and controlling pharmaceutical packaging inventories.
The post office uses computer vision to sort your mail. Unlike in industrial applications where three standard character fonts are used, mail addresses, written by hand or by machine, come in an almost infinite variety of fonts. Because of this challenge in deciphering all of these writing styles, the post office wants everyone to use the same format of all capital letters and putting the ZIP code at the end of the last address line. Usually the ZIP code can be read because there are only ten possible numbers that are fairly easy to distinguish between. Without the use of computer vision, the post office would be far, far slower.
In semiconductor manufacturing, hundreds of tiny computer chips or grown side by side on a typical eight-inch silicon wafer. Each computer chip must be tested to see if it works properly. Formerly, any defective chip was marked by an ink spot that made it completely unable to be checked further. Now, using optical character recognition, each tiny chip is identified by a serial number engraved in one corner. Information about each chip is then stored by serial number in a computer's memory. When a chip tests defective, its serial number is merely flagged inside the computer. After testing of the entire wafer is completed, one can go back and further test the defective chips to analyze the causes of the defects (postmortem testing) .
For safety reasons, the Food and Drug Administration now requires that all packaged medicine be engraved with the date and lot number of its manufacture and that these symbols should be machine readable. With production rates putting out 1200 bottles per minute on a production line, this is a challenging task for today's computer visiontechnology. Systems are being implemented to meet thischallenge using the latest technology .
A. Ease of Implementation
For about $1800 you can acquire the bare-bone components of a computer vision system: camera, $200; PC image capture board, $600; Personal Computer, $1000. Theoretically, anyone with programming ability and an understanding of how computer vision works should be able to construct a working system. Although some do just that, the task is usually much more difficult and time-consuming than they first imagined. For an additional $1000 you can buy a low-budget software library, but the software functions still require a good deal of understanding of how the vision concepts work together.
Intelledex and other vendors provide integrated vision systems that make implementation easier. Intelledex's software includes some two hundred and seventy C Language functions for vision processing. Most systems in industrial use are managed by industrial engineers, whose experience with the sometimes mind-warping C Programming Language is not great. So even with these powerful software tools available, practical implementation is not easy. Industry wide standards would be a good thing to reduce confusion and allow simplifications to evolve.
Another issue is the general lack of knowledge about vision systems; how they work, their limits and abilities. It is hoped that papers like this one will make people more aware of computer vision and what it can be used for. In today's marketplace, system integrators serve as go-betweens between vendors and industrial customers, applying their detailed knowledge of computer vision to the practical task at hand. They act as carriers of information .
Due to poor imaging conditions, faulty programming, or just the nature of the problem, results can be less than adequate for the problem to be solved. Inadequate contrast causes the information gathered to be insufficient to discern the nature of the object. In applications with rapid movement or unpredictable conditions, the system cannot cope. In most situations, the human eye is much more adaptive to various environmental conditions. Also, the computer programing may not always be adequately coordinated with the task at hand. And sometimes the nature of the task is such that even a low percentage of errors cannot be afforded.
Every step in computer vision takes time. The commonly used RS170 standard video cameras, based on the television frame rate of 30 or 60 images per second, acquire an image in about 32 or 64 ms. The processing of the image then takes 30 to 50 ms. The more image data used in calculations, the longer the image processing takes. Processing speed is limited by the speed of the computer hardware and the efficiency of the calculation algorithms*.
The hardware speed depends on the type of the processing arrangement and the absolute speed of the microprocessor. A bare-bones PC-based system is limited strongly by the communication speed of its RS232 ports through which the frame grabber is connected. An integrated hardware package like the Intelledex TENSOR system puts the frame grabber in the same physical box as the microprocessor, eliminating the RS232 communications bottleneck. At same time, the TENSOR system uses custom hardware that performs several of the image analysis steps must faster than what can be done through software on a standard PC. All systems are limited by the speed of the microprocessor. Today's standard 80386 chip runs at a maximum of 33 MHz.
Another large determiner of speed are the vision algorithms used; algorithms for preprocessing,
feature extraction, and pattern analysis. Usually, for any function, there is the straight-forward method and then
there is the fast method. The most proprietary information at Intelledex is their very fast and efficient algorithms.
Computer programmers with strong math backgrounds devote considerable time and effort
to develop the best possible algorithms .
A. Parallel Processing
All vision systems in commercial use operate around a sequential instruction microprocessor. That means, the computer executes one instruction at a time, one after the other. Parallel processing uses separate blocks, coordinated together, each executing an instruction simultaneously. The result is typically a must faster processing of the information. Parallel processing is put to good use in some computers today, like the large systems manufactured by Sequent Computers of Beaverton, Oregon. There are some hurdles to be overcome in using parallel processing in computer vision.
Image processing is such, that many many parallel processors could be used, so that a processor would be assigned to take care of the information of a single pixel or a small group of pixels. Such a structure using so many processors is called a massively parallel architecture. For computer vision, the overall speed increase would be tremendous. One difficulty is in the communication of so much information between different processors, but that is solvable.
The big hurdle in using parallel processing is the lack of software to make it possible. In many applications, not only computer vision, the cost of rewriting software to run on the parallel architectures seems to outweigh the potential price/performance gains from the new hardware. At the moment, it seems up to the universities to make the software developments that will make parallel machines truly useable. That could be a long time because the universities do not have an inherit financial stake in the outcome , , .
B. Faster Hardware
The semiconductor industry is continuously developing faster and faster microprocessors. Intel's 80486 chip, as it matures, will leap past its old 80386 chip in performance. Each generation of microchip compresses more and more functions into a smaller amount of space. The next generation, expected in a few years, should increase the number of transistor per chip from 1 million to 2 million, and increase the processor speed from 25-30 MHz to 50-60 MHz . All computer applications will benefit from this, including computer vision.
C. Stereo Imaging
Whereas humans see depth with their two eyes, computer vision does not. When two cameras are used in computer vision, the problem is correspondence; how do you match the features viewed with one camera with the same features as viewed through another camera. When an object is looked at from two angles, some mechanism must determine how the images match up. Much research is being done on this and some interesting theories are being proposed, but any practical implementations seem a long way off , .
D. Color Imaging
In the related fields of medical image analysis and satellite image analysis, color is used extensively. For practical reasons, color has not been used in computer vision systems. Humans are the analyzers in satellite and medical imagery, and the human eye handles color very well. With computer vision, information about the color of the object viewed, in addition to the light intensity information, would overwhelm the processing power available with the present technology. Also, none of the algorithms are designed to incorporate color information, so the changeover would require lots of work .
E. Neural Networks
All the rage now with artificial intelligence* buffs are neural networks. Neural networks emulate the functioning of biological brains by the use of simple interconnected nodes analogous to brain synapses. As they are trained for the correct responses to inputs, they learn by adjusting these interconnections. The hope is that they would serve as the ideal pattern recognition devices. The word is still out on their effectiveness, though. While small scale models seem to perform well, their effectiveness decreases as they are scaled up in size. Problems remain with training and speed , .
F. Graphical Programming Environments
Today's computer vision software is not easy to work with. There are numerous details to remember and it is hard to relate the program code to what one would like to see happen. Like other industries, the trend is to make the user software more friendly and intuitive.
The most logical approach is toward a graphical programming environment where the users
manipulate shapes and images they are familiar with instead of complex character code. The problem is that designing
such a programming environment is a very formidable programming task. It is hard to make a graphical system capable
of handling most every situation that would be encountered. But if the problem domain is sufficiently restricted,
the task becomes manageable. Intelledex and other companies are working on such graphical programming environments
Computer vision is becoming an important tool in manufacturing for the purposes of measurement, robot guidance, item identification, and optical character recognition. Its use extends to other areas as well, such as vehicle guidance and mail sorting. The potential exists for many more applications that also can take advantage of computer vision abilities.
Computer vision does have its limitations. It is not always fast enough nor reliable enough to handle the desired application. Its implementation is not always easy. Current research in faster processing schemes, advanced imaging techniques, and more intuitive programming environments should contribute eventually to reducing these limitations.
 M. G. Fairhurst, Computer Vision for Robotic Systems. Hertfordshire, England: Prentice Hall, 1988, pp. 80-82.
 R. A. Lotufo, B. T. Thomas, and E.L. Dagless, "Road following algorithm using a panned plan-view transformation," in Proc. ECCV '90, 1990, pp. 231-235.
 E. Fichter, Dept. of Industrial and Manufacturing Engineering, interviewed on May 15, 1991.
 S. Asai, "Streamlining manufacturing," IEEE Spectrum, vol. 28, No. 1, p. 53.
 G. F. Watson, "Technology 1991: solid state," IEEE Spectrum, vol. 28, No. 1, p. 52.
 A. L. Yuille, D. Geiger, and H. Buelthoff, "Stereo integration, mean field theory and psychophysics," in Proc. ECCV '90, 1990, pp.73-82.
 R. K. Jurgen, "Technology 1991: the specialties," IEEE Spectrum, vol. 28, No. 1, p. 79.
algorithm: A prescribed set of well-defined rules or processes for the solution of a problem in a finite number of steps, for example, a full statement of an arithmetic procedure for evaluating sin x to a stated precision
analog: Pertaining to representation by means of continuously variable physical quantities.
artificial intelligence: Research and study in methods for the development of a machine that can improve its own operations. The development or capability of a machine that can proceed or perform functions that are normally concerned with human intelligence as learning, adapting, reasoning, self-correction, automatic improvement.
gray scale: An optical pattern in discrete steps between light and dark.
microprocessor: The semiconductor central processing unit (CPU) and one of the principal components of the microcomputer.
noise: Unwanted disturbances superimposed upon a useful signal, which tend to obscure its information content.
normalization: The process of adjusting the representation of a quantity so that this representation lies within a prescribed range.
PCB: Printed-circuit board. A board for mounting of components on which most connections are made by printed circuitry.
pattern recognition: The identification of shapes, forms, or configurations by automatic means.
pixel: Picture element, the smallest area of a television picture capable of being delineated by an electric signal passed through the system or part thereof.
pixel matrix: A two-dimensional rectangular array of picture elements.
transformations: rules or processes which change the form of data but not the basic content.
IEEE Standard Dictionary of Electrical and Electronics Terms, The Institute of Electrical and Electronics Engineers, New York, NY, 1988.
Data Communications Dictionary, Sippl, Charles J., Van Norstrand Reinhold Company, New York, NY, 1976.