Shapiro, Linda G. : University of Washington - Seattle
Linda G. Shpiro is Professor of Computer Science and Engineering and Professor of Electrical Engineering at
the University of Washington. She earned a bachelor's degree in mathematics from the University of Illinois in
1970 and master's and Ph.D. degrees in computer science from the University of Iowa in 1972 and 1974, respectively.
She taught at Kansas State University and at Virginia Polytechnic Institute and State University and served as
Director of Intelligent Systems at Machine Vision International before joining the University of Washington in
1986. Professor Shapiro is past editor-in-chief of the journal Image Understanding and is a member of the editorial
boards of Computer Vision and Image Understanding and of Pattern Recognition. She has served on the program committees
of many computer vision workshops and conference and is co-author of the text, Computer and Robot Vision
with Robert M. Haralick. She was elected a Fellow of the IEEE in 1995, and a Fellow of the International Association
for Pattern Recognition in 2000.
Stockman, George C. :
George C. Stockman received the B.S. degree in Mathematics-Education from East Stroudsburg State University
in 1966, the MAT degree from Harvard in 1967, the M.S. degree in computer science from Penn State in 1971, and
the Ph.D. degree in computer science from The University of Maryland in 1977. Currently he is a Professor of Computer
Science and Engineering at Michigan State University, where he joined the faculty in 1982. From 1974 to 1982, he
worked as a Research Scientist for LNK Corporation on problems in image analysis and computer cartography. At MSU
he teaches programming and data structures as well as computer vision and computer graphics. Professor Stockman
has been active in many activities of the IEEE, including workshops on the teaching of computing with images.
Preface
Preface
This book is intended as an introduction to computer vision for a broad audience. It provides necessary theory
and examples for students and practitioners who will work in fields where significant information must be extracted
automatically from images. The book should be a useful resource for professionals, a text for both undergraduate
and beginning graduate courses, and a resource for enrichment of college or even high school projects. Our goals
were to provide a basic set of fundamental concepts and algorithms and also discuss some of the exciting evolving
application areas. This book is unique in that it contains chapters on image databases (Chapter 8) and on virtual
and augmented reality (Chapter 15), two exciting evolving application areas. A final chapter (Chapter 16) gives
a complete view of real-world systems that use computer vision.
Due to recent progress in the computer field, economical and flexible use of computer images is now pervasive.
Computing. with images is no longer just for the realm of the sciences, but also for the arts and social sciences
and even for hobbyists. The book should serve an established and growing audience including those interested in
multimedia; art and design; geographic information systems; and image databases, in addition to the traditional
areas of automation, image science, medical imaging, remote sensing, and computer cartography.
A broad purpose at first seems impossible to achieve. However, there are other kinds of texts that already do
this in other areas�calculus, physics, and general computing. We hope we have made at least a good beginning�we
wanted a book that would be useful in the classroom and also to the independent reader. We find the chosen topics
interesting and sometimes exciting, and hope that they are accessible to a large audience. It is assumed that use
of the text in a graduate, or even senior level, computer vision course would be supplemented by papers from the
archival literature. Coverage is not intended to be comprehensive; only a modest set of papers are cited at the
end of each chapter.
The early chapters begin at an intuitive level and progress towards mathematical models with the goal of intuitive
understanding before formal characterization. Sections marked by an asterisk (*) are more mathematical or more
advanced and need not be covered in a less technical course. To strengthen the intuitive approach, we have stayed
with the processing of iconic imagery for the first eleven chapters and have delayed 3D computer vision until the
later chapters, but it should be easy for experienced instructors to resequence them to fit a particular course
or teaching style. There are many viable applications that are entirely 2D, and many concepts and algorithms are
more simply taught in their 2D form. We provide some basics of pattern recognition in Chapter 4, so that students
can consider complete recognition systems before the full coverage of image features and matching. A reader should
have a good idea of 2D image processing applications after Chapter 4; Chapters 5, 6, and 7 add in gray-tone, color,
and texture features. Chapter 8 treats image databases, a popular recent topic. Although some colleagues advised
us to place this material near the end of the book, our goal of positioning it early in the chapter sequence is
to reinforce the concepts of the prior chapters and to provide material that can lead to an excellent half-term
project. Segmentation and matching are treated in their 2D forms in Chapters 10 and 11, so that the basic concepts
are presented in a simple form, without introducing the complexities of 3D transformations.
Characteristics of the 3D world are briefly introduced in Chapter 2 and then are studied in much more detail
in Chapter 12. Chapter 12 surveys qualitatively many aspects of how a 3D world can be perceived from 2D images:
It concludes with quantitative models of stereo and study of the thin lens equation for depth-from-focus and resolving
power. The transition to 3D computer vision is made in Chapter 13: The authors have found from their own teaching
that the difficulty increases abruptly for students at this point. The use of matrices to model homogeneous transformations
are included within the chapter rather than in appendices; the 3D versions are extensions of the simpler 2D versions
given in Chapter 11. Least-squares fitting, introduced in a simple 2D context in Chapter 11, is also extended in
Chapter 13. Non-linear optimization is introduced in a simple P3P context and then used for camera calibration
including the modeling of radial distortion in a lens. Chapter 14 treats 3D models and the matching of models to
3D sensed data: it is of mixed difficulty. Chapter 15 discusses applications in virtual and augmented (mixed) reality
and the role of computer vision techniques.
Programming Language Issue
The book does not rely on any programming language, but uses a generic algorithmic notation. Commitment to a
particular language is unnecessary and would be the wrong language for many readers. Students who are programmers
should have little trouble implementing the algorithms, as our own students have shown. Examples will eventually
be provided on the World Wide Web when appropriate and available, primarily so students can quickly experiment,
secondarily so that they can study some sample code.
Several tools and libraries are available to instructors and students; for example, Khoros, NIH-Image, XView,
gimp, MATLAB, etc. There are also packages that can be purchased from companies that make machine vision hardware.
The authors have decided not to base the text on any specific software because, first, most readers would be using
something else, and second, it would be counterproductive to bury the essence of the image operations within the
complex framework of data structures and methods needed in an industrial strength system. Having first studied
principles in an environment with few variables, the reader will then be better able to successfully choose and
use an industrial system.
Ways to Use the Text
The book material can be selected, and sometimes sequenced, in different ways according to the goal of the course
and interests of the instructor and students.
Chapter 3, with brief summary of Chapter 2
A minimum usage would be 1-3 lectures in a data structures and algorithms course. Chapter 3, with some background
from Chapter 2 contains motivational applications and programming exercises on 2D arrays, depth-first search, and
the union-find data structure for sets.
Chapters 1, 2, and 3, and optionally some of Chapters 4, 5, and 6
This could serve as an enrichment unit of 1 to 3 weeks for high school or lower division undergrads. The objective
could be as simple as a term paper or as complex as group work on a program to, say, create a 2D parts recognition
system based on connected components and prototype matching of feature vectors.
Much of Chapters 1-11
This would be a survey of 2D material for an elective course for students in geography, natural resources or microbiology,
for example, provided that many of the optional sections are passed over. If most sections of Chapters 1-11 are
covered, this would constitute a semester undergraduate course in image processing and analysis with an introduction
to computer vision.
Most of the text
This would constitute a semester course in computer vision for the senior or first year graduate student level.
There is more material in the book than can be covered well in one semester. Some sections will have to be ignored
or surveyed and the reader should not be expected to be able to work homework problems in all sections. For the
quarter system, Chapters 1-4, 6-12, and 14 make a good introduction to computer vision for undergraduates. For
a one quarter graduate course, Chapters 1-4 can be minimally covered with the emphasis on Chapters 6-14 and a brief
coverage of Chapter 15. For any graduate level course, it is expected that some papers from the current literature
would also be covered.
We are grateful to our many colleagues, teachers, and students with whom we have shared our interests. They
have contributed much to our growing field and shared their work and excitement. Many have generously supported
this book with encouragement and with contributions of ideas, figures, and algorithms. Specific citations are given
throughout the book. With regret we have left out some important contributions�a text can only be so large. The
several reviewers and many colleagues who have given us feedback have significantly improved our work. In particular,
for careful editing, we are indebted to Mohammad Ghavamzadeh, Nick Dutta, Kevin Bowyer, Adam Clark, Yu-Yu Chou,
Habib Abi-Racked, and Valentin Razmov. We take responsibility for any errors remaining in the book and for providing
corrections in the future.
This book was four years in the making. We are indebted to Paul Becker of Addison Wesley-Longman for much guidance
in getting the project going and to Tom Robbins of Prentice Hall for finishing it off. We thank Cathy Davison and
Lorraine Evans for their persistence in helping to resolve the many cases where permissions needed to be tracked
down. We are grateful to Rose Rummel-Eury and Chanda Wakefield of ICC for meticulous editing of our notation and
English, and for pushing the schedule. Creating the book was not light work and it certainly helped to have a team
with both skill and humor.
For upper level courses in Computer Vision and Image Analysis.
Provides necessary theory and examples for students and practitioners who will work in fields where significant
information must be extracted automatically from images. Appropriate for those interested in multimedia, art and
design, geographic information systems, and image databases, in addition to the traditional areas of automation,
image science, medical imaging, remote sensing and computer cartography. The text provides a basic set of fundamental
concepts and algorithms for analyzing images, and discusses some of the exciting evolving application areas of
computer vision.
Broad range of topics--Topics include image databases and virtual and augmented reality in addition to classical
topics.
Familiarizes students with the traditional topics as well as exciting evolving application areas.
Two significant case studies--(Ch. 16) 1) Veggie Vision: A System for Checking Out Vegetables; 2) Identifying
Humans via the Iris of an Eye.
Gives students a complete view of real-world systems that use computer vision.
Progressive intuitive/mathematical approach--The early chapters begin at an intuitive level and progress towards
mathematical models. (Optional more mathematical/advanced sections are marked with an asterisk.) The processing
of iconic imagery is emphasized in the first eleven chapters, and 3D computer vision is covered in later chapters.
Instructors could easily re-sequence chapters to fit a particular course or teaching style.
Helps students achieve an intuitive understanding before tackling formal characterization.
Language independent--The text does not rely on any programming language, but uses a generic algorithmic notation.
Allows students to do projects using their choice of languages and tools.
Specific Software independent.
Enables students to study and learn principles in an environment with few variables. Gives students a firm foundation
for successfully choosing and using an industrial system.
Course flexibility and short-course option.
Enables instructors to select, and sometimes sequence, content in different ways according to the goal of the
course and their own and students' interests. The first four chapters provide sufficient depth for a complete short
course.