



































































































































































































































































































































































































































































































































































































































EDITORIAL


O-T-A is O-U-T


It's not often that a government agency can justify its existence and pay its
own way. But sometimes, even that's not enough. After 23 years of advising
Congress on technology-related issues, the Office of Technology Assessment
(OTA) has fallen victim to congressional budget cuts. The nonpartisan OTA
provided Congress with analyses, background, briefings, and testimony on
technical issues so that elected officials could make informed decisions
affecting policy, funding, and laws. OTA projects underway when the lights
were turned off included "Wireless Technologies and the National Information
Infrastructure," "Information Technologies for Control of Money Laundering,"
and "Electronic Surveillance in a Digital Age." 
Perhaps OTA's crowning achievement was a report last year that put the skids
to the Social Security Administration's plan to buy $2 billion worth of
outdated computer systems. That savings alone more than covered the OTA's $22
million annual budget. From how much we pay for computer memory to who we
export our software to, technology-related decisions coming from Capitol Hill
directly affect all of us. Let's hope our elected representatives find a
creditable means of filling the information void created by the demise of the
OTA.


The Latest (s-e-n-d m-e) in Computer (m-o-n-e-y) Games


Although subliminal advertising has been outlawed since the 1970s, it's not
illegal to embed subliminal messages in computer software. California-based
Interloc Design Group, in fact, has developed a "subliminal module" that plugs
into screen savers (text or image), flashing at 1/50 of a second. That
subliminal messaging is legal is of little comfort to British parents whose
teenagers are flocking to a Time Warner Interactive computer game called
"Endorfun." Time Warner openly crows that the game has more than 100
subliminal messages, including such Hallmark-inspired gems as "I expect
pleasure and satisfaction," "It's okay for me to have everything I want," "My
heart is filled with joy," "I am in harmony," and "I create miracles." In
fairness, players can turn off a music track containing audio messages and
play only with sound effects. 
The name "Endorfun," by the way, relates to the brain's ability to release
natural chemicals called "endorphins" that relieve pain and produce a natural
high. Time Warner boasts that the game lures players into a "trance-like
state." Independent tests conducted by the London Times bear this out. Said
one 14 year old after playing Endorfun, "it is like a trance." Out the other
side of its mouth, however, Time Warner insists that "there's no scientific
proof that subliminal affirmations work." Just to hedge its bets, Time Warner
is also promoting the "feel good" game as a politically correct alternative to
the violent, militaristic genre of computer games. 
In the meantime, officials in Hong Kong have called for legislation to ban
electronically influencing the subconscious using software, and British
politicians are clamoring for Time Warner to remove the game from store
shelves. 


The Eyes and Ears of the Computer Industry


Of course, there are those among us who can neither see nor hear subliminal
messages. In such cases, it's comforting to be reminded that computer
technology can make a beneficial difference. 
Simply saying that Harvard Medical School's Joseph Rizzo and MIT's John Wyatt
are developing embedded systems is an understatement. They are developing a
tiny, solar-powered computer--literally the size of the date on a penny--that
can float inside your eye. The top layer of the chip has solar cells, with
logic and circuitry on the bottom layer. Electrodes send signals to the retina
nerves, which in turn transmit signals via the optic nerve to the brain. To
activate the system, a visually impaired person dons special eye-glasses with
optical detectors and a laser. The laser sends visual data to the embedded
computer and provides energy to its solar cells. The narrow field of vision
will be limited, but that's of little consequence to someone who can't see at
all. Rizzo and Wyatt's prototype costs about $500,000, but production versions
are expected to sell for $50 or so. 
Rizzo and Wyatt were inspired by the success of cochlear
implanting--surgically attaching digital devices inside the ears of people
with severe hearing impairments. A small, external microphone transmits
signals to an analog/digital converter, which generates electrical signals
that stimulate the ear's auditory nerves. This information is then sent to the
brain. Thousands of people worldwide have been fitted with the devices,
enabling many previously deaf individuals to have phone conversations.
Although cochlear implants have been available for several years, the Food and
Drug Administration only recently approved them for U.S. adults.
On the nonsurgical front, hearing-aid manufacturers are going digital: The
emerging generation of hearing aids can be programmed according to an
individual's ability to hear sound-specific frequency ranges, filtering out
background noise in the process. The resulting realistic sound is then
amplified by a tiny speaker.
Jonathan Ericksoneditor-in-chief



































LETTERS


Dateline


Dear DDJ,
Murray Lesser answered my question as to the starting calendar for the Julian
Day, when he says it refers to the Julian "proleptic" calendar ("Letters" DDJ,
September 1995). I assume that means the use of the Julian calendar algorithm
in a range of years where it was not intended. (Incidentally, "Julian Day" and
"Julian Date" are not the same thing, as your banner suggested.) 
Now I'd like to carry this thing one step further. I've been using the
following tried-and-true Basic Gregorian algorithm (unnecessary spaces have
been omitted): M=7:Y=1776:N=31:Q=SQR(M-3):FOR
J=1TON:D=(Y+Y\4-Y\100+Y\400+2.6*M+1.2)MOD 7+J:LOCATE D\7+6,3*(D MOD
7)+3:?J:NEXT. 
I'm not as sure about the following Julian algorithm, and invite readers to
check it: M=10:Y=1492:N=31:Q=SQR(M-3):FOR J=1TO N:D=(Y+Y\4+2.6*M+6.2)MOD
7+J:LOCATE D\7+6,3*(D MOD 7)+3:?J:NEXT. 
For January or February you must use M=13 or 14 along with the previous year's
number with both algorithms. They won't let you use M=1 or 2.
The first algorithm is set up to generate July 1776; the second generates
October 1492. Now, 4713 BC= -4712 ad because 1 BC=0 ad. Neither algorithm
works with negative year numbers, but we know the Julian calendar repeats
every 700 years. So 4713 BC (or -4712 ad) is the same as 188 ad. Thus we use
M=13:Y=187:N=31 with the second algorithm for January 4713BC Julian proleptic.
The resulting display is:
 1 2 3 4 5 6
 7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31
Murray, is this correct? If not, where did I go wrong?
Homer B. Tilton
Tucson, Arizona


Executable Content


Dear DDJ,
In his November 1995 "Editorial," Jonathan Erickson made several negative
comments about Eolas's recent patent-license announcement relating to the
implementation and use of "applets" over the World Wide Web.
Rather than representing a "blow to interactivity on the Internet," the
University of California patent will be used to encourage the acceptance of a
standard API for Web-based interactive applications, preventing the
development of a VHS/ Beta-style "API war" between Microsoft, Netscape, Sun,
and the like. We are not asking browser companies to pay royalties for
developing browsers that can run applets. Rather, we are only requiring that
they adhere to a standard "Web-API" that will be defined by a consortium of
Eolas licensees. This will accelerate the rapid pace of interactive
application development on the Web, not hinder it.
Jonathan's comments go on to imply that since I went to graduate school at the
University of Illinois at Urbana-Champaign, and since Mosaic was developed
there, that I must have "lucked" into some special knowledge of Web
technologies through an alleged "tangential association" with NCSA. This is
untrue and misleading. Although I did receive my PhD from UIUC, I had no
connection with NCSA at the time. My attendance on campus was from 1984-1989,
long before the NCSA folks began work on a Web browser. Furthermore, my degree
was from the department of Cell and Structural Biology, for studying the
effects of aging on the microvascular system of the heart.
I was deeply involved at the time in computer programming, hypermedia, and
image analysis, but this was entirely self-taught and had no connection with
NCSA. The only computer-science department courses I ever took included a 1977
course in assembly language on a Cyber mainframe, when I was an undergraduate
student, and a 1992 course in supercomputing applications at UIC, where I was
on the faculty.
After joining the faculty of the University of Illinois at Chicago in 1989, I
began an informal relationship with some of the visualization people at NCSA
as part of an HPCC "Grand Challenge" scientific database effort, called the
"Visible Embryo Project" (http://bubba.afip.mil) that I directed from 1989 to
1993. I saw Mosaic for the first time when Larry Smarr demonstrated it at an
NSF site visit in my lab at UIC in early 1993. I became immediately intrigued
with the potential for Mosaic to act as the front end to the Visible Embryo
Project database.
Immediately after leaving UIC to take the position of Director of the Center
for Knowledge Management at University of California, San Francisco, I began
work with Cheong Ang and David Martin on my staff at CKM to enhance Mosaic in
various ways. We designed and implemented an API for embedded "inline" applets
and demonstrated it to several groups, including many of those who were later
involved in projects to add APIs and applets to Web browsers at places like
NCSA, Netscape, and Sun. Due to the time delay relating to the patent
application, we did not make our announcement as early as we had hoped. We
originally planned to release our WebRouser by January 1, 1995. In any case,
this work led to the patent that UC applied for in 1994, and was the subject
of our press releases in August and September of 1995.
I realize that our first press release arrived very close to your deadline for
publication, and that there was not sufficient time for you to check all of
the facts related to the editorial. I appreciate the opportunity to clarify
some of the history surrounding our announcement.
Michael D. Doyle
Eolas Technologies 
Dear DDJ,
In reading Jonathan Erickson's "Editorial" entitled "Bellying Up to the Public
Trough" (DDJ, November 1995), I was struck by the somewhat defeatist tone of
his closing remark regarding the pending patent for "executable content."
Patents (especially pending ones) are challengeable on several grounds, one of
which is prior art. That is, the prior existence of substantively related
practice or material can nullify a patent claim. The Compton's New Media claim
a few years back (essentially granting Compton's a patent on hypertext) comes
to mind. As I recall, that patent claim ended up being rejected due to the
prior existence of HyperCard and Danny Goodman's original book on the subject.
Not having read the executable-content patent application, and not being a
lawyer, I would not want to hazard an opinion on whether the
executable-content claims are valid or not. However, I will point out what I
believe is substantively similar prior art. If others will do the same, the
pending patent may be nullified.
I presume that executable content means that one computer transfers
information to another, which is then executed in that other computer, causing
actions therein. So why isn't downloading a program from CompuServe executable
content? It seems to me to satisfy all the constraints. But maybe not. So,
what if I use a telecomm program written in Forth to download Forth code?
Maybe I have to download new telecomm Forth source or "macros" for it to
count. Or maybe my Forth interpreter has to interpret or compile the
downloaded Forth source on the fly for it to be executable content. If so,
why? This is clearly all just a range of options on a continuum of behavior.
And if none of this is executable content under the patent claim, then how can
HotJava be an infringement?
In the Forth world, techniques of dividing execution behavior between a host
computer and connected "remote" computers go back quite a ways. Just browsing
through my FORML (Forth Modification Laboratory) proceedings for 1991, 1990,
and 1989, it was easy to find papers on various aspects of this technique for
every year. While the descriptions are given from the point of view of the
host machine, they are in every way equivalent to a Web page transferring
HotJava "code" to a client machine for local execution, and even communicating
back and forth between the newly executing client and the existing host. I
can't imagine that the same thing described from a different point of view,
the client, would be patentable.
Furthermore, I am fairly sure that these techniques go back even farther than
my few FORML proceedings. As a simple example, I recall a Forth-based computer
that used a single RS-232 serial line for both console and mass-storage
transfers. The serial line was connected to a PC also running Forth, which
sent either disk or keyboard data, and received both modified disk data and
displayed data. The clever part was that this entire protocol was completely
transparent to anything compiled on either computer. Nothing but the
lowest-level serial-port drivers knew that there wasn't a real disk connected
to the "remote." What does this have to do with executable content? Well, it
shows that actions initiated by either computer are not necessarily
distinguishable as actually executing on one computer or the other. And the
situation may change during the course of program execution, without any
change at the higher levels of the program. Which is the whole reason for
having executable content in the first place.
Some other executable content areas to explore: the AT&T "Blit" terminal,
which could receive and execute code sent to it from the host; smart terminals
in general, some of which could receive new executable code from a host
computer or could ask for new definitions for actions. And what about diskless
workstations?
In short, the way to prevent this executable-content patent from being granted
is clear: Challenge its claims on priorart grounds. Since lawyers will
certainly be involved, I doubt this can be done inexpensively, but maybe the
Electronic Frontier Foundation would be interested. As for doing the research,
I'm sure that the formidable resources and denizens of the Internet itself
could be put to use. It will benefit us all.
Greg Guerin
Tempe, Arizona


Of Milestones and Men


Dear DDJ,
I found the article, "Of Milestones and Men," by Ray Valds (Dr. Dobb's
Developer Update, October 1995) interesting, especially the part regarding
evolutionary development. It reminded me of something Gerald Weinberg wrote in
the December 1972 edition of Datamation and was later paraphrased by Knuth in
his 1974 paper, "Structure Programming with go to Statements," as: "the former
regimen of analysis/code/debugging should be replaced by
analysis/code/debugging/improving."
As an aside, Knuth also anticipated object technology, specifically,
object-based programming languages, over 20 years ago in this quote from the
same paper: 
...it turns out that a given level of abstraction often involves several
related routines and data definitions; for example, when we decide to
represent a table in a certain way, we simultaneously want to specify the
routines for storing and fetching information from that table. The next
generation of languages will probably take into account such related routines.
The four-foot-high stack of project documentation that Ray mentions brought
back similar memories. I was part of a three-man team that designed an
integrated development environment for a major computer manufacturer about
eight years ago. We cranked out a lot of documentation. The project leader
wanted anything and everything written down although we didn't have a
particularly well-defined development process. The result was a one-foot-high
stack of documentation--all prose, no diagrams. He proudly displayed the
stacks sitting in front of each reviewer at the interdepartmental design
review. I shook my head, knowing that this ostensibly impressive volume
rendered the design unreviewable.
Paul Long 
plong@perf.com 

Dear DDJ,
I am in complete agreement with Eric Raymond, quoted in the article by Ray
Valds ("Of Milestones and Men"): "...good designs arise only from
evolutionary...interaction[s] between ...able designers and an active
[knowledgeable] user population...first try at a big new idea is always
wrong...."
The myth is that some smart, well-organized novices can do it right the first
time with preconceived notions. Strange, but the world isn't built that way.
The world is more like a Rubik's Cube. You can be one move from a solution and
not see it, or think you are one move from a solution and be far from it.
Bill Fay
WFay@hei.com


It's in the Numbers


Dear DDJ,
In the September 1995 "Letters" section of DDJ, Dwight Keeve posed a question
regarding prime numbers. The list of primes up to 100 that he refers to is 2,
3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73,
79, 83, 89, 97. The number 57, which Dwight included, is divisible by 3 and
therefore cannot be prime. The correct sequence p(p(n)) is: 3, 5, 11, 17, 31,
41, 59, 67, 83,....
Daniel Pfeffer
Netanya, Israel
daniel@ornetix.mhs.compuserve.com


Book Business


Dear DDJ,
I enjoyed the "Programmer's Bookshelf" entitled "Nontraditional Education
Alternatives," by Jonathan Erickson (DDJ, September 1995), but my local
bookstore hasn't been able to get an address or phone number for Professional
Publications, the publisher of High-Technology Degree Alternatives. Can you
help?
Kevin Light
klight@lws.leba.net
DDJ Responds: Sure Kevin, Professional Publications is located at 1250 Fifth
Ave., Belmont, CA 94002, 800-426-1178, or send e-mail to: profpubl@crl.com.






































Priority Queues and the STL


Heap management and container support made easy




Mark R. Nelson


Mark, a programmer for Greenleaf Software, is the author of The C++
Programmer's Guide to the Standard Template Library (IDG Books, 1995) and The
Data Compression Book, Second Edition (M&T Books, 1995). He can be contacted
at 73650.312@compuserve.com.


When I wrote the chapter on Huffman coding for the first edition of my
data-compression book, my sample programs emphasized clarity at the expense of
efficiency. I felt it was much more important for my code to be easy to
understand than to be fast or tight. 
Were I writing that book today, I wouldn't have to wrestle with this choice.
The addition of the Standard Template Library (STL) to the ANSI/ISO C++
standard provides a perfect tool for the creation of Huffman coding trees: the
priority_queue container adapter. The STL lets me have the best of both
worlds: easy-to-read code that runs with optimal efficiency.
In this article, I'll examine just what a priority queue is and how the STL
implements it using heap algorithms. I'll also present a Huffman encoder using
the STL priority queue.


What is a Priority Queue?


Open any book on data structures and you will undoubtedly run into a container
type known as a queue. Another name for the queue, FIFO, is an acronym for the
way it works: First In, First Out. Figure 1 describes how a queue normally
processes data.
The standard interface to the FIFO queue simply consists of a push() function,
which adds new elements, and pop(), which always removes the oldest element. 
This data structure is relatively easy to implement. Often, however, a more
sophisticated form of queue is needed. The one I present here is the priority
queue.
A priority queue, as shown in Figure 2, assigns a priority to every element
that it stores. New elements are added to the queue using push(), just as with
a FIFO queue. This queue also has a pop() function, but it differs from the
FIFO pop() in one key area: When you call pop() for the priority queue, you
don't get the oldest element in the queue. Instead, you get the element with
the highest priority.
The priority queue obviously fits well with certain types of tasks. For
example, the scheduler in an operating system might use a priority queue to
track processes. In this article, I'll use a priority queue to build a Huffman
coding tree.


Implementation with a Heap


There are a few brute-force ways to implement a priority queue. In my
inefficient Huffman encoder, I kept the queue elements in an unsorted list,
then searched the entire list for the highest-priority item when it came time
to pop(). Alternatively, you could perform a sorted insertion when the push()
function is called, so that pop() has an easy job of it. This could be
accomplished using a sorted tree.
In the case of the priority queue, however, there is a better approach. A data
structure known as a "heap" can be used to keep the queue elements in a
partially sorted order. Insertions and extractions can then be done in log(N)
time, with virtually no extra memory overhead.
A heap has the following key characteristics:
The elements of a heap are stored in a contiguous array. If the heap holds N
elements, they will be stored at locations 1 through N of the array. (In this
article, I use a convention of arrays starting at 1 and running through N
mostly for notational convenience.)
The elements in the array are also part of an implicit binary tree. Every
element X (except the root node) has a parent at location X/2.
Each node has a priority greater than or equal to both of its children.
Figure 3 shows a typical heap. Note that a heap is not a completely ordered
tree. Even though parent nodes are always greater than children, there is no
ordering among siblings. For example, node 10 has a higher priority (39) than
node 3, which is two levels higher up in the tree.
Even though a heap is not completely sorted, it has one very useful
characteristic: The node with the highest priority will always be at the top
of the tree. This makes it suitable for a priority queue, since the pop()
operation requires only the highest element. This characteristic, along with
the negligible overhead and fast insertions and removals, makes the heap an
ideal data structure for implementation of a priority queue.


The STL Heap Functions


The STL has four template functions that work with heaps:
make_heap() takes an existing sequence of elements and rearranges it to form a
heap.
push_heap() adds a new element to an existing heap.
pop_heap() removes the highest-priority value from a heap.
sort_heap() turns a heap into a sorted sequence.
All four template functions are parametrized by two different types: class
RandomAccessIterator and class Compare. The iterator class simply defines the
type of iterator used to point to the elements in the heap. The iterator
usually will be of a type defined by the container the heap is stored in. If
the heap is stored in a built-in C/C++ array, the iterator will be a standard
pointer type.
The Compare class defines an STL comparison object used to order the elements
in the heap. The STL gives us maximum flexibility in determining how heap
elements are compared. The comparison class defaults to less<T>, a predefined
STL comparison-object class. less<T> simply compares two objects using the
less-than relational operator, operator<(). Alternatively, the comparison
could be done with the comparison object passed as an optional parameter to
any of the four heap management functions; see Example 1.
The STL uses only three of these heap functions to implement a priority queue.
make_heap() is used to create a heap out of unordered data, while push_heap()
and pop_heap() are used to insert and remove individual items from the heap.
Here, I'll look only at push_heap() and pop_heap(). make_heap() uses
techniques very similar to those of push_heap() to create an initial heap, so
understanding the latter should make the former easy to follow.
In Example 2, the first two arguments to function push_heap() are iterators
that define the start and end of the sequence of elements that make up the
heap. (In canonical STL fashion, first points to the first element in the
sequence, and last points one past the last element.) The third parameter is
the optional comparison object used to order the elements in the heap.
In general, the algorithms supplied with the STL work on sequences of elements
and don't perform any operations that require knowledge of how to add or
remove elements from containers. (This is one reason why so many of the
algorithms work with built-in arrays.) 
Given this restriction, push_heap() operates a little differently than you
might expect. If you have an existing heap with N elements numbered 1 through
N, you must place the new element at location N+1 before calling push_heap().
This creates a new, slightly unbalanced heap with N+1 elements. 
You call push_heap() with the heap in this temporary, invalid configuration.
At that point, push_heap() simply corrects the heap to take into account this
new element. This is done by repeatedly swapping the new node with its parent
node until it reaches a position where the parent is greater than the new
node. The procedure for this can be described with code like Example 3.

Figure 4 illustrates this procedure. The new node, with a value of 55, is
added to the heap at the next available position. push_heap() is called and
begins moving the node up the heap. The initial parent of the new node has a
value of 49, which is less than 55. The two elements are swapped, and the
comparison loop repeats. After the first swap, the new node is in a valid
position in the heap. It is greater than all of its children, and less than
its parent. If the new node had a value greater than 60, a second swap would
have been performed, and the new node would then appear at the root of the
heap.
The pop_heap() function requires that you use a set of idiosyncratic
conventions similar to those for push_heap(). pop_heap() doesn't want to worry
about changing the size of your container. Instead, it moves the current root
node of the heap to position N. pop_heap() then adjusts the heap to be correct
for positions 1 through N-1, and control passes back to the calling function.
After calling pop_heap(), the size of the heap will be N-1 instead of N.
However, there is still a single element in the last position of your array.
(This can be removed from STL containers using the container's pop_back()
function.) Furthermore, if the heap still contains any data, the highest value
will have been moved to the root node of the heap.
The actual mechanics of this operation are similar to those for pop_heap().
First, the root node is swapped with the least node in the heap. Then, the new
root node is moved down the tree, swapping the greater of its two children,
until it is greater than both of its children. Example 4 illustrates how this
function might be implemented. This code is slightly more complicated than
that of push_heap(), but you can clearly see that it is performing the reverse
operation of push_heap(). It moves the least element to the root of the heap,
then loops while checking to see if the root node is less than one of its
children. If it is, the node is swapped with its largest child, and the
process repeats.
You can see why inserting and removing items from the heap is a Log(N)
operation. Regardless of what number is inserted or extracted, the adjustment
process will only have to go from the root of the tree to its lowest level, or
vice versa. Because of the way the tree is designed for the heap, this will
never take more than Log(N) swap operations.


The priority_queue Container Adapter


The STL implements priority queues using a "container adapter," a simple
wrapper class that uses the framework of an existing container to implement a
new container type. The STL has three container-adapter types: stack, queue,
and priority_queue. The priority_queue class uses an underlying container that
can be either an STL vector or deque. In most cases, a vector object is
probably the best choice.
When you create a priority_queue, you have to specify the container type, and
optionally a comparison-object type. Typical declarations look like Example 5.

The priority_queue adapter presents a very limited interface. It has a pair of
constructors: One creates an empty queue, and the other loads a sequence of
initial values into the queue. The copy constructor, assignment operator, and
destructor are all implicitly defined by the compiler. The constructors and
destructors initialize two protected data members: Container c and the
comparison object, Compare comp.
The container offers just five member functions. (Well, six, if you count the
const and mutable versions of top() as two separate functions.) Like the
priority_queue class itself, these member functions are wrappers around
container member functions. The Hewlett-Packard STL release defines these
functions as shown in Example 6, and nearly all are self-explanatory. empty()
and size() are used to determine the number of elements in the heap at a given
time. top() returns a reference to the top element in the heap, which should
be the element with the highest priority. push() and pop() are used to insert
and remove elements from the queue.
Note that push() and pop() both deal with the idiosyncratic behavior of the
STL heap functions. push() first adds the new element to the end of the
container that holds the heap. It then calls the STL push_heap() function to
walk the new element up to its proper place. pop() first calls the STL
pop_heap() function to move the top element to the end of the container. A
subsequent call to pop_back() removes that element and shrinks the container.
As a programmer, I feel the most exciting thing about this container adapter
is the relative ease with which it was built around an existing container
class. It would take hundreds of lines of code to create a new container class
from scratch. But this example shows that you can hijack existing container
capabilities to define new types without even breaking a sweat.


An Example: Huffman Coding


Earlier, I mentioned that priority-queue containers would be ideal for
developing Huffman coding trees. A Huffman coding tree is a binary tree made
up of internal nodes and leaf nodes. Coding starts at the root and moves down
the tree, issuing 0s and 1s until a leaf node is reached. Leaf nodes signify
that a character has been completely encoded, while internal nodes are a stop
along the way. Each node has two children, one designated with a 0 bit, and
the other designated with a 1 bit. To encode a particular character, you take
the path down the tree that leads to the target leaf.
Building the Huffman encoding tree begins with creating a list of unattached
leaf nodes, each having an internal weight proportional to the frequency of
the character they are encoding. The encoding process then executes a simple
loop that combines nodes until there is only one root node, which becomes the
starting point for the Huffman coder.
On each pass through the loop you have to identify the two available nodes
with the lowest weights. Those two nodes are then removed from the pool of
available nodes (the priority queue). A new internal node is then created, and
it is assigned those two nodes as children. This internal node is then entered
into the priority queue. Figure 5 shows the tree that results from processing
the input data in Figure 6 (a).
This data is found in INPUT.DAT, the file used to exercise the test program,
PQHUFF.CPP (see Listing One beginning on page 96). As you can see, the nodes
with the lowest frequency counts are found the farthest down the tree, meaning
they will take more bits to encode. High-frequency characters, such as "A",
take fewer bits to encode. The result? Data compression.
PQHUFF.CPP builds the Huffman encoding tree using a priority queue filled with
objects of class node. The node objects are simply C++ representations of the
nodes in Figure 5. Each node has a weight, which determines its place in the
eventual Huffman coding tree.
The utility of the priority queue finally shows up in main() of PQHUFF.CPP.
The loop that builds the encoding tree counts all the characters in the input
file and adds the initial count values to the priority queue; see Example 7.
In the loop, the two lowest-weight nodes are popped from the priority queue. A
new node that has those two nodes for children is then created and inserted
into the queue. The process continues until the queue has just one node left,
which is then the root node for the encoding tree. Figure 6 (b) is the output
from PQHUFF.EXE for the data in Figure 6 (a). (The source code, executable,
and sample data file for PQHUFF are available electronically; see
"Availability," page 3.)


Conclusion


The STL template class priority_queue provides an efficient encapsulation of a
priority queue. The details of both heap management and container support are
provided in a fairly transparent manner. Having this class available as part
of the C++ standard library gives you a potent, versatile new tool for a
variety of problems. I find that it makes my Huffman-table creation much more
efficient, while maintaining readability.
Figure 1: A FIFO queue.
Figure 2: A priority queue.
Figure 3: A heap.
Figure 4: Inserting a new node. (a) A new node is inserted at the end of the
heap; (b) push_heap moves the new node into its proper location.
Figure 5: A Huffman coding tree.
Figure 6: (a) Sample input data; (b) output from the PQHUFF.EXE program for
the data in (a).
(a)
AAAAAAAAAAAAAABBBCDEFGGGHHHH

(b)
Char Count Code
A 14 0B 3 100G 3 101H 4 110C 1 11100F 1 11101D 1 11110E 1 11111
Example 1: The prototype for the template function pop_heap(). A second
overloaded version of this function omits the comparison parameter.
template<class RandomAccessIterator, class Compare>
pop_heap( RandomAccessIterator first,
 RandomAccessIterator last,
 Compare compare);
Example 2: The prototype for the template function push_heap(). This
overloaded version takes a comparison object argument in addition to the two
iterators.
template<class RandomAccessIterator, class Compare>
push_heap( RandomAccessIterator first,
 RandomAccessIterator last,
 Compare compare);
Example 3: A simplified sample of the type of code push_heap() uses to adjust
the heap to accommodate the new elements at position N.
template <class T>
void adjust_heap( T* c, int N )

{
 int test_node = N;
 while ( test_node > 1 ) {
 int parent_node = N/2;
 if ( c[parent_node] < c[test_node] )
 swap( c[parent_node], c[test_node] );
 test_node = parent_node;
 } else
 break;
 }
}
Example 4: A simplified sample of the type of code pop_heap() uses to move the
root node to position N, then adjust the rest of the heap.
template<class T>
adjust_heap( T* c, int N )
{
 T temp = c[1];
 c[1] = c[N];
 int test_node = 1;
 for ( ; ; ) {
 int child;
 if ( ( test_node * 2 ) >= N )
 break;
 if ( ( test_node * 2 + 1 ) >= N )
 child = test_node * 2;
 else if ( c[ test_node * 2 ] > c[ test_node* 2 + 1 ] )
 child = test_node * 2;
 else
 child = test_node * 2 + 1;
 if ( c[ test_node ] < c[ child ] ) {
 swap( c[ test_node ], c[ child ] );
 test_node = child;
 } else
 break;
 }
 c[ N ] = temp;
}
Example 5: Typical declarations.
priority_queue< vector<node> > x;
priority_queue< deque<string>,
 case_insensitive_compare > y;
priority_queue< vector<int>, greater<int> > z;
Example 6: HP STL member functions.
template <class Container, class Compare>
class priority_queue {
 protected :
 Container c;
 Compare comp;
 ...
 public :
 bool empty() const { return c.empty(); }
 size_type size() const { return c.size(); }
 value_type& top() { return c.front(); }
 const value_type& top() const { return c.front(); }
 void push(const value_type& x) {
 c.push_back(x);
 push_heap(c.begin(), c.end(), comp);
 }
 void pop() {
 pop_heap(c.begin(), c.end(), comp);

 c.pop_back();
 }
Example 7: Loop that builds an encoding tree.
while ( q.size() > 1 ) {
 node *child0 = new node( q.top() );
 q.pop();
 node *child1 = new node( q.top() );
 q.pop();
 q.push( node( child0, child1 ) );
}

Listing One
// PQHUFF.CPP -- This program reads in all the characters from the file 
// input.dat, then builds a Huffman encoding tree using an STL priority queue.
// The resulting table is then printed out. If you have the HP version of the 
// STL installed, you can build this program with Borland C++ 4.5 using a 
// command line like this: bcc -ml -IC:\STL pqhuff.cpp
//
// Borland 4.x workarounds
//
#define __MINMAX_DEFINED
#pragma option -vi-
#include <iostream.h>
#include <iomanip.h>
#include <fstream.h>
#include <vector.h>
#include <stack.h>
#include <cstring.h>
// The node class is used to represent both leaf and internal nodes. leaf
nodes
// have 0s in the child pointers, and their value member corresponds to the 
// character they encode. internal nodes don't have anything meaningful in 
// their value member, but their child pointers point to other nodes.
struct node {
 int weight;
 unsigned char value;
 const node *child0;
 const node *child1;
// Construct a new leaf node for character c
 node( unsigned char c = 0, int i = -1 ) {
 value = c;
 weight = i;
 child0 = 0;
 child1 = 0;
 }
// Construct a new internal node that has children c1 and c2.
 node( const node* c0, const node *c1 ) {
 value = 0;
 weight = c0->weight + c1->weight;
 child0 = c0;
 child1 = c1;
 }
// The comparison operator used to order the priority queue.
 bool operator<( const node &a ) const {
 return weight < a.weight;
 }
 void traverse( string code = "" ) const;
};
// The traverse member function is used to print out the code for a given
node.
// It is designed to print the entire tree if called for the root node.

void node::traverse( string code ) const
{
 if ( child0 ) {
 child0->traverse( code + "0" );
 child1->traverse( code + "1" );
 } else {
 cout << " " << value << " ";
 cout << setw( 2 ) << weight;
 cout << " " << code << endl;
 }
}
// This routine does a quick count of all the characters in the input file. 
// I skip the whitespace.
void count_chars( char *name, int *counts )
{
 for ( int i = 0 ; i < 256 ; i++ )
 counts[ i ] = 0;
 ifstream file( name );
 if ( !file ) {
 cerr << "Couldn't open " << name << endl;
 throw "abort";
 }
 cout << "Counting chars in " << name << endl;
 file.setf( ios::skipws );
 for ( ; ; ) {
 unsigned char c;
 file >> c;
 if ( file )
 counts[ c ]++;
 else
 break;
 }
}
main( int argc, char *argv[] )
{
 int counts[ 256 ];
 if ( argc > 1 )
 count_chars( argv[ 1 ], counts );
 else
 count_chars( "input.dat", counts );
 priority_queue< vector< node >, greater<node> > q;
// First I push all the leaf nodes into the queue
 for ( int i = 0 ; i < 256 ; i++ )
 if ( counts[ i ] )
 q.push( node( i, counts[ i ] ) );
// This loop removes the two smallest nodes from the queue. It creates a new 
// internal node that has those two nodes as children. The new internal node
// is then inserted into the priority queue. When there is only one node in 
// the priority queue, the tree is complete.
 while ( q.size() > 1 ) {
 node *child0 = new node( q.top() );
 q.pop();
 node *child1 = new node( q.top() );
 q.pop();
 q.push( node( child0, child1 ) );
 }
// Now I dump the results
 cout << "Char Symbol Code" << endl;
 q.top().traverse();

 return 1;
}





























































Dynamic Markov Compression


Better compression for large binary files 




Tong Lai Yu


Tong Lai Yu is an associate professor of computer science at the California
State University, San Bernadino. He can be contacted at tongyu@csci.csusb.edu.


Compression packages such as PKZIP, Stacker, and DoubleSpace are all based on
variations of the so-called "LZ77'' algorithm, a dictionary-lookup technique
invented by Abraham Lempel and Jacob Ziv in the late 1970s. The LZ77 algorithm
requires a relatively small amount of memory and performs very well in both
speed and compression ratios for small and large files.
In this article, I'll present a statistical technique called "Dynamic Markov
Compression" (DMC) that uses a very different approach. DMC was first
introduced by Gordon Cormack and Nigel Horospool in the late 1980s. Its
overall performance is not as good as that of PKZIP or other similar archiving
packages. However, DMC yields a better compression ratio when applied to large
binary files such as speech and image files. Since it handles one bit at a
time, DMC might also be appropriate for fax machines that compress
black-and-white images. 
Implementing DMC is straightforward, requiring no special data structures. In
fact, its implementation is simpler than any other compression scheme with a
comparable compression ratio. Here, I'll concentrate on DMC's working
principles, rather than its performance. 


Dynamic Markov Modeling


Like other statistical techniques, DMC consists of two parts--modeling and
coding. The model, which is created from the data to be compressed, feeds
information to the coder, which codes the data. An unusual feature of DMC is
that it employs arithmetic coding to encode only two symbols, namely 0 and 1.
(For more information, see "Arithmetic Coding and Statistical Modeling," by
Mark R. Nelson, DDJ, February 1991.) 
The model consists of a finite number of states. Each state gives you
information about the probability of finding a 0 or a 1 at that state. The
probabilities of 0 and 1 at the current state are used to encode the current
incoming bit. 
The central scheme of DMC is its cloning mechanism. Figure 1 (a) shows a
portion of a simple, finite-state model. Through the cloning mechanism, this
evolves into the more complex model in Figure 1 (b). If, in Figure 1 (a), the
instance at state A receives symbol x (which is either 0 or 1), the instance
will transit to state D. There can be many other transitions into state D at
different times. If state D has been visited often and the transition of A-->D
upon receiving x is frequent, you clone state D to form an additional state
D'. In Figure 1 (b), upon receiving x, the instance at state A will transit to
state D' instead of D. The output transition counts from the old state D are
divided between D' and D; you assume that the total number of transitions into
a state is equal to the total number of transitions out of a state. 
Two threshold parameters, min_cnt1 and min_cnt2, are used to determine when a
state should be cloned--a state D is cloned if it satisfies the following
conditions:
The number of counts of a specific transition into D is larger than min_cnt1. 
The number of counts at D not arising from the specific transition is larger
than min_cnt2.
You should start cloning as early as possible. In the implementation presented
here, I set min_cnt1 and min_cnt2 to two. You could start the model with
min_cnt1 and min_cnt2 values equal to one and increase their values slowly, as
cloning adds states to the model. When the number of states reaches a certain
maximum value, you discard all the states and start the model all over again.
Again, you could save some final states in a buffer and start a model based on
those states.


The Code


Listing One includes the main function and housekeeping (parsing files,
computing compression ratios, and the like). Listing Two is a basic
implementation of DMC. (The complete source code and executables are available
electronically; see "Availability," page 3.) I've compiled the program with
both Turbo C and Watcom C. 
To save time in file I/O processing, I define the macros input_bit() and
output_bit() instead of using function calls to handle file input and output;
the macros are in Listing Three. The function compress() calls other functions
to compress an input file. Because I'm dealing with only two symbols (0 and
1), I don't want to add an extra END_OF_STREAM symbol to signify the end of
the encoding stream. To help the decoder recognize the end of the original
message, compress() first calls put_file_length() to save the original file
size. It then calls initialize_model() to start DMC with a simple model. The
function initialize_encoder() is used to initialize the arithmetic coder with
certain initial values. The next step is to call encode() to compress the
input file and output the encoded stream to an output file. With probabilities
of finding 0 and 1 generated by the model, encode() uses a standard
arithmetic-encoding technique to encode the current bit read from the input
file; when the most significant bits (MSBs) of variables low and high match,
they are pushed out and saved. To avoid underflow, you must throw away the
second MSBs if their values differ and the MSBs of low and high do not match;
the model must be updated (by function update_count()) before or after each
bit is encoded. When encoding is done, the remaining bits in the arithmetic
encoder are pushed out and saved by flush_encoder(). An extra 16 0 bits are
then saved in the output file, so that during the expansion phase, you won't
run into the EOF mark before decoding has finished.
The function uncompress() calls other functions to decode a compressed file.
It first calls get_file_length() to obtain the original size of the file
before compression. It then calls initialize_model() to start a simple
compression model. As before, the arithmetic encoder has to be initialized by
the function initialize_decoder(). Decoding is done primarily by the function
decode() in a manner similar to that of encode(). The same model is
reconstructed while bits are decoded. Again, the standard arithmetic technique
uses probabilities from the bit stream generated from the model.
Figure 1: The DMC cloning mechanism. (a) The simple model; (b) the complex
model.

Listing One
/***** main.c : housekeeping for dmc.c ******/
void usage_exit( char *argv )
{
 char *filename, *extension;
 filename = strrchr( argv, '\\' ); /* find '\' */
 if ( filename == NULL )
 filename = strrchr( argv, '/' );
 if ( filename == NULL )
 filename = strrchr( argv, ':' );
 if ( filename != NULL )
 filename++; /* strip off the path */
 else
 filename = argv;
 extension= strrchr( filename, '.' );
 if ( extension!= NULL )
 *extension= '\0';
 printf( "\nUsage: %s [-c/e] in-file out-file \n", filename );

 printf("\n%s is for compression ! ", filename );
 printf("\n%s", info );
 printf("\nThe switches -c means compress \( default \),");
 printf("\n -e means expand,");
 exit( 0 );
} /* usage_exit() */
#ifndef SEEK_END
#define SEEK_END 2
#endif
long file_size( char *name )
{
 long eof_fp;
 FILE *file;
 file = fopen( name, "r" );
 if ( file == NULL )
 return( 0L );
 fseek( file, 0L, SEEK_END ); /* points to END OF FILE */
 eof_fp = ftell( file ); /* get current file pointer */
 fclose( file );
 return( eof_fp );
} /* file_size() */
/* This routine prints out the compression ratios after the input
 * and output files have been closed. */
void print_ratios( char *input, char *output )
{
 long input_size;
 long output_size;
 float ratio;
 input_size = file_size( input );
 if ( input_size == 0 )
 input_size = 1;
 output_size = file_size( output );
 ratio = ( float ) output_size * 100 / input_size;
 if ( output_size == 0 )
 output_size = 1;
 printf( "\nCompressed Size in bytes : %12ld", output_size );
 printf( "\nOriginal Size in bytes : %12ld", input_size );
 printf( "\nCompression ratio : %12.1f%%\n", ratio );
}
main( int argc, char *argv[] )
{
 char *p[10], c;
 if ( argc < 3 )
 usage_exit( argv[ 0 ] );
 argv++;
 c = *(argv[0]+1);
 if ( *argv[0] != '-' ){
 compress( argv );
 print_ratios ( *argv, *(argv+1) );
 } else {
 if ( argc < 4 )
 usage_exit( argv[0] );
 ++argv;
 if ( c == 'c' c == 'C' ){
 compress( argv );
 print_ratios ( *argv, *(argv+1) );
 } else if ( c == 'e' c == 'E' ){
 uncompress( argv );
 print_ratios ( *(argv+1), *argv );

 } else
 usage_exit( argv[0] );
 } /* else */
} /* main */

Listing Two
/* dmc.c -- uses dynamic markov model to compress data. The program has been 
* compiled using Turbo C or Watcom C. If you use Turbo C, choose huge model.
* This program is for demonstration use. Compression improvement can be 
* obtained by adjusting min_cnt1, min_cnt2 and the way of reconstructing model
* when memory is full. 
* Usage -- to compress : dmc input_file output_file
* to expand : dmc -e input_file output_file
*/
#include <stdio.h>
#ifdef __TURBOC______LINEEND____
#include <alloc.h>
#else
#include <malloc.h>
#endif
#include <string.h>
#include <io.h>
#include "iodmc.fun" /* io functions */
/* because we only have two symbols, we do not need higher precision */
#define NBITS 15 /* # of bits used in high, low */
#define MSBIT 0x4000 /* most significant bit */
#define MSMASK 0x7FFF /* consider 15 bits */
#define MASK2 0x1FFF /* for underflow use */
char *info = "Dynamic Markov Compression ( DMC ) ";
ui code, low, high, mp;
int underflow_bits;
int times = 1;
ul switch_delta = 5000;
ul *switch_state;
ul switch_len;
void put_file_length( long l, FILE *output )
{
 char *pc= (char*) &l, i;
 printf(" l = %ld ", l );
 for ( i = 0; i < sizeof( long ); ++i )
 putc( *pc++, output );
}
long get_file_length( FILE *input )
{
 long l;
 char *pc = (char *) &l, i;
 for ( i = 0; i < sizeof( long ); ++i )
 *pc++ = getc( input );
 return( l );
}
void initialize_encoder()
{
 low = 0;
 high = MSMASK;
 underflow_bits = 0;
}
void initialize_decoder( FILE *input )
{
 int i, bit;

 code = 0;
 for ( i = 0 ; i < NBITS; i++ ) {
 code <<= 1;
 input_bit( input, bit );
 code += bit;
 }
 low = 0;
 high = MSMASK;
}
/* next_state[d][s] = state reached from s after transition d
 trans_cnt[d][s] = number of observations of input d when in state s
 state = number of current state
 min_cnt1 = minimum # of transitions from the current state
 to state s before s is eligible for cloning
 min_cnt2 = minimum # of visits to a state s from all predecessors
 of S other than the current state before S is eligible
 for cloning. A simple choice for min_cnt1 and min_cnt2
 values is to set them to 2, 2.
*/
/* int and unsigned int are 32 bits for WATCOM C */
#ifdef __TURBOC______LINEEND____
#define maxstates 32760 /* TURBO C can access low DOS mem only*/
#else
#define maxstates 500000l /* for WATCOM C */
#endif
ui *next_state[2], *trans_cnt[2];
ui state, last_state;
ui nbits, total_states;
int min_cnt1 = 2, min_cnt2 = 2;/* for comparison, thus make it signed*/
/* initialize the model */
initialize_model()
{
 int i, j, k, m, n;
 static initialized = 0;
 min_cnt1 = min_cnt2 = 2;
 if ( !initialized ) {
 next_state[0] = (ui *) malloc( maxstates*sizeof( ui ) );
 check_mem_error( next_state[0] );
 next_state[1] = (ui *) malloc( maxstates*sizeof( ui ) );
 check_mem_error( next_state[1] );
 trans_cnt[0] = (ui *) malloc( maxstates*sizeof( ui ) );
 check_mem_error( trans_cnt[0] );
 trans_cnt[1] = (ui *) malloc( maxstates*sizeof( ui ) );
 check_mem_error( trans_cnt[1] );
 initialized = 1;
 } else {
 for ( i = 0; i < maxstates; ++i )
 trans_cnt[0][i] = trans_cnt[1][i] = 0;
 }
 n = 8;
 printf(" initialize_model %d times ", times++);
 m = 1;
 for ( i = 0; i < n; ++i )
 m = 2 * m;
 for ( i = 0; i < n; ++i )
 for ( j = 0; j < m; ++j ) {
 state = i + n * j;
 k = ( i + 1 ) % n;
 next_state[0][state] = k + (( 2*j ) % m ) * n;

 next_state[1][state] = k + ((2*j+1) % m ) * n;
 trans_cnt[0][state] = 1; /* force this to 1 to avoid overflow*/
 trans_cnt[1][state] = 1;
 }
 last_state = n * m - 1;
}
update_count( int x )
/* x is current bit */
{
 int b;
 unsigned int nxt, nxt_cnt, new;
 if ( trans_cnt[x][state] > 0xfff1 ){
 trans_cnt[0][state] /= 2; /* rescale counts to avoid overflow*/
 trans_cnt[1][state] /= 2;
 }
 ++trans_cnt[x][state];
 nxt = next_state[x][state]; /* next state */
 /* total transitions out of "nxt" on receiving 0, or 1 */
 nxt_cnt = trans_cnt[0][nxt] + trans_cnt[1][nxt];
 if ( (trans_cnt[x][state] > min_cnt1) &&
 ((int)(nxt_cnt - trans_cnt[x][state])>min_cnt2) ){
 ++last_state;
 new = last_state; /* obtain a new state # */
 next_state[x][state] = new;
 for ( b = 0; b <= 1; ++b ){
 next_state[b][new] = next_state[b][nxt];
 trans_cnt[b][new] = (ui) ( (ul) trans_cnt[b][nxt] *
 trans_cnt[x][state] / nxt_cnt );
 trans_cnt[b][nxt] = trans_cnt[b][nxt] - trans_cnt[b][new];
 }
 nxt = new;
 }
 state = nxt;
}
void flush_encoder( FILE *output )
{
 int b, i;
 output_bit( output, low & ( MSBIT >> 1 ) );
 underflow_bits++;
 b = (~low & ( MSBIT >> 1 ) ) ? 1 : 0;
 while ( underflow_bits-- > 0 )
 output_bit( output, b );
 b = 0;
 for ( i = 0; i < 16; ++i )
 output_bit( output, b );
}
ui get_mp ()
/* get mid point of high-low interval */
{
 ui p0, p1, mp;
 ul ps, range;
 p0 = trans_cnt[0][state] + 1;
 p1 = trans_cnt[1][state] + 1;
 ps = p0 + p1;
 ps = ( ul )p0 + ( ul ) p1; /* ps is unsigned long */
 range = ( ul )( high - low ) + 1;
 mp = low + (ui) (( range * p0 ) / ps );
 if ( mp >= high ) mp = high - 1; /* take care of roundoff error*/
 return( mp );

}
void shift_out_encoded_bits( FILE *output )
{
 int b;
 for ( ; ; ) {
 /* Shift out matched MSBs. */
 if ( ( low & MSBIT ) == ( high & MSBIT ) ) {
 b = ( high & MSBIT ) ? 1 : 0;
 output_bit(output, b); /* output one bit */
 b = b ? 0 : 1;
 while ( underflow_bits > 0 ){
 output_bit( output, b );
 underflow_bits--;
 }
 } /* if */
 /* If uderflow is threatening, throw away 2nd MSBs */
 else if ((low & ( MSBIT >> 1)) && !( high & (MSBIT >> 1) )) {
 underflow_bits += 1;
 low = low & MASK2;
 high = high (MSBIT>>1);
 } else
 break;
 low = ( low << 1) & MSMASK; /* shift in 0s */
 high = ( high << 1) & MSMASK;
 high = 1; /* shift in 1s */
 }
} /* shift_out_encoded_bits() */
encode( FILE *input, FILE *output )
{
 int mark, c;
 int i, j, k, b;
 long range;
 state = 0;
 do {
 mark = c = getc( input );
 for ( k = 0; k < 8; ++k ){
 b = 0x80 & c;
 b = ( b > 0 ) ? 1 : 0;
 mp = get_mp();
 if ( last_state == maxstates )
 initialize_model();
 update_count( b );
 c <<= 1;
 if ( b == 1 )
 low = mp; /* pick upper part of range */
 else
 high = mp - 1; /* pick lower part of range */
 shift_out_encoded_bits( output );
 } /* for k */
 } while ( mark != EOF ); /* do loop */
} /* encode */
void remove_and_get_bits( FILE *input )
{
 int bit;
 for ( ; ; ) {
 /* If the MSBs match, shift out the bits.*/
 if ( ( high & MSBIT ) == ( low & MSBIT ) )
 ;
 /* Else, throw away 2nd MSB to prevent underflow.*/

 else if ((low & (MSBIT>>1)) && ! (high & (MSBIT >> 1) ) ) {
 code ^= (MSBIT>>1);
 low = low & MASK2;
 high = (MSBIT>>1);
 } else
 /* Otherwise, nothing to shift, so return.*/
 break;
 low = ( low << 1) & MSMASK;
 high = ( high << 1) & MSMASK;
 high = 1;
 code = ( code << 1 ) & MSMASK;
 input_bit( input, bit );
 code += bit;
 } /* for (;;) */
} /* remove_and_get_bits() */
decode( long flen, FILE *input, FILE *output )
{
 FILE *fp;
 int b,n, i, j, k=0;
 ul len = 0;
 state = 0;
 while ( 1 ) {
 mp = get_mp();
 if ( code >= mp ){ /* determine if symbol is 0 or 1 */
 b = 1;
 low = mp;
 } else{
 b = 0;
 high = mp -1;
 }
 output_bit( output, b); /* output a bit */
 if ( ++k == 8 ){
 ++len;
 k = 0;
 }
 if ( len == flen )
 break;
 if ( last_state == maxstates )
 initialize_model();
 update_count(b); /* update state */
 /* Next, remove matched or underflow bits. */
 remove_and_get_bits( input );
 } /* while ( 1 ) */
} /* decode */
void compress( char *argv[] )
{
 FILE *input, *output;
 long file_length, file_size(); /* in main.c */
 int c;
 if ( ( input = fopen( *argv++, "rb" ) ) == NULL ){
 printf("\n%s doesn't exist ", *(argv-1) );
 exit( 1 );
 }
 output = fopen( *argv, "wb" );
 file_length = filelength ( fileno( input ) );
 put_file_length( file_length, output );
 initialize_model(); /* initialize the model */
 initialize_encoder();
 encode( input, output );

 printf("\nmin_cnt1 = %d ", min_cnt1 );
 flush_encoder( output );
 close_output( output );
 fclose( input );
 fclose( output );
} /* compress */
void uncompress( char *argv[] )
{
 FILE *input, *output;
 long file_length;
 if ( ( input = fopen( *argv++, "rb" ) ) == NULL ){
 printf("\n%s doesn't exist ", *(argv-1) );
 exit( 1 );
 }
 output = fopen( *argv, "wb" );
 file_length = get_file_length( input );
 initialize_model(); /* initialize the model */
 initialize_decoder( input );
 decode( file_length, input, output );
 fclose( input );
 fclose( output );
} /* umcompress */
#include "main.c" /* housekeeping */

Listing Three
/*** iodmc.fun : functions for compression and expansion */
#define check_mem_error( x ) if ( x == NULL ){printf("\nout of memory");\
 exit( 1 );}
#define rotater( x ) { x >>= 1; if ( x == 0 ) x = 0x80; } /* rotate right */
#define input_bit( input, x ) { \
 rotater( in_control.mask ); \
 if ( in_control.mask == 0x80 ){ \
 in_control.code = getc( input ); \
 if ( in_control.code == EOF ){ \
 printf("\nerror! out of data "); \
 exit( 1 ); \
 } \
 if ( !(comforter++ & 0x0fff) ){ \
 putc('.', stdout ); \
 fflush( stdout ); \
 } \
 } \
 x = in_control.mask & in_control.code ? 1 : 0 ; \
}
#define output_bit( output, bit ) { \
 if ( bit ) \
 out_control.code = out_control.mask; \
 rotater( out_control.mask ) \
 if ( out_control.mask == 0x80 ){ \
 if ( putc( out_control.code, output ) != out_control.code ) \
 printf("\nfatal error in output_bit" ); \
 out_control.code = 0; \
 if ( !(comforter++ & 0x0fff) ){ \
 putc('.', stdout ); \
 fflush( stdout ); \
 } \
 } \
}
typedef unsigned int ui;

typedef unsigned short int us;
typedef unsigned char uc;
typedef unsigned long ul;
typedef struct {
 uc mask;
 int code;
} IO_CONTROL;
IO_CONTROL out_control = { 0x80, 0 };
IO_CONTROL in_control = { 0x01, 0 };
ui comforter = 0;
FILE *gopen( char name[] )
{
 FILE *fp;
 if ( ( fp = fopen( name, "rb" ) ) == NULL ){
 printf("\n%s doesn't exist ", name );
 exit( 1 );
 }
 return( fp );
}
void output_bits( FILE *output, ul value, int count )
{
 ul p;
 static ui comforter = 0;
 p = 1L << ( count - 1 );
 while ( p != 0 ) {
 if ( p & value )
 out_control.code = out_control.mask; /* non-zero bit */
 p >>= 1;
 out_control.mask >>= 1;
 if ( out_control.mask == 0 ){
 putc( out_control.code, output );
 out_control.mask = 0x80;
 out_control.code = 0;
 if ( !(comforter++ & 0x0fff) ){
 putc('.', stdout );
 fflush( stdout );
 }
 }
 }
} /* output_bits() */
close_output ( FILE *output )
{
 if ( out_control.mask != 0x80 )
 putc( out_control.code, output );



















Faster Fractal Compression


Speed




D.R. McGregor, R.J. Fryer, P. Cockshott, and P. Murray


The authors work in the department of computer science at the University of
Strathclyde, Glasgow, Scotland. They can be contacted at
office@cs.strat.ac.uk.


Fractal image compression has a formidable reputation. While reputed to
achieve extremely high compression ratios, it nonetheless makes extreme
demands on computing power and mathematical understanding. To truly comprehend
it, you must wade through textbooks on Cauchy series, Hausdorff spaces, affine
transforms, and the like. To win its promise, you must be prepared to
sacrifice hours of CPU time.
Once you get the hang of it, however, the ideas underlying fractal compression
are quite simple. You don't need to be an expert in abstract algebra or
topology to understand it, and by applying fairly simple optimization
techniques, the compute cost of fractal compression becomes quite modest. In
this article, we'll explain how and why fractal compression works, then
present the Fast Fractal Transform, a family of algorithms that achieve a
several-hundred-fold speed-up over the simple fractal transform.


How Fractal Compression Works


Fractals are structures that exhibit self similarity at different scales. The
classic example is a mountain, like that in Figure 1. A photo of an area on
the side of a mountain has a characteristic, "mountainy" appearance. If you
zoom in on a portion of the mountain, select an arbitrary square half the size
of the original, and enlarge it to the original size, the blow up will exhibit
the same sort of mountainy look. In Figure 1, a picture of Cir Mhor on the
Scottish Island of Arran, the big block B is similar to the small square A.
The small square C is also a mirror image at a smaller scale of the big block
D. Figures 2(a) and 2(b) are enlarged versions of blocks A and B.
If you grasp the essence of this mountainy look, you can "grow" mountain
pictures to order, as is done in the type of fractal image synthesis used in
cinematographic special effects. Image synthesis "grows" a generic mountain;
fractal compression uses similar principles to regenerate a specific mountain,
flower, bird, or tree.
A half-sized square of the mountainside has the same general appearance as the
whole mountain because (and this is a recursive argument) within the
half-sized square, numerous areas look like some larger areas in the
full-sized image. In other words, small hillocks and rocks look pretty much
like larger hillocks and rocks. No matter which size picture you look at, your
mind's eye takes in this texture--this particular sort of hillocky, bumpy look
that gives a mountain its special character.
Fractal compression takes advantage of the similarity between the part and the
whole. A fractal transform of a picture is in essence a list of pairs of
squares [(B1,S1), (B2,S2),...(Bn,Sn)], where the Bi are selected big-square
blocks and Si are small squares which, taken together, cover the entire image.
Square S1 looks like block B1, square S2 looks like block B2, and so on. The
fractal transform records these similarity relationships. 


Reconstructing the Image


The wizardry of fractal compression is embodied in the regeneration of the
image from pairs of similar squares. For each small square on the picture,
there's a similar big block. You want to use information about the big blocks
to fill in the small-scale detail. If you shrink a copy of a big block and
move it onto the corresponding small square, you'll get an approximation of
what the small square should look like. The quality of this approximation
depends upon the degree of similarity between the small square and the big
block. The more fractal the image, the greater the likenesses between its big
blocks and small squares--and the better the approximation generated by
copying and shrinking.
Using big blocks to fill in small squares gives you leverage on the detail.
This is similar to a mint producing the dies for coins: A large model of the
coin is first made by hand, then traversed by an arrangement of levers which
guide the cutter that makes the small-scale die. In the process, letters that
were reasonably large on the model are reduced to the neat, small print on the
coin.
The compressed file contains a small amount of additional information, giving
the average brightness and color levels for each small square. This gives you
something to start with. 
If you set up a fractal decompression program to output to the screen (showing
you the image as it is decompressed), you initially see a patchwork of small,
uniformly colored squares; eight pixels on an edge, for example. The initial
8x8-pixel squares get their colorization from the data in the initial file; at
the next iteration, each small square is replaced by a reduced image of a
16x16 big block that overlaps with 4-9 original small squares, providing more
details. In the next step, the 16x16 big block covers 4-9 small squares, each
of which contains details derived from 4-9 small squares in the previous step.
After a few iterations, details are a single pixel in size.


Complexity of Compression


The big obstacle to widespread use of fractal compression has been its huge
computational cost. This stems from its basic operation--finding a
most-similar big block for each small square in the picture. 
Assume you want to compress a picture 256x256 pixels in size using small
squares of size 8x8 matched against big blocks of size 16x16. Clearly, the
image would be covered by a patchwork of 1024 squares, 32 in each direction.
For each small square, you would have to find the corresponding big block that
best fits it.
How many big blocks do you have to search?
Consider the number of points on the picture that could serve as the origins
of big blocks. On the x-axis, any position from 0 to 239 would do. Above 239,
the big blocks would be partly off the right edge of the screen. By symmetry,
there are 240 possible positions in each direction, giving a total of 57,600. 
The number of square-to-block comparisons it would take to search 57,600 big
blocks to find the best match for each of the 1024 small squares is
58,982,400. If you assume the simplest case--that is, you perform the
comparisons by simply decimating the big blocks so that you pick every fourth
pixel in the big block and compare it to the corresponding one in the small
square--then each block-to-block comparison has 64 pixel-to-pixel comparisons.
This yields 3,774,873,600 pixel-by-pixel comparisons. But to do a thorough
job, you must look not only for similar blocks in the same orientation, but
also for rotated or mirror images of the original. Thus if you were
compressing a picture of a building, the left edge might look like a scaled
and reflected right edge. A decompression process that allows not just
translation and scaling of squares but also rotation and reflection makes
better use of the image's self similarity. Given four possible rotations and
two mirror images, that's more than 30 billion pixel comparisons. Three color
planes in the image will give you 90,596,966,400 byte comparisons. Since the
most-similar block is usually determined by a least-squares method, each byte
comparison involves a subtraction, multiplication, and addition, creating a
significant number of instructions. 
Each pixel in the image must be compared with almost every other pixel, so if
the image is N pixels on an edge, its total complexity is of order N4. This is
a polynomial, but of worryingly high order. This naive algorithm can take
several hours of CPU time, and must therefore be optimized.


Doing It Quickly


At the University of Strathclyde, we've developed the "Fast Fractal
Transform," a new and much faster algorithm. On a SPARC II, it takes 35
seconds to compress a 256x256 24-bit color image. On the same machine, an
8-bit gray-scale version of the same image took 8.5 hours using the
conventional algorithm presented in the article "Fractal Image Compression,"
by L.A. Anson (Byte, October 1993). 
Figure 3(a), an uncompressed 256x256 24-bit image entitled "Rachel,"
illustrates our technique's compression capabilities. Table 1(a) shows how
long it takes to compress the image on a Sun 4/75, as well as the relative
quality of the compression as reflected in the root-mean-square (RMS)
pixel-by-pixel difference between the original and the reconstructed images.
In this instance, we used 8x8 small squares, resulting in a compression ratio
of 38:1. Table 1(b) shows compression using 4x4 small squares, resulting in a
compression ratio of 9.5:1. Figure 3(b) is the Rachel image compressed using
the best of eight matches. Figure 4 is a graph of the mean match error versus
the number of candidate matches extracted from the K-D tree (discussed
shortly) for a typical 256x256 24-bit image. The match error is the sum of the
squares of the individual pixel errors between a small square (4x4, in this
instance) and its corresponding big block for the case of the best match
delivered. This computation assumes the color planes are independent of each
other.
The obvious alternative to an exhaustive search is indexing blocks in terms of
their similarity. This lets you perform the whole process in two passes. The
first pass places each big block into the index and the second looks up the
most-similar partner in the index for each small square. Since indexes can
usually be searched in logarithmic time, it seemed that the entire process
would be much faster for two reasons:
Associative store enables the matching process to select only those big blocks
that are most similar to the small square currently being matched. Each match
can be done in a time logarithmically related to the number of big blocks
existing in the image. For example, because of the associative store's
characteristics, the selection can obtain the best matches with the order of
only log M single comparisons, where M is the number of big blocks. 
A small square can be compared with a big block on the basis of its gross
characteristics only. This is much faster than determining whether or not they
conform on a pixel-by-pixel basis (typically two or three comparisons for an
8x8 block, as opposed to 64).



Suitable Gross Characteristics


A surprisingly large variety of gross characteristics can be used, based on
the following criteria:
They should be independent, each providing different information to
characterize the patch.
They should be relatively immune to noise (hence, characteristics of the
entire set of patch pixels are preferred).
They should be simple to compute.
Example characteristics include the big block's Fourier coefficients and
Discrete Cosine Transformation coefficients. Alternatively, you might use the
result of summing the pixels in the large patch under one or more different
masks, or of determining the principle moment of the image in the big block.
The first step in our compression method computes such gross characteristics
for the potential big blocks in the image. 
The second phase of the algorithm computes the same set of gross
characteristics for the small squares. These computations must be performed
only once for both blocks and squares, respectively.
The computed characteristics of the small squares are then used to index the
big blocks' associative table of characteristics to discover which big blocks
are most similar to the given small square. For each small square, the
associative mechanism can return either a single best match or a relatively
small number of near matches, from which we can select the optimum match
according to other criteria.
Matching a small square to a corresponding big block on the basis of its gross
characteristics obtains the relative x- and y-coordinates required for the
translation transform in the compression process.


The Associative Mechanism


When presented with a set of attributes, the mechanism must be able to quickly
locate the nearest-matching item in its memory. In a single-processor system,
this means with as few simple comparisons as possible. 
Suitable associative data structures are already well known--the K-D tree, for
instance. The mechanism adopted can select not only a single best match, but a
set of k best matches. In such a tree, similar items are located near each
other. To find a match for a small square, its attribute characteristics are
calculated and the nodes of the tree are searched from the root down until an
initial candidate best-match big block is located. The exact overall
similarity distance from the small square is then calculated, and the search
returns to the parent node, which contains upper- and lower-bound information
pertaining to that level's attribute. 
If the similarity distance between the small square's attribute and the bound
(in that dimension only) exceeds the selection criterion, the alternate branch
can be ignored; otherwise it, too, must be searched to find the best match, or
best-matching set. This process is repeated until the root is reached. Thus,
the store mechanism selects the best match, or a set of k matches, or the set
within some search-specified distance of the given small square. This improves
the range of candidate matches, while only minimally increasing the number
examined.
The Distance Function used is a weighted Euclidean distance over the chosen
attribute values corresponding to the blocks and squares being compared. The
weight applied to a particular attribute value should optimally be linked to
the accuracy of the attribute itself. (More attention is given to parameters
that are on average more accurate than to those that are less accurate.) The
weight must also, however, reflect the discriminating power of the attribute
(for example, an attribute constant over all patches is useless, even though
it is consistent and accurate). 
The values of the weights used by the algorithm can be determined by a
standard procedure, as follows:
1. Process a set of images by Barnsley's method, obtaining the exhaustively
searched, correct, best-match transform.
2. Calculate all gross characteristics for all small squares and their
corresponding big blocks. 
3. Based on these values, adjust the weights using optimization. This
decreases the similarity distance between each Barnsley-matched pair and
increases the distance between nonmatched pairs. 


Two Dimensions are not the Limit 


Unlike previous methods, the Fast Fractal Transform is practical for 3-D data,
whether derived from spatial dimensions (MRI 3-D scan data), or from video
sources in which the third dimension is time. Very high compression ratios in
reasonable time are possible. Indeed, this method could be applied to even
higher-dimension data sets such as those derived from volumes of the
atmosphere or ocean, involving variation over time and additional attributes
in additional dimensions. 


The Complexity of the New Method


In our example of a 256x256 image, there would typically be 240x240 candidate
big blocks whose values must be stored in the table. The K-D tree implements a
mechanism that can determine the best-matching big block in the order of
log2(M) operations, where M is the number of entries for big blocks in the
tree. 
Thus, the best-matching big block can be determined in the order of
log2(240x240) individual comparisons in the K-D tree (approximately 16)
instead of 240x240 (57,600). Remember that in practice, a small set of
best-matching candidates may be returned. The major components not yet
accounted for are the calculation of the characteristics (once only) for
setting up the tree, and the final pixel-by-pixel comparison of the
nearest-neighbor set of candidates returned by the Associative Memory. 
The cost of building the K-D tree is composed of the cost of computing the
gross attributes and the cost of storing them in the tree. Typically a small
number of attributes are used, say three to five. The cost of computing each
attribute is, to a first approximation, proportional to the size of the big
blocks, say m2, where m is the edge of the block. So the complexity is
m2xnumber of big blocksxnumber of attributes. For the example given, this
requires O(16x16x240x240x3) approximately equals O(44,236,800) operations.
Building the tree requires O(240x240xlog2(240x240)) approximately equals
O(892,800) operations. The rate-limiting factor is thus the computation of the
gross attributes, and Fast Fractal Transform has a complexity about three
orders of magnitude less than the conventional algorithm. The observed
execution times are indeed about three orders of magnitude faster.


References


Anson, L.A. "Fractal Image Compression." Byte (October 1993).
Barnsley, M.F. and A.D. Sloane. "A Better Way to Compress Images." Byte
(January 1988).
Buerger, M.J. Crystal Structure Analysis. New York, NY: John Wiley & Sons,
1960.
Friedman, J.H., J.L. Bentley, and A.F. Finkel. "An Algorithm for Finding Best
Matches in Logarithmic Expected Time." ACM Transactions on Mathematical
Software, Vol. 3, No. 3, 1977. 
Figure 1: The big block B is similar to the small square A in this photo of
Cir Mhor on the Scottish Island of Arran. The small square C is also a mirror
image at a smaller scale of the big block D. (Photo courtesy of Neil
MacLennan, Strathclyde University Audio Visual Services.)
Figure 2: Enlarged versions of A and B.
Figure 3: (a) An uncompressed 256x256 pixel picture of "Rachel" (b) "Rachel"
compressed using the best of eight matches. 
Figure 4: Graph of the mean match error versus the number of candidate matches
extracted from the K-D tree for a typical 256x256 24-bit image.
Table 1: (a) Compression results on an 8x8 small square; (b) results on a 4x4
small square.
 Number of Time to RMS Error Matches Compress
(a) 1 19.53 15.07 2 20.55 12.50 8 28.75 12.18
(b) 1 15.82 10.52 2 17.67 8.11 8 30.42 6.74




































































Differential and Linear Cryptanalysis


Attacking the Data Encryption Standard




Bruce Schneier


Bruce is a DDJ contributing editor and author of Applied Cryptography, Second
Edition (John Wiley & Sons, 1996). He can be contacted at
schneier@counterpane.com.


The Data Encryption Standard (DES) has been the workhorse of cryptography for
almost 20 years. Recently, two powerful new attacks on DES have been invented:
differential and linear cryptanalysis. Both are statistical in that an
attacker collects a large amount of plaintext and ciphertext associated with a
given key, then uses that information to determine the key. In this article,
I'll explain the attacks by showing how they work against DES.


A DES Backgrounder 


Since the mid-1970s, DES has been analyzed and discussed in every book and
magazine (including DDJ) that covers cryptography. DES is a 16-round block
cipher. Its input is a 64-bit data element, x. You first permute the bits of x
in a fixed pattern, then divide x into two 32-bit halves: L and R. Then, for
i=1 to 16:
Li=Ri-1
Ri=Li-1 XOR f(Ri-1,Ki)
After the sixteenth round, you swap L and R. Next, you recombine L and R and
permute the bits again to get the ciphertext; see Figure 1.
Function f, the round function, is where the security lies. First, the 32
input bits are permuted and expanded to 48 bits. Then, the 48 bits are divided
into eight 6-bit chunks. Each chunk goes through an "S-box," the output of
which is a 4-bit number. These output 32 bits are permuted again.
DES has many constants, including the fixed permutations and the eight
different S-boxes. These are all specified, but the designers of DES give no
reasons why they were chosen instead of others.


Differential Cryptanalysis


In 1990 and 1991, Eli Biham and Adi Shamir introduced differential
cryptanalysis, which looks specifically at pairs of ciphertexts whose
plaintexts have particular differences. Differential cryptanalysis analyzes
the evolution of these differences as the plaintexts propagate through the
rounds of DES when they are encrypted with the same key.
Put simply, the technique chooses pairs of plaintexts with a fixed difference.
The two plaintexts can be chosen at random, as long as they satisfy particular
difference conditions; you don't even have to know their values. Then, using
the differences in the resulting ciphertexts, you assign different
probabilities to different keys. As you analyze more ciphertext pairs, one key
will emerge as the most probable. This is the correct key.
The details are, of course, more complicated. Consider Figure 2, the DES round
function. Imagine a pair of inputs, X and X', that have the difference DX. The
outputs, Y and Y' are known, therefore, so is the difference, DY. Both the
expansion permutation and the P-box are known, so DA and DC are known. B and
B' are not known, but their difference DB is known and equal to DA. (When
looking at the difference, the XORing of Ki with A and A' cancels out.) So
far, so good. Here's the trick: For any given DA, not all values of DC are
equally likely. The combination of DA and DC suggests values for bits of A XOR
Ki and A' XOR Ki. Since A and A' are known, this gives us information about
Ki.
Look at the last round of DES. (Differential cryptanalysis ignores the initial
and final permutations since they have no effect on the attack--except to make
it harder to explain.) If we can identify K16, then we have 48 bits of the
key. (Remember, the subkey in each round consists of 48 bits of the 56-bit
key.) The other eight bits we can get by brute force. Differential
cryptanalysis will get us K16.
Certain differences, called "characteristics," in plaintext pairs have a high
probability of causing certain differences in the resulting ciphertext pairs.
Characteristics extend and define a path through several rounds. There is an
input difference, a difference at each round, and an output difference--with a
specific probability.
You can find these characteristics by generating a table where the rows
represent the possible input XORs (the XOR of two different sets of input
bits), the columns represent the possible output XORs, and the entries
represent the number of times a particular output XOR occurs for a given input
XOR. You can generate such a table for each of DES's eight S-boxes.
Figure 3 (a), for example, is a one-round characteristic. The input difference
of the left side is L; it could be anything. The input difference of the right
side is 0. (The two inputs have the same right side, so their difference is
0.) Since there is no difference going into the round function, there is no
difference coming out of it. Therefore, the output difference of the left side
is L XOR 0=L, and that of the right side is 0. This is a trivial
characteristic, and is true with probability 1.
Figure 3 (b) is a less obvious characteristic. Again, the input difference to
the left side is arbitrary--L. The input difference to the right side is
0x60000000; the two inputs differ in only the first and third bits. With a
probability of 14/64, the output difference of the round function is
LXOR0x00808200, so the output difference of the left side is LXOR0x00808200
and that of the right side is 0x60000000--with probability 14/64.
Different characteristics can be joined and, assuming the rounds are
independent, the probabilities can be multiplied. Figure 4 joins the two
characteristics just described. The input difference to the left side is
0x00808200, and that to the right side is 0x60000000. At the end of the first
round, the input difference and the output of the round function cancel out,
leaving an output difference of 0. This feeds into the second round, where the
final output difference of the left side is 0x60000000 and that of the right
side is 0. This two-round characteristic has a probability of 14/64.
A plaintext pair that satisfies the characteristic is a correct pair; the pair
that does not is a wrong pair. A correct pair will suggest the correct round
key (for the last round of the characteristic); a wrong pair will suggest a
random round key. To find the correct round key, simply collect enough guesses
so that one subkey is suggested more often than all the others. In effect, the
correct subkey will rise out of all the random alternatives.
So, the basic differential attack on n-round DES will recover the 48-bit
subkey used in round n, and the remaining eight key bits are obtained by
brute-force guessing.
There are still considerable problems. There is a negligible chance of success
until you reach some threshold; that is, until you accumulate sufficient data,
you can't tell the correct subkey from all the noise. And the attack isn't
practical: You have to use counters to assign different probabilities to 248
possible subkeys, and too much data is required to make this work.
Consequently, Biham and Shamir tweaked their attack. Instead of using a
15-round characteristic on 16-round DES, they used a 13-round characteristic
and some tricks to get the last few rounds. A shorter characteristic with a
higher probability worked better. And they used some clever mathematics to
obtain 56-bit key candidates which could be tested immediately, eliminating
the need for counters. This attack succeeds as soon as a right pair is found;
this avoids the threshold and gives a linear success probability. If you have
1000 times fewer pairs, you have 1000 times smaller chance of success--but
there is always some chance of immediate success.
The results are most interesting. DES variants with fewer rounds are highly
susceptible to differential cryptanalysis. The best attack against full
16-round DES requires 247 chosen plaintexts. This can be converted to a
known-plaintext attack that requires 255 known plaintexts. The analysis
requires 237 DES operations.
Differential cryptanalysis works against DES and other similar algorithms with
constant S-boxes. The attack is heavily dependent on the structure of the
S-boxes; the ones in DES just happen to be optimized against differential
cryptanalysis. The attack works against DES in any of its operating
modes--ECB, CBC, CFB, and OFB--with the same complexity.
DES's resistance can be improved by increasing the number of rounds.
Chosen-plaintext differential cryptanalysis DES with 17 or 18 rounds takes
about the same time as a brute-force search. At 19 rounds or more,
differential cryptanalysis becomes impossible because it requires more than
264 chosen plaintexts: Remember, DES has a 64-bit block size, so it only has
264 possible plaintext blocks. (In general, an algorithm is resistant to
differential cryptanalysis if the amount of plaintext required to mount such
an attack is greater than the amount of plaintext possible.)
Realize that this attack is largely theoretical. The enormous time and data
requirements to mount a differential cryptanalytic attack put it beyond the
reach of almost everyone. To get the requisite data for this attack against a
full DES, you would have to encrypt a 1.5-MB/sec data stream of chosen
plaintext for almost three years. Furthermore, this is primarily a
chosen-plaintext attack. To convert it to a known-plaintext attack, you have
to sift through all of the plaintext-ciphertext pairs looking for the useful
ones. For full 16-round DES, this makes the attack slightly less efficient
than brute force (the differential cryptanalytic attack requires 255.1
operations, and brute force requires 255). Properly implemented, DES is still
secure against differential cryptanalysis.
Why is DES so resistant to differential cryptanalysis? Why are the S-boxes
optimized to make this attack as difficult as possible? Why are there as many
rounds as required, but no more? Because, as Don Coppersmith of IBM admitted,
the designers knew about it in the mid-1970s (but it was classified government
information).


Linear Cryptanalysis


Linear cryptanalysis, invented by Mitsuru Matsui in 1993, is a type of
cryptanalytic attack that uses linear approximations to describe the action of
DES. This means that if you XOR some of the plaintext bits together, XOR some
ciphertext bits together, and then XOR the results, you will get a single bit
that is the XOR of some of the key bits. This is a linear approximation, and
will hold with some probability p. If p is not equal to 1/2, then this bias
can be exploited. You use collected plaintexts and associated ciphertexts to
guess the values of the key bits. The more data you have, the more reliable
the guess. The greater the bias, the greater the success rate with the same
amount of data.
How do you identify good linear approximations for DES? For starters, you find
good one-round linear approximations and join them together. (Again, ignore
the initial and final permutations; they don't effect the attack.) Look at the
S-boxes. There are six input bits and four output bits. The input bits can be
combined using XOR in 63 useful ways (26-1), and the output bits can be
combined in 15 useful ways. For each S-box, you can evaluate the probability
that for a randomly chosen input, an input XOR combination equals some output
XOR combination. If a combination has a high-enough bias, then linear
cryptanalysis may work.
If the linear approximations are unbiased, then they will hold for 32 of the
64 possible inputs. I'll spare you the pages of tables, but the most biased
S-box is S-box 5. In fact, the second input bit is equal to the XOR of all
four output bits for only 12 inputs. This translates to a probability of 3/16,
or a bias of 5/16, and is the most extreme bias in all the S-boxes.

Figure 5 shows how to turn this into an attack against the DES round function.
The input bit into S-box 5 is b26. (I am numbering the bits from left to
right, and from 1 to 64. Matsui ignores this convention with DES and numbers
his bits from right to left and from 0 to 63. It's enough to drive you mad.)
The four output bits from S-box 5 are c17, c18, c19, and c20. We can trace b24
backwards from the input to the S-box. The bit a26 is XORed with a bit from
the subkey Ki,26 to obtain b26. Bit X17 goes through the expansion permutation
to become a26. After the S-box, the four output bits go through the P-box to
become four output bits of the round function: Y3, Y8, Y14, and Y25. This
means that with probability 1/2-5/16:
X17XORY3XORY8XORY14XORY25=Ki,26
Linear approximations for different rounds can be joined in a manner similar
to that discussed under differential cryptanalysis. Figure 6 is a three-round
approximation with a probability of 1/2+.0061. The individual approximations
are of varying quality: The first and last are pretty good, and the middle is
bad. Together, the three one-round approximations give a very good three-round
approximation.
The basic attack is to use the best linear approximation for 16-round DES. It
requires 247 known-plaintext blocks and will result in one key bit. This is
clearly not very useful. If you interchange the role of plaintext and
ciphertext and use decryption as well as encryption, you can get two key bits.
This still isn't very useful.
There are refinements. Use a 14-round linear approximation for rounds 2
through 15. Guess the six subkey bits relevant to S-box 5 for the first and
last rounds (12 key bits in all). Effectively, you are doing 212 linear
cryptanalyses in parallel and picking the correct one based on probabilities.
This recovers the 12 bits, plus the b26, and reversing plaintext and
ciphertext recovers another 13 bits. To get the remaining 30 bits, use
exhaustive search. There are other tricks, but that's basically it.
Against full 16-round DES, this attack can recover the key with an average of
243 known plaintexts. A software implementation of this attack recovered a DES
key in 50 days using 12 HP9735 workstations.
Linear cryptanalysis depends heavily on the structure of the S-boxes, and the
S-boxes in DES are not optimized against this attack. According to Don
Coppersmith, resistance to linear cryptanalysis "was not part of the design
criteria of DES." Either they didn't know about linear cryptanalysis, or they
knew about something else even more powerful whose resistance criteria took
precedence.
Linear cryptanalysis is newer than differential cryptanalysis, and there may
be more improvements in the years to come.


Future Directions


Work has been done to extend the concept of differential cryptanalysis to
higher-order differentials. Lars Knudsen uses "partial differentials" to
attack 6-round DES; it requires 32 chosen plaintexts and 20,000 encryptions.
The attack is too new to know if these extensions will make it easier to
attack full 16-round DES.
Another avenue of attack is differential-linear cryptanalysis: combining
differential and linear cryptanalysis. Susan Langford and Martin Hellman have
an attack on eight-round DES that recovers ten key bits with an 80 percent
probability of success with 512 chosen plaintexts and a 95 percent probability
of success with 768 chosen plaintexts. However, it doesn't seem to extend
easily to more rounds.
These attacks are still new, and work continues. There may be a breakthrough
sometime during the next few years. Maybe we will see a practical statistical
attack against DES. Who knows?


References


Biham E. and A. Shamir. Differential Cryptanalysis of the Data Encryption
Standard. New York, NY: Springer-Verlag, 1993.
Matsui, M. "Linear Cryptanalysis Method for DES Cipher," Advances in
Cryptology-EUROCRYPT '93 Proceedings. New York, NY: Springer-Verlag, 1994.
------. "The First Experimental Cryptanalysis of the Data Encryption
Standard." Advances in Cryptology-CRYPTO '94 Proceedings. New York, NY:
Springer-Verlag, 1994. 
Schneier, B. Applied Cryptography, Second Edition. New York, NY: John Wiley &
Sons, 1996.
Figure 1: DES.
Figure 2: DES round function.
Figure 3: DES characteristics. (a) With probability 1; (b) with probability
14/16 (X=0x60000000, Y=0x00808200).
Figure 4: A two-round DES characteristic with probability 14/64 (X=0x60000000,
Y=0x00808200).
Figure 5: A one-round linear approximation for DES.
Figure 6: A three-round linear approximation for DES with probability
1/2+6.1x10-3 (A=[3,8,14,25], B=[8,14,25]).
































Evaluating Data-Compression Algorithms


Finding the right tool for the right job




G. Jason Mathews


Jason is a computer engineer at the National Space Science Data Center
Interoperable Systems Office. He can be contacted at
mathews@nssdc.gsfc.nasa.gov.


The large amount of data collected by NASA and other scientific organizations
makes data compression a necessary part of archival and data-management
systems. For instance, data compression is being evaluated for incorporation
into the Common Data Format (CDF), a data-management package developed and
maintained by the National Space Science Data Center of the NASA/Goddard Space
Flight Center (GSFC) for storing, manipulating, and accessing multidimensional
data sets. CDF is capable of storing any type of data, including scalar data
items, vectors, and multidimensional arrays of data.
Regardless of dimensionality, all data types break down into a sequence of
bytes that can be fed into a compression function. The diverse data types and
high-performance speed requirements necessitate a general-purpose, fast,
simple data-compression algorithm. As a member of the team designing and
implementing CDF, I compared several data-compression approaches, searching
for a suitable, general-purpose algorithm. In this article, I'll present the
results of my examinations. While specifically targeted at CDF, the results
are applicable to a variety of platforms.
An ideal data-compression algorithm would compress all data at the highest
compression rate in the least amount of time. No such algorithm exists,
however, since the measurement criteria are both data and application
dependent. For example, one developer might want a high compression ratio
regardless of time, while another might need fast compression with some
sacrifice of compression ratio. There are trade-offs among the following: time
to compress and decompress the data, compression rates, memory requirements
(hash tables, frequency counts, temporary buffers, and so on), and algorithmic
complexity. 
CDF is a programming interface for applications whose goal is to provide fast
access to the data; therefore, the compression algorithm should be fast. There
isn't much need for compression if most of the data cannot be compressed, so
we needed a general-purpose compression algorithm that compresses most data
with a good compression ratio. The compression ratio is defined as the
percentage reduction (1--compressed data size/uncompressed data size) x 100.
One hundred percent compression is not possible unless the compressed size is
zero, which occurs only if the uncompressed data size is zero as well.
Therefore, the upper limit is 100 percent, and the ratios will approach this
limit. If the compressed data size is larger than the uncompressed data size,
the data expanded with negative compression. Unless you set a lower limit,
data can expand infinitely. If you set the lower limit to zero, files that
fail to compress will retain a compression ratio of 0 instead of expanding. A
good compression ratio is defined as a significant data reduction for all test
data as a whole for a particular compression algorithm. Furthermore, the CDF
interface is a library and is linked into all CDF applications, so it's also
important that the compression algorithm not dramatically increase the
application size.
Finally, portability is important, since both the CDF software and data sets
are portable to DEC Alpha, DECstation, HP 9000, PC, RS/6000, Macintosh, NeXT,
SGI, Sun, and VAX platforms. The hallmark of the CDF concept is its data-set
independence, so a compression implementation with minimum architecture- and
machine-level dependencies is preferable:
A fast algorithm that minimizes compression/decompression time.
A good algorithm with a high compression rate.
A simple algorithm that has a small code size.


Data Characteristics


CDF can handle data conceptually organized as scalar and multidimensional with
arrays of up to ten dimensions. It supports the data types available with C
and Fortran compilers on most systems, which include 1-, 2-, and 4-byte signed
and unsigned integers, 4-byte single-precision floating point, 8-byte
double-precision floating point, and 1-byte signed and unsigned character
types. For example, there may be data defined as a single scalar 1-byte
integer and another as a 512x512x16 array of double-precision floating-point
numbers. Although it is not very productive to compress a single scalar byte,
compressing a large array or stream of bytes has vastly different results.
Similarly, any data value with two or three bytes cannot be reduced. However,
a 4-byte quantity could possibly be reduced to three bytes, so the minimum
file size of four bytes is selected. Since most data stored in CDFs typically
have fewer than three dimensions (for example, scalar data types,
three-element vectors, and two-dimensional images), the sizes of the sample
data files are arbitrarily chosen to range from 4 bytes to 144 KB. Although
larger files achieve higher compression ratios, smaller data files better
represent the typical size of data stored as components within a CDF.
To evaluate a general-purpose compression algorithm, I needed a representative
collection of sample data. I turned to the graphics, text, executables, and
sound-data samples presented in Dr. Dobb's Journal; see "DDJ Data Compression
Contest" (DDJ, February 1991) and "DDJ Data Compression Contest Results," by
Mark Nelson (DDJ, November 1991). While it is unlikely that MS-DOS executable
programs will be stored in CDFs, these files have their own structure, and
since the goal was to find a general-purpose compressor for all types of data,
I included all of the files used by the DDJ tests. In addition, a subset of
scientific data from the International Solar-Terrestrial Physics (ISTP) Key
Parameter Data CD-ROM was included, since it represents the typical values and
types of data that CDF actually stores (data for ISTP spacecraft orbit,
attitude, magnetic field and particle measurements, and images).
The compression programs I evaluated included implementations for variations
of the LZ77, LZ78, LZW, Huffman, run-length encoding, and arithmetic
compression algorithms. Table 1 lists the programs included for the MS-DOS
compression tests.
CHURN, the compression utility program that accompanies Mark Nelson's Data
Compression Book, Second Edition (M&T Books, 1995), is capable of running
compression and decompression programs on all data files in a specified disk
volume for MS-DOS. CHURN also measures the elapsed compression and
decompression times, and verifies that the contents of the decompressed files
match the original data files. CHURN is called with three arguments: a drive
letter and pathname to recursively search for files to compress, a compression
command, and an output filename. The compression command tells CHURN how to
compress the input file to a file called TEST.CMP. CHURN executes the
compression command by passing the command line to DOS using the system()
function call. The program inserts the filename into the compression command
by calling sprintf() with the filename as an argument. This means that if the
compression command has a %s anywhere in it, the name of the input file will
be substituted. Finally, the third argument on the command line is the command
that CHURN needs to decompress TEST.CMP to TEST.OUT; for example, CHURN
D:\DATA\ "LZSS-C %%s test.cmp" "LZSS-D test.cmp test.out". The double %
symbols defeat variable substitution under some command-line interpreters such
as 4DOS. A more complicated example which tests PKZIP might be CHURN D:\DATA\
"PKZIP.BAT %%s" "PKUNZIP TEST.CMP", where PKZIP.BAT has two lines:
COPY %1 TEST.OUT
PKZIP -M TEST.CMP TEST.OUT
CHURN also creates a file called CHURN.LOG containing a summary of compression
results. This can be used for further analysis by other programs. This file is
reformatted and used to generate the graphic output in Figure 1, where the
numbers correspond to those in Table 1. CHURN and the corresponding
compression programs were tested on a 486/33 PC running MS-DOS 6.20.
The DOS COPY command (item 14 in Table 1) measures the baseline time to copy a
file byte-by-byte with no compression. No compression program should be able
to read the entire input file, analyze the data, and compress the results in
less time than a straightforward copy, but some compression programs actually
approach this time with significant compression. Therefore, the overhead for
compressing data won't impact performance if the faster compression algorithms
are used.
The DCT program appears to have the highest compression rate, but this is a
lossy algorithm included in the test suite only for comparison since lossy
algorithms do not meet the requirement for full reconstruction of the data.
However, the COMPRESS, GZIP, LZRW1, and LZRW3A programs all achieve high
compression rates with fast execution times. 
I also conducted tests on the six algorithms listed in Table 2, using the
Computerized Reduction Using Selected Heuristics (CRUSH) data-compression
program. CRUSH is a portable, multimethod utility that incorporates different
algorithms into a common interface for VMS and UNIX systems. CRUSH takes a
different approach than CHURN: It links different compression and
decompression algorithms into one program, rather than calling external
programs. CRUSH is flexible because you can either compress files using a
compression algorithm or select automatic mode, which tries all algorithms and
selects the best method. Automatic mode ensures that the data have the maximum
compression given the available algorithms, but at a high cost of time. The
algorithms included in CRUSH were developed by Ian H. Witten, Radford M. Neal,
and John G. Cleary (Department of Computer Science, University of Calgary,
Canada); David M. Abrahamson (Trinity College, Dublin, Ireland); Ross N.
Williams (Renaissance Software, Adelaide, Australia); Robert F. Rice (JPL);
and Pen-Shu Yeh and Warner Miller (GSFC). The source code for CRUSH (which is
in the public domain) is available via anonymous ftp from dftnic.gsfc.nasa.gov
(128.183.115.71) in the software/unix/crushv3 directory. 
I ran tests on a Sun 4 machine under SunOS 4.1. This machine is not dedicated,
so the elapsed times may have been slightly affected by other activity on the
system. To minimize this, the tests were run at off-hours. CRUSH compresses
each data file using all six algorithms. For tests of an individual data file,
the best compression algorithm varied; the overall results are in Figure 2.
LZC has the highest compression ratio and one of the lowest times. The
next-best class of algorithms with a high compression/time ratio is LZRW3A and
LZRW1. WNC and ADAP both offer comparable compression ratios, and ADAP is even
slightly better than LZC, but both take much longer. Overall, the RICE
algorithm does not work as well as a general-purpose compressor, but it works
best on several individual files.
CRUSH has a single interface, with each compression algorithm having
equivalent functionality, so the code complexity can be estimated using the
size of the corresponding object-code files. This is a rough approximation,
but, along with the other measurements, it gives some idea of how much is
involved in the algorithm. Since the compression programs tested on the IBM PC
each have different functionality, error handling, and other bells and
whistles, the object and executable sizes are not very reliable measures of
their complexity. For this reason, all complexity estimates will include only
the algorithms available through the CRUSH interface, which has the equivalent
interface.
Table 3 ranks compression ratio, speed, and complexity from 1 to 4 (1=worst in
its category; 4=most desirable and best in its category). The
compression-ratio and speed (total elapsed time) rankings are extracted from
the results of running CHURN with the test data and scaled appropriately.
The overall score is a weighted sum (10 x compression rank + 10 x time rank +
5 x complexity rank) with a maximum of 100. The complexity weight is lower
than the other two because it is the least reliable measurement. The weights
can be adjusted according to the measurements' priority.
Another study that compares 47 compression programs over various large
collections of data was conducted by Greg Flint of Purdue University. The raw
test results are available as arctst03.zip via anonymous ftp from any SimTel
archive site, such as oak.oakland.edu (141.210.10.117) in the
SimTel/msdos/info directory. Upon preliminary examination these results were
consistent with those presented in this article.


Results


According to my tests, it appears that the best class of programs is based on
the Lempel-Ziv adaptive dictionary-based algorithm. Arithmetic compression has
a high compression ratio but is the slowest. GZIP and LZC (UNIX Compress)
achieve the highest compression ratios and lowest elapsed time. They are the
best external, stand-alone compression programs, and they are portable to
MS-DOS, UNIX, VMS, and Macintosh platforms. The two algorithms that
incorporate small, simple functions into a much larger data-management package
are LZRW1 and LZRW3A.
Unfortunately, the LZC implementation, which is based on the LZW algorithm, is
covered by Unisys patents (U.S. 4,464,650 and 4,558,302). While Unisys has
stated it will license the algorithm to modem manufacturers, the company has
not expressly said how the patent applies to other products. This is one
reason for the CompuServe GIF/Unisys controversy. The LZRW algorithms are also
covered by one or more patents. (The author of these two algorithms believes
that the patents are discouraging implementation of the compression
techniques.) The GZIP algorithm (a variation of LZ77), however, is free
software and is free of patents. GZIP is fast and good but the code is neither
simple nor straightforward for incorporating into a large data-management
package.
CDF is distributed worldwide as public-domain software and as part of several
commercial data-analysis and visualization packages, such as RSI's Interactive
Data Language (IDL) and IBM's Visualization Data Explorer; thus, there may be
legal complications that are beyond the scope of this article. Aside from
these issues, these four LZ-based algorithms (GZIP, LZC, LZRW1, and LZRW3A)
are recommended as good, general-purpose data- compression algorithms.


Future Plans


The future plans for CDF are to incorporate the LZC and possibly another
data-compression algorithm into a test version of the CDF data-management
package, where compression and decompression will be transparent to CDF
applications. Various techniques will be tried to determine what to compress
to allow efficient access and minimize the impact on performance.



References


Goucher, G.W. and G.J. Mathews. A Comprehensive Look at CDF, NSSDC/WDCA-R&S
94-07. August 1994. (http://nssdc.gsfc.nasa.gov/cdf/cdf_home.html).
Nelson, Mark R. "DDJ Data Compression Contest Results." Dr. Dobb's Journal
(November 1991).
------. The Data Compression Book, Second Edition. San Mateo, CA: M&T Books,
1995.
Ross, E., "A Simple Data Compression Technique." C/C++ User's Journal (October
1992).
Teague, M. "ISTP Key Parameter Available on CD-ROM." Solar-Terrestrial Energy
Program (STEP) International, Vol. 4, No. 7. July 1994.
Williams, R.N. "An Extremely Fast Ziv-Lempel Data Compression Algorithm."
Proceedings Data Compression Conference '91.
Ziv, J. and A. Lempel. "A Universal Algorithm for Sequential Data
Compression." IEEE Transactions on Information Theory, Vol. 23, No. 3, May
1977.
Table 1: Data-compression programs tested with CHURN. *Source/executables
available on SimTel anonymous ftp sites (for example, oak.oakland.edu).
**Source available on disks provided with Mark Nelson's Data Compression Book.
 Item Program Description
 1 ahuff** Adaptive Huffman coding
 2 ar** Haruhiki Okumura's archiver
 3 arith** Arithmetic coding
 4 arith1** Order-1 arithmetic coding
 5 arj* Robert Jung's ARJ, v2.41a
 6 arth1e** Order-1 arithmetic coding
 7 cmp** Linear-conversion scheme for sound
 8 compress* UNIX compress uses LZW variant
 9 cmp40-14 Compress w/COMP40 option + 12 bit
 10 comp40 Compress with COMP40 option
 11 comprs12 Compress 12 bit
 12 comprs14 Compress 14 bit
 13 comp_opt Modified/optimized compress
 14 copy MS-DOS COPY command
 15 dct** Discrete cosine transformation
 16 dogzip* gzip-like LZ77 variant
 17 doz* UNIX-like LZW
 18 gzip* Gnu ZIP uses LZ77 variant
 19 huff* Huffman coding
 20 lha* Harayasu Yoshizahi's LHA v2.13
 21 lzari* LZSS with adaptive arithmetic coding
 22 lzhuf* LZSS with adaptive Huffman coding
 23 lzrw1 Ross Williams's LZ variant
 24 lzrw3a Ross Williams's LZ variant
 25 lzss* Storer-Szymanski modified LZ77
 26 lzss** LZ77 with 12-bit sliding window
 27 lzw12** 12-bit LZW compression
 28 lzwcom* Kent Williams's LZW variant
 29 pkarc* PKARC v3.6
 30 pkzip (default)* PKZIP v2.04g LZ77 variant
 31 pkzip -ex* PKZIP using maximal compression
 32 rdc Ross Data Compression uses RLE
 33 snd** Silence compression coding
 34 zoo* Rahul Dhesi's Zoo archiver, v2.1
Table 2: Data-compression algorithms tested with CRUSH.
 Method Algorithm Description
 Name
 ADAP Adaptive dependency WNC method.
 LZC Lempel/Ziv/Welch UNIX compress.
 LZRW1 Fast Lempel-Ziv method.
 LZRW3A Fast Lempel-Ziv method.
 RICE Rice machine.
 WNC Witten/Neal/Cleary arithmetic code.
Table 3: Ranking of six algorithms tested with CRUSH.
 Compression Complexity Overall ScoreMethod Ratio Speed (Obj. Code Size)
(Max=100)
LZC 4 3 2 80

LZRW1 2 4 4 80
LZRW3A 3 3 3 75
ADAP 4 1 3 65
WNC 2 1 2 40
RICE 1 2 2 40
Figure 1: CHURN compression results for 30 programs.
Figure 2: CRUSH compression results.
























































Color Quantization using Octrees


Mapping 24-bit images to 8-bit palettes




Dean Clark


Dean, a programmer/analyst who develops graphics and imaging applications, can
be contacted at 71160.2426@compuserve.com.


Much of what goes on inside a computer is a discrete approximation of
continuous real-life events. This works because, most of the time,
approximations are close enough. Computer graphics, in particular, are full of
approximations. We approximate curved lines by stringing together many very
short straight lines. We approximate curved surfaces using bunches of tiny
flat surfaces. And we approximate a continuous rainbow of colors using many
individual discrete colors. 
There are a number of ways to approximate continuous colors. One is to throw
in buckets of bits, resulting in "true color" (or 24-bit), where three bytes
(one each for red, green, and blue primary colors) are dedicated to each color
pixel on the screen; see Figure 1. While simple in concept this approach is
expensive, both in terms of computing power and money. It takes a lot of CPU
horsepower to move 24 bits of data around for every pixel on the screen, and
hardware that's able to do so tends to be expensive.
A more common approximation is "palette-table" (or lookup table) color. This
is less expensive because it requires only a single byte per screen pixel. The
screen pixel value indexes a table of red/blue/green (RGB) values, and the
color value at that table position is displayed on the screen; see Figure 2.
But the table index is only eight bits, so only 256 colors can be displayed on
the screen at any time. Clearly, the challenge is to select the right 256
colors for the image at hand.
There are three common techniques for mapping an image with 24-bit color depth
to an 8-bit palette-table workstation. Past issues of DDJ have covered two of
these; see "Median-Cut Color Quantization," by Anton Kruger (September 1994),
and my article "The Popularity Algorithm" (July 1995). Both techniques were
first developed by Paul Heckbert in the early 1980s. In this article, I'll
present "octree quantization," the third and most-recent color quantization
technique. 


Spatial Data Structures


As its name implies, an octree is an 8-way tree; that is, each node has up to
eight children. The octree (and its cousin, the quadtree) is a hierarchical
data structure that uses the principle of recursive decomposition to represent
spatial data. 
The RGB color space is a cube, each axis of which is one primary color. If you
take the RGB color cube and divide it along each axis, you get eight subcubes;
see Figure 3. Divide each subcube on each axis and each of them becomes eight
subcubes. Do this eight times, and you've got 224 subcubes, one for each
possible color in a true-color image.
This subdivision can be represented by a tree structure; see Figure 4. The
root of the tree represents the entire space. The first level represents the
first subdivision. Each region in the subdivision corresponds to a child
pointer. The color space can be subdivided until each individual region
represents a single color.
The number of levels in a color octree corresponds to the number of bits in
the color primaries, so eight bits in each red, green, and blue component
equal an octree with eight levels. VGA provides six bits per primary, so an
octree with six levels is sufficient for VGA. The 3-bit pattern at each bit
position of each byte of a 3-byte RGB determines the decomposition of the
color space; see Figure 5.
Suppose you want to insert the RGB color from Figure 5 into the color octree.
The most-significant bit is the root (bit 7 in each color byte, level 0 in the
octree). The bit pattern is 101==5, so to begin inserting this color, follow
the path through tree-->child[5]. At the next level, the bit pattern is
011==3, so continue down the tree through tree-->child[3]. The bit pattern
from the least-significant bit pinpoints the color exactly. The first step for
a naive octree color-quantization algorithm would be to scan the image and
store each unique color in the octree by traversing the tree as previously
discussed.


Refining the Process


At the end of the first image scan, the color octree could have as many as
rowsxcolumns individual colors in it. Each unique color would be represented
by a leaf node in the tree. To get the number of colors down to 256 (or
whatever the goal is), the tree must be reduced by somehow merging colors.
Colors that are very close together (that is, leaf nodes that share a parent)
would be combined into a single average color. The close colors would be
deleted, and the new average color inserted into the tree. This would repeat
until the tree contained the desired number of colors.
The naive color octree can be improved in several ways. First, instead of
building an entire octree containing all image colors, the tree can be reduced
immediately whenever the number of leaves exceeds the target number. Thus,
there are never more than the target number of leaf nodes in the tree, saving
considerable memory.
Reducing the tree by traversing to the leaf level is time consuming. As a
further improvement, maintain an array of linked lists of reducible tree
nodes; see Figure 6. The tree can grow no deeper than the number of bits in
our color primaries, so an array of this size will do. 


A Level-Linked Octree


A node is reducible if it isn't a leaf. As new nodes are created at each level
except the leaf level, they're added to the reducible list for that level. 
To reduce the tree, all the children of a reducible node are averaged together
into the node. The child subtrees are discarded, and the reducible node
becomes a leaf node. The result is that all those colors are now represented
by the single larger node. Further, since the node is now a leaf, any new
colors whose path through the tree takes them through this node now stop
here--after reduction, the tree never grows downward again.
You can now outline the octree quantization algorithm:
1. Scan the image, and insert colors into an octree structure. 
2. If the number of leaf nodes in the octree exceeds the number of final
colors, reduce the tree. 
3. When the image scan is finished, the leaf nodes of the octree contain the
reduced image colors. Rescan the image to map image colors to their
appropriate octree leaves.


Implementing Octree Quantization


Listings One and Two present an implementation of the refined color-octree
algorithm just described. InsertTree takes an RGB color and inserts it into
the octree at the leaf level, which is initially level 6 for VGA. Each leaf
node maintains a count of the number of image pixels containing that color and
a sum of the RGB instances. InsertTree recurses down the tree, creating new
nodes as necessary. As nodes are created in CreateOctNode, they're added to
the reducible linked list for their level.
ReduceTree does the reduction. The reduction level is always one higher than
the leaf level. ReduceTree takes the first node from the list and sums the
component RGB values and pixel counts for all its children. 
Once the tree is built, MakePaletteTable traverses the tree and builds an
array of leaf RGB values for the output color palette. The palette index value
is also stored in the color-octree leaf node. QuantizeColor uses this value to
map an image RGB value to a palette-table entry. This function takes an image
RGB value and traverses the octree in the same manner as InsertTree. When a
leaf node is reached, the palette index in the leaf node is the appropriate
quantization value.
CONVERT.C (available electronically, see "Availability," page 3) reads an RGB
image file, maps its colors down to 256 palette-table cells, and either
displays the quantized image or writes a PCX file of it. The input file format
is the same as that in my July 1995 article--essentially an ASCII file of RGB
values between 0 and 1, plus a short header. The screen output uses the
MetaWindow graphics library from Metagraphics (Scotts Valley, CA), but it is
easily adapted to most any graphics kernel.



Results and Additional Heuristics


This algorithm is quite efficient in both time and space. Since the color
octree never has more than N+1 leaf nodes, plus internal nodes (at most, N
more) it compares favorably to other methods, which typically require space
proportional to the number of unique colors in the original image. Inserting
colors into the tree and mapping original colors into the palette table are
all bounded by the tree depth (eight or less) times the number of image
pixels, while building the palette table is proportional to the tree depth
times the number of final quantization colors. In practice, the running time
of the program is dominated by disk accesses, managing about 4300 pixels per
second on a 486/33 PC.
The tree is always reduced by merging the colors at the leaf level into their
parent nodes. Any node at the reducible level could be chosen, but the final
output can be affected by the selection criteria, which include: picking a
node arbitrarily; picking the node representing the most image pixels; picking
the node representing the least image pixels; and some hybrid, such as
alternating most/least or favoring center pixels over edge pixels.
What's the difference? Picking colors that represent many pixels and averaging
them results in larger groups of pixels that share the same (slightly wrong)
color. The tendency then is to have more individual colors available for
less-conspicuous regions of the image. Picking the fewest pixels tends to
preserve subtle gradations in large, nearly constant shaded areas, reducing
overall error, but at the expense of small image detail. An alternative
technique would attempt to balance these two effects. A heuristic that favors
central pixels might be useful in an animation scenario, where the user tends
to focus on the middle of the image.
Listing One selects reducible nodes "randomly"--it simply takes the first node
in the reducible list, which is the most recently added one. Implementing any
of the aforementioned heuristics requires that more information be stored at
each node. For the pixel-count heuristics, you have to know how many pixels
are represented in the subtree rooted at each reducible node. This could be
calculated as an extra step as the leaf level changes, or the counts could be
accumulated as colors are inserted into the tree. The list could then be
sorted before reductions are done on the level, or for each reduction the list
could be scanned for the largest/smallest number of pixels.
Finally, note that the octree algorithm can't guarantee exactly N colors in
the final palette table. This is because a reduction trims an entire subtree,
which may be up to eight leaf nodes, while adding only one additional leaf.
This might be important in a situation where an image is mapped to a very
small number of colors, say 64 or less.


References


Clark, Dean. "The Popularity Algorithm." DDJ (July 1995).
Gervautz, M. and W. Purgathofer. "A Simple Method for Color Quantization:
Octree Quantization" in New Trends in Computer Graphics. New York, NY:
Springer-Verlag, 1988.
Heckbert, Paul. "Color Image Quantization for Frame Buffer Display." Computer
Graphics (July 1982).
Kruger, Anton. "Median-Cut Color Quantization." DDJ (September 1994).
Samet, Hanan. Applications of Spatial Data Structures. Reading, MA:
Addison-Wesley, 1990.
Figure 1: 24-bit color display.
Figure 2: Palette-table color scheme.
Figure 3: Recursive subdivision of a 2-D space.
Figure 4: Hierarchical space subdivision--a quadtree.
Figure 5: Determining the color space at each tree level.
Figure 6: A level-linked octree.

Listing One
// oct1.h -- Header file for octree color quantization function
// Dean Clark
//
#ifndef OCT1_H
#define OCT1_H
typedef unsigned char byte;
typedef unsigned int uint;
typedef unsigned long ulong;
typedef int bool;
#ifndef True
#define False 0
#define True 1
#endif
// RGBType is a simple 8-bit color triple
typedef struct {
 byte r,g,b; // The color
} RGBType;
// OctnodeType is a generic octree node
typedef struct _octnode {
 int level; // Level for this node
 bool isleaf; // TRUE if this is a leaf node
 byte index; // Color table index
 ulong npixels; // Total pixels that have this color
 ulong redsum, greensum, bluesum; // Sum of the color components
 RGBType *color; // Color at this (leaf) node
 struct _octnode *child[8]; // Tree pointers
 struct _octnode *nextnode; // Reducible list pointer
} OctreeType;
OctreeType *CreateOctNode(int level);
void MakePaletteTable(OctreeType *tree, RGBType table[], int *index);
ulong TotalLeafNodes(void);
void ReduceTree(void);

void InsertTree(OctreeType **tree, RGBType *color, uint depth);
int QuantizeColor(OctreeType *tree, RGBType *color);
#endif

Listing Two
// oct1.c -- Color octree routines.
// Dean Clark
//
#include <stdio.h>
#include <stdlib.h>
#include "oct1.h"
#define COLORBITS 8
#define TREEDEPTH 6
byte MASK[COLORBITS] = {0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80};
#define BIT(b,n) (((b)&MASK[n])>>n)
#define LEVEL(c,d)((BIT((c)->r,(d)))<<2 BIT((c)->g,(d))<<1 BIT((c)->b,(d)))
OctreeType *reducelist[TREEDEPTH]; // List of reducible nodes
static byte glbLeafLevel = TREEDEPTH;
static uint glbTotalLeaves = 0;
static void *getMem(size_t size);
static void MakeReducible(int level, OctreeType *node);
static OctreeType *GetReducible(void);
// InsertTree -- Insert a color into the octree
//
void InsertTree(OctreeType **tree, RGBType *color, uint depth)
{
 int level;
 if (*tree == (OctreeType *)NULL) {
 *tree = CreateOctNode(depth);
 }
 if ((*tree)->isleaf) {
 (*tree)->npixels++;
 (*tree)->redsum += color->r;
 (*tree)->greensum += color->g;
 (*tree)->bluesum += color->b;
 }
 else {
 InsertTree(&((*tree)->child[LEVEL(color, TREEDEPTH-depth)]),
 color,
 depth+1);
 }
}
// ReduceTree -- Combines all the children of a node into the parent, 
// makes the parent into a leaf
//
void ReduceTree()
{
 OctreeType *node;
 ulong sumred=0, sumgreen=0, sumblue=0;
 byte i, nchild=0;
 node = GetReducible();
 for (i = 0; i < COLORBITS; i++) {
 if (node->child[i]) {
 nchild++;
 sumred += node->child[i]->redsum;
 sumgreen += node->child[i]->greensum;
 sumblue += node->child[i]->bluesum;
 node->npixels += node->child[i]->npixels;
 free(node->child[i]);

 }
 }
 node->isleaf = True;
 node->redsum = sumred;
 node->greensum = sumgreen;
 node->bluesum = sumblue;
 glbTotalLeaves -= (nchild - 1);
}
// CreateOctNode -- Allocates and initializes a new octree node. The level 
// of the node is determined by the caller.
// Arguments: level int Tree level where the node will be inserted.
// Returns: Pointer to newly allocated node. Does not return on failure.
//
OctreeType *CreateOctNode(int level)
{
 static OctreeType *newnode;
 int i;
 newnode = (OctreeType *)getMem(sizeof(OctreeType));
 newnode->level = level;
 newnode->isleaf = level == glbLeafLevel;
 if (newnode->isleaf) { 
 glbTotalLeaves++;
 }
 else {
 MakeReducible(level, newnode);
 }
 newnode->npixels = 0;
 newnode->index = 0;
 newnode->npixels = 0;
 newnode->redsum = newnode->greensum = newnode->bluesum = 0L;
 for (i = 0; i < COLORBITS; i++) {
 newnode->child[i] = NULL;
 }
 return newnode;
}
// MakeReducible -- Adds a node to the reducible list for the specified level
//
static void MakeReducible(int level, OctreeType *node)
{
 node->nextnode = reducelist[level];
 reducelist[level] = node;
}
// GetReducible -- Returns next available reducible node at tree's leaf level
//
static OctreeType *GetReducible(void)
{
 OctreeType *node;
 
 while (reducelist[glbLeafLevel-1] == NULL) {
 glbLeafLevel--;
 }
 node = reducelist[glbLeafLevel-1];
 reducelist[glbLeafLevel-1] = reducelist[glbLeafLevel-1]->nextnode;
 return node;
}
// MakePaletteTable -- Given a color octree, traverse tree and: 
// - Add the averaged RGB leaf color to the color palette table;
// - Store the palette table index in the tree;
// When this recursive function finally returns, 'index' will contain

// the total number of colors in the palette table.
//
void MakePaletteTable(OctreeType *tree, RGBType table[], int *index)
{
 int i;
 if (tree->isleaf) {
 table[*index].r = (byte)(tree->redsum / tree->npixels);
 table[*index].g = (byte)(tree->greensum / tree->npixels);
 table[*index].b = (byte)(tree->bluesum / tree->npixels);
 tree->index = *index;
 (*index)++;
 }
 else {
 for (i = 0; i < COLORBITS; i++) {
 if (tree->child[i]) {
 MakePaletteTable(tree->child[i], table, index);
 }
 }
 }
}
// QuantizeColor -- Returns palette table index of an RGB color by traversing 
// the octree to the leaf level
//
int QuantizeColor(OctreeType *tree, RGBType *color)
{
 if (tree->isleaf) {
 return tree->index;
 }
 else {
 QuantizeColor(tree->child[LEVEL(color,6-tree->level)],color);
 }
}
// TotalLeafNodes -- Returns the total leaves in the tree (glbTotalLeaves)
//
ulong TotalLeafNodes()
{
 return glbTotalLeaves;
}
// getMem -- Memory allocation routine
//
static void *getMem(size_t size)
{
 void * mem;
 mem = (void *)malloc(size);
 if (mem == NULL) {
 printf("Error allocating %ld bytes in getMem\n",(ulong)size);
 exit(-1);
 }
 return mem;
}









































































Extending MFC


Designing a grid control for MFC




Stefan Hoenig and Scot Wingo


Stefan, who is studying at Technische Universitt Munchen in Germany, can be
contacted at 100042.1003@compuserve.com. Scot, a cofounder of Stingray
Software, can be contacted at ScotWi@aol.com.


A wide variety of off-the-shelf VBX/OLE/DLL components are available to
simplify development with C++ and MFC. However, function-based interfaces do
not give you the object-oriented benefits of a C++ interface. Simply wrapping
a C++ class around a DLL is not sufficient, since you can't modify component
behavior through inheritance. 
In this article, we'll describe the design and implementation hurdles we
encountered when extending existing MFC classes to take advantage of the
object-oriented paradigm. The component we developed was a grid control--a
user-interface component that displays data in rows and columns, allowing end
users to manipulate data. In all, the grid component ended up being about
45,000 lines of C++ code. 
Since we'll focus on design principles, we won't provide the source code for
the entire control, although we will present a class or two for illustrative
purposes. (The grid control itself is commercially available.) Our intent is
to document our hard-won experience so that you can use and benefit from the
same C++/MFC principles in your applications. 


Design Goals


Our primary design goal for the grid component was to be 100 percent MFC
compatible. This meant we had to carefully consider how others might derive
classes from ours. It was also important to use standard MFC approaches such
as the document/view architecture whenever possible. More specifically, we
wanted to:
Use the grid either as a view or as a control in a dialog.
Support a variety of controls within the cells.
Create new types of controls for the cells.
Read data through ODBC (Microsoft's database connectivity library) for
display. 
Have the grid component perform well. 
From the user's perspective, a grid can contain many different pages or
worksheets. This allows the user to manipulate three dimensions of data
instead of two. Each worksheet contains rows and columns of individual cells,
each with its own particular color, format, size, and so on.
From the developer's perspective, each worksheet is an instance of a grid
class, which maintains information about the currently active cell and handles
resizing and other window-management functions. Multiple worksheets are
handled by placing several grids within a tabbed window. There are classes for
each different type of cell, including text entry, buttons, and bitmaps. To
allow cells to have different visual attributes without storing separate
attributes for every cell, we decided to have a separate set of classes for
storing cell attributes. These classes implement a style hierarchy with
default styles for each row and column, as well as the entire grid.


The Drawing Classes


To support the use of the grid both as a view and a control in a dialog, we
needed an MFC CWnd-based class (SECGridWnd) and a CView-based class
(SECGridView). (Classes that start with C are MFC classes, and those prefixed
with SEC are grid classes.) However, implementing each one separately would
result in a lot of code duplication.
Since CView already inherits from CWnd, we considered having SECGridWnd
inherit from CWnd, and SECGridView inherit both from CView and SECGridWnd.
This would allow the common code to reside in SECGridWnd. Unfortunately, most
classes in MFC--including CWnd and CView--derive from CObject. CObject is
designed such that it can only appear once in a class hierarchy without
causing serious collisions.
Our solution was to put the common code in the class SECGridCore, which does
not inherit from CObject. Thus, SECGridView can inherit from both SECGridCore
and CView without conflict. Figure 1 illustrates this approach, which we also
used at several other points in the project.
Since SECGridCore is not derived from CWnd, it can't directly handle CWnd
functionality, such as drawing operations and message maps. It must instead
own a pointer to a CWnd object to handle drawing operations. Our classes
inherit both from SECGridCore and a CWnd class, and set the SECGridCore member
m_pGridWnd to this in the constructor. The derived class inherits the message
map from CWnd and is responsible for calling the associated SECGridCore
methods. Figure 2 shows how this works for SECGridView.
The SECGridCore class encapsulates all of the drawing logic for the grid and
its components. It is responsible for drawing the grid, including inverting
any selected cells. It handles scrolling and provides support for frozen rows
and columns, which do not move when the rest of the grid scrolls. Finally,
SECGridCore interprets user interactions, including formatting cells,
inserting and moving rows, tracking the current cell, and managing undo and
redo.
Class SECGridWnd provides SECGridCore with an interface so that it can be used
like a control (placed in a dialog and manipulated with the Visual C++ dialog
editor). Like CWnd, SECGridWnd is rarely instantiated. Instead, you derive a
class from SECGridWnd, override the virtual functions to obtain the desired
behavior, and instantiate the derivative. The SECGridView class provides CView
features such as splitter-window support, printing, and print preview. Like
SECGridWnd, it is also usually used only as a base class and not directly
instantiated.


The Style Classes


The SECStyle class contains all the information necessary for displaying a
single cell. This includes the type of the cell, the cell contents, and
attributes such as the text color, borders and control type, as well as font
attributes. All of these styles can be modified by the end user via the
SECStyleSheet dialog. You can extend SECStyle at run time with additional
information about specific cells. For example, you could add an Expression
attribute which could be modified with SECStyleSheet to add simple formula
capabilities to the grid. 
Base styles provide default attributes for a group of cells. The predefined
base styles are row header, column header, and standard. Row-header cells
inherit their attributes from row-header style, and column headers inherit
from column-header style. Standard is the base style for all cells in the
grid. These base styles are maintained by an SECStylesMap object and can be
modified with an SECStylesDialog. These default styles are automatically
applied to all appropriate cells that do not explicitly override them.


The Control Classes


Because controls embedded in a grid have to interact closely with the grid, we
created our own hierarchy of grid controls, using the same
multiple-inheritance approach used for the basic classes. Figure 3 shows the
hierarchy of the grid-control classes. 
SECControl is an abstract base class that establishes a default
grid-to-control interface that derived grid-control classes must implement. If
deriving from SECControl only, controls must be entirely implemented by the
derived class. However, SECControl can also inherit from existing MFC control
classes to implement a standard MFC control derivative. The resulting MFC
control derivative can be used in grid cells. For example, classes
SECEditControl, SECComboBoxWnd, and SECListBox in Figure 3 are MFC control
derivatives that use this approach.



The ODBC Classes


ODBC is a call-level interface that lets applications access data in any
database for which there is an ODBC driver. This allows your application to be
independent of the database system. 
MFC includes a high-level API for using ODBC. Two classes in this API are of
interest here: CRecordset and CRecordView. A CRecordset object represents the
currently selected records. It can scroll to other records, update records,
sort the selection, and apply filters to qualify the selection. A CRecordView
object provides a form view directly connected to a CRecordset object.
The MFC database classes do not directly support the ability to determine the
structure (or schema) of the database at run time. However, we wanted to make
it possible for the end user to specify and view the results of an SQL query
even if the schema were unknown at compile time, while still allowing the
developer to continue using ClassWizard to create record sets with a known
schema. We handled this by allowing the grid classes to be bound to any
CRecordSet-derived class and creating SECDynamicRecordset (our own descendant
of CRecordset), which has the necessary functionality to determine the schema
information at run time. We altered the RFX mechanism in SECDynamicRecordset
to behave exactly like any other CRecordset class. Consequently, the grid ODBC
classes integrate cleanly into the MFC architecture and allow you to specify
SQL Query statements at run time.
Figure 4 shows the hierarchy of the ODBC grid classes. SECODrid provides the
basic functionality to display the records of a CRecordset. Because SECODrid
is not derived from CObject, it can be used as a right-hand branch in derived
classes. This is the same mechanism we discussed with SECGridCore. The
SECRecordWnd and SECRecordView classes inherit from SECODrid to display query
results in a dialog or view grid, respectively. Additional classes are used to
display the status beam in the scroll bar. 


The Tabbed Window Classes


While MFC provides advanced UI components such as tabbed dialogs, splitter
windows, and floating toolbars, there are no classes that directly support
workbook (or tabbed window) interfaces. Consequently, we developed the
SECTabWnd class to hold multiple instances of SECGridView, one for each tab;
see Figure 5. In other words, each tab holds a different view. This is similar
to the way MFC splitter windows (CSplitterWnd) operate. 
The SECTabWnd class handles the containment and switching between the various
views. Class SECTabBeam draws the tabs, while SECTabInfo stores the tabs'
properties such as name, size, and the like. The end user can change the tab
names by double clicking. The source for the tabbed window classes is
available electronically (see "Availability," page 3). 


Implementation Issues


Styles are the key to the display of the grid and also define most of its
behavior. The SECGridCore member function ComposeStyleRowCol computes the
style for a particular cell. ComposeStyleRowCol first calls GetStyleRowCol to
get any cell-specific style attributes. It then fills in defaults from the
row, the column, and finally from the entire grid. Listing One shows a
simplified version of ComposeStyleRowCol.
GetStyleRowCol handles a single set of attributes. If either the row or column
is 0, the corresponding row or column style is returned. GetStyleRowCol takes
an argument specifying the operation to be performed on the style argument.
The SECCopy operation copies the styles into the argument, while the SECApply
argument alters only those style attributes that have not yet been set.
One function controls all access to styles, allowing the developer to bind
(that is, dynamically tie) the grid to data such as a database or a live-data
feed. To bind the grid to a data source, simply override the virtual
GetStyleRowCol function. 
Listing Two is an example GetStyleRowCol override that ties a grid control to
a database stored in the document as member m_dbfile.
Centralizing all style operations in one virtual function lets you easily
modify the grid at run time in any way imaginable.
What if you want the user to be able to dynamically modify the style? The grid
provides StoreStyleRowCol, a virtual function that is called before data is
stored. By overriding this function, you can intercept the end user's changes
and do whatever you wish with them.
This technique allows you to enhance the grid. You can override many methods;
for example, a virtual OnValidateCell function (called after the user has left
a cell and before another cell is selected) can be overridden to perform
cell-level validation.


Drawing and Updating the Grid


When the grid draws, it first calculates the range of cells that need to be
drawn. Then it creates a style object for each cell in the range and
initializes those styles with ComposeStyleRowCol. Once all of the style
information has been gathered, the grid draws the backgrounds and borders of
each, then asks each cell control to draw itself.
To minimize the number of drawing operations, our drawing routine looks
globally at all cells to determine the most cost-effective drawing technique.
For example, all neighboring cells with the same background pattern and all
cells with the same borders are grouped, so that these GDI operations need be
called only once for that common group of cells. We also cached GDI objects
such as pens, fonts, and brushes to avoid the overhead of allocating and
freeing these resources for each cell. The cache is created at the start of
the draw process and is flushed when all of the cells have been drawn.
In addition to drawing in response to standard Windows paint messages, the
grid also has to keep the document and views synchronized. The MFC
document/view model handles most of the updating for the grid. To provide
maximum flexibility, the grid provides three overridable functions in the
update path. Figure 6 shows how updates are performed. You can intercept
updates at the command stage, the storing stage, and even the update stage.
If you leverage the document/view architecture and ensure that you can
override the functions involved in updating at every critical stage, the uses
of the grid are limitless. For example, you could have two views on the same
live-data feed--one with a read-only version of incoming data, and another
with a delayed snapshot that can be modified by the user. 


Control Implementation


Each composed style object contains an identifier for a control. Like any
other style attribute, this identifier can be changed at run time, allowing
you maximum flexibility. The SECGridCore class maintains a map of registered
control types and identifiers. In the SECGridCore::OnInitialUpdate member
function, you can register any additional control types and identifiers by
calling RegisterControl. 
To reduce resource requirements and maintain high performance, we implemented
a control-sharing scheme in which the grid creates only one control of each
type and uses that single control to draw every cell of that type. When the
grid is not in the process of drawing, it places a "live" control at the
current cell, so that the user can interact directly with that control. When
the user moves to a new cell, the grid resets the previous cell and places a
"live" control at the new current cell.
This technique lets the control draw the current cell as the user moves around
the grid, resulting in higher performance and very low overhead: An example is
a grid full of edit controls for entering a table of values. The user will
probably enter text and tab to the next cell very quickly. In this case, the
grid-control implementation transparently uses the same edit control instead
of creating and destroying a new edit control for each current cell or
maintaining a unique edit control for each cell.


The Results


The compactness of our grid control can be attributed to solid object-oriented
design practices and to our use of existing MFC architectures and classes
whenever possible. Control caching and other optimization techniques make the
grid fast without sacrificing the object-oriented interface or requiring
special care.
Figure 1: Solution to the CWnd/CView derivation problem.
Figure 2: Interaction between SECGridCore and CWnd.
Figure 3: Grid-control class hierarchy.
Figure 4: ODBC grid-class hierarchy.
Figure 5: Tabbed window-class hierarchy.
Figure 6: Implementing grid updates.

Listing One
void CGXGridCore::ComposeStyleRowCol(ROWCOL nRow, ROWCOL nCol,
CGXStyle*pStyle)
{

 //Copy the cell style first
 GetStyleRowCol(nRow, nCol, *pStyle, secCopy);
 //Apply the row style next
 GetStyleRowCol(nRow,0,*pStyle,secApply);
 //Apply the colum style
 GetStyleRowCol(0,nCol,*pStyle,secApply);
 //Finally inherit any base styles
 pStyle->LoadBaseStyle(GetStylesMap());
}

Listing Two
BOOL CDBaseBrowserView::GetStyleRowCol(int nRow, int, nCol, SECStyle&
style, SEC_OPERATION op)
{
 if (nRow == 0) { // Column style
 style.SetStyle(GetField(nCol)->name); //Get column name from DB
 return TRUE;
 }
 //Advance row in DB (row = record)
 if (GetDocument()->m_dbfile.Seek(nRecord -1){
 //Row headings
 if (nCol == 0){
 char sz[20];
 wsprintf(sz,"%5lu%c",nRow,
 GetDocument()->m_dbfile.IsDeleted() ? '*' : ' ');
 style.SetValue(sz);
 return TRUE;
 }
 //If we get here, we're looking up a cell.
 CSting s;
 CField * fld = GetField(nCol);
 //Get cell data and store along with max length
 GetDocument->m_dbFile.GetValue(GetFieldId(nCol),s);style.SetValue(s);
 style.SetMaxLength(fld->len);
 // Now set up the cell appearance based on field typeswitch(fld->type){
 case 'N' : style.SetBaseStyle(SEC_NUMERIC); break;
 case 'C' : style.SetBaseStyle(SEC_TEXT); break;
 case 'D' : style.SetBaseStyle(SEC_DATE); break;
 case 'L' : style.SetBaseStyle(SEC_CURRENCY); break;
 // a sidenote:
 // this is also a good example for using base styles. The user can 
 // change the appearance of all numeric, text, date and currency fields
 // at run time with the SECStyleSheet described above.
 // In OnInitialUpdate base styles are typicaly initialized 
 // programmaticaly as for example:
 // BaseStyle(SEC_NUMERIC)
 // .SetHorizontalAlignment(DT_LEFT)
 // .SetFont(SECFont().SetBold(TRUE));
 }
 return TRUE;
 }
 return FALSE; //At end of DB
}






































































Randomness and the Netscape Browser


How secure is the World Wide Web? 




Ian Goldberg and David Wagner


Ian and David are PhD students in the computer science department at the
University of California, Berkeley. They can be reached at
iang@cs.berkeley.edu or daw@cs.berkeley.edu.


As the World Wide Web gains broad public appeal, companies are becoming
interested in using the Web not just to advertise, but also to take orders for
their merchandise and services. Since ordering a product online requires the
customer to transmit payment information (such as a credit-card number) from a
client program to the company's server program through the Internet, there's
need for cryptographic protection. By encrypting payment information before
transmitting it, a customer can ensure that no one except the company from
which he is purchasing can decode that sensitive data.
Netscape Communications has been at the forefront of the effort to inte-grate
cryptographic techniques into Web servers and browsers. Netscape's Web browser
supports the Secure Sockets Layer (SSL), a cryptographic protocol developed by
Netscape to provide secure Internet transactions. Given the popularity of
Netscape's browser and the widespread use of its cryptographic protocol on the
Internet, we decided to study Netscape's SSL implementation in detail.
Our study revealed serious flaws in Netscape's implementation of SSL that make
it relatively easy for an eavesdropper to decode the encrypted communications.
Although Netscape has fixed these problems in a new version of their browser
(as of this writing, Netscape 2.0 beta1 and Netscape Navigator 1.22 Security
Update are available), these weaknesses provide several lessons for people
interested in producing or purchasing secure software.


Back to Basics


At its most basic level, SSL protects communications by encrypting messages
with a secret key--a large, random number known only to the sender and
receiver. Because you can't safely assume that an eavesdropper doesn't have
complete details of the encryption and decryption algorithms, the protocol can
be considered secure only if someone who knows all of the details of these
algorithms is unable to recover a message without trying every possible key.
Ultimately, security rests on the infeasibility of trying all possible
decryption-key values.
The security of SSL, like that of any other cryptographic protocol, depends
crucially on the unpredictability of this secret key. If an attacker can
predict the key's value or even narrow down the number of keys that must be
tried, the protocol can be broken with much less effort than if truly random
keys had been used. Therefore, it is vital that the secret keys be generated
from an unpredictable random-number source.
Randomness is not a black-and-white quality: some streams of numbers are more
random than others. The only truly random number sources are those related to
physical phenomena such as the rate of radioactive decay of an element or the
thermal noise of a semiconductor diode. Barring the use of external devices,
computer programs that need random numbers must generate these numbers
themselves. However, since CPUs are deterministic, it is impossible to
algorithmically generate truly random numbers.
Many common computer applications (games, for instance) use any readily
available source of randomness to provide an initial value, called a "seed,"
to a pseudorandom number generator (PRNG). PRNGs operate by repeatedly
scrambling the seed. Typically, the seed is a short, random number that the
PRNG expands into a longer, random-looking bitstream. A typical game might
seed a PRNG with the time of day; see Figure 1.
For a simple game, the seed only needs to change each time the game is run.
Though the seed will be predictable, this is not a major concern in
applications where security is not an issue. In cryptographic applications,
however, the seed's unpredictability is essential--if the attacker can narrow
down the set of possible seeds, his job is made significantly easier. Since
the function used by the PRNG to turn a seed into a pseudorandom number
sequence is assumed to be known, a smaller set of possible seeds yields a
correspondingly small set of sequences produced by the PRNG.
A good method to select seed values for the PRNG is an essential part of a
cryptographic system such as SSL. If the seed values for the PRNG can easily
be guessed, the level of security offered by the program is diminished
significantly, since it requires less work for an attacker to decrypt an
intercepted message.


Netscape's Implementation


Because Netscape would not release detailed information about this section of
its program, we resorted to the tedious task of reverse-engineering Netscape's
algorithm by manually decompiling their executable program.
The method Netscape uses to seed its PRNG is shown in pseudocode in Figure 2.
This algorithm was derived from Version 1.1 of the international version of
Netscape's Solaris 2.4 browser. Most other versions of Netscape for UNIX use
the same algorithm; the Microsoft Windows and Macintosh versions have slightly
different details (for example, they use a particular system timer instead of
the process ID), but the techniques employed are fundamentally the same across
all architectures and operating systems.
In Figure 2, it's important to note that mklcpr() and MD5() are fixed, unkeyed
algorithms that will presumably be known by an adversary. The seed generated
depends only on the values of a and b, which in turn depend on just three
quantities: the time of day, the process ID, and the parent process ID. Thus,
an adversary who can predict these three values can apply the well-known MD5
algorithm to compute the exact seed generated.
Figure 3 shows the key-generation algorithm, also reverse-engineered from
Netscape's browser. An attacker who can guess the PRNG seed value can easily
determine the encryption keys used in Netscape's secure transactions.


Attacks on Netscape


Unfortunately for Netscape, U.S. regulations prohibit the export of products
incorporating strong cryptography. In order to distribute an international
version of its browser overseas, Netscape had to weaken the encryption scheme
to use keys of just 40 bits, leaving only a million million possible key
values. That may sound like a lot of numbers to try, but several people (David
Byers, Eric Young, Damien Doligez, Piete Brooks, Andrew Roos, Adam Back, Andy
Brown and many others) have been able to try every possible key and recover
SSL-encrypted data in as few as 30 hours using spare CPU cycles from many
machines. Since nearly all Netscape browsers in use are the free international
version, the success of this attack demonstrates a fundamental vulnerability
in Netscape that cannot be repaired under current export regulations.
We used the information we uncovered about Netscape's internals to formulate a
more intelligent attack against Netscape's encryption scheme. According to
Figure 2 and Figure 3, each possibility for the time of day, process ID, and
parent-process ID produces a unique seed, which in turn produces a unique
encryption key.
When a connection is first established, a challenge value is calculated and
sent unencrypted from the Netscape client to the secure server. This allows an
attacker to learn that value, which will be useful later.
Netscape's UNIX browsers are more difficult to attack than its browsers for
other platforms, since the seeding process used in the UNIX browsers utilizes
more-random quantities than does the process used in the browsers for other
platforms. We will only discuss attacks on the UNIX browsers; it should be
apparent from this discussion how to attack the other versions.
An attacker who has an account on the UNIX machine running the Netscape
browser can easily discover the pid and ppid values used in
RNG_CreateContext() using the ps command (a utility that lists the process IDs
of all processes on the system).
All that remains is to guess the time of day. Most popular Ethernet sniffing
tools (including tcpdump) record the precise time they see each packet. Using
the output from such a program, the attacker can guess the time of day on the
system running the Netscape browser to within a second. It is probably
possible to improve this guess significantly. This recovers the seconds
variable used in the seeding process. (There may be clock skew between the
attacked machine and the machine running the packet sniffer, but this is easy
to detect and compensate for.)
Of the variables used to generate the seed in Figure 2 (seconds, microseconds,
pid, ppid), we know the values of seconds, pid, and ppid; only the value of
the microseconds variable remains unknown. However, there are only one million
possible values for it, resulting in only one million possible choices for the
seed. We can use the algorithm in Figure 3 to generate the challenge and
secret_key variables for each possible seed. Comparing the computed challenge
values to the one we intercepted will reveal the correct value of the secret
key. Testing all one million possibilities takes about 25 seconds on an HP
712/80.
Our second attack assumes the attacker does not have an account on the
attacked UNIX machine, which means the pid and ppid quantities are no longer
known. Nonetheless, these quantities are rather predictable, and several
tricks can be used to recover them.
The unknown quantities are mixed in a way which can cancel out some of the
randomness. In particular, even though the pid and ppid are 15-bit quantities
on most UNIX machines, the sum pid + (ppid << 12) has only 27 bits, not 30
(see Figure 2). If the value of seconds is known, a has only 20 unknown bits,
and b has only 27 unknown bits. This leaves, at most, 47 bits of randomness in
the secret key--a far cry from the 128-bit security claimed by the domestic
U.S. version.
A little cleverness can reduce the uncertainty even further. First, ppid is
often 1 (for example, when the user starts Netscape from an X Window system
menu); if not, it is usually just a bit smaller than the pid. Furthermore,
process IDs are not considered secret information by most applications, so
some programs will leak information about them. For example, the popular
mail-transport agent sendmail generates Message-IDs for outgoing mail using
its process ID; as a result, sending e-mail to an invalid user on the attacked
machine will cause the message to bounce back to the sender; the Message-ID
contained in the reply message will tell the sender the last process ID used
on that machine. Assuming that the user started Netscape relatively recently,
and that the machine is not heavily loaded, this will closely approximate
Netscape's pid. These observations mean that the amount of unpredictability in
the pid and ppid quantities is quite small.
The most unsophisticated attack on any encryption scheme is to try all
possible key values by brute force. Naturally, this approach can be expected
to take a long time. For the domestic U.S. version of the Netscape browser,
trying every possible 128-bit key is absolutely infeasible. However, the
problems with Netscape's seed-generation process make it possible to speed up
this process by trying only the keys generated by the possible seed values.
Optimizations such as those described earlier should allow even a remote
attacker to break Netscape's encryption in a matter of minutes.


Future Impact



Using these weaknesses, we were able to successfully attack Version 1.1 of the
Netscape browser. All UNIX versions of the browser are vulnerable. Netscape
has confirmed that the Microsoft Windows and Macintosh versions are also
subject to this attack. Both the international version (with 40-bit keys) and
the domestic version (with 128-bit keys) are vulnerable. In fact, the private
keys used by the Netscape server software may be susceptible to this attack as
well.
Very soon after we announced these attacks, Netscape responded with a new
version of the browser, which uses more randomness in producing the encryption
keys. Since only the older versions are vulnerable, the direct, long-term
impact of our attack should be small. Still, we can learn several lessons from
this experience.
The Achilles heel of Netscape's security was the way in which it generated
random numbers. The cryptography community has long known that generating
random numbers requires great care and is easy to do poorly; Netscape learned
this lesson somewhat painfully. If you need to generate random numbers for
cryptographic purposes, be very careful.
In a narrow sense, the security flaw we found in the Netscape browser serves
merely as an anecdote to emphasize the difficulty of generating
cryptographically strong random numbers. But there's a broader moral to the
story. The security community has painfully learned that small bugs in a
security-critical module of a software system can have serious consequences,
and that such errors are easy to commit. The only way to catch these mistakes
is to expose the source code to scrutiny by security experts.
Peer review is essential to the development of any secure software. Netscape
did not encourage outside auditing or peer review of its software--and that
goes against everything the security industry has learned from past mistakes.
By extension, without peer review and intense outside scrutiny of Netscape's
software at the source-code level, there is simply no way consumers can know
where there will be future security problems with Netscape's products.
Interestingly, Jim Bidzos of RSA Data Security reports that he offered to
review Netscape's security before its initial release, but that Netscape
declined. Now the company has changed its tune. "They're asking us to review
it this time," Bidzos said.
Since we announced our attack, Netscape has publicly released the source code
to its patch for independent scrutiny. But the company has not made available
for public review any other security-critical modules of their programs. Until
they learn their lesson and open their security programming to public
evaluation, many security experts will remain justifiably skeptical of the
company's security claims.
The growth of Internet commerce opens wonderful new opportunities for both
businesses and consumers, but these opportunities can be safely exploited only
after all parties are satisfied with the security of their online financial
transactions. We are concerned that companies are hiding information about
their security modules and shunning public review. We hope that the lessons
learned from this incident will encourage software companies to openly embrace
in-depth public scrutiny of their security software as both a natural and
necessary part of the software-development cycle.


Resources and References


A number of resources are available to help programmers in generating
cryptographically strong random numbers. One of the best is Colin Plumb's
article "Truly Random Numbers" (Dr. Dobb's Journal, November 1994), which
provides both source code and a discussion of his technique. 
Another helpful resource is "Randomness Recommendations for Security" (RFC
1750). (RFCs are widely available memos detailing information of broad
relevance to the Internet community.) Also, Bruce Schneier's book Applied
Cryptography (John Wiley & Sons, 1994) offers a helpful introductory
discussion of randomness. 
Finally, we have collected links to many resources for generating
cryptographically strong random numbers on a supplemental World Wide Web page
at http://www.cs.berkeley.edu/~daw/netscape-randomness.html, which includes
links to: 
Ron Rivest's "The MD5 Message-Digest Algorithm" (RFC 1321) describes this
popular mixing function in detail.
Adam Back, David Byers, and Eric Young's "Another SSL breakage...". Post to
cypherpunks mailing list, August 15, 1995.
Damien Doligez's "SSL challenge--broken!". Post to cypherpunks mailing list,
August 15, 1995.
Marc Van Heyningen's "Re: What's the netscape problem." Post to www-security
mailing list, September 20, 1995.
"International Traffic in Arms Regulations," 22 CFR 120-130. Federal Register,
vol. 58 no. 139. July 22, 1993.
Sameer Parekh's "Community ConneXion Corrects Inaccuracies in Netscape Press
Release."


Acknowledgments


We owe tremendous appreciation to David Oppenheimer for his thorough
commentary on an early draft of this article; his help has greatly improved
the explanation of our attack.
Figure 1: A typical C program that uses a PRNG.
srand(time(0));
 ...
printf("You rolled the die, and got a %d.\n", 1 + (rand()%6));
Figure 2: The Netscape 1.1 seeding process: pseudocode.
global variable seed;
RNG_CreateContext()
 (seconds, microseconds) = time of day; /* Time elapsed since 1970 */
 pid = process ID; ppid = parent process ID;
 a = mklcpr(microseconds);
 b = mklcpr(pid + seconds + (ppid << 12));
 seed = MD5(a, b);
mklcpr(x) /* not cryptographically significant; shown for completeness */
 return ((0xDEECE66D * x + 0x2BBB62DC) >> 1);
MD5() /* a very good standard mixing function, source omitted */
Figure 3: The Netscape v1.1 key-generation process: pseudocode.
RNG_GenerateRandomBytes()
 x = MD5(seed);
 seed = seed + 1;
 return x;
global variable challenge, secret_key;
create_key()
 RNG_CreateContext();
 tmp = RNG_GenerateRandomBytes();
 tmp = RNG_GenerateRandomBytes();
 challenge = RNG_GenerateRandomBytes();
 secret_key = RNG_GenerateRandomBytes();



































































Line-Segment Clipping Revisited


Extending an old favorite to 3-D




Victor J. Duvanenko, W.E. Robbins, and R.S. Gyurcsik


Victor is an engineer at Truevision in Indianapolis, IN. He can be contacted
at victor@truevision.com.


Our previous article, "Improving Line Segment Clipping" (DDJ, July 1990),
presented improvements on the Cohen-Sutherland (CS) line-clipping algorithm.
Since then, we've made more improvements, some with the help of algorithms
such as Nicholl-Lee-Nicholl (NLN). In this article, we'll discuss some of
these improvements, extend the results to three dimensions, and introduce the
Sutherland-Hodgman clipping algorithm.
The CS and NLN algorithms are based on the assumption that the database being
viewed is much larger than the window. Thus, most of the spans (which are
lines with endpoints, or line segments) will not appear in the window at all
and must be eliminated as quickly as possible. Both algorithms have been
optimized to reject spans quickly and are suboptimal for trivial acceptance.
In the case of trivial acceptance, a span is completely inside the clipping
window, which is assumed to be rectangular with axes-aligned (orthogonal)
edges. In some cases, you may need an algorithm optimized for trivial
acceptance of spans (for instance, if the database fits mostly inside the
window).
CS and NLN trivially accept the span after completing eight floating-point
comparisons for 2-D. CS trivially accepts a span after 12 floating-point
comparisons for 3-D. (The NLN algorithm doesn't yet exist for 3-D.) Both
algorithms compare the two span endpoints to each of the clipping
window/volume boundaries (2*4=8 for 2-D, 2*6=12 for 3-D).
The only inputs to a span-clipping algorithm are the coordinates of the span
endpoints and the clipping window boundaries. Since orthogonal window
boundaries are assumed, the problem can be reduced to a series of independent,
one-dimensional problems. Each problem tries to resolve whether the span
endpoints are within the range defined by the window boundaries. If the
endpoints are within the window boundaries of all dimensions, then the span
must be inside the window. CS and NLN perform four comparisons per dimension
for trivial acceptance. However, the minimal number of comparisons necessary
for trivial acceptance is of interest.
A more precise way to state the problem is as follows: "What is the minimum
number of comparisons necessary to ascertain that Xmin </= {X0,X1} </= Xmax,
where X0 and X1 are the span endpoints and it is known that Xmin </= Xmax?"
The solution is a three-comparison method based on the MIN/MAX algorithm, as
described in Baase's Computer Algorithms and our article, "Optimal
Determination of Object Extents" (DDJ, October 1990). The algorithm is as
follows:
1. Compare X0 to X1.
2. Compare the larger to Xmax.
3. Compare the smaller to Xmin.
If the larger point is smaller than Xmax and the smaller point is larger than
Xmin, then both points are inside the range; see Figure 1. The new algorithm
compares the endpoints to each other instead of always comparing the endpoints
to the boundaries (as do the current CS and NLN implementations). This method
allows the algorithm to obtain a greater amount of useful information. In
fact, the most information is obtained by comparing two previously untouched
items (two endpoints/vertices, in this case) to each other.
An implementation of this algorithm, applied to span trivial acceptance, is
shown in Listing One. This algorithm accepts a span only for the cases E and F
in Figure 2. MIN/MAX was applicable because of the orthogonal nature of the
window boundaries, as the window boundary occupied a single value (Xmin or
Xmax) in the dimension being checked. In summary, a minimum of six comparisons
for 2-D and nine comparisons for 3-D are necessary to trivially accept a span,
compared with eight comparisons performed by the CS and NLN algorithms for
2-D, and 12 comparisons by CS for 3-D. This constitutes a computational
savings of 25 percent.


Minimal Trivial Rejection


Trivial rejection is a special case of a span being completely outside of the
window. The span must also be in a position such that both endpoints are
located entirely within the outside half-space of one of the clipping-window
boundaries (spans G and H, but not K in Figure 3). The endpoints of span G are
above the top boundary, and the endpoints of span H are to the right of the
right boundary. On the other hand, span K cannot be trivially rejected because
both its endpoints are not located within the outside half-space of any one
boundary. These conditions allow the use of a single comparison to determine
the rejection of a span--and that makes the rejection trivial. The comparison
is a one-dimensional operation.
NLN performs fewer floating-point comparisons than CS in all trivial-rejection
cases. For example, NLN rejects a span after as few as two floating-point
comparisons when a span is strictly to the left of the window, whereas CS
performs as many as six floating-point comparisons. Thus, it is beneficial to
determine the minimal number of comparisons necessary to reject a span.
In a single dimension, two boundaries (Xmin and Xmax) are present when
clipping a span to a window, so the minimal number of comparisons necessary to
reject a span to both boundaries is sought. A more precise way to state the
problem is: "What's the minimum number of comparisons necessary to determine a
True result of the logical expression ((X0<Xmin and X1<Xmin) or (X0>Xmax and
X1>Xmax)), where X0 and X1 are the span endpoints and it is known that Xmin
</= Xmax?"
In the best case, only two comparisons are needed (X0<Xmax and X1<Xmax). In
the worst case, a True result is only possible after three comparisons
(X0>Xmax and X1>Xmax) because the sequential nature of program execution
forces a fixed order of comparisons. (There are several three-comparison
solutions.) 
The trivial-rejection method used by NLN (see Listing Two) achieves the
minimal number of comparisons. However, achieving optimality in the worst case
doesn't imply that a search for a better algorithm is futile. It may be
possible to reduce the number of cases that require maximal work, thus
achieving an algorithm that, on average, performs less work. Listing Three
shows a one-dimensional decision tree used in NLN and an alternative method
that performs fewer comparisons on average.
Table 1 shows a detailed analysis of all possible cases. A total of nine cases
are possible, since each of the endpoints can reside in one of the three
regions shown in Figure 2. Obviously, neither algorithm exceeds the minimal
bound of three comparisons.
The NLN algorithm performs 2.33 comparisons on the average; the DGR algorithm
performs 2.22. Thus, a more efficient decision tree exists, which will lead to
a more efficient implementation of the NLN, and possibly of other algorithms.


Implementation Enhancements


In addition to being a well-established algorithm for clipping spans in 2-D,
CS easily extends to 3-D. We enhanced the CS implementation without disturbing
its ability to extend to 3-D. The result is an efficient 2-D and 3-D algorithm
called CS (DGR); see Listing Four. 
Two outcodes procedure calls are replaced by a single call to the
reject_accept_outcodes procedure, which implements optimal trivial rejection.
Processing the two span endpoints together allows a decision of trivial
rejection to be made sooner.
The slope subproducts dx and dy are computed once, since clipping does not
change the slope of a span. The number of floating-point operations per
intersection computation is reduced as redundant subtractions are eliminated.
Note that dx and dy, computed from the original endpoints, are more accurate
than those computed from clipped endpoints.
The While loop is removed. This unfolds the divide-and-conquer strategy, but
allows the removal of redundant computations. In CS, the While loop can go
through as many as four iterations for 2-D. Up to two initial iterations will
be used for clipping the first endpoint, and up to two of the concluding
iterations will be used for the second endpoint. In fact, a span endpoint can
be clipped to, at most, two of the four extended window boundaries.
Spans are clipped in each dimension--not to each boundary--by each If
statement, which reduces the number of If statements used. The computations of
the intersection with the right and the left window edge are similar. The only
difference is the window edge the span is being clipped against (x_right or
x_left). Thus, these computations can be merged into a single If statement;
see Listing Four. Coupled with the While loop removal, this implies that only
four sections are needed to implement the algorithm.
The While loop has been unfolded, so each endpoint must be clipped separately,
making span-endpoint swapping unnecessary.
The call to the outcodes procedure and acceptance/rejection testing after the
endpoint adjustment have been replaced by custom tests that accept or reject a
span as soon as possible. This reduces the number of redundant floating-point
comparisons. In all cases, the CS implementation performs more floating-point
comparisons than any other floating-point operation.
Our July 1990 article showed that clipping to a boundary moves the span
endpoint onto the boundary. Thus, the dimension to which a span endpoint has
been clipped need never be checked again, and some comparisons can be
eliminated. For example, when clipping to the top or bottom boundary, the new
span endpoint must be compared to the left and right boundaries only. This
eliminates one-half of the comparisons performed at each clipping step for
2-D, and one-third for 3-D. It also means that the number of comparisons
should diminish at each iteration. In fact, no comparisons are needed after
the second adjustment of the second endpoint.
Also, once the first endpoint has been adjusted (at most, twice) to its final
position, you know whether the span intersects the clipping window. At this
point, the span either missed the window and is trivially rejected, or the
first intersection point between the span and the window has been found. Thus,
when adjusting the second endpoint no trivial-rejection testing is necessary,
since only trivial acceptance is possible. 
These enhancements are fairly independent of each other, allowing for a family
of implementations varying in efficiency and complexity. For example, addition
of the optimal trivial-rejection procedure and the slope-subproduct
computation to the CS implementation results in a fairly efficient and simple
implementation that extends to 3-D.
We found--but did not use because of the lack of extendibility to 3-D--one
implementation enhancement in Hill's Computer Graphics: The slope can be
computed only once. In the worst case, this improvement removes one
multiplication and one division (averaged across symmetry). However, an
additional floating-point comparison is required in every case not trivially
rejected or accepted, since a vertical span (slope is infinity) is checked and
handled as a separate case. But the IEEE P754 floating-point standard handles
infinity properly: When dividing by 0, the result is infinity; when dividing
by infinity, the result is 0. Thus, when you use a compiler or a processor
that conforms to the IEEE P754 standard, no extra comparison is needed. The
method is advantageous for all cases, since the redundant floating-point
operations in slope computations are removed. Not all computer systems conform
to the IEEE P754 standard however, which may make the implementation less
portable.
Extending the CS (DGR) 2-D implementation to 3-D is not difficult. Parametric
equations are used, but the slope subcomponents (dx, dy, and dz) are computed
only once. If slopes were to be precomputed, three slope computations would be
necessary (dy/dx, dz/dx, and dz/dy). This would require extra division-by-0
checks in a non-IEEE-compliant environment and an extra division for
single-endpoint adjustment (as only two slopes are necessary for the
intersection computation).
Finally, NLN and CS (DGR) rely on the top window boundary being above the
bottom boundary and the left boundary being to the left of the right boundary.
Otherwise, results produced by various algorithms will differ. Infractions of
this rule can be easily detected when the clipping-window dimensions are
changed. All rendering operations can then be disabled whenever the
clipping-window dimensions are "unreasonable." However, differences in the
outputs of span-clipping algorithms warrant cautious use. 



Sutherland-Hodgman Algorithm 


Sutherland and Hodgman (SH) suggest the application of their polygon-clipping
algorithm to span clipping. SH separates the process of clipping to a window
into disjoint processes of clipping to each boundary. This approach resembles
a pipeline: Each boundary is a stage in the pipeline and can be applied to
graphics-system architectures such as the Geometry Engine and the Pixel
Machine. Listing Five shows one possible 2-D implementation of SH, SH (DGR).
This implementation gains efficiency by clipping to both X boundaries, Xmin
and Xmax, in one stage of the pipeline. This suggests a shorter pipeline, with
each stage treating a dimension instead of a single boundary. Other possible
implementations and the 3-D algorithm are available electronically; see
"Availability," page 3.
SH (DGR) works on a simple principle. If both endpoints of a span are in the
"outside" half-space of a window boundary, the span is rejected. If one
endpoint is in the "outside" half-space while the other endpoint is in the
"inside" half-space, the "outside" portion of the span is clipped away. If
both endpoints are in the "inside" half-space, nothing needs to be done. Thus,
the algorithm always removes the invisible portion of a span. SH (DGR) is
compact and conceptually simple. It provides symmetry (or reentrancy), extends
easily to 3-D, performs few comparisons (8 for 2-D, 12 for 3-D), and has
inherent parallelism.
Integer comparison can be eliminated from SH (DGR) by placing clip_D inline.
You can unfold the algorithm completely so that the slope computations from
the x-dimension clipping can be utilized during the y-dimension clipping.
Finally, SH (DGR) can be extended to convex clipping windows, just like the SH
polygon algorithm.
The SH (DGR) method has two inefficiencies, however. First, it performs
redundant slope computations. Second, the trivial rejection of spans is
nonoptimal. In fact, only the first call to clip_D (the first dimension) is
optimal.


Comparison of Algorithms


Nicholl, Lee, and Nicholl have developed metrics and techniques for direct
comparison of clipping algorithms, which we further refined. Grouping cases in
a reasonable manner simplifies presentation of algorithm analysis. For
example, Figure 4 shows the results of all cases when the first span endpoint
(P0) is inside the clipping window. The second span endpoint can be inside the
window, in any of the four corner regions or in any of the four edge regions.
The algorithm-analysis data can be reduced by averaging the four corner cases.
Thus, the data is averaged over symmetry. We used the technique to automate
the generation of all possible cases. Operation counters were added to all
algorithms to obtain a precise count of all considered operations.
We compared CS, Liang-Barsky (LB), NLN, CS (DGR), and the fully expanded
version of the SH (DGR) algorithms for various cases in 2-D. The conclusions
in Table 2 apply to a general scalar, computational model. Each entry in Table
2 indicates the comparison between the algorithm that labels the row and the
one that labels the column, where < means "performs less work than." For
instance, the third row of Table 2 states that, in all cases, NLN performs
less work than CS and LB and less than or equal to the amount of work done by
CS (DGR); furthermore, comparison to SH (DGR) is machine dependent, and thus
inconclusive. 
We also compared CS, LB, CS (DGR), and SH (DGR) for various cases in 3-D. (NLN
is not yet available in 3-D.) Four types of regions are created by the
parallel-piped boundaries in 3-D space: one inside region, six face regions,
twelve edge regions, and eight corner regions. The conclusions are presented
in Table 3. 


Conclusions


The fact that SH (DGR) performs the fewest number of comparisons may be
important for general architectures that can chain an addition and a
multiplication (some present DSPs and floating-point units), since comparisons
become the bottleneck operation. 
CS was designed for a hardware implementation and contains much parallelism.
Sproull and Sutherland describe CS in terms of a hardware implementation, with
all window-boundary comparison performed in parallel. Serial implementations,
such as CS, CS (DGR), and NLN, suffer from a fixed, sequential comparison
order that may not be most efficient for all spans.


Acknowledgments


Thanks to Dr. John F. Hughes for providing many useful suggestions on every
aspect of the first draft of our paper, "Simple and Efficient 2-D and 3-D Span
Clipping Algorithms," published in Computers and Graphics (January 1993).


References


Baase, S. Computer Algorithms. Reading, MA: Addison-Wesley, 1988. 
Duvanenko, V.J. "Simple and Efficient 2-D and 3-D Span Clipping Algorithms."
MS Thesis, North Carolina State University, December 1990.
Hill, F.S., Jr. Computer Graphics. Houston, TX: Macmillan Publishing, 1990.
Nicholl, T.M., D.T. Lee, and R.A. Nicholl. "Efficient New Algorithm for 2-D
Line Clipping: Its Development and Analysis." Computer Graphics (ACM SIGGRAPH)
(July 1987).
Rankin, J.R. Computer Graphics Software Construction. Englewood Cliffs, NJ:
Prentice Hall, 1989.
Sproull, R.F., and I.E. Sutherland. "A Clipping Divider." Fall Joint Computer
Conference, 1968.
Sutherland, I.E., and G.W. Hodgman. "Reentrant Polygon Clipping."
Communications of the ACM (January 1974).
Figure 1: One-dimensional trivial-acceptance flow.
Figure 2: All trivial-acceptance/rejection cases.
Figure 3: G and H can be trivially rejected; K cannot.
Figure 4: Analysis of the algorithms when PO is in the clipping window.
Table 1: Detailed analysis for all possible trivial-acceptance/rejection
cases.
X0 X1 Comparisons Comparisons (NLN) (DGR)
A A 2 2
A E 2 2
A C 2 2
E A 2 2
E E 2 2
E C 2 3
C A 3 2
C E 3 2
C C 3 3
Table 2: Comparison of 2-D algorithms: MD, machine dependent (and thus
inconclusive); a, in most cases; b, lost only in trivial rejection cases; c,
in all but two cases; d, equal to, in trivial-rejection cases; e, if
(subtraction--addition)<(division--multiplication).
 CS(NS) LB NLN CS(DGR) SH(DGR)
CS (NS) -- MD >e > >a
LB MD -- > >c >

NLN <e < -- d MD
CS(DGR) < <c d -- >a
SH(DGR) <a < MD <a,b --
Table 3: Comparison of 3-D algorithms: MD, machine dependent (and thus
inconclusive); a, in most cases; b, in most cases if floating-point comparison
is faster than division.
 CS (NS) LB CS (DGR) SH (DGR)
CS (NS) --- MD > MD
LB MD -- >b >a
CS (DGR) < <b -- MD
SH (DGR) MD <a MD --

Listing One
/* Optimal 3-D trivial accept procedures. Return as soon as
 trivial acceptance fails. All dimensions must accept for the procedure
 to accept. Acceptance is border inclusive. */
#define ACCEPT 1
#define BOOLEAN int
#define ACCEPT_TEST( p1, p2, min, max ) \
 if ( p1 < p2 ) { \
 if ( p1 < min ) return( !ACCEPT ); \
 if ( p2 > max ) return( !ACCEPT ); \
 } else { \
 if ( p2 < min ) return( !ACCEPT ); \
 if ( p1 > max ) return( !ACCEPT ); \
 }
extern double x_min, x_max, y_min, y_max, z_min, z_max;
BOOLEAN optimal_triv_acc_3D( x1, y1, z1, x2, y2, z2 )
double x1, y1, z1, x2, y2, z2;
{
 ACCEPT_TEST( x1, x2, x_min, x_max)
 ACCEPT_TEST( y1, y2, y_min, y_max)
 ACCEPT_TEST( z1, z2, z_min, z_max)
 return( ACCEPT );
}

Listing Two
/* Nicholl-Lee-Nicholl trivial rejection (optimal). */
#define REJECT 1
#define BOOLEAN int
#define REJECT_TEST( p0, p1, min, max ) \
 if ( p0 < min ) { \
 if ( p1 < min ) return( REJECT ); \
 } else if ( p0 > max ) \
 if ( p1 > max ) return( REJECT );
extern double x_min, x_max, y_min, y_max, z_min, z_max;
BOOLEAN NLN_trivial_reject_3D( x0, y0, z0, x1, y1, z1 )
double x0, y0, z0, x1, y1, z1;
{
 REJECT_TEST( x0, x1, x_min, x_max )
 REJECT_TEST( y0, y1, y_min, y_max )
 REJECT_TEST( z0, z1, z_min, z_max )
 return( !REJECT ); /* couldn't reject */
}

Listing Three
/* NLN 1-D trivial rejection (worst case optimal). */
 if ( x0 < x_a ) {
 if ( x1 < x_a ) return( REJECT );
 } else if ( x0 > x_b )
 if ( x1 > x_b ) return( REJECT );

/* DGR 1-D trivial rejection (worst case optimal, better on average). */
 if ( x0 < x_a ) {
 if ( x1 < x_a ) return( REJECT );
 } else if ( x1 > x_b )
 if ( x0 > x_b ) return( REJECT );

Listing Four
/* 2D CS (DGR) implementation - achieves optimal trivial rejection. */
#define ACCEPT 1
#define REJECT 0
/* Clipping volume boundaries - accessable to all routines */
extern double x_right, x_left, y_bottom, y_top, z_front, z_back;
/* Reject lines as quickly as possible and set 'outcodes'. */
static reject_accept_outcodes( x0, y0, outcode0, x1, y1, outcode1 )
double x0, y0; int *outcode0;
double x1, y1; int *outcode1;
{
 if ( x0 < x_left ) {
 if ( x1 < x_left ) return( REJECT );
 else *outcode0 = 1;
 }
 else if ( x0 > x_right ) {
 if ( x1 > x_right ) return( REJECT );
 else *outcode0 = 2;
 }
 else *outcode0 = 0;
 if ( y0 < y_bottom ) {
 if ( y1 < y_bottom ) return( REJECT );
 else if ( y1 > y_top ) *outcode1 = 8;
 else *outcode1 = 0;
 *outcode0 = 4;
 }
 else if ( y0 > y_top ) {
 if ( y1 > y_top ) return( REJECT );
 else if ( y1 < y_bottom ) *outcode1 = 4;
 else *outcode1 = 0;
 *outcode0 = 8;
 }
 else {
 if ( y1 < y_bottom ) *outcode1 = 4;
 else if ( y1 > y_top ) *outcode1 = 8;
 else *outcode1 = 0;
 }
 if ( x1 < x_left ) *outcode1 = 1;
 else if ( x1 > x_right ) *outcode1 = 2;
 return( !REJECT );
}
/* Body of the CS (DGR) 2D line clipping algorithm implementation. */
clip_2d_dgr( x0, y0, x1, y1 )
double *x0, *y0; /* first end point */
double *x1, *y1; /* second end point */
{
 unsigned outcode0, outcode1;
 double dx, dy, /* change in x and change in y */
 edge; /* clipping edge */
 if ( reject_accept_outcodes(*x0,*y0,&outcode0,*x1,*y1,&outcode1) == REJECT)
 return( REJECT );
 if ( ! ( outcode0 outcode1 )) /* trivial accept */
 return( ACCEPT );

 dx = *x1 - *x0; /* Calculate the slope subproducts only once. */
 dy = *y1 - *y0;
 if ( outcode0 & 3 ) { /* divide line at right or left of window */
 if ( outcode0 & 1 ) edge = x_left;
 else edge = x_right;
 *y0 += dy * ( edge - *x0 ) / dx;
 *x0 = edge;
 if ( *y0 < y_bottom )
 if ( outcode1 & 4 ) return( REJECT );
 else outcode0 = 4;
 else if ( *y0 > y_top )
 if ( outcode1 & 8 ) return( REJECT );
 else outcode0 = 8;
 else if ( ! outcode1 ) return( ACCEPT );
 else outcode0 = 0;
 }
 if ( outcode0 & 12 ) { /* divide line at top or bottom of window */
 if ( outcode0 & 4 ) edge = y_bottom;
 else edge = y_top;
 *x0 += dx * ( edge - *y0 ) / dy;
 *y0 = edge;
 if (( *x0 < x_left ) ( *x0 > x_right )) return( REJECT );
 if ( ! outcode1 ) return( ACCEPT );
 }
 /* Clip P1 end. P0 is inside the window. It's only possible to accept. */
 if ( outcode1 & 3 ) { /* divide line at right or left of window */
 if ( outcode1 & 1 ) edge = x_left;
 else edge = x_right;
 *y1 += dy * ( edge - *x1 ) / dx;
 *x1 = edge;
 if ( *y1 < y_bottom ) outcode1 = 4;
 else if ( *y1 > y_top ) outcode1 = 8;
 else return( ACCEPT ); /* (outcode0 outcode1) == 0 */
 }
 if ( outcode1 & 12 ) { /* divide line at top or bottom of window */
 if ( outcode1 & 4 ) edge = y_bottom;
 else edge = y_top;
 *x1 += dx * ( edge - *y1 ) / dy;
 *y1 = edge;
 return( ACCEPT );
 }
}

Listing Five
/* SH (DGR) 2D algorithm. */
#define ACCEPT 1
#define REJECT 0
/* Clipping rectangle boundaries - accessable to all routines */
extern double y_bottom, y_top, x_right, x_left;
/* Clip X or Y dimension. */
static clip_D( x0, y0, x1, y1, min_boundary, max_boundary )
double *x0, *y0, *x1, *y1, min_boundary, max_boundary;
{
 double m;
 if ( *x0 < min_boundary ) { /* divide line at the min_boundary */
 if ( *x1 < min_boundary ) return( REJECT );
 m = ( *y1 - *y0 ) / ( *x1 - *x0 ); /* x1 >= min_boundary */
 *y0 += m * ( min_boundary - *x0 );
 *x0 = min_boundary;

 if ( *x1 > max_boundary ) {
 *y1 += m * ( max_boundary - *x1 );
 *x1 = max_boundary;
 }
 }
 else if ( *x0 > max_boundary ) { /* divide line at the max_boundary */
 if ( *x1 > max_boundary ) return( REJECT );
 m = ( *y1 - *y0 ) / ( *x1 - *x0 ); /* x1 <= max_boundary */
 *y0 += m * ( max_boundary - *x0 );
 *x0 = max_boundary;
 if ( *x1 < min_boundary ) {
 *y1 += m * ( min_boundary - *x1 );
 *x1 = min_boundary;
 }
 }
 else { /* x0 is inside the window */
 if ( *x1 > max_boundary ) {
 *y1 += ( *y1 - *y0 ) / ( *x1 - *x0 ) * ( max_boundary - *x1 );
 *x1 = max_boundary;
 }
 else if ( *x1 < min_boundary ) {
 *y1 += ( *y1 - *y0 ) / ( *x1 - *x0 ) * ( min_boundary - *x1 );
 *x1 = min_boundary;
 }
 }
 return( ACCEPT );
}
/* The body of the SH (DGR) algorithm. */
clip_2d_sh_dgr( x0, y0, x1, y1 )
double *x0, *y0, *x1, *y1;
{
 if ( clip_D( x0, y0, x1, y1, x_left, x_right ) == REJECT) return( REJECT);
 if ( clip_D( y0, x0, y1, x1, y_bottom, y_top ) == REJECT) return( REJECT);
 return( ACCEPT );
}
DDJ



























Password Files


MD5 for secure passwords




Trevor J. Pope


Trevor is a software engineer specializing in real-time embedded systems. He
can be contacted at 38 Somerset Road, Kensington, 2094, South Africa or at
trevor@infoweb.co.za.


Recently, while developing an access-control method for an embedded system, I
implemented one-way encryption of passwords using the MD5 Message Digest
algorithm. The MD5 Message Digest Algorithm was invented by Ron Rivest of RSA
Data Security (inventors of the patented RSA public-key cryptography
algorithm). The MD5 algorithm calculates a practically unique 128-bit message
digest of a file. It is claimed that the odds of a different file producing
the same digest are quite small--one in 264. Thus, the digest protects a file,
because any alteration will show up as a changed digest. The digest can be
used for one-way encryption so that data can be verified without being
disclosed. This is how the passwords in the method described here are
protected.
The approach I implemented uses a password file that contains the user records
in plaintext, together with the user digests. The entire file is protected
using a digest to detect unauthorized alterations to the file. Under UNIX,
password files can be read by users, but are protected from tampering by
file-access restrictions. Since this wasn't possible in my embedded system, I
used a file digest to protect the password file from tampering.
Each user record contains the user's name and other personal details, as well
as the record's access rights and digest. The digest is calculated on the
visible contents of the record as well as the user's password, to produce a
unique, 128-bit key. The length of the key (and the way it is calculated)
makes it unlikely that the password can be guessed. 


The Password File


The password file (an ASCII file that can be read with a text editor) contains
a header and a number of user entries. The header contains the file digest, a
string containing the machine name, and the number of password entries in the
file. The digest is calculated on the whole file so that if the file is
altered in any way, this will be detected. If the digest is incorrect, an
attempt may have been made to alter the file, and the file is rejected.
Each entry provides for one user. The plaintext part contains the user's name,
employee number, and access rights. The digest is constructed using the
plaintext part and the user-entered password. The resulting 128-bit key is
stored as a string of 32 hex characters at the end of the user record. To be
validated, the user supplies a username and password. The username is used to
locate the user record in the password file. The plaintext part of the record,
together with the user-supplied password, is used to calculate the digest. If
the calculated digest is the same as the digest stored in the record, the user
is validated. The user's password remains secret: It is not stored separately
in the file, and it is computationally infeasible to guess it.
Because the MD5 algorithm is well known, it would be simple to change the file
and recalculate the digest. One way to prevent this is to implement a keyed
hash function. My approach, however, was to add encryption and scrambling to
the digest; this is a key feature of my password file. There are a number of
ways to scramble the digest. Any change in the order of the data will affect
the digest, so even simply reordering the file as the digest is calculated
would complicate things for a cracker. As long as this encryption remains
secret, the password file will be impregnable. 


The Code


The code that implements the technique I used consists of the password module,
a command-line-driven test module, and the MD5 digest-algorithm code. The code
is written for ANSI C and has been compiled and tested using the GNU C
compiler (gcc) on a Sun under Solaris 1.1 (4.3 BSD) and Solaris 2.3 (basically
UNIX System V), and under Linux. I obtained the MD5 digest package via FTP
from info.cert.org in /pub/tools/md5 and used it as is. (Thanks to Ron Rivest
and RSA Data Security for placing this code into the public domain.)
The password module consists of password.c, password.h (the public
interfaces), and passwrdc.h (the public data definitions). These files, along
with the MD5 modules used (as distributed by RSA Data Security), are available
electronically; see "Availability," page 3. Listing One is passtest.c, a
program that tests the operation of the password package. The code can be
compiled and linked with gcc as follows: 
gcc -O -g -Wall -static -I. -I/usr/5include -I/usr/include -o passtest
passtest.c Password.c md5c.c -L/usr/5lib.
The -g flag includes debug information; this should be omitted from the final
version. Figure 1 is sample output from the test program.
Running the resulting executable allows the various functions in password.c to
be exercised by entering the first letter out of the list of options provided.
You could start off by making a new password file and answering the questions.
The resulting password file contains 20 entries--the first is a user with
supervisor rights, as per the questions you answered. The remaining 19 entries
are marked as unused. The single user needs supervisor rights to add and
delete other users.
Once you have a valid password file, you test whether or not it is okay by
pressing "I" (initialize), whereupon the file is read into memory and
validated by checking the digest. Once a password file has been validated, you
can perform other actions: validate, add and delete users, or change
passwords. Each time a user is added/deleted, or the user changes his or her
password, the password file is updated and written with the new digest. You
can also display the password file in memory and generate a list of users
suitable for display to a user. 
The code enforces several common-sense rules:
Only a user with a supervisor privilege can add and delete users.
Supervisors cannot delete themselves.
Only a user can change his or her password, and only by entering the previous
password first. If the user forgets the password, a supervisor must log on and
delete and add the user with a new initial password. The user can alter the
initial password to a secure password known only to the user at any later
time.
A new password must pass an obviousness test to be accepted.
There can be more than one supervisor. If the only supervisor forgets his
password, a new password file must be made from scratch and copied over the
old.


Security


To keep the system secure, the facility to create a new password file from
scratch must be guarded. Depending on the actual implementation, there are a
number of ways to attack a system using the code I present here. Among the
strategies an attacker could adopt are:
Cracking individual passwords. Nonobvious passwords that are long enough are
computationally infeasible to crack. The user must choose a password that is
suitably randomized and long enough. 
Adding entries to the password file. If this can be done, then the new entry
could be used to compromise the system. Once the attack is over, the old
password file can be restored to allow the attacker to cover his tracks. This
will only succeed if the password file can successfully be altered with a new,
valid digest. Proper encryption of the file digest will make the attacker's
task extremely difficult.
Intercepting passwords. This will depend very much on the architecture of the
target system. If a serial terminal is used, a serial communications monitor
can be used to reveal the password. If an X terminal is used, network sniffers
can reveal it. On an X terminal, another application could simply grab the
keyboard while the user types in the password. 
Accessing the source code. This will allow the attacker to analyze the
algorithms used. This is unimportant except for the part used to encrypt the
file digest; still, this needs to be guarded carefully.
Using a debugger. Under UNIX, it is possible to attach a debugger to a process
and then set a break point just after the user enters a password. On other
systems, this may not be possible, particularly if the code is running out of
ROM. If possible, control access to the system via telnet, rlogin, and the
like to prevent this sort of attack. To make this type of attack more
difficult, the password field is cleared as soon as the user digest has been
calculated. Don't distribute the executable with debugging information
included!


For More Information



RFC1321: The MD5 Message-Digest Algorithm. Ron Rivest, RSA Data Security Inc.
April 1992. Available from info.cert.org in /pub/tools/md5.
Schneier, Bruce. "One-Way Hash Functions." Dr. Dobb's Journal (September
1991).
Stallings, William. "SHA: The Secure Hash Algorithm." Dr. Dobb's Journal
(April 1994).
Figure 1: A starting password file with one entry with supervisor privileges.
This was created using the New function in the test software. One user is
defined and has supervisor privileges (logon username Trevor, password Pope).
Other user records are marked as unused. 
efa00b25cc4b3fb5aa29d507f9d2ad38 1
MachineName
Trevor 0123456789 s 0c47cdd73a4e8f70c67671f968eae636
Unused Nobody * #
Unused Nobody * #
Unused Nobody * #
Unused Nobody * #
Unused Nobody * #
Unused Nobody * #
Unused Nobody * #
Unused Nobody * #
Unused Nobody * #
Unused Nobody * #
Unused Nobody * #
Unused Nobody * #
Unused Nobody * #
Unused Nobody * #
Unused Nobody * #
Unused Nobody * #
Unused Nobody * #
Unused Nobody * #
Unused Nobody * #

Listing One
/*************************************************************************/ 
/* File: passtest.c */ 
/* Password package - test program */ 
/* Simple test program to check the operation of the password package */ 
/* under unix. */ 
/* Copyright (C) 1994-5, Trevor J. Pope -- All rights reserved. */ 
/* License to copy and use this software is granted provided that this */ 
/* copyright notice is retained in any copies of any part of this */ 
/* software. */ 
/* No representations are made concerning either the merchantability */ 
/* of this software or the suitability of this software for any */ 
/* particular purpose. It is provided "as is" without express or */ 
/* implied warranty of any kind. */ 
/*************************************************************************/ 
 
/* Common library includes */ 
#include <stdio.h> /* for file I/O */ 
#include <time.h> /* for time functions */ 
#include <string.h> /* for strcpy */ 
#include <errno.h> /* for errno etc */ 
 
/* Pull in the standard include for the package */ 
#include "Password.h" 
 
/*************************************************************************/ 
/* main() -- Entry point for the test program */ 
/* Passed : command line parameter list */ 
/*************************************************************************/ 
int main( int argc, char * argv[], char * env[] ) 
{ 

 int result; 
 char option, optionString[80]; 
 char userName[USER_NAME_LENGTH+1]; 
 char userNumber[EMPLOYEE_NUMBER_LENGTH+1]; 
 char accessString[ACCESS_RIGHTS_LENGTH+1]; 
 char password[PASSWORD_LENGTH+1]; 
 char superUserName[USER_NAME_LENGTH+1]; 
 char superPassword[PASSWORD_LENGTH+1]; 
 char newPassword[PASSWORD_LENGTH+1]; 
 char passwordFileName[51]; 
 char machineName[MACHINE_NAME_LENGTH+1]; 
 char userDetails[USER_DETAILS_LENGTH]; 
 
 /* Loop getting options and executing them until a quit is selected */ 
 do { 
 /* Show menu */ 
 printf("\n\ 
Select: (i)nitialise (v)alidate (c)hange (a)add (d)elete\n\ 
 (n)ew (u)sers (s)how (h)elp (q)uit : "); 
 scanf( "%s", &optionString[0] ); 
 option = optionString[0]; 
 /* Act on option ... */ 
 switch( (int)option ) 
 { 
 case 'i': 
 { 
 printf("\n initPassword - enter password file name :"); 
 scanf("%s", passwordFileName ); 
 result = initPassword( passwordFileName ); 
 printf("\n initPassword() returned %s \n", 
 &passReturnCodeStrings[result][0] ); 
 break; 
 } 
 case 'v': 
 { 
 printf("\n validateUser - enter userName :"); 
 scanf("%s", userName ); 
 printf("\n validateUser - enter pass word :"); 
 scanf("%s", password ); 
 result = validateUser( userName, password, accessString ); 
 if( result == PASS_OK ) 
 { 
 printf("validateUser() returned %s (Access level is %s )\n", 
 &passReturnCodeStrings[result][0], accessString ); 
 } else { 
 printf("validateUser() returned %s \n", 
 &passReturnCodeStrings[result][0] ); 
 } 
 break; 
 } 
 case 'c': 
 { 
 printf("\n changePassword - enter userName :"); 
 scanf("%s", userName ); 
 printf("\n changePassword - enter old pass word :"); 
 scanf("%s", password ); 
 printf("\n changePassword - enter new pass word :"); 
 scanf("%s", newPassword ); 
 result = changePassword( userName, password, newPassword ); 

 printf("changePassword() returned %s \n", 
 &passReturnCodeStrings[result][0] ); 
 break; 
 } 
 case 'a': 
 { 
 printf("\n addUser - enter supervisor userName :"); 
 scanf("%s", superUserName ); 
 printf("\n addUser - enter supervisor password :"); 
 scanf("%s", superPassword ); 
 printf("\n addUser - enter new userName :"); 
 scanf("%s", userName ); 
 printf("\n addUser - enter new user number:"); 
 scanf("%s", userNumber ); 
 printf("\n addUser - enter new user rights :"); 
 scanf("%s", accessString ); 
 printf("\n addUser - enter new user password :"); 
 scanf("%s", password ); 
 result = addUser( superUserName, superPassword, 
 userName, userNumber, accessString, password ); 
 printf("addUser() returned %s \n", 
 &passReturnCodeStrings[result][0] ); 
 break; 
 } 
 case 'd': 
 { 
 printf("\n deleteUser - enter supervisor userName :"); 
 scanf("%s", superUserName ); 
 printf("\n deleteUser - enter supervisor password :"); 
 scanf("%s", superPassword ); 
 printf("\n addUser - enter userName to delete :"); 
 scanf("%s", userName ); 
 result = deleteUser( superUserName, superPassword, userName ); 
 
 printf("deleteUser() returned %s \n", 
 &passReturnCodeStrings[result][0] ); 
 break; 
 } 
 case 'n': 
 { 
 printf("\n newPasswordFile - enter full pathname : "); 
 scanf("%s", passwordFileName ); 
 printf("\n newPasswordFile - enter machine name :"); 
 scanf("%s", machineName ); 
 printf("\n newPasswordFile - enter supervisor name :"); 
 scanf("%s", userName ); 
 printf("\n newPasswordFile - enter supervisor number:"); 
 scanf("%s", userNumber ); 
 printf("\n newPasswordFile - enter supervisor password :"); 
 scanf("%s", password ); 
 result = newPasswordFile( passwordFileName, machineName, 
 userName, userNumber, password ); 
 printf("newPasswordFile() returned %s \n", 
 &passReturnCodeStrings[result][0] ); 
 break; 
 } 
 case 'u': 
 { 
 showUserDetails( userDetails ); 

 printf("showUserDetails() :\n%s", userDetails ); 
 
 /* Check current user details as recorded internally */ 
 result = currentUserDetails( userName, userNumber, accessString ); 
 if( result == PASS_OK ) 
 { 
 printf("currentUserDetails() returned a valid user:\n\ 
userName: %s\n\ 
userNumber: %s\n\ 
accessString: %s\n", userName, userNumber, accessString ); 
 } else { 
 printf("currentUserDetails() returned NO valid user:\n"); 
 } 
 break; 
 } 
 case 'h': 
 { 
 printf("\n\n%s\n\n", passwordHelp() ); 
 break; 
 } 
 case 's': 
 { 
 displayPasswordStruct(); 
 break; 
 } 
 case 'q': 
 { 
 printf("\n quitting ... \n"); 
 break; 
 } 
 default: 
 { 
 printf("\n unrecognised option (%x)", (int)option ); 
 break; 
 } 
 } /* switch */ 
 
 } while( option != 'q' ); 
 return( 0 ); 
} /























TCP/IP and Windows 95


Adding network support to every program




Andrew Wilson and Peter D. Varhol


Peter is chair of the graduate computer science department at Rivier College.
He can be contacted at pvarhol@mighty .riv.edu. Andrew is a graduate student
at Rivier, and can be contacted at awilson@mighty.riv.edu. 


Network programming for Windows and Windows 95 has an intimidating reputation.
The mystique suggests that only advanced application developers are up to the
task, but nothing is farther from the truth. In reality, Windows 95 network
apps are relatively easy to write.
Still, network applications have to be well planned in terms of communicating
and maintaining connections. In this article, we'll focus on developing
networked applications based on TCP/IP User Datagram Protocol. With this
model, applications support either a single task, where a packet is sent and a
reply is always expected, or environments with many users--all communicating
data that need no specialized checking beyond that provided by the hardware.
To demonstrate network communication under Windows 95, we'll present a TCP/IP
chat program. Since the application will be connectionless, we needn't
establish a real communication session with any single user; instead, we can
communicate with many users simultaneously. "Chat" is able to find other
users, pass a message from the user to the group, and display incoming
messages. (The complete source code and executables for the chat program are
available electronically; see "Availability," page 3.)
Sockets, initially developed for UNIX platforms, are the basis for most IP
development in Windows 95. The sockets interface is an API to the protocols
and service ports those interfaces provide. Sockets provide a standard set of
calls, allowing you to write a base set of instructions that can be moved from
one platform to the next without recoding.
To use Sockets, Windows needs drivers that implement the TCP/IP protocols. A
common development API that provides these interfaces is WINSOCK.DLL, which
allows any TCP/IP-based application to use any vendor's TCP/IP protocol stack.
To link to a network, you must initialize the WINSOCK.DLL and determine the
protocol to use. You must then attach to, communicate with, and close the
socket. This is as straightforward as it sounds, and requires only a few lines
of code. Figure 1 illustrates the sequence of tasks.
WINSOCK allows the application to simply wait until a message indicates that
data is available on a given socket. However, the message system must be
initialized, and WINSOCK must know the handle to the window that is supposed
to receive the messages.
The same API call also prevents an error caused by API calls that may block a
socket or stop the application. There's a bad side effect to polling a socket
to see if it has data--if there is no data, the function stops the application
until there is. Typically, the WINSOCK.DLL will return another error message,
causing you to bang your head on the desk until you realize that the socket
can't be called until the message system is actually in place. This is
confusing because you can write to the socket at any time, but you can't read
from it unless a message is there.


Loading a Socket


Listing One loads and checks to see if WINSOCK.DLL is present and enough
sockets are available. This doesn't lead to anything spectacular, but does the
first-layer initialization. When we use a WINSOCK.DLL, we also ensure it is a
current version; if so, we continue forward with the initialization process
and take control of a socket.
We call the socket to attach to a particular address family, type, and
sometimes a protocol. It returns a socket descriptor, similar to a file
pointer. In this case, we are selecting a socket that will provide simple
datagram support. We then select a protocol and service port by calling
getservbyname.
In the example, we select User Datagram Protocol (UDP) as the protocol and a
service port of "echo." (For more information on UDP, see "A VBX for Network
Applications," by Frank E. Redmond III, DDJ, September 1995.) However, we
override the service port with a custom port. Directly accessing UDP as our
protocol would require us to make the application more dependent on the
operating system. By using getservbyname, we need to know only the protocol's
name; the function will find the correct numeric protocol identification. That
lets us move the function to a new operating system with few code changes.
Listing Two demonstrates this process. 
The binding of the protocol is where we select the protocol and the service
port. For some applications, it is best to generate a custom service port so
other applications do not respond to our application. This is also where we
initialize the messaging system so that we receive the notification that we
can read from the socket. Note also the host and network byte ordering. Host
byte ordering is how we read a bit map. The least significant bit (LSB) and
most significant bit (MSB) are reversed in network byte ordering. A series of
different functions can perform this conversion.
Finally, we implement a message handler via a call to WSAAsyncSelect to alert
us when data is waiting to be read from the socket or when we can send to the
socket. We pass the socket descriptor, a handle to the window that will
receive the messages, the message, and the events we wish to be notified of.
We can then proceed with the regular send and receive functions. Only one
message is sent. A write notification message is sent only when the socket is
ready to be written to; another will not be sent until we have written
something to the socket. A receive notification message is sent only when
there is data on the socket; a second message will not be sent until the
socket has been read from in response to the previous read notification
message.


Sending a Packet


To send a packet, we first build the data portion of the packet and then send
it out to the network. Listing Three assembles the packet and address.
Network addresses are not read in a user-friendly way. For instance,
"198.121.25.22" cannot be sent without first converting it to a more readable
form and encapsulating it in another structure. Then we must build the
individual fields that make up our packet with whatever data we choose, and
pass the packet off to the API function sendto. sendto has several parameters.
It will receive the socket descriptor, the data we wish to send, the size of
the message, any flags we set, the destination socket of the system to which
we're sending the packet, and the length of the destination address. The
packet is launched, and hopefully, something receives it.
Receiving a packet follows nearly the same process, but with less work.
Instead of inserting all the data, we simply grab the incoming packet. After
we have received the packet, we are free to interpret it as we wish. However,
we never directly call the socket to check if data is available; if it is, a
message will be sent to our window. If we call the function before the
notification message is received, the call will error out, stating that the
socket is currently blocked. recvfrom also returns the address from which the
incoming packet came. We can then view the data inside the packet. As our
application closes, we must also close the socket. If not, the WINSOCK.DLL
remains active and the actual socket remains allocated. Listing Four shows how
to receive the packet, examine its contents, and close it.


Building the Application


The chat application can easily show how to communicate across a network and
find other workstations. It can support multiple users in the chat environment
employing the UDP protocol as the transport.
The application opens and requests the user name. When the user enters it, a
broadcast packet is launched. This packet is read by other workstations
running interprocess communications (IPC). Those workstations add the user
name and IP address to the user-list table, then reply to the broadcast. The
reply is taken and the user name and IP address of the replying station are
added to the local user list. 
Broadcasting is simply a way to send a single packet that every IP station
must read. The only stations that respond are those that have an application
using the same service port. In our case, only those running IPC will be able
to respond. To get the broadcast address, you must first know your own
address. Then you can broadcast with no trouble.
We get the address by calling gethostname, which passes a string containing
the system's host name. We pass that into gethostbyname, which returns with a
pointer to a hostent structure containing information about our system. The
h_addr field can then be read, converted into a manageable IP address, and
converted into a broadcast address. This is demonstrated in Listing Five.
Once the initialization is complete and we have a user list, we can receive
and transmit. The user merely types in a message and presses Enter. The
message is encapsulated into the chat packet and forwarded only to the users
in the user list. If no one is in the user list, we do not transmit any
packets. This isolates the traffic to just the users running the application.
It would also be possible to use just IP broadcasts; however, this would
affect many workstations each time a packet was sent, rather than just those
running IPC.
When a user changes his or her name, IPC sends a name-update packet, which
alerts the other workstations in the user list to a name change; they then
update their own user lists to match the name to the IP address.
You can also change the channel that the user is on. Only workstations set to
the same channel can display the packet. All packets are processed, but only
those on the same channel as the workstation are displayed.
When the user exits, the application sends an exit message to all the
workstations, which then remove the user name from the user list so no further
packets are sent to that user. Finally, the application closes the socket
interface and exits.


Porting Chat to Other Operating Systems



TCP/IP applications are fairly portable, which allows applications written in
several operating systems to be virtually identical (with the exception of
message handling and the user interface). The network interface can be taken
from one platform and moved to the other with few changes, making possible
portable TCP/IP applications.
Furthermore, after writing a few lines of code, you are ready to communicate
with the network. At that point, your focus shifts to the protocol choice and
network environment you are using.


Conclusion


It's clear that only a little of this work is network related. After a few
short calls, we provided a stable network environment for many IP
applications. Likewise, the IPC application is straightforward, which leads to
our last point: With the minimal amount of code required to build a TCP/IP
application, there is no reason that any application can't have network
support. It's straightforward to implement, and the code is fairly obvious and
easily maintained. Knowing this, you can add simple network support to nearly
every program without fear of reduced portability.
Figure 1: Sequence of tasks necessary to establish and close a connection.

Listing One
BOOL CChatNet::InitNet(void)
{ 
if(WSAStartup(SOCKET_VERSION, &m_wsaData))
 { 
 // Initializes the WINSOCK.DLL
 TRACE("Couldn't find the winsock.dll\n"); 
 return FALSE; 
 } 
if(m_wsaData.wVersion < SOCKET_VERSION)
 { 
 // Checks to make sure we are at a current revision
 TRACE("Winsock too old a rev\n"); 
 return FALSE; 
 } 
if(m_wsaData.iMaxSockets < SOCKETS)
 {
 // Checks if there are enough sockets
 TRACE("Not enough sockets available\n");
 return FALSE;
 }
return TRUE;
}

Listing Two
BOOL CChatNet::SetProtocol(HWND hWnd)
{
m_Socket=socket(PF_INET,SOCK_DGRAM,0);
if(m_Socket == INVALID_SOCKET)
 {
 TRACE("Cannot snag socket\n");
 return FALSE;
 }
m_station = getservbyname("echo","udp"); // attaches to the protocol
m_station->s_port = htons(IPC_PORT); 
// sets a custom service port in network byte order
if(m_station == NULL)
 {
 TRACE("Can't get service port address\n");
 return FALSE;
 }
source_addr.sin_family = AF_INET; // defines our address family
source_addr.sin_addr.s_addr = INADDR_ANY; 
// incoming address types we will accept
source_addr.sin_port = m_station->s_port; // sets the serivce port we wish to
use
if(bind(m_Socket, (LPSOCKADDR) &source_addr, sizeof(source_addr)) ==
SOCKET_ERROR)
 {

 // binds protocol to socket
 TRACE("Cannot bind to socket\n");
 return FALSE;
 }
if((WSAAsyncSelect(m_Socket,hWnd,SOCKET_MESSAGE, FD_READ FD_WRITE
 FD_CLOSE)) ==SOCKET_ERROR)
 {
 // initializes message handler
 TRACE("Unable to set wsaasync\n");
 return FALSE;
 }
return TRUE;
}

Listing Three
UINT CChatNet::SendPacket(LPCSTR
lpstrDestination, LPCSTR lpstrMessage) 
{ 
SOCKADDR_IN dest_addr;
dest_addr.sin_family = AF_INET; 
dest_addr.sin_addr.s_addr = inet_addr(lpstrDestination); 
dest_addr.sin_port = m_srvinfo->s_port;
return sendto(m_Socket, lpstrMessage, 
MessageLen(),0, (PSOCKADDR) &dest_addr,sizeof(dest_addr)); 
} 

Listing Four
UINT 
CChatNet::RecChatPacket(LPSTR *lpstrSourceAddress, 
CHATPACKET *Message)
{ 
int incoming; 
SOCKADDR_IN inaddr; 
int addrlen = sizeof(inaddr); 
incoming = recvfrom(m_Socket,(char *) 
Message, MessageLen(), 0, (PSOCKADDR) &inaddr,&addrlen);
*lpstrSourceAddress = inet_ntoa(inaddr.sin_addr);
if(incoming == -1) 
 { 
 incoming = 0; 
 incoming = WSAGetLastError() + 2000;
 } 
return incoming; 
}
BOOL
CChatNet::CloseNet 
(void)
{ 
closesocket(m_Socket);
return TRUE;

Listing Five
BOOL 
CChatNet::SendBroadcast(LPSTR UserName)
{ 
int buflen = 50; 
LPHOSTENT Me; 
char *buf = new char[100]; 
if ( gethostname(buf,buflen) == 0)

 Me = gethostbyname(buf); 
//gets this systems host envirnoment 
else 
 return FALSE;
if (Me==NULL) 
turn FALSE; 
ned char octet[4]; 
octet[0] = Me->h_addr[0]; 
//reads the first octet of the ip address and determines the ip
// address calls of the system 
if(octet[0] < 191) 
 { 
 octet[1] = Me->h_addr[1]; 
 octet[2] = '\xff';
 } 
if(octet[0] < 127) 
 octet[1] = '\xff'; 
octet[3] = '\xff'; 
//builds the broadcast ip address
tf(buf,"%d.%d.%d.%d",octet[0],octet[1],octet[2],octet[3]);
//converts the address into a string
if((SendChatPacket(buf,UserName, BROADCAST,NULL)) > 12000)
 {
 delete [] buf;
 return FALSE;
 }
// sends the broadcast packet.
delete [] buf;
return TRUE;


































Multiple Inheritance for MFC 4.0


Inheriting from CObject




Jean-Louis Leroy


Jean-Louis currently works for a small marketing company. He can be reached at
100611.1330@compuserve.com.


The recently released Microsoft Foundation Class library Version 4.0
incorporates features such as full encapsulation of Windows 95 controls, the
ability to create and use OLE controls, and a database-access model called
"Data Access Objects." One thing MFC 4.0 doesn't support, however, is multiple
inheritance. In fact, Microsoft's MFC Tech Note #16 even attempts to dissuade
you from using it. For instance, in discussing multiple inheritance, Tech Note
#16 points out that CObject can appear only once in the inheritance graph of
your class--and CObject-derived classes can only appear on the left-most edges
of the graph. This splits the app's classes into two categories: those that
inherit from CObject (and receive full service from the framework) and those
that MFC ignores.
While this may be Microsoft's choice, it does not necessarily have to be
yours. After all, many people (including Bjarne Stroustrup) believe that
multiple inheritance can be useful. While I won't attempt to convince you of
the value of multiple inheritance in this article, I will show how you can
lift some of Tech Note #16's restrictions without hacking the MFC source or
loosing functionality.


A Peek at MFC 4.0


The last "public" release of MFC was 3.0. Since that time, subscribers to the
Microsoft Developer Network have seen incremental changes in MFC that add
support for Windows Sockets, Simple MAPI, and Windows 95 common controls. MFC
4.0 brings other features to the table, raising the total number of classes to
just over 150.
For example, MFC 4.0 provides support for multithreaded programming through a
new base class, CSyncObject. Under this base class are a collection of derived
classes to handle semaphores, mutexes, synchronization, events, and the like.
Support for OLE is provided through an OLE container that allows you to create
an OLE control. OLE controls are treated as special child windows and
associated with a parent window through CWnd. Thus, you can create an instance
of a control using CWnd::CreateControl. MFC also adds two new OLE common
dialog classes, CPageSetupDialog and COlePropertiesDialog, to encapsulate the
Page Setup dialog box and OLE Properties dialog box, respectively.
A lot of work has gone into some of the less "visual" portions of the
framework. CArchive, for example, is no longer limited to 32 K objects, and
now uses the class-specific new operator if there is one. CStrings use (at
last!) reference counting to greatly reduce the cost of passing or returning a
string by value. Also, redefinition of new as DEBUG_NEW (a debugging technique
promoted by Microsoft) no longer breaks the IMPLEMENT_SERIAL and
IMPLEMENT_DYNCREATE macros.


Supporting Multiple Inheritance


MFC belongs to the category of tree-like frameworks, like the NIH class
library or Borland's OWL 1.0. MFC, however, differs in its support of multiple
inheritance. The NIH class library, for instance, can be compiled with support
for multiple inheritance by making the root Object class a virtual base class.
This approach, however, is dismissed by Tech Note #16. The real problem with
virtual inheritance of CObject, however, is that it breaks some parts of MFC.
It turns out that MFC 4.0 is almost capable of dealing with virtual CObjects.
If you attempt to write and use a class that virtually inherits from CObject,
two problems can arise. The first one occurs only if the object is
serializable and is detected by the compiler. The IMPLEMENT_SERIAL macro will
then trigger an "error: downcast involving a virtual base class" message.
Among other things, IMPLEMENT_SERIAL generates the extraction operator >>,
which reads the next CObject from a CArchive, checks its class, and casts the
pointer to the requested type. Since MFC uses a static cast, IMPLEMENT_SERIAL
will not compile if CObject is virtually inherited. This is easy to fix.
Visual C++ 4.0 supports new cast operators such as dynamic_cast<>. To make
serialization work again you need a VIMPLEMENT_SERIAL that uses dynamic_cast<>
instead of the old cast notation; see Listing One. This is, by the way, a good
opportunity to fill a gap in MFC: VIMPLEMENT_SERIAL and VDECLARE_SERIAL make
it possible to deserialize a pointer to a const object.
The second problem with virtually inheriting from CObject is more difficult to
remedy because it occurs only in debug builds, and the compiler won't warn
you. The program will build and run until functions such as
CMemoryState::DumpAllObjectsSince() are called, either explicitly or when the
program terminates and MFC attempts to dump leaked objects.
At this point, a description of the dump process is necessary. Starting with
Visual C++ 4.0, the C run-time library (CRT) also exists in debug version. A
header file crtdbg.h declares various interfaces to heap diagnostics. You can
use them to monitor heap activity, visit each allocation, get statistics on
heap usage, report memory leaks, and the like. All of this should sound
familiar. Indeed, these functions were previously part of MFC. Moving them to
the CRT solves several problems. For example, you can now use the heap
diagnostics even if your application isn't MFC based--even if it's written in
plain old C. The debug versions of heap functions (_malloc_dbg, _realloc_dbg,
and so on) have extra parameters that associate a source filename and line
number with an allocation. They also require a type parameter: normal, client,
ignore, or CRT. Only nodes of the first two categories appear in heap dumps.
ignore blocks are useful when memory needs to be allocated, but the allocation
pattern differs between runs of the same program. An example is the temporary
CWnd objects allocated by MFC in response to various Windows events. Having
them in the way would make it impossible to troubleshoot heap problems. CRT
blocks are allocated by the library for internal use. For example, the
iostream library needs to allocate buffers as part of its initialization.
Previously, these buffers were reported as memory leaks if the program also
used MFC. MFC 4.0 solves this by marking the buffers as CRT blocks.
normal blocks are dumped in terms of address and size, along with a hex dump
of the first few bytes. client blocks are dumped by the CRT the same way as
normal blocks, but you can override this behavior by specifying a dump
function that will be called by the CRT for client blocks. Listing Five is a C
program that illustrates these features.
MFC sets up a client dump function. From MFC's point of view, a client block
contains a CObject. Custom versions of operator new() in CObject allocate the
memory as client. When the block must be dumped, MFC casts the base address of
the block to CObject* and, depending on the dump context's depth, either lists
the object's class obtained by a call to GetRuntimeClass() or calls the
object's Dump() method. This is the source of MFC's allergy to virtually
inherited CObjects. Objects are registered as client blocks, but the CObject
part is floating somewhere in the block, not at the start where MFC expects
it. Thus, calling virtual functions results in a crash.
There is an easy solution--override the derived class's operator new() and
make it use the global operator new() instead of CObject's. The memory block
will be registered as nonclient, and MFC won't later attempt to call virtual
functions on it. Of course, the drawback is that the object will be dumped in
hexadecimal. In other words, objects using virtual inheritance will receive
only second-class support form the framework.
Fortunately, there is a better solution. Since MFC wants a CObject at the
beginning of a client block, you can include an extra CObject at the beginning
of the block that will act as a proxy for the actual object. Actually, you
need two proxy classes--one for blocks awaiting construction, and another for
blocks that contain a fully constructed, dumpable object. The sole purpose of
the second proxy class is to delegate functions such as Dump(),
GetRuntimeClass(), and AssertValid() to the actual object.
Listing Two, class CVirtualObject, implements this concept. A class can be
virtually derived from CVirtualObject, or from any class that inherits from
CVirtualObject. In release builds, CVirtualObject does nothing different from
CObject. It could just as well be a typedef. In debug builds, however,
CVirtualObject replaces CObject's memory management. One of the new operators
is used to construct an object at a specific address; see Listing Three. It is
of no interest to us and is simply passed down to CObject. The other two
allocate enough memory for the object and a CBlockProxy by calling the
corresponding CObject::operator new, create the proxy at the beginning of the
block, and add it to a linked list of blocks for yet-to-be-constructed
objects. The proxy remembers the size of the allocated memory block. The new
operators then return the address of the first byte past the proxy. 
The constructor of CVirtualObject calls CBlockProxy::Resolve (Listing Four),
which searches the list of blocks waiting for construction. If the object
under construction is found in the list, the CBlockProxy is replaced with a
CObjectProxy. Of course, the two proxies must be of the same size. The address
of the CObject subobject (passed from the constructor) is stored in the proxy.
Note that the object can be absent from the list. For example, CVirtualObject
member variables will not be found in the list. Also, a list is needed as
opposed to a mere static variable because, in expressions such as new A(new
B), the two calls to new take place before the execution of the constructors.
The CObjectProxy simply delegates diagnostic functions to the actual object.
GetRuntimeClass() is also delegated because, when the dump context's depth is
equal to zero, the framework simply outputs the object's class. Finally, a
delete operator is needed in CVirtualObject because the base address of the
object is no longer the base address of the allocated block.
CVirtualObject won't help you virtually inherit from CDocument, but the same
technique could probably be used to build a CVirtualDocument class. It can
also be applied to problems not related to multiple inheritance; for example,
to make MFC diagnostics work with classes not derived from CObject (and
perhaps coming from another vendor). In such case, CVirtualObject could be
turned into a template.


Conclusion


MFC 4.0's cleaner design makes it possible to add many extensions to the
framework. However, it seems amazing that it is possible to reintroduce
support for multiple inheritance in a framework designed by people who overtly
contest the value of the feature. The code presented in this article allows
you to do just that.


For More Information


MFC 4.0
Microsoft Corp.
One Microsoft Way
Redmond, WA 98052
206-880-8080


Listing One
# define _VDECLARE_DESERIALIZER(class_name) \
friend CArchive& AFXAPI operator>>(CArchive& ar, class_name* &pOb); \
friend CArchive& AFXAPI operator>>(CArchive& ar, const class_name* &pOb) \
 { return ar >> const_cast<class_name*&>(pOb); }
# define _VIMPLEMENT_DESERIALIZER(class_name) \
CArchive& AFXAPI operator>>(CArchive& ar, class_name*&p) { \
 CObject* pOb = ar.ReadObject(NULL); \
 if (!pOb) p = NULL; \
 else if (!(p = dynamic_cast<class_name*>(pOb)))\
 AfxThrowArchiveException(CArchiveException::badClass); \
 return ar; }
# define VDECLARE_ABSTRACT_SERIAL(class_name) \
DECLARE_DYNAMIC(class_name) \
_VDECLARE_DESERIALIZER(class_name)
# define VIMPLEMENT_ABSTRACT_SERIAL(class_name, base_class_name) \
IMPLEMENT_DYNAMIC(class_name, base_class_name) \
_VIMPLEMENT_DESERIALIZER(class_name)
# define VDECLARE_SERIAL(class_name) \
DECLARE_DYNCREATE(class_name) \
_VDECLARE_DESERIALIZER(class_name)
# define VIMPLEMENT_SERIAL(class_name, base_class_name, wSchema) \
CObject* PASCAL class_name::CreateObject() \
 { return new class_name; } \
_IMPLEMENT_RUNTIMECLASS(class_name, base_class_name, wSchema, \
 class_name::CreateObject) \
_VIMPLEMENT_DESERIALIZER(class_name)

Listing Two
# ifdef _DEBUG
class CVirtualObject : public CObject
 {
 public:
 CVirtualObject();
 void* PASCAL operator new(size_t, void* p);
 void* PASCAL operator new(size_t size);
 void* PASCAL operator new(size_t size, LPCSTR file, int line);
 void PASCAL operator delete(void* p);
 static void AfxDoForAllObjects(void (*pfn)(CObject* pObject, 
 void* pContext), void* pContext);
 };
class CProxy : public CObject
 {
 protected:
 size_t size;
 union
 {
 CBlockProxy* next;
 CVirtualObject* object;
 };
 };
class CBlockProxy : public CProxy
 {
 public:
 CBlockProxy(size_t size);
 virtual void Dump(CDumpContext& dc) const;
 static void Resolve(CVirtualObject* p);
 private:

 static CBlockProxy* head;
 };
class CObjectProxy : public CProxy
 {
 public:
 CObjectProxy(CVirtualObject* p);
 virtual void Dump(CDumpContext& dc) const;
 virtual void AssertValid() const;
 virtual CRuntimeClass* GetRuntimeClass() const;
 };
# else
class CVirtualObject : public CObject { };
# endif

Listing Three
void* PASCAL CVirtualObject::operator new(size_t size, LPCSTR file, int line)
 {
 // allocate requested memory + space for header
 void* p = CObject::operator new(size + sizeof CProxy, file, line);
 // construct a dumpable object at start of block
 new (p) CBlockProxy(size);
 // return address of first byte past header
 return reinterpret_cast<CBlockProxy*>(p) + 1;
 }

Listing Four
CVirtualObject::CVirtualObject()
 {
 // object is now constructed & dumpable
 CBlockProxy::Resolve(this);
 }
void CBlockProxy::Resolve(CVirtualObject* p)
 {
 for (CBlockProxy **pn = &head, *node; node = *pn; pn = &node->next)
 if (reinterpret_cast<BYTE*>(node + 1) <= reinterpret_cast<BYTE*>(p)
 && reinterpret_cast<BYTE*>(p) < reinterpret_cast<BYTE*>(node + 1) 
 + node->size)
 {
 // object found; remove from unconstructed block list
 *pn = node->next;
 // destroy unconstructed block header
 node->~CBlockProxy();
 // construct a header for constructed block
 new (node) CObjectProxy(p);
 break;
 }
 }

Listing Five
# include <stdlib.h>
# include <crtdbg.h>
void dump_client(void* p, size_t s)
 {
 _RPT1(_CRT_WARN, " Integer: %d\n", *(int*) p); 
 }
int main()
{
 _CrtSetDumpClient(dump_client);
 *(int*) _malloc_dbg(sizeof(int), _IGNORE_BLOCK, __FILE__, __LINE__) = 1;

 *(int*) _malloc_dbg(sizeof(int), _CRT_BLOCK, __FILE__, __LINE__) = 2;
 *(int*) _malloc_dbg(sizeof(int), _NORMAL_BLOCK, __FILE__, __LINE__) = 3;
 *(int*) _malloc_dbg(sizeof(int), _CLIENT_BLOCK, __FILE__, __LINE__) = 4;
 _CrtDumpMemoryLeaks();
 return 0;
}
Output
Detected memory leaks!
Dumping objects ->
K:\test\leak.c(15) : {33} client block at 0x002D07A8, subtype 0, 4 bytes long.
 Integer: 4
K:\test\leak.c(14) : {32} normal block at 0x002D0768, 4 bytes long.
 Data: < > 03 00 00 00 
Object dump complete.

















































A TARGA Viewer in Borland Delphi


Supporting graphics formats not native to the Windows environment




Gunter Born


Gunter is a computer consultant and the author of more than 35 books,
including The File Formats Handbook (International Thomson Publishing, 1995).
He can be reached at 100322.1465@compuserve.com.


With the release of Windows 95 and the announcement of a 32-bit version of
Borland's Delphi, I've begun moving my Turbo Pascal DOS code--including the
TARGA viewer I present here--over to the Windows environment. The TARGA
format, developed by TrueVision to store color-map and true-color images,
supports a number of variants for different graphics modes. Delphi's ability
to quickly create Windows apps--coupled with the fact that my DOS source code
was easily portable--made Delphi a natural choice. In this article, I'll
describe the internals of the TARGA file format and the process I went through
to convert the viewer code to Delphi.


TARGA Files


All data in TARGA files are stored in Little-endian format (least-significant
byte first). A TARGA file consists of a header, followed by two optional data
areas (ID field and color map) and the image data; see Figure 1. In TARGA 2.0,
the image data may be followed by an optional appendix. The header is a fixed
area 18 bytes in length. While the header and the imagedata area are
mandatory, other components are either vendor specific or depend upon the type
of TARGA file. The 18-byte TARGA header contains all the information necessary
to decode and evaluate the image file; see Table 1. The first byte in the
header defines the length of the optional Image Identification field, which
begins at offset 12H. Some manufacturers store configuration-specific data in
this field, which may be between 0 and 255 bytes long. If the first header
byte is 0, there is no image-identification field and the optional color map
follows at offset 12H; see Figure 2. If a vendor needs more than 255 bytes,
the TARGA 2.0 format allows data to be stored after the image area. 
The offset from the beginning of the file to the first palette entry is
calculated by length image identification field+12H. The number of entries in
the color map and the entry size are stored in the header at offsets 05H and
07H. If an entry contains 32 bits, the first three bytes define the color
information for blue, green, and red, and the fourth byte is used as an
attribute. For 16-bit palette entries, the following bitwise coding applies
(A=attribute, R=red, G=green, and B=blue): A RRRRR GGGGG BBBBB.
TARGA files may contain images with or without a color map (palette). If the
second byte in the header is set to 1, a color map is stored after the header.
A 0 value indicates that the image data contains all necessary color
information. In addition, the TARGA format can store images of different
types, including monochrome, RGB, compressed, and uncompressed. Offset 03H
contains the TARGA image type, coded as shown in Table 1. Here, I describe the
file types 1, 2, 3, 9, 10, and 11. At offset 03H, there are five bytes
containing information about the optional color map. If a color map exists,
the data are appended after the Image Identification field. A color map may
contain up to 256 entries. 
A bit of trickery is used to define the colors in the file. The integer value
at offset 03H is used as an index into the color map. For example, a value of
5 defines the fifth entry in the color map as the first valid item. If the
value is 0, all entries in the color map should be used. The integer at offset
05H indicates how many entries in the color map are valid. This integer is set
to 100H (256) entries. A color map may contain 16, 24, or 32 bits per entry.
The size of a palette entry in bits is defined in the byte at offset 07H.
The coordinates of the image origin are defined in two words beginning at
offset 08H; see Table 1. The next two words define the image's width and
height in pixels. The number of colors for each pixel is stored in a byte at
offset 10H. TARGA files may use 1, 8, 16, 24, or 32 bits for each pixel. Those
with 32 bits per pixel use the highest byte as an attribute (alpha or
transparency information). The number of attribute bits used is defined in the
image-descriptor flag, which is he last byte in the header and defines how the
image should be interpreted; see Table 2. Bits 4 and 5 in the image-descriptor
flag define the screen origin, which may be any of the four corners of the
screen. Most applications set the origin to the upper- or lower-left corner.
The Screen Origin must be set to 0 for TrueVision images. Type 1 images should
set the Image Descriptor Flag to 0. The value for the attribute (bits 0
through 3) should be set to 0 or 1 for 16-bit TARGA files, 0 for 24-bit TARGA
files (since they have no attributes), and 8 for 32-bit TARGA files.


The Image-Data Area


The image-data area follows the color map. Because the Image Identification
field and the color map are optional, the first image data start at offset
12H+length image identification field + length color map. The image area
contains width*height pixels with a color depth (monochrome or 256, 32,767, or
16 million colors). The pixel data may be stored compressed or uncompressed.
The monochrome image (types 3 and 11) uses two colors (black and white), so
the color map is obvious. All image data are stored contiguously in the data
area. Type 3 uses no compression for the image data, while type 11 defines a
run length encoding (RLE) compression scheme; see Figure 3. A 16x16 image is
defined by 256 pixels. With one bit per pixel, the data area needs only 32
bytes.
With type 1 uncompressed color-map images, n bits per pixel are stored. Each
entry acts as an index into the color map of the graphics card. The color map
contains 256 entries of 16, 24, or 32 bits, defining 256 colors. If not all
colors are used, the entries in the color map are set to 0 (black).
The TARGA format recognizes various compression schemes. The most popular,
RLE, is used for the image type 9. The image data (eight bits per pixel) are
packed in the record structure in Figure 3. Each record contains a header,
followed by one or more data bytes. If the most significant bit in the header
is set to 1, an RLE record is used. This record type defines a pattern of n
identical data bytes, where n is defined by the remaining 7 bits of the
header. To calculate n, use the 7 remaining bits and add 1. The result is a
value between 1 and 128. The next byte should be repeated n times in the
output buffer. For example, the hexadecimal sequence 82 02 will be expanded to
02 02 02 (82H results in n=3). 
The data of an expanded RLE record may extend into the following image line.
If the image-data bytes cannot be compressed, they are stored in a raw-data
record that contains a header byte with the most significant bit set to 0; see
Figure 3. The remaining seven bits of the header byte contain a counter
defining the number of bytes--1 following in the record. These data bytes
should be copied to the output buffer. For example, the code sequence 85
0F--02 03 00 0F--83 33 should be expanded to 0F 0F 0F 0F 0F 0F--03 00 0F--33
33 33 33. 
The "--" character marks the records in the RLE and expanded sequence. The
same compression scheme is used for monochrome images (type 11). The
pseudocode sequence in Example 1 shows the RLE decoding. The header of a
record is always a byte. The pixel size depends on the image type (1, 2, 3, or
4 bytes).


Implementing a TARGA Viewer in Delphi


Delphi supports the display of bitmap images including BMP, ICO, and WMF files
through the image control. To display an image in this control, you simply
call the LoadFromFile method, which automatically handles the details of
loading and displaying images within the image control. Thus, as Figure 4
illustrates, creating a BMP viewer in Delphi requires just a few lines of
code. Unfortunately, the image control does not support graphics file formats
not native to Windows, such as TIFF, PCX, and TARGA.
The Delphi documentation indicates that you can define your own objects to
extend the LoadFromFile function. Unfortunately, this task is nontrivial, and
no examples demonstrate the process. After some head scratching, I decided to
create a bitmap structure and read the TARGA data into it. I first tried
creating a TBitmap object and adding my code. The TBitmap object allows you to
manipulate images in the canvas area and encapsulates the Windows HBitmap and
HPalette structures necessary to fill in the data from the TARGA file.
Unfortunately, the result was extremely slow code; in some cases, it took more
than a minute to display a 200x200-pixel image.
To get around this problem I used the original Windows bitmap structures,
which resulted in a lot of hand coding without any Delphi support. My solution
was to port my Turbo Pascal DOS code to a unit which reads the TARGA file and
stores the converted data in a temporary BMP file. I was then able to use the
Delphi functions to display the file. The TGAtoBMP unit in Listing One
contains no object-oriented code and also may be used in older versions of
Borland Pascal. The conversion routine may be called as
status:=ConvertTGABMP(Infile, Outfile). The two string parameters are
filenames. Infile defines the name and location of the TARGA graphics file,
and Outfile defines the path and name of the resulting BMP-file. Upon
successful conversion, the function returns status=0 and the BMP-file will be
available in the location defined in Outfile. Listing Two demonstrates how to
use the function and is based on the reduced ImageView example from the Delphi
package, with a few lines added for the TARGA support. The complete code and
resource files for the TARGA viewer are available electronically; see
"Availability," page 3.


References


Born, Gunter. The File Formats Handbook. London, U.K.: International Thomson
Publishing, 1995.
Figure 1: Structure of a TARGA file.
Figure 2: TARGA file as a hex dump.
Figure 3: RLE record format.
Figure 4: Displaying a BMP file under Delphi.
var Image 1: TImage;
 ....
Image1.Picture.LoadFromFile (Filename);

ViewForm.Image1.Picture:= Image1.Picture;
Table 1: Structure of a TARGA header.
Offset Bytes Remarks
00H 1 Length of image ID field (0-255 bytes, 0 = no ID field)
01H 1 Color-map type (0 = no color map, 1 = color map in file)
02H 1 TARGA-image type
 0 = No image data in file
 1 = Color-map image, uncompressed
 2 = RGB image (24 bit), uncompressed
 3 = Monochrome image, uncompressed
 9 = Color-map image, RLE compression
 10 = RGB image (24 bit), RLE compression
 11 = Monochrome image, RLE compression
 32 = Color-map image, Huffmann, Delta, and RLE compression
 33 = Color-map image, Huffmann, Delta, and RLE compression 
 (4-pass quadtree)
03H 2 Color-map origin (Integer)
05H 2 Color-map length (Integer)
07H 1 Color-map entry size (16, 24, 32)
08H 2 x-coordinate origin
0AH 2 y-coordinate origin
0CH 2 Image width in pixels
0EH 2 Image height in pixels
10H 1 Bits per pixel (1, 8, 16, 24, 32)
11H 1 Image-descriptor flag
12H n Image-identification field (optional)
..H n Color map (optional)
Table 2: Coding an image-descriptor flag.
Bit Remarks
0-3 Attribute = bits per pixel
4-5 Screen origin 00 = Lower-left corner 01 = Lower-right corner 10 =
Upper-left corner 11 = Upper-right corner
6-7 Interleave 00 = No interleave 01 = Interleave odd/even 10 = Interleave
quad 11 = Reserved
Example 1: RLE decoding.
get header byte
n := (Byte and 3FH) + 1
IF (Byte and 80H) > 0 THEN
 get next pixel in RLE record
 output pixel n times
ELSE
 FOR k := 1 TO n
 get raw pixel
 output pixel
 END FOR k
END IF

Listing One
unit TGAToBMP;
{
 Warranty and Copyright Disclaimer:
 TGA to BMP conversion routine, implementation for Borland
 Delphi by: (c) Guenter Born - Version 17-July-1995
 The TARGA and BMP specification may be found in:
 The File Formats Handbook - by Gunter Born, 1300 pages,
 published by International Thomson Publishing,
 ISBN 1-85032-117-5
 This code comes unsupported AS IS without any warranty.
 The code was written as an example for my article in the
 Dr. Dobb's Journal of Software Tools.
 Users are given the right to use/modify and distribute this

 source code as long as this disclaimer are included and
 credits are given.
}
interface
function ConvertTGABMP (Infile, Outfile: string): Byte;
{
 Infile : Input TARGA File
 Outfile: Output BMP File
 Return: 0 conversion successful
 1 can't open input file
 2 can't create output file
 3 TARGA type not implemented (Header check)
 4 Data processing error
}
implementation
uses WinTypes, WinProcs, Classes, Graphics, Forms, Controls,
 FileCtrl, StdCtrls, ExtCtrls, Buttons, Spin;
 const
 MAX_WIDTH = 8000; { max. ImageWidth in Byte }
 MAX_BLOCK = 4096; { Input Buffer size }
 type
 block_array = array [0..MAX_BLOCK] of byte;
 line_array = array [0..MAX_WIDTH] of byte;
 name = String [255];
 TGA_header = record
 ID_Size: Byte; { Length ID-field }
 Pal_flag: Byte; { Flag indicates a palette }
 Typ : Byte; { Image Type Mono, Color }
 Pal_index: Word; { Index into Palette }
 Pal_size: Word; { No. of palette entries }
 Pal_entry_size: Byte; { Size of a pal. entry }
 leftx: Word; { Coordinate X0 }
 lefty: Word; { Coordinate Y0 }
 ImageWidth: Word; { Image width }
 ImageHeight: Word; { Image height }
 Bits_per_pixel:Byte; { Bits per Pixel }
 Image_descr:Byte; { Image descriptor byte }
 end;
var
 RawData: block_array; { Input buffer raw data }
 NextByte: integer; { Index in raw buffer }
 Data: byte; { current ras data byte }
 Header: TGA_header; { TGA File Header }
 IMGline: line_array; { buffer Output data }
 Index: integer; { IMGline Index }
 TGABytes_per_Line : word; { Byte per line TGA }
 BMPBytes_per_Line : word; { Byte per line BMP }
 Bits_per_pixel: byte;
Function OPEN (Var handle : file; Filename: name) : boolean;
{--------------------------------------------------------}
{ Open File for Input Function }
{--------------------------------------------------------}
begin
 AssignFile (handle,Filename);
 {$I-} {* Errorcheck off *}
 Reset(handle,1); {* open file *}
 {$I+} {* Errorcheck on *}
 Open := IOResult = 0;
end;

Function OPENNew (Var handle : file; Filename: name) : boolean;
{--------------------------------------------------------}
{ Open File for Output Function }
{--------------------------------------------------------}
begin
 Assign(handle,Filename);
 {$I-} {* Errorcheck off *}
 Rewrite(handle,1); {* open file *}
 {$I+} {* Errorcheck on *}
 OpenNew := IOResult = 0
end;
function ProcessHeader (Var FIn,FOut: File): Boolean;
{--------------------------------------------------------}
{ Read TGA Header data and create a BMP Header }
{--------------------------------------------------------}
 const
 Red = 0; { constants for Palette }
 Green = 1;
 Blue = 2;
 Attr = 3;
 var
 Ofs: LongInt;
 I, count: Integer;
 BmHeader : TBitmapInfoHeader; { BMP-Header }
 BitFile: TBitmapFileHeader; { BMP File Header }
 BMP_pal: TRGBQuad; { Palette BMP }
 TGApal_array : array [Red .. Attr] of Byte; { buffer palette}
begin
 {$I-}
 BlockRead (FIn, Header, SizeOf(Header)); { get TGA Header }
 {$I+}
 IF IOResult <> 0 Then
 begin
 ProcessHeader := false;
 exit;
 end;
 IF NOT (Header.Typ IN [1, 2, 3, 9, 10, 11]) Then
 begin { sorry TARGA types 32, 33 not implemented yet }
 ProcessHeader := false;
 exit;
 end;
 IF NOT(Header.Bits_per_pixel IN [1, 8, 24]) Then
 begin { TARGA types with 16/32 Bit/pixel not implemented yet }
 ProcessHeader := false;
 exit;
 end;
{
 calculate the number of Bytes per line (for n color planes)
}
 CASE Header.Bits_per_pixel OF
 1: TGABytes_per_line := Header.ImageWidth div 8;
 8: TGABytes_per_line := Header.ImageWidth;
 24: TGABytes_per_line := Header.ImageWidth * 3;
 end;
 BMPBytes_per_line := TGABytes_per_line;
 count := BMPBytes_per_line Mod 4; { adjust length to 32 Bit }
 IF count <> 0 THEN
 INC (BMPBytes_per_line,(4-count));
{

 construct the BitmapInfoHeader
}
 BmHeader.biSize := Sizeof(TBitmapInfoHeader);
 BmHeader.biWidth := Header.ImageWidth;
 BmHeader.biHeight := Header.ImageHeight;
 BmHeader.biPlanes := 1; { set always to 1 }
 BmHeader.biBitCount := Header.Bits_per_Pixel;
 BmHeader.biCompression := BI_RGB; { no compression }
 BmHeader.biSizeImage := 0; { valid, no compress. used }
 BmHeader.biXPelsPerMeter :=0; { not used yet }
 BmHeader.biYPelsPerMeter :=0; { ditto }
 BmHeader.biClrUsed := 0; { all colors are used }
 BmHeader.biClrImportant := 0; { all colors important }
{
 construct the BitmapFileHeader
}
 BitFile.bfType := $4D42; { Signature 'BM' }
 BitFile.bfReserved1 := 0; { not used }
 BitFile.bfReserved2 := 0; { dito }
 BitFile.bfOffBits := (256 * SizeOf(TRGBQuad))+ { Offset to Data }
 sizeof(TBitmapFileHeader)+
 sizeof(TBitmapInfoHeader);
 BitFile.bfSize := (256 * SizeOf(TRGBQuad)) + { File size }
 sizeof(TBitmapFileHeader) +
 sizeof(TBitmapInfoHeader) +
 (Header.ImageHeight*BMPBytes_per_Line);
 {$I-}
 BlockWrite (FOut, BitFile, SizeOf(BitFile)); { write BMP-File Header }
 {$I+}
 IF IOResult <> 0 Then
 begin
 ProcessHeader := false;
 exit;
 end;
 {$I-}
 BlockWrite (FOut, BmHeader, SizeOf(BmHeader)); { write BitmapInfoHeader }
 {$I+}
 IF IOResult <> 0 Then
 begin
 ProcessHeader := false;
 exit;
 end;
{ *** process the color palette if available *** }
 IF (Header.Pal_flag <> 0) THEN
 begin
 Ofs := SizeOf(Header) + Header.ID_Size; { Offset to palette }
 SEEK (FIn, Ofs); { set to Palette }
 count := Header.Pal_entry_size div 8; { Byte per Pal entry }
 FOR i := 1 to Header.Pal_size do { process all entries }
 begin
 {$I-}
 BlockRead (FIn, TGAPal_array[0],count); { get one plaette entry }
 {$I-}
 IF IOResult <> 0 Then
 begin { something wrong }
 ProcessHeader := false;
 exit;
 end;
 { Write the BGR palete information to this file}

 IF count > 2 Then
 begin
 BMP_pal.rgbBlue := TGAPal_array[Red];
 BMP_pal.rgbGreen := TGAPal_array[Green];
 BMP_pal.rgbRed := TGAPal_array[Blue];
 end
 ELSE { 16 Bit Palette Entry -> create 3 Byte Attribute }
 begin{ Entry is ARRRRRGGGGGBBBBB coded }
 BMP_pal.rgbBlue := (TGAPal_array[1] SHR 2) AND $1F; { Red }
 BMP_pal.rgbGreen := (TGAPal_array[0] SHR 5)+ { Green }
 + (TGAPal_array[1] AND $03) SHL 3;
 BMP_pal.rgbRed := TGAPal_array[0] AND $1F; { Blue }
 end;
 { in 32 Bit Palette entry, clear attribute byte always !! }
 BMP_pal.rgbReserved := 0; { 4th byte in Palette }
 {$I-}
 BlockWrite (FOut, BMP_pal,SizeOf(TRGBQuad)); { write Palette }
 {$I+}
 IF IOResult <> 0 Then
 begin
 ProcessHeader := false;
 exit;
 end;
 end; { FOR }
 end; { Process Palette }
{ *** calculate Offset TGA Image Data *** }
 Ofs := SizeOf(Header)+
 Header.ID_Size;
 IF Header.Pal_flag <> 0 THEN
 Ofs := Ofs + Header.Pal_size*count; { add Pal size to Offset }
 SEEK (FIn, Ofs); { set to 1st Image Byte }
end; { ReadHeader }
procedure ReadByte (Var FIn : File);
{--------------------------------------------------------}
{ get a byte from the buffer, if buffer empty, fill }
{--------------------------------------------------------}
var
 NumBlocksRead: integer;
begin
 if NextByte = MAX_BLOCK then { buffer empty ? }
 begin
 {$I-} { Yes, read next block }
 BlockRead (FIn, RawData, MAX_BLOCK, NumBlocksRead);
 {$I+}
 IF IOResult <> 0 THEN
 begin
 Close (FIn);
 exit;
 end;
 NextByte := 0; { reset buffer pointer }
 end;
 data := RawData [NextByte]; { return next byte }
 inc (NextByte); { pointer to next entry }
end; { ReadByte }
Function put_line (Var FOut: File; count: Integer): Boolean;
{----------------------------------------------------------}
{ write a uncompressed line into the BMP file }
{----------------------------------------------------------}
Var X : Integer;

 begin
 {$I-} { Write the buffer }
 BlockWrite(FOut, IMGLine[0], count);
 {$I+}
 IF IOResult <> 0 THEN
 put_line := false
 ELSE
 put_line := true;
end; { Put_line }
function ProcessData (Var FIn, Fout : File): Boolean;
{--------------------------------------------------------}
{ read the TGA-data step by step, decode the data and }
{ write 1, 8 and 24 Bit per Pixel to the BMP file }
{--------------------------------------------------------}
var
 count: word; { repeat variable }
 i,j,k: integer;
 pixel: array [0..2] of Byte; { 24 Bit Data buffer }
begin
 NextByte := MAX_BLOCK; { clear input buffer!}
{
 Attention: this implementation defines the image lines
 from top to bottom. TGA may have an invers order for the image lines.
}
 FOR j := 1 TO Header.ImageHeight do { all lines }
 begin
 index := 0; { Reset output ptr }
{------------------------------------------------------}
{ TGA decoding starts here }
{------------------------------------------------------}
 WHILE (index < TGABytes_per_line) Do { all bytes in line }
 begin
{ *** RLE compression for 1 and 8 Bit per Pixel *** }
 IF Header.typ IN [9, 11] Then
 begin
 ReadByte (FIn); { get 1st Recordbyte }
 count := (data AND $7F) + 1; { get count value }
 IF (data AND $80) > 0 THEN
 begin { RLE-Record }
 ReadByte (Fin); { get pixel value }
 FillChar (IMGLine[index],count,data); { expand }
 INC (index, count); { index ++n }
 end
 ELSE { RAW-Record }
 FOR k := 1 TO count do
 begin
 ReadByte (FIn); { get pixel data }
 ImgLine[index] := data; { copy to output buf }
 INC (index, 1); { index ++ }
 end;
 end; { RLE 1,8 Bit...}
{*** RLE compression for 24 Bit/Pixel, 16/32 Bit not supported *** }
 IF Header.typ IN [10] Then
 begin
 ReadByte (FIn); { get 1st Recordbyte }
 count := (data AND $7F) + 1; { get count value }
 IF (data AND $80) > 0 THEN
 begin { RLE-Record -> 24 Bit per Pixel, process 3 Byte }
 For i := 0 To 2 Do { get 3 Byte }

 begin
 ReadByte (Fin); { get value }
 pixel[i] := data; { store value }
 end;
 For i := 1 To count Do { expand n pixel }
 begin
 MOVE (pixel[0],IMGLine[index],3);
 INC (index,3); { index ++3 }
 end;
 end
 ELSE { RAW-Record }
 FOR k := 1 TO count*3 Do { 3 Byte per Pixel }
 begin
 ReadByte (Fin); { get value }
 IMGLine[index] := data; { store value }
 INC (index, 1); { index ++1 }
 end;
 end; { 24 Bit RLE }
 IF Header.typ IN [1, 2, 3] Then { no Compression }
 begin
 ReadByte (FIn); { get TGA-Pixel }
 IMGLine[index] := data; { copy to output buf }
 INC (index,1); { index ++ }
 end;
 end; { While }
 IF BMPBytes_per_line<>TGABytes_per_line THEN
 FillChar (IMGLine[index], { clear last n bytes }
 BMPBytes_per_line-TGABytes_per_line,0);
 IF Not put_line (FOut,BMPBytes_per_Line) Then { Write line }
 begin
 ProcessData := false;
 exit;
 end;
 end; { FOR j .... }
end; { ProcessData }
function ConvertTGABMP (Infile, Outfile: string): Byte;
{-------------------------------------------}
{ This is the root of the TGA BMP converter }
{-------------------------------------------}
Var
 FromF : File;
 ToF : File;
 status : byte;
begin
 status := 0;
 IF NOT Open (FromF, InFile) Then { Open Input file }
 status := 1 { Error during open }
 ELSE
 IF NOT OpenNew (ToF, OutFile) Then { Open Output file }
 status := 2 { Error during open }
 ELSE
 IF NOT ProcessHeader(FromF, ToF) THEN { process TGA Header }
 status := 3 { Error in Header }
 ELSE
 IF NOT ProcessData (FromF, ToF) THEN { process TGA Data }
 status := 4;
 close (FromF);
 close (ToF);
 ConvertTGABMP := status; { success status }

end; { ConvertTGA }
end.

Listing Two
unit ImageWin;
{
 This is the unit to display BMP, ICO, WMF and TGA graphics.
 The code was taken from the Borland Delphi example IMAGEVIEW and was 
 changed by Guenter Born. Note: this unit needs the TGAToBMP conversion unit.
}
interface
uses WinTypes, WinProcs, Classes, Graphics, Forms, Controls,
 FileCtrl, StdCtrls, ExtCtrls, Buttons, Spin, TGAToBmp;
type
 TImageForm = class(TForm)
 DirectoryListBox1: TDirectoryListBox;
 DriveComboBox1: TDriveComboBox;
 PathLabel: TLabel;
 FileEdit: TEdit;
 Panel1: TPanel;
 FileListBox1: TFileListBox;
 Label1: TLabel;
 ViewBtn: TBitBtn;
 Bevel1: TBevel;
 FilterComboBox1: TFilterComboBox;
 StretchCheck: TCheckBox;
 Image1: TImage;
 procedure FileListBox1Click(Sender: TObject);
 procedure ViewBtnClick(Sender: TObject);
 procedure StretchCheckClick(Sender: TObject);
 procedure FileEditKeyPress(Sender: TObject; var Key: Char);
 end;
 name = string [255];
var
 ImageForm: TImageForm;
implementation
uses ViewWin, SysUtils;
{$R *.DFM}
procedure TImageForm.FileListBox1Click(Sender: TObject);
{ call if FileListBox is checked by a user }
var
 FileExt: string[4];
 FileName: string [255];
 TMPFileName: String [255];
 status : byte;
begin
 FileExt := UpperCase(ExtractFileExt(FileListBox1.Filename));
 if (FileExt = '.BMP') or (FileExt = '.ICO') or (FileExt = '.WMF') then
 begin
 Image1.Picture.LoadFromFile(FileListBox1.Filename);
 Label1.Caption := ExtractFilename(FileListBox1.Filename);
 if (FileExt = '.BMP') then
 begin
 { show image height and width in window }
 Label1.Caption := Label1.Caption +
 Format(' (%d x %d)', [Image1.Picture.Height, Image1.Picture.Width]);
 ViewForm.Image1.Picture := Image1.Picture; { show picture as icon }
 end;
 if FileExt = '.ICO' then Icon := Image1.Picture.Icon;

 if FileExt = '.WMF' then
 ViewForm.Image1.Picture.Metafile := Image1.Picture.Metafile;
 end
 else
{ Here comes the extension for the TARGA format. Note: to avoid all this 
 dirty BitBld stuff I read the TARGA file, convert it and store it as
 a BMP-File. This file is easy to load and display with delphi.}
 if FileExt = '.TGA' then
 begin
 FileName := FileListBox1.Filename; { Get Input Filename }
 TMPFileName := 'TMP.BMP'; { Set Output Filename }
 status := ConvertTGABMP(FileName,TMPFileName); { Conversion TGA->BMP }
 IF status = 0 Then { conversion was succesful }
 begin
 Image1.Picture.LoadFromFile(TMPFileName); { Load BMP File }
 { show image height and width in window }
 Label1.Caption := ExtractFilename(FileListBox1.Filename);
 Label1.Caption := Label1.Caption +
 Format(' (%d x %d)', [Image1.Picture.Height,
 Image1.Picture.Width]);
 ViewForm.Image1.Picture := Image1.Picture;
 end
 ELSE
 begin { display conversion error }
 { show image height and width in window }
 Label1.Caption := ExtractFilename(FileListBox1.Filename);
 Label1.Caption := Label1.Caption +
 Format(' Error: (%d )', [status]);
end;
 end; { TARGA Branch }
end;
procedure TImageForm.ViewBtnClick(Sender: TObject);
begin
 ViewForm.HorzScrollBar.Range := Image1.Picture.Width;
 ViewForm.VertScrollBar.Range := Image1.Picture.Height;
 ViewForm.Caption := Label1.Caption;
 ViewForm.Show;
end;
procedure TImageForm.StretchCheckClick(Sender: TObject);
begin
 Image1.Stretch := StretchCheck.Checked;
end;
procedure TImageForm.FileEditKeyPress(Sender: TObject; var Key: Char);
begin
 if Key = #13 then
 begin
 FileListBox1.ApplyFilePath(FileEdit.Text);
 Key := #0;
 end;
end;
end.








































































PROGRAMMING PARADIGMS


Goal-Centered User-Interface Design




Michael Swaine


Here's the problem: As programmers, we build programs function by function.
Users, however, want to use software in a way that enables them to achieve
their goals. It's an unfortunate fact that the view of software you get from
looking at it function by function is likely to be orthogonal to the view you
get from looking at it in terms of user goals.
Complicating the problem is the fact that the user's goals are probably not
what we think they are, and probably not even what the users say they are.
Here's the thesis: The user interface should be designed on the basis of the
user's goals, not on the basis of the internal functional structure of the
program. That's probably not too shocking a thesis, but it's rarely followed
with any real rigor. In fact, it's violated regularly, even in shrinkwrapped,
commercial software from established companies. It's easy, though, to state
the programmer's goals, or rather the user-interface designer's goals, that
are implied by this thesis:
Discern the user's goals.
Design the user interface so that the user can directly employ the program to
achieve those goals.
Easy to state, not so easy to achieve. But help has arrived.
The thesis has, over the past few years, motivated much of the user-interface
design work of Alan Cooper, including his design of the user interface for
Visual Basic. Cooper, a well-known user-interface consultant and software
designer, is the recognized "Father of Visual Basic" and the author of About
Face: The Essentials of User Interface Design (IDG Books, 1995, ISBN
1-56884-322-4), a book based on the concept of goal-centered design for user
interfaces. It's a book that should be read by every...
But let's hear who the author thinks should read it.


About Face


Authors' expressed ideas about who is likely to profit from reading their
books ought to be taken with a grain of salt. For one thing, many authors
honestly and fervently believe that everyone would be better off if they would
just commit the author's books to memory, or at least read a few choice lines
before breakfast every day. Not all authors are this self-assured, but even if
they aren't, their publishers are unlikely to let them include introductory
notes characterizing the potential market for their books so cautiously that
the bookstore browser gets the idea that the book really isn't for him unless
he was one of a dozen aerospace engineers working on a particular project at
NASA Ames Research Center in the late '80s.
Cooper's views on who should read his book are something different. And I'm
not just talking about his "Who should read this book" section in the
"Introduction."
Throughout the book, Cooper implicitly defines a new job classification--the
professional user-interface designer. This is not surprising when you consider
that he explicitly defines this new career path every day for his staff at
Cooper Software. These professional user-interface designers are the ideal
readers for the book. Ideal, but rare. Since there are about as many
professional user-interface designers today as there were aerospace engineers
working on a particular project at NASA Ames Research Center in the late '80s,
I'm sure his publisher is glad he didn't explicitly characterize his
readership that way.
Explicitly, he says that the book should be read by those programmers,
documentation writers, trainers, and technical-support people who are aware of
and concerned about the need for better software design. Personally, I think
that everyone should be concerned about it; or, to put it another way, I think
everyone would be better off if they would just commit the entirety of About
Face to memory.


Strategy and Tactics


Magazine art directors and automotive designers know that you can talk about
design at different levels. There's the large-vision level, where you lay out
broad strategies and concepts, and there's the specific-detail level, where
decisions are more tactical. About Face tackles user-interface design at both
levels. Cooper makes a point of presenting both strategic and tactical advice.
"There is no such thing," he says, "as an objectively good dialog box--the
quality depends on...who the user is and what his background and goals are."
What's good tactically is a function of strategic considerations.
Getting the result you want, Cooper says, requires that you maintain "a
strategic sensitivity for how users interact with specific software. This will
enable you to correctly choose the appropriate tactics to apply in a
particular situation."


The Master's Voice


This is a very readable book, primarily because it so well thought out and
because the voice is so clear. Cooper makes strong assertions that, even if
you disagree with them, really focus attention on important design issues. I
find myself agreeing with most of them. A few examples: 
Of all the misconceptions to emerge from Xerox PARC, the global metaphor myth
is the most debilitating and unfortunate.
The Mac didn't succeed because of [the] metaphors...but because it was the
first computer that defined a tightly restricted vocabulary...for
communicating with users.
I believe that, with proper design, all error messages and confirmation
dialogs can be eliminated. I'll tell you how.
Today's Help menus are poorly designed reflections of poor help systems.
The canonical mode of the book, though is imperative, not declarative. For
example:
Allow input wherever you output.
Never make the user ask to ask.
Never scroll text horizontally.
Debounce all drags.
And this one:
Don't make the user look stupid. In the course of this book we will examine
numerous ways in which existing software makes the user look stupid and
explore ways to avoid that trap.


A Real Pain to Use



And he does. Cooper uses real, commercial software in his examples. Since he
often seriously criticizes particular aspects of the programs' user-interface
design, this can be--er, interesting. Cooper can be brutal. 
To demonstrate the danger of bending your interface to fit a metaphor, he rips
apart the MagiCap interface from General Magic, which he calls, in only
apparent praise, "the acme of the expression of the metaphoric paradigm."
Nothing in the program is done without a thorough metaphoric
rationalization...All of the interaction has been subordinated to the
maintenance of these metaphors. I'm sure it was a lot of fun to design. I'll
bet it is a real pain to use.
He points out, for example, that when you want to call people, you must go
down the metaphorical street to the metaphorical phone-company building, and
metaphorically enter it. You must do this every time you want to call someone.
The metaphor is a mechanical-age model, and it imposes all the limitations of
the mechanical age on a device that is supposed to be an information-age
communicator. Is this progress, Cooper wonders.


Keep it by Your Keyboard


One of the things I like best about the book is Cooper's willingness to
neologize. He creates new terms whenever he needs them, and I'll let you read
for yourself his justification for doing so. 
I'll also let you read the other topics that the book covers and that are not
touched on in this brief account, including his "Irreverent History of
Rectangles on the Screen," "Overhead and Idiocy," "The Secret Weapon of
Interface Design," "The Meaning of Menus," "Dialog Box Etiquette," and other
entertaining and enlightening subjects.
Although Cooper tries to credit the originators of ideas, I feel compelled to
point out Ted Nelson has been talking about the myth of metaphors and the
tragedy of files, in almost those words, for years.
Quibbles aside, About Face is a must-read for any software designer or anyone
interested in software design.


Blowin' in the Wind


The rest of this column is the second installment in my survey of programming
paradigms.
This survey began last month by looking at some implementations of languages
characterized by a common feature: They are not C. As it happens, they have a
couple of other things in common: The languages or some of their
implementations are in the public domain or are distributed as freeware or
shareware, and they are available on the Macintosh. If I live long enough and
get my Intel box back in action, I'll eventually move on to languages with
implementations on Windows, although most languages do have implementations on
several platforms. The ones I'm looking at now I've examined on the Mac
platform. This month continues the survey with a language that is definitely
not C and that does have free implementations. It's available on the Mac, but
not as available as had once been expected, and thereon hangs a tale. 
Dylan was to be Apple's effort to change the nature of programming languages.
Although it was never strictly an Apple project, Apple funded a team of
researchers at Cambridge and started promoting Apple Dylan. Then, last fall,
with profits down 40 percent, sales projections disastrously muffed,
PowerBooks bursting into flames, and a president in trouble (no, no, Apple's
president), Apple took decisive action. It canceled all funding for its Dylan
team. A banner headline at the Cambridge Website says it: "Apple: The power to
cancel your very best."


What was it You Wanted?


Just a few months ago, Apple was waxing rhapsodic about Dylan. Whatever you
wanted in a language, Dylan had it, or so Apple was saying. Here's what Apple
said right up to the day of canceling the project: 
Dylan is a new language developed at Apple. It is a bold new effort to create
a powerful, practical tool for writing mainstream commercial applications. We
believe it combines the best qualities of static languages with the best
qualities of dynamic languages.
Apple may still be saying that Dylan is a bold new effort, et cetera. It's
just a little too bold for the Apple of 1996, apparently. Dynamic languages,
of course, have the virtue of letting you program in an interactive style,
leading to rapid development and prototyping. They are typically more robust
against errors because they retain more information at run time. They
typically have automatic memory management. Their interactive style may be
implemented as an interpreter, a byte-code compiler, or an incremental native
compiler. They typically have dynamic type information. All this adds up to
code that's easy to read, write, and maintain.
The virtues of a static language, of course, are small, fast applications and
the fact that you had it hammered into your head in school to the extent that
you now naturally think in C--I mean, in a static language. You may have seen
demos of Dylan; I've been seeing them for some time now, and they are
impressive, at least when they're done by one of those professional Apple
demoers. Actually, even playing around with the toy implementations leaves
one--this one anyway--pretty impressed.
Dylan really is cool.
Dylan is actually an object-oriented, dynamic language--a language that has
automatic memory management, supports dynamic linking and incremental
development, and provides self-identifying objects, according to Joseph Dumas
and Paige Parsons, writing about it in Communications of the ACM, Vol. 38 #6,
June 1995.
Many of the promised benefits of Dylan are features of the proposed
development environment, like storing your project in a database rather than
in files, customizable browsers, and the ability to inspect objects in your
program while it's running. Dumas and Parsons point out that this is common:
The popularity of Visual Basic is probably not attributable to the love we all
harbor for the Basic language. Dylan has undergone a lot of change in response
to feedback in the past four years. By the time you read this, a definitive
reference specification known as the Dylan Reference Manual may be available
online. Or then again, it may not. It's not clear at this writing how sweeping
the effect of the Dylan cancellation will be. But the Dylan language design is
in the public domain.


I Shall be Released


All implementations of Dylan as of late 1995 were toy or experimental
implementations, strictly for exploration and hacking. Chief among these are
Marlais, an experimental Dylan interpreter written in C and available for
UNIX, Macintosh, and Windows; and Mindy, an experimental byte-code compiler
and interpreter written in C at Carnegie-Mellon University and available for
UNIX and Macintosh.
I've played with MacMarlais, a port of the UNIX program Marlais to the
Macintosh, created by Patrick Beard. It implements a subset of the Dylan infix
syntax. MacMarlais uses multithreading to make the user interface responsive
even while the interpreter is running. It handles Apple Events, and you can
add code resources to it to extend its functionality. Marlais is available
from ftp://ftp.cis.ufl.edu/pub/Marlais/, and MacMarlais is available from
ftp://ftp.bdt.com/home/beard/. Mindy was developed by the Gwydion Project at
Carnegie Mellon University for internal use as a development tool while they
work on their "real" Dylan implementation.
Mindy is an acronym meaning "Mindy Is Not Dylan Yet", and as the name implies,
Mindy is incomplete. The Gwydion folk claim, though, that it implements most
of what they expect Dylan to include. Gwydion, their high-quality, integrated
Dylan development environment for UNIX, should be out this year. Check it out
at http://legend.gwydion.cs.cmu.edu:8001/gwydion/. Mindy is available by
anonymous ftp at legend.gwydion.cs.cmu.edu in the file
/afs/cs.cmu.edu/project/gwydion/release/mindy.tar.gz.
Also look for an implementation from Harlequin early this year. Harlequin has
been developing DylanWorks, a dynamic development environment and native
compiler for the Dylan language. When released, Harlequin expects it to
produce small, fast, commercial-quality applications. DylanWorks will
initially be released for Windows 95 and Windows NT, and will provide full
interoperability with OLE and Win32 functionality. You can find out about it
at http://www.harlequin.com/full/dylan.html. 
Whether or not Dylan survives Apple's dropping of its project, it seems likely
that we will see the best features of Dylan migrate into traditional language
compilers and development environments.
Dylan information is at this writing available from
ftp://ftp.cambridge.apple.com:/pub/dylan/. The Dylan Web page at Carnegie
Mellon University is http://legend.gwydion.cs.cmu.edu:8001/dylan/. The Dylan
newsgroup is comp.lang.dylan.



















C PROGRAMMING


SD '95 East, ANSI C++, Framework Wars




Al Stevens


I spent a week last fall at Software Development '95 East in my hometown of
Washington, DC. The show was a big improvement over the previous year--more
attendees, more exhibitors, more parties. It's taken this long for the
east-coast show to catch on. It was in Boston for a while and then in DC
starting last year. I'm glad that DC worked out, because I like having an
excuse to return home every year.
It's fun to take Judy to these shows. As one of the speakers, I can usually
wangle a pass for her. They make a copy of my badge and substitute her name,
which identifies her as an editor for DDJ. That gets her all kinds of
attention from exhibitors who schmooze for favorable mention in this magazine.
Judy and I go our separate ways among the exhibitors. I pick my route based on
what's new for C/C++ developers: This booth is showing an interesting
programmer's editor, another has a new compiler version, and still others have
class libraries, debuggers, and so on. Judy plans her tour based on who's got
the freebies: A yo-yo over here, fly swatters over there, T-shirts, buttons,
ball-point pens, tote bags, an indoor boomerang. It's lucky we drove up from
Florida in the minivan.
On Thursday evening, we went to the Symantec product-announcement party. The
announcement presentation was a real yawner. The presenter got cute and tried
an unrehearsed demonstration involving a lot of ad hoc code on the overhead
screen. Members of the audience were calling out code corrections while the
presenter fumbled and hemmed and hawed his way through the exercise, which
never worked. Some of us slept through it. Everyone stayed, though, because
the free beer and food wasn't available until the presentation was complete.
They sure know how to hold onto an audience of programmers.
At the party, Symantec gave away C++ compiler CD-ROMs, which did not interest
Judy. When they ran out of CD-ROMs, they gave away T-shirts and a promise to
mail a CD-ROM. Judy went for the T-shirt. They wouldn't give her one. Turns
out you needed a coupon to qualify, and she didn't have one. She puffed out
her DDJ editor's badge and tried to look important but to no avail. The
T-shirt official was immutable. She returned to the table empty-handed and
vowed never again to wear last year's Symantec T-shirt, which she treasures
because it was given to her personally by none other than Gene Wang. She said
that if Gene had been there this year, she would have gotten one of those new
T-shirts. It's not fun being around Judy when she's been snubbed this way.
Fortunately, they soon ran out of beer and food, too. "It figures," she said,
and we left along with everyone else. They were boxing up the remaining
T-shirts as we walked out. Judy looked the other way. "I should've worn my new
Borland T-shirt," she whispered.


Borland C++ 5.0


I spoke to one of the Borland C++ compiler developers about impending Version
5.0. It was a bit confusing, not because of the compiler, but because the
developer was the one I used to talk to at Symantec. He switched to Borland at
about the same time that Borland's chief compiler guy went to Microsoft. Gene
Wang used to work for Borland and jumped to Symantec a couple of years ago
amidst a flurry of ado about trade secrets and such. Before that, Borland's
Brad Silverberg moved to Microsoft. As a result of all these shifts, we had to
cancel our annual Benedict Arnold Loyalty award; the playing field got to be
too even.
Anyway, back to Borland C++ 5.0. The Borland developer told me that the new
compiler would have all the new ANSI features described in the April '95 draft
specification, which would once again put Borland C++ ahead of Microsoft
Visual C++ 4.0, which is missing most of the new stuff. I asked about support
for Microsoft Foundation Classes in the Borland compiler. He said that the
compiler would support but not distribute MFC, which means that if you can get
hold of the MFC source code (distributed with Visual C++), Borland C++ will
compile it. I expect that Borland's visual-development environment will work
with MFC, too, but don't quote me. I asked why Borland would not distribute
MFC as Watcom and Symantec do. The developer said that Microsoft refuses to
license the MFC technology to Borland. The two companies have been arch rivals
for years. Gates has been quoted as saying that he hates Philippe Kahn. Hates?
If I had umpty-ump billion dollars, I'd love everybody--even Philippe. Wake
up, Bill. Philippe went to another company. Borland has a significant slice of
the compiler market. It would be nothing but good for Microsoft if all those
programmers were using MFC.


SoftBoard


My favorite SD '95 East exhibit was not software at all, but a hardware
device, one of those obvious things that, as soon as you see it, you wonder
why you didn't think of it. SoftBoard from Microfield Graphics is a whiteboard
interfaced to a PC. At first you don't believe it. A conventional-looking
whiteboard sits on an easel next to the PC. Selecting a color dry-erase marker
from the tray, you draw a picture or write something on the whiteboard. As you
do, a Windows application in the PC, which has a white window the same shape
as the whiteboard, displays what you mark. You can save the picture as a
bitmap file.
How did they do it? First, the whiteboard is not conventional at all. It casts
laser scanners in two axes across the face of the whiteboard. Second, the
markers are not conventional either. Each marker has a band with a bar code
just above the tip. The scanners read the bar code and send the coordinates
and code to the PC through the serial port. The code identifies the color, and
the application does the rest.
This is not only cool, it has lots of potential. In my C and C++ training
classes, I often use the whiteboard to amplify a concept or answer a question.
The students hastily copy the charts into their notes because I'll be erasing
the picture to make room for the next explanation. How convenient, timesaving,
and accurate it would be if students could push a button and save those
illustrations electronically with all the information intact. This technology
also solves one of my objections about remote training across a network, where
the students have an audio/video window of the instructor but no way to
clearly see or capture the whiteboard presentations. SoftBoard is not cheap.
The smallest one is about $2800, but it solves several long-standing problems
associated with training and presentations. If they had been handing out
samples at the show, I'd have sent Judy to their booth right off.


JMcC's Keynote


The highlight of the show was the keynote address by Jim McCarthy, Microsoft's
product manager for Visual C++. He addressed the problems of schedules,
slippage, and shipping versions in large software projects. His observations
hit the mark every time. One of them (my favorite) describes his answer to
managers who find it unacceptable if a programmer does not know when something
is going to be finished. Jim says that it is just too bad. They can't hire
someone who knows; they can only hire someone who lies. He told the story of a
Visual C++ feature, originally estimated to take four months, then estimated
to take ten, causing it to miss the ship date for the next release. Jim wanted
that feature and sent his programmers away to think about it. Two days later
they came back. How long did they think it would take after giving it more
thought? Forget it, they said, it's completed. In the course of reconsidering
their estimate, they had just gone ahead and implemented the feature. His
point? Those are the best programmers in the business, and they don't know the
difference between a ten-month feature and a two-day feature. The practice of
estimating software development is a long way from being understood, much less
accurate.


ANSI C++ Innovations


The ANSI X3J16 committee published a draft specification in April 1995 for
public review. Presumably, it was meant for public comment, too, but they
failed to allow sufficient time for anyone to analyze the new language in any
detail. Furthermore, if my experience is typical, comments from the public are
stonewalled as soon as they are submitted. The committee wants to get on with
its work and apparently does not want to be bothered by public reaction. I
heard that the committee's position is that the public had sufficient time to
react prior to the publication of this document. True enough if you are on the
committee and privy to their deliberations. Those of us who are not had little
opportunity between the document's availability and the cutoff date for
comments for a comprehensive review.
It could be argued that anyone that interested should be on the committee, a
participant rather than a critic. It can be counterargued that commentators,
particularly in the press, must preserve the objectivity of the outsider
looking in. An activist forms emotional commitments to issues based on
not-always-equal quantities of ego and merit. Besides, I hate meetings.
The committee has made substantial changes in the area of templates and the
standard library. We are correct to raise concerns about how well those
changes have been tested and validated in so little time with no mature
compilers available to implement them. The public deserves a better shot at a
specification that is bound to change the way we work for the next several
years.
There are many small changes to the language that have minor consequences. The
mutable and typename keywords. The bool type. You can use them or ignore them.
There are many substantial changes, too, such as the namespace feature and the
specification of a standard exception hierarchy. These should not be ignored
because they solve serious deficiencies in traditional C and C++.
But the committee wears two masks. If you suggest a change that they prefer
not to consider, they fall back on the time-worn argument that the change
breaks too much existing code, whether or not it does. But when they want to
make a change, regardless of its consequences to existing code, they just up
and make it. Well, maybe not in such a cavalier fashion, but they do it
nonetheless. This tendency is not rampant, but it occurs enough to make me
uncomfortable.


ifstream::read and ofstream::write


Case in point. I suggested that the ifstream::read and ofstream::write
functions be changed from using char* and int parameters to using void* and
size_t parameters. This would eliminate casts when the buffer argument was not
a char array or pointer and would permit a program to read and write buffers
greater in length than the limits of a signed integer (32K on a 16-bit int
implementation). I discussed this situation with Bjarne Stroustrup, who agreed
that the specification should change. He called it a bug. (Actually, we
discussed only the int versus size_t part of the issue.) I submitted the
recommendation to the committee. The answer came back that this change would
break too much existing code. Not understanding how, I asked for an example.
I'm still waiting for an answer. I suppose that an implementation that uses an
int wider than its size_t could have problems, but do such implementations
exist? I don't know. If I can allocate a record buffer with a size_t size
argument, why can't I read a record into that block with the same argument?
A look at the specification reveals why the committee might not want to
consider such a change. The code that would be broken is in the document
itself, not in programs out in the real world. Because virtually everything in
the standard library is now specified as a template implementation, the types
for those parameters are not what they used to be in traditional C++. They are
now specified with typedefs rather than intrinsic types. The buffer argument
is a pointer to char_type, which is almost impossible to interpret because
they left it out of the document's index. I found its typedef in a header file
in the document. It's a charT, which is also missing from the index, but which
turns out to be a template argument, so it could wind up being anything. The
read and write functions' size arguments are of type streamsize. The
streamsize type is typedefed as an INT_T, which is either int or wint_t
depending on what CHAR_T is. If anybody suggested to me that I change
something inside that mess as a volunteer and for no pay, I'd tell them
whatever it took to make them go away.


A New for Statement



Here's a change the committee made. The new specification for declarations in
the first expression of a for statement reveals an insidious problem. Example
1(a) is taken from the specification.
The specification states that "...the scope of the name(s) declared extends to
the end of the for-statement." This is quite different from how the C++
language works now.
A lot of existing code will be broken by this change. In fact, the committee's
example does not even compile with contemporary compilers because the
declaration, under current rules, is in the same scope as the for statement.
Example 1(b) shows the effective code and scope if you substitute an
independent declaration and a while loop.
Clearly, you could not declare another i integer after the while loop because
the first i integer is still in scope. And here's where the ANSI specification
springs a leak. Earlier on the same page, the specification uses just such an
example to demonstrate the operation of the for statement. C'mon guys, which
way is it?
Example 1(c) shows a common idiom that uses the current rules to advantage.
Change the rule, and a lot of similar code no longer compiles. If there
happens to be another int i in an outer scope in Example 1(c), the program
compiles okay under the new rule, but it stops working properly.
Bjarne Stroustrup prefers the new rule and wishes he had specified it that way
originally. If he had, this issue would not exist today. He did not, however,
and now he has successfully campaigned to have the committee correct what he
perceives to be his earlier mistake. Over the years Bjarne's influence has led
the committee down many paths, mostly good. His championing of STL and the
template changes that STL needs, for example, was a major step forward for the
language. But this change is, in my opinion, a mistake. That's because: 1. I
really like the earlier usage; and 2. I don't think the change's benefits
outweigh its consequences.
Bjarne disagrees. The decision to effect this change was not made lightly.
According to him, the decision was debated and postponed for years because of
concerns for existing code, teaching materials, compilers, and so on. His
winning argument to the committee in favor of the change was:
You will have programmers complaining whatever you decide, but personally I
will feel better defending the cleaner and--based on experience--more
intuitive rule than apologizing for the old rule and explaining what code the
change would break.
So it is what it is, and for a time we will be in transition until all
compilers comply and all old programs are converted. During the interim I
would encourage compiler vendors to include a compile-time option, perhaps a
pragma, that retreats to the older rule. Better yet, a warning that alerts
programmers about the potential change to what was an overriding scope.
Programmers should code to the new rule, but if an older program fails to work
properly with a new compiler, this new behavior might be the problem.


enum as a Type


I don't know if this next change is intentional or merely the result of one
compiler vendor's interpretation of an ambiguous specification. It turned up
when I ported the code in Example 2(a) from Visual C++ 2.0 to the beta Visual
C++ 4.0. Example 2(b) is the message the compiler returned to me.
According to Microsoft's interpretation of the ANSI specification, you have to
overload the arithmetic operators for enum types. The relational operators
still work the way they always did. Example 2(c) shows the overloaded operator
function for the postfix ++ operator. After that, you need the prefix ++
operator, the two decrement operators, the binary addition and subtraction
operators, and the op-equal addition and subtraction operators. Finally, all
those operators must be overloaded for any other enum types you define.
I pored through the ANSI draft specification, but I could not determine from
the document with any degree of certainty whether Microsoft's so-called new
behavior is proper or improper. Unless I missed something, which is possible
given the complexity of the document, this behavior is totally implementation
dependent, subject to the compiler builder's interpretation of the
specification. If that assumption is correct, the portability of the language
has been seriously compromised. If incorrect, and Microsoft is right, this
change to the way that enum variables work might make for a more type-safe
language, but it will certainly break a lot of existing code. If Microsoft is
wrong, the committee needs to tighten up the specification.


MFC versus OWL


In a recent column, I expressed my opinion that the Microsoft Foundation Class
(MFC) library is the best of the Windows program-framework class libraries.
That opinion generated more heated response than anything I've said since I
suggested that OS/2's installation procedure makes huge vacuum-cleaner noises.
I seem to know the way to the heart of a programming religious issue. Mostly I
heard from programmers who prefer Borland's Object Windows Library (OWL) over
MFC. OWL is, they say, more object oriented than MFC. Maybe. Is that
necessarily true and, if so, is it better? It's a matter of opinion. Does OWL
encapsulate more of the Windows API than MFC? Perhaps. It seems to. OWL also
seems to hide more of the API, making it more difficult to get at things
either not supported or encapsulated in a way that the programmer wishes to
override. These discussions, mostly online, went on for a time, devoting only
the first few messages to the relative merits of either class library. After
that, they concentrated on what constitutes a de facto standard, whether
Microsoft is force-feeding us something, and whether the trade press (me) is
being taken in by hype. More message bandwidth was given to Microsoft bashing
and defending than to the technical issues, which might point to the true
heart of the religious wars.
The two sides squared off and fired their volleys back and forth. It was a
friendly exchange, and I won't go further into the details of the battle.
Mostly I stood back to see which way it went.
It soon became evident that if you start with MFC, you like MFC; if you start
with OWL, you like OWL. Most of the commentators had not spent a lot of time
with both products, which is to be expected. You do not, in the course of
making a living, dedicate substantial time to two competing, mutually
exclusive tools. It is natural, then, to prefer the one you use. The
respondents seemed to believe that C programmers who already knew the Windows
C API preferred MFC, while C++ programmers with little or no exposure to
Windows programming preferred OWL. That was not my experience. I am in the
latter group and prefer MFC. But, having developed DOS application-framework
function and class libraries (D-Flat and D-Flat++), my experience and bias
might not be typical.
A recurring theme was that influential columnists are irresponsible when they
state their opinions so unequivocally. That position is taken only by those
who disagree with the opinions, of course. First, you overestimate my
influence. I could write until my mug was cerulean about my favorite
programmer's editor, and not one of you would switch. Second, what difference
does it make if a few people decide to use what I use. If it works for me, it
should work for them. Are programmers somehow diminished when their choice is
the different (or same) drummer? I don't think so.
I expected objections from the vendors of the competing class libraries, but
they didn't make a peep. Maybe they dismiss my opinions as the wild ranting of
an uncontrollable and out-of-control programmer. Maybe they're right. On the
other hand, maybe they secretly agree with me. I doubt it.
Those of you who cared enough to object got one thing right. My experience
with OWL is probably not current enough to simply say that MFC is better. I
like MFC better, so I use it, which accounts for my more current knowledge of
its capabilities. I did not intend to give the impression that other class
libraries are not worth a look.
Example 1: (a) New C++ for statement; (b) the effective code and scope if you
substitute an independent declaration and a while loop; (c) common idiom that
uses the current rules to advantage.
(a)
int i = 42;int a[10];for (int i = 0; i < 10; i++) a[i] = 1;int j = i; // j =
42;

(b)
int i = 0;while (i < 10) { a[i] = i; i++;}

(c)
for (int i = 0; i < maxentries; i++) { // ... if ( ... ) // some
loop-terminating condition break;}if (i < maxentries) // the loop was broken
before reaching the end
Example 2: (a) Code being ported from Visual C++ 2.0 to the beta Visual C++
4.0; (b) message returned from the compiler; (c) the overloaded operator
function for the postfix ++ operator.
(a)
enum Color { red, green, blue };int main(){ Color clr = red; clr++; return 0;}

(b)
error C2676: binary '++' : 'enum Color' does not define this operator or a
conversion to a type acceptable to the predefined operator (new behavior;
please see help)

(c)
Color operator++(Color& c, int){ Color temp = c; int cc = c; c =
static_cast<Color>(++cc); return temp;}
















ALGORITHM ALLEY


Multiple Encryption: Weighing Security and Performance




Burton S. Kaliski, Jr. and M.J.B. Robshaw


Burt and Matt are scientists at RSA Laboratories. They can be contacted at
burt@rsa.com and matt@rsa.com, respectively.


As time passes, encryption algorithms age and become vulnerable to new or
previously infeasible attacks. In particular, interest is often paid to the
size of the key used for encryption, since, in the absence of any other
weaknesses, this is one way to compare the strength of ciphers. Key lengths
once considered adequate are now vulnerable to adversaries with only modest
resources.
While old, well-trusted ciphers approach the end of their useful life, new
ciphers are continually being proposed. But new ciphers should be greeted with
extreme caution and deep skepticism. They should be subjected to prolonged
cryptanalysis (by people other than the designers) before they are considered
safe for use by the community at large. Several ciphers are currently earning
their credentials as part of this process; IDEA is perhaps the foremost among
them (see "The IDEA Encryption Algorithm," by Bruce Schneier, DDJ, December
1993).
To span this period of uncertainty, you should try to use trusted ciphers in
more secure ways. As a first step, it is natural to suppose that encrypting
twice is better than encrypting once; that the more times you encrypt data
which has already been encrypted (perhaps with a different cipher), the safer
you must be. 
For the most part, this might be true. However, you don't get something for
nothing. More encryption means more work, and you need to know exactly how
much extra security is gained from the additional work.
In this article, we will examine these issues, paying particular attention to
repeated encryption with the same cipher. We will then consider some issues
involved in the mixed use of different block ciphers.


Double Encryption


Triple encryption (encrypting three times) is mentioned frequently. But
whatever happened to double encryption? Destined to remain the perfect
tutorial example, double encryption increases security so little that the
extra effort cannot be justified. 
With double encryption, the same block cipher is used twice with two different
keys. If we denote block-cipher encryption of a message m under a key k1 by
E(m,k1), then double encryption might be written as E(E(m,k1),k2). One
argument for the effort required to mount an exhaustive search is that if each
key is k bits long and k1 and k2 are independent, then the best brute-force
search for the keys used in double encryption requires 22k operations.
But what if the opponent is willing to invest in memory? Very often, the time
required to mount an attack is reduced by memory; hence the "time-memory"
trade-off. In the case of double encryption, a brute-force attack requires 2k
table entries, where a table entry consists of a key value and an intermediate
data value, and 2k+1 operations, where an operation is encryption under the
particular cipher. 
To mount this "meet-in-the-middle" attack, you encrypt some known plaintext m
under every possible key k1 and store a table with 2k entries, where each
resulting intermediate block is indexed by the choice of first key. Then you
use every possible key k2 to decrypt the ciphertext; if there is a match in
the table, then you have a guess (k1*,k2*) for the pair of keys that were
actually used. (Figure 1 shows this attack.) The cryptanalyst might need
additional known-plaintext/ciphertext pairs to check that the guess is correct
(depending on block and key size, there may well be false alarms).
While such an attack still seems to have large requirements--a table of size
2k, for instance--if the attacker can choose the plaintext, only one table
must be compiled for all keys. (If the plaintext is only known rather than
chosen, then the table needs to be prepared for that particular plaintext.)
Thus, while encryption of each plaintext block requires twice as much work,
the cryptanalyst with resources to invest in memory is only faced with twice
the computational effort required for single encryption. This is certainly not
the 22k operations we might have expected.


Triple Encryption


Unlike double encryption, triple encryption can provide the kind of security
we want at a price we are willing to pay, and in several different ways.
In two-key triple encryption we can write the ciphertext equivalent of some
plaintext m as E(D(E(m,k1),k2),k1), where D(,k2) denotes decryption under key
k2. When k1=k2, this mode of triple encryption reduces to the single
encryption E(m,k1). This useful property is referred to as "backward
compatibility." For shorthand, let's call this version of triple encryption
with two distinct keys "EDE2"; obvious changes will be made to this notation
to accommodate other variants.
Two-key triple encryption is vulnerable to some forms of attack. While the
attacks we will describe require large amounts of time and memory, they show
that two-key triple encryption does not provide as much security as we might
have hoped. Nevertheless, two-key triple encryption is used quite widely in
banking; we are not suggesting any practical problem with two-key triple
encryption, but there are benefits in using three keys.
In their paper "On the Security of Multiple Encryption" (Communications of the
ACM, July 1981), R.C. Merkle and M.E. Hellman showed how two-key triple
encryption is vulnerable to a chosen-plaintext attack that requires the
encryption of 2k chosen plaintexts, 2k words of memory, and about 2k+1
operations. This is essentially the same work effort required to tackle double
encryption, though the plaintext requirements have been vastly increased.
Merkle and Hellman were the first to point out the essential impracticality of
their own attack, but they referred to the attack as a "certificational
weakness" in two-key triple encryption.
This somewhat prophetic view was later upheld. In the paper "A Known-plaintext
Attack on Two-key Triple Encryption" (Advances in Cryptology: Eurocrypt '90,
Springer-Verlag, 1991), P.C. van Oorschot and M.J. Wiener described a
known-plaintext attack that requires 2120/n operations when given n known
plaintext blocks, and memory requirements that grow linearly as n increases.
While this attack may also be considered impractical, anyone using two-key
triple encryption must consider it seriously.
Instead, it is recommended that three independent keys be used. Thus, we can
write the ciphertext equivalent to some plaintext encrypted using three-key
triple encryption as E(D(E(m,k1),k2),k3), where backward compatibility is
provided by putting k3=k2 or k1=k2. 
Three-key triple encryption is still vulnerable to a meet-in-the-middle attack
requiring 22k words of memory and about 2k+1 operations. However, for any
cipher with a reasonably sized key, this brute-force attack will not be a
substantial threat. 
All these brute-force attacks are applicable, at least to some degree, to any
cipher. However, there are some technical issues to resolve in adapting the
attack. These attacks were devised with DES in mind, and they make assumptions
about key size compared to block size. The sizes can vary considerably between
different ciphers.
Finally, one potential property of a block cipher that would jeopardize the
security of triple and even higher multiples of encryption is the possibility
that the underlying block cipher generates a "group." If so, then for each set
of keys k1, k2, and k3, there would be a fourth key k4, such that
E(D(E(m,k1),k2),k3)=E(m,k4). Clearly, this would be disastrous. Unfortunately,
for most ciphers it's still unknown whether or not the cipher forms a group.
DES, however, is not a group, and triple encryption is not vulnerable to
attack in this way (see the accompanying text box, "The Data Encryption
Standard").


Modes of Triple Encryption


So far, we have only considered the triple-encryption cipher as an "electronic
code book," where each plaintext block is encrypted independently of the
others. Previously, triple encryption has been used to encrypt valuable
information such as other encryption keys (see American National Standard
X9.17: Financial Institution Key Management, 1985), and this information can
often be accommodated within a single block.
For longer messages, however, it is customary to "chain" the series of
encryptions in a mode known as "cipher block chaining" (CBC). With
single-encryption CBC, the previous ciphertext is XORed with the plaintext
prior to block-cipher encryption.
With triple encryption, we can use one of two CBC modes (see Figure 2):
Inner-CBC, where each individual component is used in CBC mode. 
Outer-CBC, where the three stages of encryption are considered a single unit
and the ciphertext is fed back to the plaintext.
Before considering the security issues, let's examine the performance of these
CBC modes.
All triple-encryption modes obviously require more resources than single
encryption: more hardware or more time. But it would be nice if, given
sufficient hardware, the throughput were the same as for single encryption.
(For software implementations, the different triple-encryption modes all take
roughly the same amount of time.)
Throughput is the main performance distinction between the inner- and
outer-CBC modes. In the inner-CBC modes, feedback is around one operation.
Three chips, each implementing a single cipher operation, can be kept busy
continually, each feeding back to itself, so the throughput is the same as for
single encryption. (Latency, the amount of time it takes to process a
particular block, is nevertheless three times longer than in single
encryption.)
In outer-CBC, feedback is around all three operations. This means that even
with three chips, the throughput is only one third that of single encryption;
the first chip must wait until the other two are done to move on to the next
block. Of course, the chip can process blocks from other messages during its
idle time. By interleaving the processing of three messages, the chips can be
kept busy all the time. This changes the encryption protocol at a higher
level, but may be appropriate for some applications.

With three encryption chips then, inner-CBC is intrinsically more efficient
than outer-CBC, unless three messages are interleaved for the latter.
It appears that inner-CBC offers more security than outer-CBC, since the
values of the internal feedbacks are never revealed. In an EDE configuration
the usual feedback around the middle decryption unit actually becomes a
feedforward. This hinders meet-in-the-middle attacks, and inner-CBC appears to
offer considerable security against brute-force attacks. 
The use of three keys is obviously beneficial, and any additional operational
overhead it incurs is insignificant. So, three-key inner-CBC EDE seems to be
the mode of choice against brute-force attacks.
But in "Cryptanalysis of Multiple Modes of Operation," Advances in Cryptology:
Asiacrypt '94 (Springer-Verlag, 1995), E. Biham showed that inner-CBC modes
are vulnerable to a class of sophisticated attacks. The trouble is that the
use of feedback around each unit in triple encryption allows controlled
changes in the ciphertext to be propagated as changes to the internal and
unseen data. 
By controlling these changes in the ciphertext, powerful differential and
linear cryptanalysis techniques can be used on individual components (see
Differential Cryptanalysis of the Data Encryption Standard, by E. Biham and A.
Shamir, Springer-Verlag, 1993). Depending on the exact form of triple
encryption--two- or three-key, EDE, or EEE--different attacks can be mounted. 
In essence, Biham has demonstrated that the use of feedback in the internals
of an encipherment procedure is dangerous. It is better to use one substantial
encryption transformation and to employ any feedback around its entirety,
rather than to consider the encryption as a succession of small, and
essentially less secure, transformations, each with its own feedback.
When using triple encryption with chaining, we recommend three-key, outer-CBC
triple encryption with an EDE configuration for backward compatibility.


Mixing Ciphers


One frequent suggestion is to mix the use of different ciphers. Perhaps we
should encrypt the plaintext with one cipher, the result with a second,
incompatible cipher, and so on. This might well be the basis of a strong
design, although the difficulty and cost of implementation must be considered,
and potentially intricate solutions to backward compatibility must be
provided. 
Of course, this more-general question could have been the theme of this entire
article; after all, multiple encryption with a single cipher is merely one
restricted case where all the ciphers are the same. But more-practical issues
have influenced our emphasis. Multiple encryption with the same cipher is
standardized and widely used, whereas the use of different ciphers remains, at
best, merely a proposal. In "Cascade Ciphers: The Importance of Being First,"
Proceeding of the IEEE Symposium on Information Theory, 1990, U.M. Maurer and
J.L. Massey demonstrated that a cascade of ciphers, where the key used at each
stage is chosen independently of the others, is at least as strong as its
first component. This property is often casually referred to as "the
importance of being first." When the chain of ciphers can be equivalently
rewritten with any of the ciphers first (technically, the ciphers are said to
"commute"), then the cascade is as strong as its strongest component.
Others have addressed related questions, and it seems reasonable to suppose
that the successive use of different and incompatible ciphers can only provide
additional protection against more-sophisticated methods of cryptanalysis. The
basic brute-force and meet-in-the-middle attacks can always be mounted,
whatever the cipher, in one way or another. (There are some technical issues
to resolve when the key size either differs between the ciphers or becomes
larger than the block size.) But there appears to be some reluctance in
establishing a precedent; as yet, no one has offered a good choice of ciphers
or basic results on the security of a particular mix.


Less-Computation-Intensive Alternatives


As we add encryption operations to the chain of multiple encryptions, we
increase the work required to encrypt a block of data by another factor. Is
there some less-computation-intensive way to increase the security of our
basic block cipher without resorting to multiple encryptions?
With regard to DES, there have been a few proposals that might easily be used
on other block ciphers. In one variant, which we call "DES-SES" (see "Foiling
an Exhaustive Key-search Attack," by F. Rubin, Cryptologia, April 1987), a
secret substitution operation S is used on the plaintext before encryption
using single DES. We'll focus on DES-XEX, another variant where an XOR is used
both before and after single encryption with DES; hence the -XEX. Thus, the
plaintext m would be encrypted as E(mXORs1, k1)XORs2, where k1 is the usual
secret DES key and s1 and s2 are secret 64-bit quantities which may be the
same (DES-XEX2) or different (DES-XEX3).
Against brute-force attacks, both DES-XEX2 and DES-XEX3 offer a marked
improvement over single-encryption DES. The effort required to attack DES-XEX2
is 2120/n operations with n known plaintexts; to attack DES-XEX3, 2121 DES
operations are required when three plaintext blocks are known. 
The additional protection offered against differential and linear
cryptanalysis is not as dramatic but still significant; DES-XEX2 or DES-XEX3
offer the same protection as DES with so-called independent subkeys. Such
attacks, while impractical (they require more than 261 chosen plaintexts for
differential and 260 known plaintexts for linear cryptanalysis; see Biham's
"Cryptanalysis of Multiple Modes of Operation"), offer only a small
improvement over the requirements for single-DES.
But DES-XEX is cheap--only an additional two XORs are required to encrypt each
plaintext block. DES-XEX or similar variants may offer a very realistic
balance between increased encryption effort and increased security.
As an exercise, we pose the following question: Is DES-EXE--that is,
E(E(m,k1)XORs1,k2), where k1 and k2 are independent DES keys and s1 is a
secret value--stronger or weaker than DES-XEX?


Final Thoughts


While there may be significant advantages in using several types of ciphers,
the approach is essentially untried and has significant operational
implications. Instead, it is most commonly proposed to use multiple iterations
of the same cipher, often DES. In particular, three-key triple-DES in EDE mode
is just starting to be widely accepted for the encryption of data streams. 
The use of triple encryption raises an interesting question: Surely we need
something stronger than triple encryption to transmit the keys used for triple
encryption?
When considering triple-DES in a system, perhaps you should examine higher
multiples of DES encryption. In quintuple-DES, if a three-key variant is used,
then E(D(E(D(E(m,k1),k2),k3),k2),k1) would be among the strongest key
sequences against meet-in-the-middle attacks. Backward compatibility with
triple-DES is provided by putting k2=k3, and backward compatibility with
single-DES is provided by putting k1=k2=k3. Of course, a five-key variant
would have obvious additional advantages.
Whatever the future with DES, the issue of multiple encryption will have to be
considered for the block cipher of the day. We now have a sufficient
understanding of current cryptanalytic techniques to provide a fairly accurate
estimate for the additional security provided in return for the increase in
work effort and key size.
The Data-Encryption Standard
Designed in the early 1970s, the Data Encryption Standard (DES) is, in the
1990s, still arguably the world's most trusted block cipher, and with good
reason. Its resistance to all published cryptanalytic attacks bears continuing
testament to the exceptional quality of its design. In the absence of any new
major cryptanalytic breakthrough, the declining security offered by DES is
essentially a symptom of technological aging--as a basic, primitive building
block, it can still be cryptographically useful.
DES is a block cipher that transforms plaintext blocks of 64 bits to 64 bits
of ciphertext. The transformation is controlled by a 56-bit key. DES is an
iterated cipher consisting of 16 rounds, each using a 48-bit subkey derived
from the 56-bit key chosen by the user. Sometimes a variant of DES is
described as using independent subkeys. In this case, the subkeys used in each
of the 16 rounds are unrelated, rather than being derived from the same
user-provided 56-bit quantity.
Apart from a potential halving of the search space gained by using the
so-called complementation property (this results from the way the subkey is
used in each round), no known shortcuts allow a reduction in the complexity of
a brute-force attack. Compared to some more-recent variants, DES is remarkably
resilient against sophisticated attacks such as differential and linear
cryptanalysis. The best differential attack requires 247 chosen plaintexts,
and the best linear cryptanalytic attack requires 243 known plaintexts. In
fact, in an experiment lasting 50 days, 12 workstations were used to recover
the key used for the DES encryption of 243 known plaintexts (see "The First
Experimental Cryptanalysis of the Data Encryption," by M. Matsui, Advances in
Cryptology: Crypto '94, Springer-Verlag, 1994). This attack is impractical,
however; 40 of the 50 days were spent just encrypting the data.
More significantly, in a Crypto '93 Rump Session entitled "Efficient DES Key
Search," M.J. Wiener estimated that for $1 million, a machine can be built
which would search exhaustively for the key used in any DES encryption. It is
estimated that this machine could find the correct key in an average time of
3.5 hours. 
While DES is still practically secure to those without $1 million to invest,
it is time to move to stronger methods of encryption, be they new ciphers or
multiple iterations of an old favorite.
--B.S.K. & M.J.B.R.
Figure 1: Double encryption using two independent keys and a
meet-in-the-middle attack.
Figure 2: Two different ways of implementing the CBC mode of triple
encryption.



















PROGRAMMER'S BOOKSHELF


Writing Localized Software




Charles Pfefferkorn


Charlie is an independent consultant and the chair of the Software Forum's
International Software group. He can be contacted at charlie@crystal-media.com
or 73234.2154@compuserve.com.


International markets for software are big and getting bigger. Companies like
Microsoft, in fact, earn over half of their revenues outside the United
States. At one time, international markets were happy receiving the previous
version of newly released U.S. software. Today, however, these markets want
the latest version now. The participants in these markets read the most recent
editions of U.S. computer magazines and actively search the Internet for the
most up-to-date information. As a result, U.S. companies are forced to reduce
the time between U.S. and localized release dates, and many are releasing them
simultaneously. To participate in the international market, you need to
understand both its business and technical aspects.
The three books I'll examine here will help you design and develop
international software. While there is some overlap, they focus on different
aspects of the process. Software Internationalization and Localization: An
Introduction provides an overview, covering several platforms and providing
information about International Standards and various business issues.
Understanding Japanese Information Processing focuses on processing Japanese
text. It includes C code for converting between various Japanese character-set
encoding methods and special functions for repairing Japanese text damaged by
e-mail programs and newsgroup readers. It also provides access to online
information about Japanese. The third book is Developing International
Software for Windows 95 and Windows NT. It includes code samples, tables,
figures, checklists, and troubleshooting guides. All three books provide
glossaries, references to additional documents, and numerous appendices.


Software Localization


Software Internationalization and Localization, by Emmanuel Uren, Robert
Howard, and Tiziana Perinotti, discusses the creation of products for
international markets. The book lists over 40 separate engineering issues,
including different languages, character sets, writing systems, currencies,
currency formats, measurement systems, number formats, calendars, date
formats, standards, legal systems, and cultures. 
Western European languages, for instance, use diacritical characters and
additional non-English letters. Eastern European languages include Cyrillic
script and Greek letters. Asian languages use thousands of ideographic
characters derived from traditional Chinese characters. Arabic and Hebrew use
a bidirectional writing system when English words are included. Different
languages have different capitalization, hyphenation, spelling, and grammar
rules, and imply different typography. Even number formats vary: In the U.S.,
the decimal separator is a period and the thousand separator is a comma. In
most Western European nations, it is the opposite. A less obvious difference
is numeric rounding rules. Even colors, symbols, and sounds (like those of
emergency vehicles and telephones) vary from culture to culture.
The book also describes issues associated with translation. Translated text is
often longer then the original. Words that are different in one language may
translate into the same word in another.
Software Internationalization focuses on using IBM PCs with either DOS or
Windows 3.x, but the book also provides UNIX and Macintosh information. The
technical information is descriptive and references are included; source code,
however, is not, and some of the technical information is becoming dated.
The book also includes a chapter on non-Western European languages and a
chapter on International Standards and International Standards Organizations.
Software Internationalization concludes with a discussion of international
business issues: development models, business relationships, distribution
channels, legal issues, logistics, government regulations, custom duties and
taxes, repatriating funds, and the cost of doing international business. There
is also a chapter on developing products in Europe and marketing them in the
United States.


Japanese Information Processing


Japanese text is written using four types of characters: romaji (Roman
characters), hiragana, katakana, and kanji. Romaji includes the standard
English alphabet and numerals. Hiragana and katakana are syllabaries for
Japanese sounds. Hiragana is used for grammatical words, inflectional endings
for verbs and adjectives, and some nouns. Katakana is used for words of
foreign origin and for emphasis. Kanji includes the characters borrowed from
the Chinese over 1500 years ago.
In Understanding Japanese Information Processing, Ken Lunde carefully
describes the evolution of Japanese character-set standards and their
relationship to ISO character-set standards. The primary Japanese character
set standards are JIS X 0208-1990 and JIS X 0212-1990. JIS X 208-1990 contains
6879 characters, of which 6355 are kanji, divided into two groups: 2965 in
Level 1 and 3390 in Level 2. JIS X 0212-1990 contains 6067 characters, of
which 5801 are supplemental kanji.
Lunde also describes other Asian language standards and international
character sets including ISO 10646 and its subset, Unicode. In Unicode,
121,403 characters of Chinese origin (Chinese, Japanese, and Korean) are
mapped into 20,902 unique characters using Han Unification rules.
Separate, but related to the character sets are the encoding methods. The
three major Japanese encoding methods are JIS, Shift-JIS, and EUC. JIS is a
modal system for encoding various character sets, including JIS X 0208-1990
and JIS X 0212-1990. It is used primarily for passing information between
computing systems. Shift-JIS is a nonmodal modification developed by Microsoft
and used by many other platforms, including Japanese PCs and KanjiTalk (the
Japanese Macintosh OS). Shift-JIS supports faster internal processing, but
does not support Level 2 or supplemental kanji. EUC (Extended UNIX Code) is
the internal coding system used by most UNIX workstations and is defined by
ISO 2022-1993. The appendices of Lunde's book also include information about
Japanese corporate character sets and encoding methods.
Since the major Asian character sets are extremely large, entering characters
is difficult. While kanji tablets with thousands of keys exist, other input
methods for Asian languages have been developed that use combinations of
software and hardware. Lunde examines these options and describes typography
issues.
For some developers, the most important part of the book will be the
algorithms (presented in C) for converting between different encodings,
handling text streams, automatically detecting the Japanese encoding used for
a text file, and repairing JIS-encoded files. These algorithms are included in
the set of tools, which the author provides via the Internet.
Lunde devotes an entire chapter to Japanese text-processing tools, including
operating systems, text editors, word processors, page-layout software, online
dictionaries, machine-translation software, and terminal software. The chapter
on using Japanese e-mail and newsgroups includes advice on how to repair files
damaged by network mail programs and newsgroup readers. In the appendices, he
lists professional organizations, mailing lists, and FTP sites for additional
software and documents.


Developing International Software 


Developing International Software for Windows 95 and Windows NT, by Nadine
Kano, focuses on developing international software on Windows 95 and Windows
NT. The early chapters discuss general issues associated with
internationalizing and localizing software. Kano stresses the importance of
planning and having written specifications that define localization
requirements. She also describes Microsoft's experience in developing
international software using a single team for both the domestic and
international versions. Finally, Kano discusses the trade-offs Microsoft made
in developing Windows 95.
Other issues covered include designing an international user interface,
researching legal issues, setting up a development environment, testing,
assisting translators, and coding practices.
Chapter 3 covers encoding character sets. Windows 95 uses a code-page model.
For Japanese, Windows 95 uses code page 932, a Shift-JIS encoding; Windows NT
uses Unicode. To produce a single code base for both Windows 95 and Windows
NT, you must use generic prototypes and compiler switches. All Win32 API
functions contain two entry points: one for traditional string parameters and
one for Unicode string parameters.
To localize the user interface, use resource files to define pictures,
strings, messages, menus, dialog boxes, and version information. Chapter 4
describes how to organize these resources and link them to your source code.
Chapter 5 describes how to use Microsoft Win32 NLSAPI to support linguistic
and cultural conventions such as date, time, calendar, number, and currency
formats. This API also provides sorting and character-type information. Like
the rest of Win32 API, NLSAPI exists in two forms (-A APIs and -W APIs). On
Windows NT you can use either form, but on Windows 95 you can only use the -A
forms.
Chapter 6 covers multilingual input, fonts, and multilingual text layout.
Chapter 7 covers processing of Far Eastern writing systems (Chinese, Japanese,
and Korean), including the use of Input Method Editors (IMEs) supported by
Windows NT and Windows 95. On Windows NT 3.5, the interface to the IMEs
depends on the target language. A unified API is provided by Windows NT 3.51
and Windows 95.
While many coding examples are included, you will still need to use other
Microsoft reference materials, including reference manuals for Windows NT and
Windows 95 and the appropriate SDKs.


Conclusion


Both Software Internationalization and Localization and Developing
International Software for Windows 95 and Windows NT cover the basics of
developing international products. Windows 95 and Windows NT developers will
prefer the latter. Developers using other platforms will probably prefer the
former, as will those interested in an introduction to the business aspects of
developing international software. Both books are excellent.
Ken Lunde's Understanding Japanese Information Processing is an essential
reference book for developers processing Japanese text. It will also appeal to
individuals interested in the Japanese language.

Software Internationalization and Localization: An Introduction
Emmanuel Uren, Robert Howard, and Tiziana Perinotti
Van Nostrand Reinhold, 1992 300 pp., $39.95
ISBN 0-442-01498-8
Understanding Japanese Information Processing
Ken Lunde
O'Reilly & Associates, 1993 470 pp., $29.95
ISBN 1-56592-043-0 
Developing International Software for Windows 95 and Windows NT
Nadine Kano
Microsoft Press, 1995 800 pp. $35.00
ISBN 1-55615-840-8



















































SWAINE'S FLAMES


Yellow Dog's New Trick


I hadn't seen my cousin Corbett in months, so when he dropped by one afternoon
last October, I wondered what he had been up to.
"You are looking," he told me as he poured himself a glass of lemonade, "at
the president and chief executive officer of Yellow Dog Enterprises."
I tried to be polite. "Well, the 'Enterprises' part sounds impressive."
"It's accurate, too," he said. He'd managed to find the cornbread I'd hidden
when I saw him drive up, and was pouring honey on a large slab of it. "Yellow
Dog Enterprises is a conglomerate of a growing number of companies. I'm the
president of each one of them."
"Yellow dog, yellow dog. Doesn't that have something to do with labor unions?"
"Got any cheese? Oh, here it is, behind the spice rack. Yeah, yellow-dog
contracts were employment contracts in which the prospective employee agreed
not to join a union. They're illegal now."
As we walked out onto the deck I asked him why on earth he'd chosen that name.
He sat down next to the grinning butternut squash I'd carved into a
jack-o'-lantern and gave me a toothy grin. "In a one-person shop," he said,
"you can work your employees till they drop, pay them peanuts, and give them
zero benefits, since they are all actually you. You can, in effect, enforce a
yellow-dog contract on your work force. Do you see what an enormous advantage
this gives the one-person shop? It's golden. And Yellow Dog Enterprises is a
one-person shop. I'm it."
A passing breeze shook an ecru flurry of leaves down on us and reminded me how
I'd tried to get Corbett to help me rake leaves last fall. "But Corbett," I
said, perhaps somewhat tactlessly, "you hate to work. How can you possibly be
the entire work force for several companies?"
He dodged. "Oh, I've got a trick or two." He was willing, though, to talk
about the enterprises that made up Yellow Dog.
"My company Stochastic Astrologer runs a web site that generates random
horoscopes--a cross between astrology and the I Ching. The word 'stochastic'
is hot right now. And PAMCO produces PAM, a personal-activity manager. It's
designed to organize the most common daily activities, based on my research on
what people really do during the course of a week. You'd be surprised."
"So, surprise me."
"Better yet, why don't you write down your activities for a week and we'll
check how well PAM matches them. Oh, then there's the Yellow Dog Pages web
site. It's a bunch of e-mail addresses I lifted from discussion groups. The
'dog' part is so we don't get in trouble about Yellow Pages, but you can also
"dog-ear" any page you want to come back to. Cute, huh?"
"You're going to get into a heap of trouble posting people's e-mail addresses.
Anything else?"
"Just the Yellow Dog Holding Company."
"Let me guess: a kennel?"
"No, it's a dog-walking service. I've found it an excellent business strategy
to offer superfluous services that people could obviously provide for
themselves. That way you know that your customers are all people with way too
much disposable income."
"I suppose so, but how much of it are they going to spend on dog walking,
Corbett?"
"No, no. You don't get it. The dog-walking service is just to flush the
pigeons. Once I've walked their Weimaraners, I start pitching my PAM. If these
poor Yups and Dinks don't even have the time to walk their own dogs, their
ears are bound to perk up when I tell them about my magic program to organize
their lives."
I flicked a banana slug off the deck railing and mused.
I didn't muse enough, though, because it didn't cross my mind until later that
his PAM program wasn't going to have any list of daily activities until I
finished my list. He had bamboozled me into doing his research for him. I was
beginning to see how you could run a company with no paid staff.
You can usually find Mike hiding from Corbett at Stately Swaine Manor
(mswaine@cruzio.com) or at his home page, Swaine's World, at
http://www2.cruzio.com/personal/mswaine.html.
Michael Swaineeditor-at-large

































OF INTEREST
WinBatch 95 from Wilson WindowWare is a batch language for Windows 95 and
Windows NT developers and system managers. The batch language helps automate
procedures when: launching batch scripts and apps from the task bar, key
strokes, and menus; calling DLLs; constructing custom dialogs and menus;
manipulating data and transfers between apps; and more. A single-user version
of WinBatch 95 sells for $99.95. For LAN environments, WinBatch+Compiler 95
(which includes a compiler that creates royalty-free executables) sells for
$495.00. 
Wilson WindowWare
2701 California Avenue SW, Suite 212
Seattle, WA 98116
206-938-2734
morriew@windowware.com
Greenleaf Software has released Version 2.0 of its ArchiveLib multiplatform
data-compression library for C, C++, and Visual Basic developers. Among other
features, this version supports PKZIP 2.0 and OS/2. ArchiveLib is designed to
handle compression and archival of a file or directories of files. You can
also use it to compress and archive buffers of data within applications
without having to store them as a file. Data retrieval can be read into either
a disk file or memory buffer. ArchiveLib 2.0 sells for $279.00.
Greenleaf Software
16479 Dallas Parkway, Suite 570
Dallas, TX 75248
214-248-2561
http://www/gleaf.com/~gleaf
VisualRPC from CrossLogic is a tool for VisualAge programmers that
automatically embeds remote procedure calls that request, monitor, and
distribute services in a heterogeneous client/server network. VisualRPC, which
operates from a drag-and-drop interface, generates an RPC Interface Definition
Language for both OSF/DCE and Netwise. VisualRPC also generates Smalltalk code
to interface with a DLL.
For distributed computing environments, VisualRPC helps build client/server
apps that interoperate with the support of DCE security and directory
services. VisualRPC also builds OS/2 servers and OS/2 and Windows clients. The
code produced can be used with other tools that call DLLs. VisualRPC sells for
$1350.00 per license.
CrossLogic Corp.
P.O. Box 3195
Ashville, NC 28802
704-254-1702
Cyber-Safe has announced Shades, a program for encrypting electronic
communications and data storage. Shades operates in two modes--a DES-based and
a Triple-DES mode. The program meets and conforms to the FIPS 140-1 standard.
The software is available both as an off-the-shelf package and as an OEM
library toolkit.
Cyber-Safe
3200 Wilcrest, Suite 370
Houston, TX 77042-3366
713-784-2374
Quantum World has introduced ComScire QNG, a hardware device that generates
true random numbers capable of passing stringent statistical tests. The QNG
plugs into the parallel port of a PC and allows access to a variety of
random-number modes through a programmatic interface. The vendor claims that
no statistically significant defect in randomness has been detected, even with
sophisticated tests using sample sizes of up to tens of billions of bits.
Among the possible applications for the QNG are encryption, modeling,
sampling, numerical analysis, and parapsychological testing. The device sells
for $295.00.
Quantum World
P.O. Box 1930
Boulder, CO 80306-1930
http://rainbox.rmii.com/~comscire 
Spectrum Digital has introduced an integrated Motor Development System built
around the Texas Instruments TMS320C52 DSP chip that runs at 40 MIPS. The
system provides all of the electronics required to interface to AC and DC
motors. The onboard I/O includes eight A/D channels, two serial ports, four
D/A channels, three PWM channels, dead band timers, tachometer inputs, and the
like.
At the heart of the kit is the Spectrum Digital SBC320C52 Single Board
Computer, a PC plug-in card. The kit includes the TMS320C5x High-Level C
Source Level Debugger that's compatible with the TI C compiler and assembler.
Also included is a Windows-based VBX that allows you to monitor and control
motor parameters in real time via a PC bidirectional parallel port. The
complete system sells for $7995.00. 
Spectrum Digital
P.O. Box 1559
Sugar Land, TX 77487
713-561-6952
Romlok 3 from Vault is a plug-and-play software security developer's kit
(SSDK) for Visual Basic developers. The SSDK allows access at a native-code
level to an external security key attached to the PC's parallel port. Access
is through a DLL, allowing developers who use Visual C++, Delphi, and other
programming environments to use the system. Romlok 3 sells for $349.00.
Vault Corp.
517 Calle San Pablo
Camarillo, CA 93012
805-484-3475
The PCLTool Form Conversion SDK Version 4.0 has been released by PageTech.
PCLTool is a DLL that converts HP PCL5 print files into bitmap (PCX, TIF, and
so on) or vector (WMF) format files. PCLTools uses its own font engine to
rasterize fonts on-the-fly to create bitmaps at various resolutions, or to
create vector files with TrueType fonts to match those in the resident HP
LaserJet Series II printer. The SDK sells for $495.00.
PageTech
10671 Roselle Street, Suite 100
San Diego, CA 92121-1525
6190658-0191
Sheridan Software has announced ClassAssist, a tool that adds capabilities to
Visual Basic to make the environment more OOP like. ClassAssist consists of an
IDE, five visual-base classes for creating custom controls using VB, and
WinAPI Oblets--programmable objects that simplify access to the Windows API. 
The tool lets you create reusable classes via a point-and-click interface. It
also features a "Class Explorer" which lets you see the class relationships.
ClassAssist sells for $249.00.
Sheridan Software
35 Pinelawn Road, Suite 206E
Melville, NY 11747 
516-753-0985
Efficient Networks has introduced ENcontrol, an API that sets up connections
and handles data as it passes through the various layers of an ATM network.
Part of an SDK, ENcontrol resides across the ATM architecture where the
user-network interface layer, the call-management layer, and the application
layer are located. ENcontrol facilitates transmission and reception of data,
call setup, and call parameters, providing faster and better access to the
network. This improves performance because it removes operating-systems
overhead incurred by existing protocol stacks. 
Efficient Networks Inc.
4201 Spring Valley Road, Suite 1200
Dallas, TX 75244
214-991-3884
info@efficient.com

Socket Communications has announced its PageCard Partners Program, a program
intended to help developers page-enable Windows applications. Participants in
the program will receive hardware and software tools, technical support,
software updates, information on future wireless software and hardware, and
admission to Socket's technical seminars and workshops. ISVs will receive
marketing support, including trade-show activity, referrals, joint promotions,
market information, targeted publicity, and collaborative selling.
Qualifying candidates can purchase the PageCard Partners kit, consisting of
the PageCard SDK, a PageCard PC Card pager, and one month of the basic local
paging service through Socket Wireless Messaging Service. The PageCard SDK
provides tools for sending wireless messages and data directly from Windows
e-mail, spreadsheets, schedulers, or contact-management apps. The PageCard SDK
includes API functions, a run-time DLL, and C/C++ and Visual Basic header
files for developing and testing page-enabled applications under Windows 3.x
or Windows 95. The SDK also includes sample C++ and Visual Basic programs to
illustrate the use of the SDK. 
Socket Communications
6500 Kaiser Drive
Fremont, CA 94555
510-744-2740
Centigram Communications has announced that its TruVoice 5.0 supports the
Win32 Speech API. TruVoice, in fact, will be distributed in Microsoft's beta
Speech SDKs to developers who want to create speech-enabled apps under both
Windows 95 and Windows NT. 
TruVoice features advanced intelligence that includes context-sensitive rules
for correctly reading data from spreadsheets, address book applications,
e-mail messages, and faxes. The rules eliminate time-consuming editing before
the text can be accurately read in these applications. The preprocessor also
strips out unwanted header information in e-mail messages and appropriately
expands multifunction punctuation, multi-expansion abbreviations, nonstandard
abbreviations, and multiple date and time formats. Other enhancements include
improvement of the voice quality in American English, French, German, Italian,
and Spanish, as well as new voices (eight male, two female) across all
available languages. 
Centigram Communications
91 East Tasman Drive
San Jose, CA 95134
408-944-0250
http://www.centigram.com
Motorola Paging Products Group's Personal Communicator Systems and Software
has announced the Two-Way SDK Pro for providing access to two-way, high-speed
messaging and data technology. Motorola's Tango two-way pager is the first
device to operate on the ReFLEX two-way messaging protocol. 
The SDK allows developers to create a wide array of innovative, two-way
software applications for mobile professionals using a two-way pager as a
wireless modem for a Windows-based portable PCs. The Two-Way SDK Pro, priced
at $495.00, includes an API specification, Windows 3.1 DLL, a two-way pager
emulator to simulate message entry and acknowledgment, an emulator user's
guide, a programmer reference manual, and a sample application. 
Motorola Inc.
1500 Gateway Boulevard
Boynton Beach, FL 33426
NewCode Technology has released a C++ interpreter called "NCi" that's designed
to be embedded within an application. Part of NCi is a library of functions
your app can use to call the interpreter. In addition, a stand-alone C++
parser is provided to process include files and create type information used
by the interpreter at run time. 
The NCi interpreter allows interpreted code to call compiled code with full
type checking. It also allows new variables, functions, and types (including
subclasses) to be created at run time. NCi is X3J16 compliant. However, it
will not interpret namespaces, type ids, catch/throw, or dynamic arguments to
member functions that exist in the compiler scope of the class in which they
are declared. NCi supports HP and Sun workstations, and Windows NT and Windows
95.
NewCode Technology
650 Suffolk Street
Lowell, MA 01854
508-454-7255
info@newcode.com






































EDITORIAL


Not Without Controversy


You'd think that someone who's devoted his professional life to writing about
"serious" literature would be immune to controversy. But then Sven Birkets,
noted critic, author of the acclaimed book The Gutenberg Elegies: The Fate of
Reading in an Electronic Age, and self-appointed latter-day Luddite has an
excuse--it's his editor's fault. Isn't it always.
Birkets's The Gutenberg Elegies is a wonderfully written, thought-provoking
examination of the impact of digital technology on literature and reading.
Presumably, it was the book--in particular the final chapter entitled "Coda:
The Faustian Pact"--that led to Birkets's participation in an August 1995
Harper's Magazine forum dubbed "What Are We Doing On-line?". Along with the
Electronic Frontier Foundation's John Perry Barlow, Wired magazine's Kevin
Kelly, and like-minded author Mark Slouka, Birkets debated in print the pros
and cons of the "message of this new medium," the Internet. Birkets was the
resident skeptic.
Still, nothing Birkets said in Harper's matched his "Coda" chapter, which ends
with the exhortation "Refuse it." I'll let you guess (or at least read on your
own) what he wants us to shy away from. Birkets may be on to something when he
says, "my core fear is that we are...becoming shallower" and "our whole
economic and technological obsession with getting on-line is leading us
away...from the premise that individualism and circuited interconnection are,
at a primary level, inimical notions." In any event, it is entirely proper to
question basic premises as our online experience evolves. 
Letters to the editor in subsequent Harper's were hot and heavy, with Birkets
even having Rilke poems thrown back at him. It's doubtful that analyzing
imagery in the works of Virginia Woolf ever created such a firestorm.
All of this led me to go hear Birkets when he showed up at a local bookstore.
He agreed, for instance, that the "Coda" chapter was the most
controversial--and newsworthy--part of the book. More interestingly, "Coda"
wasn't part of the original manuscript--it was added when his editor sensed
something was lacking. What the book needed, said the editor, was a more
powerful ending. Birkets returned to his Smith-Corona (what else?) and wrote
"Coda," launching him on to the literati equivalent of day-time talk shows. 
While Birkets's editor is possibly an exception, sometimes we editors do end
up eating crow. Not that I'd ever fess up to feasting on such foul fowl, but
even editors can change their minds once in a while.
For instance, regular readers of this space (both of you) may recall that in
the November 1995 issue, I made a smart-aleck remark about a University of
California software patent covering embedded executable content ("applets")
and the World Wide Web. Shortly thereafter, I heard from Michael Doyle,
chairman of Eolas Technologies and co-inventor of the patented technique.
Michael set me straight on the licensing terms of patent, pointing out that
Eolas, which holds exclusive rights to the patent, is not asking programmers
to pay royalties for developing software that runs applets. Instead, Eolas is
only requiring that developers adhere to a standard API for Web development.
Michael gladly put his comments in writing, which we published in the January
1996 "Letters" column. For this issue, Michael--along with Eolas cofounders
Cheong Ang and David Martin--wrote our lead article, which delves into the
history and technical underpinnings of their work. 
By using the patent as a carrot instead of a stick, Eolas has taken a step in
helping programmers get on with the job of developing next-generation,
interactive Web applications. Clearly, supporting a single standard API is
better than tinkering with a dozen or so competing ones. (This problem of
dealing with competing APIs is partly behind the current campaign to eradicate
those annoying "enhanced for Netscape" tags.) The bottom line is that both
developers and users want a standard. 
This isn't to say that I've changed my mind about software patents. I've yet
to see how they've helped the software industry move forward. On the surface,
however, the Eolas proposal may be an exception. Of course, there's nothing to
say that Eolas will be successful. In all likelihood, browser vendors will
continue to plod along their proprietary paths, much like operating-system
vendors of a decade ago (remember TRS-DOS?). 
You don't have to agree or disagree with the concept of software patents to
appreciate the spirit of Eolas's proposal. If nothing else, Doyle and crew
should be commended for coming up with a creative approach to a thorny
problem. Doyle's article--and the proposal it makes--is certainly one of the
more controversial pieces we've recently published. I look forward to hearing
from you about both the article and the proposal.
Jonathan Ericksoneditor-in-chief











































LETTERS


Ada 95 Classes


Dear DDJ,
I read with interest the article "Object-Oriented Facilities in Ada 95," by
David L. Moore (DDJ, October 1995). It is good to see the discussion of a fine
programming language.
However, Moore's statement that "Line'Class and Object'Class represent the
same type; there is one class-wide type for any tree of derivations, rather
than one for every subtree" is contrary to fact. There is indeed a class-wide
type for every subtree. If Object and Line are defined as in Example 1, then
you can access the Offset field of a value of type Line'Class. Doing so with a
value of type Object'Class would cause a compilation error, for Object types
do not have an Offset field.
Ray Blaak
blaak@mda.ca


Prime Time Again


Dear DDJ,
First, may I congratulate DDJ on the 20th anniversary of what must be the best
programming magazine on the market.
I particularly enjoyed the September 1995 issue, not only for the interesting
articles (including the one on animation, a subject I will explore in the
not-too-distant future), but also for the gems that appear in the mailbag. The
letter I'm thinking of in particular is that from Dwight Keeve about prime
numbers. He is right to point out that Michael Swaine got his numbering wrong,
but, as Daniel Pfeffer pointed out in his January 1996 letter, Keeve's
alternative is wrong too--57 is not a prime number because it is divisible by
three. 
As a nonmathematician, I don't claim any special expertise in this respect.
However, I have found that the "ABC" rule serves me well in
programming--"Accept nothing, Believe nothing, Check everything."
Consequently, I consulted The Mathematical Experience, by Philip J. Davis and
Reuben Hersh (Houghton Mifflin, 1982). On page 211, the authors set out the
first 2500 prime numbers in a handy table. The extract in Table 1 shows the
sequence for the first 20.
In The Hitchhiker's Guide to the Galaxy, the answer to the great question of
life, the universe, and everything else is said to be 42. This information
probably has about as much usefulness, but I thought it was worth setting the
record straight.
Maurice Arnold
Victoria, Australia


Patented Passwords


Dear DDJ,
I'd like to make two points regarding the "Algorithm Alley" column "Password
Generation by Bloom Filters," by William Stallings (DDJ, August 1994). First,
the title is incorrect. The article describes how to use Bloom filters to
constrain the user's choice of passwords, but not to generate the password.
Secondly, readers should be aware of U.S. patent 5204,966 issued in 1993 to
Digital Equipment Corp., with myself and Jerry Leichter (leichter@lrw .com) as
inventors. DEC also applied for a patent in several other countries.
That patent, "a method for constraining the selection of passwords," covers
systems such as the one described in Mr. Stallings' article. 
David Wittenberg
dkw@cs.brandeis.edu


A Popular Algorithm


Dear DDJ,
The popularity algorithm for color quantization discussed in "The Popularity
Algorithm," by Dean Clark ("Algorithm Alley" DDJ, July 1995) is of historical
interest, but other algorithms, such as the median-cut algorithm (Heckbert,
SIGGRAPH '82) are just as fast, only slightly harder to implement, and produce
better pictures. As I wrote in my 1982 paper, the popularity algorithm does a
good job on many pictures, but performs poorly on pictures with a wide range
of colors or when quantizing to a small number of colors (< 50, say). The
visible flaws might be insignificant on many synthetic images of the type
displayed in Clark's article, but they can be very objectionable on real
images. Also, it is well known that dithering can improve the appearance of
quantized pictures dramatically--a fact Clark didn't mention.
The implementations described by myself and others use hashing techniques,
lookup tables, and other data structures that speed up histogram collection
and nearest-neighbor finding by an order of magnitude (see
http://www.cs.cmu.edu/~ph). My 1982 implementation of the median-cut algorithm
could quantize a 640x480 picture in 60 seconds on a 1-MIPS VAX 780; current
workstations can run the same algorithm in just a few seconds.
Paul Heckbert
Carnegie Mellon University
Pittsburgh, Pennsylvania 
ph@cs.cmu.edu


Majority Rule


Dear DDJ,
What I have to say has nothing to do with algorithms or paradigms, or even
computers. When John Travis, whom I don't know at all, says (as quoted in
Jonathan Erickson's November 1995 "Editorial") "I can't believe that we are
going to let a majority of the people decide what's best for this state," he
is actually a bit right. The majority of a given population rarely has a
chance to objectively focus on a political subject. Nor is it in the layman's
interest. Politicians have an obligation to provide the public with
information allowing it to form an opinion without having to know the deepest
truth. The problem is then that politicians have learned to use this to seduce
the public and form sentimental opinions in their own favor. That is why the
majority can actually become dangerous. A singer once sang: "Are the many the
brightest or are they merely more than the few?" (The translation doesn't do
him justice, I'm afraid.)
Thomas Honore Nielsen 
Esbjerg Handelsskole, Denmark
<thn.cls@pip.dknet.dk>



Online Educational Alternatives


Dear DDJ,
Thanks for mentioning the University of Missouri's Center for Independent
Study in the "Programmer's Bookshelf" (DDJ, September 1994) on nontraditional
education alternatives. 
In case someone asks about our BBS and examinations, some additional
information might be useful. The BBS at the end of our 800 number for
computer-assisted learning (CALS) is designed to respond only to students
sending in a CALS lesson. The program will accept their answers to the
questions given at the end of a lesson, and send back the professor's
responses to each incorrect answer. 
Students wishing to send in lessons for courses which are not CALS (for
example, writing courses) have other alternatives, including fax and e-mail.
E-mail over the Internet will handle courses which require only the basic
ASCII character set. Many courses can be handled this way, but students should
inquire about electronic options available for particular courses. The e-mail
address for those questions is independ@ext.missouri.edu. The URL is
http://indepstudy .ext.missouri.edu/. 
Exams can be requested by e-mail, but must be sent via U.S. Postal Service to
an approved examination site where they can be proctored and returned to the
Center for grading.
Incidentally, the student in Germany finished his course in technical writing
in good time: He used ftp to send fully formatted papers to his professor via
the Center, and comments were returned to him the same way or by e-mail. He
was very pleased at the speed it permitted and the savings in postage and
mailing expense. We have also had several professors who were writing courses
while they were in Europe sending their manuscripts back and forth via the
Internet. 
Dale Huffington
University of Missouri 
Columbia, Missouri 
http://www.missouri.edu/~dldcwww/


Generating Sequential Keys 


Dear DDJ,
The December 1995 "Algorithm Alley" column by Gene Callahan entitled
"Generating Sequential Keys in an Arbitrary Radix" reminded me of a problem
posed in an introductory Pascal course several years ago: The state needs to
issue license plates; provide a function which, given a license number "AAZ
999" (for a complex example), returns the next in sequence, "ABA 000." A
student in the course worked for a client of mine; I provided a solution (with
the explicit design goal that the program could in no way be regarded as the
output of a first-year student) that used the (VAX) Pascal schema construct to
make a numeric string. The arguments to the schema definition were a set of
legal characters and a string length.
The increment() operation on this schema returned not only the incremented
string, but a carry flag to indicate overflow (as when "999" was wrapped to
"000"). Thus, the LicensePlate type contained two NumericStrings of length 3,
one based on the set ('A'..'Z'), the other on the set ('0'..'9'). The
nextPlate() operation would increment the numeric string, and then--on a carry
out of this operation--increment the alphabetic string.
I'm sure the program met its design goal: The student didn't dare turn it in
as her own work. I wonder now how I would solve the problem, had it been
required for a C or C++ class. The capabilities of the language shape the
design, certainly: I'd probably have several hundred lines of C++ code
implementing OrderedSet<char>, NumericString<OrderedSet<char>, and so forth.
Ron Lusk 
Blue Bell, Pennsylvania 
ron.lusk@phh.mts.dec.com 


Never Mind the Bullocks...


Dear DDJ,
Regrettably, bassists may die, but their estates live on. By simply naming his
music program "Mingus'' or "Pastorius,'' Al Stevens is getting no guarantee of
safety. (See "C Programming,'' DDJ, September 1995.) Fortunately, there is one
name meeting your criteria which should be free of any such concerns, with the
added bonus of suitably reflecting the likely quality of the output during the
inevitable debugging stages--Vicious.
Mike Ryan
Virtuoso Software
http://world.std.com/~deeryan/virtuoso.html


Mac Can Do


Dear DDJ,
I enjoyed the article "Networking Objects with CORBA," by Mark Betz (DDJ,
November 1995). However, Mark's statement about major operating-system
vendors' support of TCP/IP was incomplete. He did not mention that the
Macintosh operating system has had a solid TCP/IP stack for years and that it
has been included with OS releases since last year (System 7.5).
Justin Souter
Justin_Souter@artlogic.com 
Mark replies: Thanks Justin, that's a very good point. I admit that I simply
failed to consider the Macintosh when I made the statement. We do have Macs
where I work, and they are on our TCP/IP domain along with everything else.


Go Figure


Dear DDJ,
Thanks for publishing the article "Data Models, CASE Tools, and Client/Server
Development,'' by Tim Wittenburg (DDJ, November 1995). However, it appears
that the illustrations for Figure 1 and Figure 2 on page 92 were reversed. 
Jim Rommens
tktfa31%ezmail@pyibm1.cc.bellcore.com
Table 1: Prime time again.
Sequence Number Prime Number
 1 2
 2 3
 3 5

 4 7
 5 11
 6 13
 7 17
 8 19
 9 23
 10 29
 11 31
 12 37
 13 41
 14 43
 15 47
 16 53
 17 59
 18 61
 19 67
 20 71
Example 1: Ada 95 classes.
type Object is tagged record
 Pos : Position;
end record;
type Line is new Object with
 record
 Offset : Size;
end record;






































Proposing a Standard Web API


Short circuiting the API wars




Michael Doyle, Cheong Ang, and David Martin


The authors are cofounders of Eolas Technologies Inc. and can be contacted at
http://www.eolas.com.


The World Wide Web has matured from a relatively limited system for passive
viewing of hypermedia-based documents into a robust framework for interactive
application development and delivery. Much of this progress is due to the
development of embedded executable content, also known as "inline Web
applets," which allow Web pages to become full-blown, compound-document-based
application environments. The first Web-based applets resulted from research
begun in the late 1980s to find a low-cost way to provide widespread access
for scientists and educators to remote, supercomputer-based visualization
systems.


The Visible Human Project


In the late 1980s, the National Library of Medicine began a project to create
a "standard" database of human anatomy. This "Visible Human Project" was to
comprise over 30 GB of volume data on both male and female adult human
anatomical structures. It was one of the original Grand Challenge projects in
the federal High-Performance Computing and Communications initiative, the
brainchild of then Senator Al Gore. As a member of the scientific advisory
board for this project, one of us (Michael Doyle) became interested in the
software issues involved in working with such a large database of the most
detailed image information on human anatomical structure yet available. His
group in the Biomedical Visualization Lab (BVL) at the University of Illinois
at Chicago realized at the time that much research would have to be done to
make such a vast resource both functional and accessible to scientists all
around the world.
Until that time, medical visualization systems were designed to work on 3-D
datasets in the 15-30 MB range, as produced by the typical CT or MRI scanner.
High-end graphics workstations had adequate memory capacity and processor
power to allow good interactive visualization and analysis of these routine
datasets. The Visible Human data, however, presented an entirely different set
of problems. To allow widespread access to an interactive visualization system
based upon such a large body of data would require the combined computational
power of several supercomputers, something not normally found in the typical
biomedical scientist's lab budget.
Doyle's BVL group immediately began to work on solving the information-science
problems related to both allowing interactive control of such data and
distributing access to the system to scientists anywhere on the Internet. Our
goal was to provide ubiquitous access to the system, allowing any user
connected to the Internet to effectively use the system from inexpensive
machines, regardless of platform or operating system.


The Promise of the Web


We saw Mosaic for the first time when Larry Smarr, director of the National
Center for Supercomputing Applications, demonstrated it at an NSF site visit
at BVL in early 1993. We became immediately intrigued with the potential for
Mosaic to act as the front end to the online visualization resource we had
been designing. Immediately after Michael Doyle left the University of
Illinois to take the position of director of the academic computing center at
the University of California, San Francisco, we began enhancing Mosaic to
integrate it with our system. We designed and implemented an API for embedded
inline applets that allowed a Web page to act as a "container" document for a
fully interactive remote-visualization application, allowing real-time volume
rendering and analysis of huge collections of 3-D biomedical volume data,
where most of the computation was performed by powerful remote visualization
engines. Using our enhanced version of Mosaic, later dubbed "WebRouser," a
scientist using a low-end workstation could exploit computational power far
beyond anything that could be found in one location. 
This work was shown to several groups in 1993, including many that were later
involved in projects to add APIs and applets to Web browsers at places such as
NCSA, Netscape, and Sun. Realizing our group's work enabled the transformation
of the Web into a robust platform for the development and deployment of any
type of interactive application, in 1994 the University of California filed a
U.S. patent application covering embedded program objects in distributed
hypermedia documents. Eolas Technologies was then founded by the inventors to
promote widespread commercialization and further development of the
technology.


Enhancing the Web


Once the concept of the Web as an environment for interactive applications was
initiated, the question was how to further develop it. Toward the end of 1993,
we discussed the relative merits of building an interpreted language, such as
Basic or Tcl, directly into the browser versus enhancing browsers through a
"plug-in" API. We chose the API approach, believing that the best way to add
language support would be by adding interpreters as external inline plug-in
applets, which we called "Weblet applications." This would enable us to add a
new language or other feature merely by developing a new Weblet application,
without having to reengineer the browser itself. 
Figure 1 is a Weblet-based version of the public-domain RASMOL visualization
program that lets users view, analyze, and visualize a 3-D protein structure
from within the Web page. A single programmer converted the original RASMOL
source code into Weblet form in only ten hours.
Some time later, both limited API support, such as NCSA's CCI, and
embedded-language support, such as Java, began to appear in various Web
browsers; the <EMBED...> tag (which we first implemented in mid-1993) appeared
in beta versions of Netscape's product by summer of 1995. Still, it wasn't
until October of 1995 that the Netscape implementation began to approach the
functionality of <EMBED...> used in WebRouser. The enormous effect of these
developments in accelerating the commercialization of the Internet industry
prompted us to release the first (free-for-noncommercial-use) distribution
version of WebRouser for UNIX platforms in September 1995
(http://www.eolas.com/eolas/webrouse/).


The WebRouser Approach


Our general philosophy with WebRouser was to allow enhancement of the
browser's functionality through object-oriented, modular application
components that conform to a standard API, rather than turning the browser
into a monolithic application with an ever-increasing code base. This
encourages Web developers to take a document-centric approach to application
development. The Web page itself becomes the mechanism for doing work, through
collections of small, efficient Weblet building blocks, rather than the
menagerie of top-heavy applications found on the common desktop PC.
The first release of WebRouser also included other enhancements aimed
primarily at improving the interactivity of Web pages. These included
client-side image maps, a document-driven button bar, and document-driven
modification of the browser's menu tree.
Client-side image maps are supported through the Polymap format. Polymap files
are essentially GIF files with polygon and URL information stored in the
image's comment fields. To prevent complex polygon information from bloating
the file, all of the comment fields are compressed and decompressed using the
GIF LZW algorithm. Polymap files require no special treatment of the HTML
code. WebRouser autodetects the presence of Polymap data when it reads inline
GIFs. If a server-side (ISMAP) image map points to a Polymap GIF, then
WebRouser will ignore the ISMAP data and give the Polymap data priority.
Hotspots are decoded in real time and highlighted as the mouse moves over the
image, and the associated URL is displayed at the bottom of the screen,
providing users the same style of interactivity that hotwords have in HTML
text. The Polymap format specification is open and freely available for use.
You can find the spec at http://www.eolas.com/papers/Papers/Polymap/.
The <LINK...> and <GROUP...> tags allow Web pages to dynamically customize
elements of the browser's GUI. The LINK tag allows the creation of a
document-driven button bar implemented by placing tags in the document header,
with the syntax <LINK ROLE="button label" HREF="http://...">. Several of these
tags in sequence result in buttons below the URL window, similar to
Navigator's What's New or What's Cool buttons, but they are dynamically
defined by the page currently being viewed. Similarly, the GROUP tag allows
the Web page to modify the browser's GUI; however, this tag differs by
defining a hierarchical menu that reflects an entire tree of Web pages. In
Example 1, a typical GROUP menu trigger, the text string "Click here to view
the WebRouser slide show" appears as a conventional anchor on the Web page,
but selecting it brings up the "slide_1.html" and activates the GROUPS menu
option on WebRouser's menu bar. Slide Show is the first menu option, with a
submenu whose options are Slide 1, Slide 2, and Slide 3. This allows the user
to easily navigate through, for example, the "year, issue, article" hierarchy
of online magazines. 


The Web API


Of course, the key feature of WebRouser is the implementation of the
<EMBED...> tag, through which inline plug-in Weblet applications are supported
in Web pages. X Window applications that conform to the Eolas distributed
hypermedia object embedding (DHOE) protocol can run--inline and fully
interactive--within Web pages in the WebRouser window. WebRouser also supports
the NCSA common client interface (CCI), which allows the Weblet to "drive" the
browser application. DHOE and CCI collectively make up the Eolas Web API
(WAPI) as supported in WebRouser.
WAPI is minimalist, combining the functionality of DHOE and CCI to exploit
both the efficiency of X-events for communication of interaction events and
graphic data and the flexibility of socket-based messaging for browser remote
control and HTML rendering of Weblet-generated data. We are currently working
on a cross-platform API, in the form of an OpenGL-style common-function
library. The current minimalist WAPI specification will allow us greater
flexibility in creating a cross-platform API, while maintaining compatibility
with Weblets developed under the UNIX WAPI specification.
Eolas' primary objective with respect to the pending Web-applet patent is to
facilitate the adoption of a standard API for interactive, Web-based
application development, and then to develop innovative Weblet-based
applications for the growing Internet software market. For an example of such
a Weblet application, see the accompanying text box entitled "WebWish: Our
Wish is Your Command." We intend to short circuit the API wars brewing between
the major Web-browser competitors. In addition to creating a universal
standard API, we are also instituting a mechanism for ensuring continued
evolution of the WAPI spec on a regular timetable. Royalty-free licenses for
browser-side implementation of Web applets under the pending patent have been
offered to the major browser companies, and are in various degrees of
negotiation. The primary condition of these licenses is that each licensee
must conform to the WAPI protocol, and no other applet-integration protocol. A
consortium of Eolas licensees is being formed to set the continuing WAPI
specification and update it at regular intervals. The widespread acceptance of
the developing WAPI standard will allow application developers to concentrate
on the functionality of their applets without worrying which Web browser their
customers will use.



Creating a WebRouser Weblet


WebRouser communicates with Weblet applications through a set of messages
called the DHOE protocol. DHOE messages are relatively short character
strings, which allow convenient, efficient use of existing
interprocess-communications mechanisms on various platforms. We have
implemented DHOE systems on several X Window platforms, including IRIX, SunOS,
Solaris, OS/F 1, Sequent, and Linux. Implementations for both Microsoft
Windows and Macintosh are planned for release by the end of the first quarter
of 1996.
Listing One is a skeleton program for Weblet-based applications that can work
with WebRouser. The current DHOE protocol defines a set of messages that
synchronize the states on the DHOE clients and DHOE servers. The first four
messages are used by the server to set up the DHOE system at startup,
refresh/resize the client, and terminate the system on exit. The rest of the
messages are sent by the browser client to the data server. They include
messages about the client drawing-area visibility, and mouse and keyboard
events.
Programming with DHOE involves initializing DHOE by installing a
message-handling function, registering the DHOE client with the DHOE server,
and registering various callbacks with their corresponding messages. The DHOE
client and server may, at any time after client/server registration, send
messages to each other. The messages (see Table 1) are character strings, and
may be followed by different types of data. DHOE also supports buffer sharing
(that is, bitmaps and pixmaps) between DHOE clients and servers.
Adding the DHOE mechanism into an existing data handler creates a DHOE server.
The DHOE library kit consists of protocol_lib.h (the declaration file) and
protocol_lib.c (the implementation file). To follow the Xt programming
conventions, the DHOE strings are #defined with their Xt equivalents
(DHOEkeyUp is mapped to XtNkeyUp, and so on). Messages from the DHOE server to
the DHOE client (for example, external app-->hypermedia browser) are:
XtNrefreshNotify, server updating.
XtNpanelStartNotify, server ready.
XtNpanelExitNotify, server exiting.
Messages from the DHOE client to the DHOE server (for example, hypermedia
browser-->external app) are:
XtNmapNotify, DHOE area shown.
XtNunmapNotify, DHOE area hidden.
XtNexitNotify, DHOE area destroyed.
XtNbuttonDown, DHOE area button down. 
XtNbuttonUp, DHOE area button up.
XtNbuttonMove, DHOE area button move. 
XtNkeyDown, DHOE area key down.
XtNkeyUp, DHOE area key up. 
You can name these messages differently as long as the names are merely
aliases of the original DHOE strings. These messages are defined in
protocol_lib.h, which must be included in your program.
The following DHOE fundamental functions are provided in protocol_lib.c:
void handle_client_msg(Widget w, caddr_t client_data, XEvent *event), a
function called back by XtAddEventHandler when it sees a message from the DHOE
client (the hypermedia browser). To register this function with Xt, your
program (DHOE server) should call XtAddEventHandler(Widget app_shell,
NoEventMask, True, handle_client_msg, 102);. Here, handle_client_msg will be
called with parameters w=app_shell, client_data=102, an event pointing to an
X-event structure generated by Xt when it sees the message. The app_shell
variable is usually the application shell returned by XtInitialize,
XtAppInitialize, or XtVaAppInitialize.
void register_client(Widget w, Display *remote_display);, which registers your
program with the DHOE client.
void register_client_msg_callback(char *msg, void (*function_ptr)());, which
registers a function to be called back when Xt sees a string that matches msg.
This function may appear anywhere in your program. You do not need to handle
the XtNmapNotify/XtunmapNotify pair because DHOE servers deiconify/iconify
when they receive these messages. You must specify a "quit" function to shut
down your application gracefully on XtNexitNotify. Button- and key-message
handling are optional. To obtain mouse coordinates, call get_mouse(int *x, int
*y) for button-handling functions and get_keysym(KeySym *keysym) for
key-handling functions. Keysym is defined by X11 (in keysymdef.h) for
cross-platform compatibility.
void send_client_msg(char *msg, Display *remote_display, Window
remote_window);, which sends a message with the value msg to the DHOE client
at a display=remote_display and has an X window ID of remote_window. The
remote_display and remote_window must be provided. This function may appear
anywhere in the program after register_client.


A Weblet CAD-File Viewer


WT is an applet that allows interactive rotation and zooming of a 3-D CAD file
stored in NASA's neutral file format (NFF). The source code for the sample
Weblet application is available electronically (see "Availability," page 3)
and at http://www .eolas.com/eolas/webrouse/wtsrc.tar.Z. What follows is a
brief walk-through of the weblet-enhancing sections of the code (illustrated
in the code listing just mentioned as a "simplified sample program outline").
1. The outline starts with a typedef and some global declarations. The new
type, ApplicationData, defines a structure common to all Xt Weblets. Together
with the myResources and myOptions static variables, myAppData (which is of
type ApplicationData) is used with XtGetApplicationResources in main() to
extract the command-line arguments flagged with win, pixmap, pixmap_width,
pixmap_height, and datafile. This is how Xt extracts command-line arguments
and is unnecessary if the program has alternatives to decode command-line
arguments. The aforementioned global variables and XtGetApplicationResources
nicely store the information in a line such as wt -win 1234 -pixmap 5678
-pixmap_width 400 - pixmap_height 300 -datafile fname into myAppData.
2. In main(), app_shell is first initialized the Xt way by using XtInitialize,
which opens a connection to the X server and creates a top-level widget.
XtGetApplicationResources gets the application resources as in step 1. The
next section conveniently uses the myAppData.win variable to find out if the
Weblet should run as a DHOE server or a stand-alone program. For a DHOE
server, the program adds the handle_client_msg function from the DHOE
implementation, protocol_lib.c, as the handler of the X client message event.
The subsequent lines call three more DHOE functions: register_client, to
initiate a handshake with the DHOE client; register_client_msg_callback, to
register myQuit() as the callback function of the message XtNexitNotify; and
send_client_msg, to send a XtNpanelStartNotify message, telling the DHOE
client that the server is ready. The program then enters the conventional
XtMainLoop(). 
3. Two more functions must be modified. The drawing routine (myDraw) needs to
copy the drawn picture (myPixmap in this case) onto myAppData.pixmap, the
client's pixmap. The function then should send an XtNrefreshNotify message to
the client, informing it of the change. The myQuit() function registered in
main() needs to send an XtNpanelExitNotify message to the client, telling the
client that the server is terminated.
This Weblet can be tested by putting it in your path and pointing your copy of
WebRouser to http://www.eolas.com/eolas/ webrouse/office.htm.


The Eolas Web OS


In addition to the WebWish applet described in the text box, a Java
interpreter Weblet application is planned for release by the end of March
1996. Java is a compiled language that produces binaries for a "virtual
machine." The binaries are downloaded to the client and run on virtual-machine
emulators that run on Macintosh, Windows, and UNIX platforms. Java
applications tend to be smaller and more efficient than WebWish interpreted
code, but they are far more difficult to develop. Eolas is developing a
virtual operating system, the Web OS (planned for release late in 1996) that
will allow far more robust, compact, and efficient compiled applets to be
developed than is possible with Java. The Web OS is key to Eolas' long-term
goal to transform the Web into a robust, document-centric,
distributed-component application environment. It is a real-time, preemptive
multitasking, multithreaded, object-oriented operating system that will run
efficiently on low-end platforms, even on 80286-based systems and handheld
PDAs.
The Web OS can run within Windows, Macintosh, and UNIX environments, or in
stand-alone mode on machines with no pre-installed operating system. It
supports dynamic memory management and linked libraries, and is both graphical
and object oriented at the OS level. The OS kernel includes fully defined
object classes, inheritance, and direct messaging. The OS includes several
building-block objects that allow sophisticated applications--WYSIWYG word
processors, spreadsheets, databases, e-mail systems, and the like--to be
developed with a minimum of code. These applications are created primarily by
subclassing and combining various Web OS component objects. Since new
applications are created by defining differences and additions to the
constituent objects, this results in tiny, robust, efficient binaries that
optimize both bandwidth usage and server storage requirements. This platform
is so efficient that a complete WYSIWYG word processor can be created in less
than 5K of compiled code. Applications developed for the Web OS are likely to
be smaller than most of the inline GIF images found on average Web pages
today.
The operating system employs a single imaging model for screen, printer, fax,
and other output devices; an installable file system, for both local and
remote file access; direct TCP/IP and socket support; distributed objects; and
security through public-key encryption and "ticket-based" authentication.
As the Internet pervades more of our work environments, the Web OS will allow
the Web to become the preferred environment for new and innovative
productivity, communications, and entertainment applications for all hardware
platforms. The concept of a machine-specific operating system will become
irrelevant, since any application will be available to the average user,
regardless of hardware platform. Much of the computational load for
applications will be pushed off to remotely networked computational engines,
allowing low-cost Web terminals to act as ubiquitous doorways to potentially
unlimited computational resources. The Web will be your operating system and
the Internet will be your computer.
WebWish: Our Wish is Your Command
Sun's announcement of the adaptation of the Java language to the Web in 1995
was received enthusiastically by the entire Internet community as a welcome
means for increasing the interactivity of Web-based content. Despite much of
the publicity surrounding Java, which described it as an "interpreted"
language, Java code must be compiled to a "virtual machine," which is then
emulated on various platforms. A Web browser that supports the Java emulator
is not enough to develop Java-based applications--the applet developer must
purchase a compiler from Sun or its licensees at considerable cost.
Fully interpreted languages like Tcl/Tk or Basic are extremely useful, partly
because they don't require a compiler for application development, just the
language interpreter and any ASCII text editor. In choosing a programming
language to adapt to the Web API, we decided early on that a fully interpreted
programming language would be vital to quick, widespread Weblet
implementation. We chose Tcl/Tk because of its robust capabilities and
widespread use.
By the time you read this article, Eolas' WebWish Tcl/Tk interpreter should be
available for both WebRouser under UNIX and Netscape Navigator 2.0 on Windows
and Macintosh (see http://www.eolas.com/eolas/webrouse/tcl.htm). It supports
Tcl 7.5 and Tk 4.1, as well as the Tcl-DP and EXPECT extensions. A new
security feature has been added that exploits PGP-style digital signatures in
order to authenticate scripts from trusted sources and to prevent unwanted
execution of scripts from untrusted sources. This Weblet application turns
WebRouser and Navigator into complete application-development environments,
without the need for expensive compilers. All that is needed to develop a
WebWish-based application is WebWish, a WAPI-compliant Web browser, and a good
text editor. Developers can draw upon the vast existing resources of freely
downloadable Tcl/Tk program source code, and the expertise of thousands of
experienced programmers.
WebWish provides an easy-to-use rapid prototyping environment, with built-in
support for socket-based communications, remote procedure calls (RPCs), and
the ability to "remote control" existing text-based server systems without
reengineering the server. WebWish can run either as a Weblet in a Web page, or
in stand-alone mode on either the client or a server machine. WebWish running
in a Web page can communicate directly with other copies of WebWish running on
remote servers, either through sockets or RPCs. This allows WebWish to act as
"middleware" for the Web, allowing Web-based interfaces to create state-aware
graphical front ends to existing text-based legacy systems, without changing
the operation of the legacy-server application. 
Last November, Chicago's Rush Presbyterian St. Luke's Medical Center surgical
department created both client and server WebWish applets for just such a
purpose. The applets allowed physicians using WebRouser to interactively query
and browse Rush's (Informix) SQL-based Surgical Information System, consisting
of medical records on over 1.5 million patients. The entire project took one
programmer exactly 12 hours from start to finish. Try that with Java!
--M.D., C.A., and D.M.
Figure 1: Typical Weblet application.
Example 1: Typical GROUP menu.
<GROUP ROLE="Slide Show">
 <LINK ROLE="Slide 1" HREF="slide_1.html">
 <LINK ROLE="Slide 2" HREF="slide_2.html">
 <LINK ROLE="Slide 3" HREF="slide_3.html">
 Click here to view the WebRouser slide show </GROUP>
Table 1: DHOE messages.

Message Description
DHOEserverUpdate Tells a client to update data.
DHOEserverReady Tells a client the server is ready.
DHOEserverExit Tells a client the server is exiting.
DHOEserverConfigureWin Tells a client to resize/reposition
 the DHOE window.
DHOEclientAreaShown Tells the server the DHOE area is exposed.
DHOEclientAreaHidden Tells the server the DHOE area is being hidden.
DHOEclientAreaDestroy Tells the server the DHOE area is being destroyed.
DHOEbuttonDown Sends mouse-pointer coordinates to the
 server on button down.
DHOEbuttonUp Sends mouse-pointer coordinates to the
 server on button up.
DHOEbuttonMove Sends mouse-pointer coordinates to the
 server on button move.
DHOEkeyDown Sends the corresponding keysym to the
 server on key down.
DHOEkeyUp Sends the corresponding keysym to the
 server on key up.

Listing One
#include "protocol_lib.h"
 ...
/* X-way to define resources and parse the cmdline args */
/* WebRouser 2.6-b2 gives the embedded window information through these args
*/
typedef struct{
 int win;
 int pixmap;
 int pixmap_width;
 int pixmap_height;
 char *datafile;
} ApplicationData, *ApplicationDataPtr;
static XtResource myResources[] = {
 {"win", "Win", XtRInt, sizeof(int),
 XtOffset(ApplicationDataPtr, win), XtRImmediate, 0},
 {"pixmap", "Pixmap", XtRInt, sizeof(int),
 XtOffset(ApplicationDataPtr, pixmap), XtRImmediate, 0},
 {"pixmap_width", "Pixmap_width", XtRInt, sizeof(int),
 XtOffset(ApplicationDataPtr, pixmap_width), XtRImmediate, 400},
 {"pixmap_height", "Pixmap_height", XtRInt, sizeof(int),
 XtOffset(ApplicationDataPtr, pixmap_height), XtRImmediate, 400},
 {"datafile", "Datafile", XtRString, sizeof(char*),
 XtOffset(ApplicationDataPtr, datafile), XtRImmediate, NULL},
};
static XrmOptionDescRec myOptions[] = {
 {"-win", "*win", XrmoptionSepArg, 0},
 {"-pixmap", "*pixmap", XrmoptionSepArg, 0},
 {"-pixmap_width", "*pixmap_width", XrmoptionSepArg, 0},
 {"-pixmap_height", "*pixmap_height", XrmoptionSepArg, 0},
 {"-datafile", "*datafile", XrmoptionSepArg, NULL},
};
ApplicationData myAppData;
void myDraw()
{
 /* do your drawing... */
 ...
 /* if you draw into your own drawables (myPixmap in this case) */
 if (myAppData.win) {
 /* copy from myPixmap to the "shared" pixmap */

 XCopyArea(display, myPixmap, myAppData.pixmap, myGC, 0, 0, WIN_WIDTH,
 WIN_HEIGHT, 0, 0);
 /* tell WebRouser to update the drawing window */
 send_client_msg(XtNrefreshNotify, display, myAppData.win);
 }
}
void myQuit()
{
 /* tell WebRouser you are exiting... */
 if (myAppData.win)
 send_client_msg(XtNpanelExitNotify, display, myAppData.win);
 /* Motif way of exiting */
 XtCloseDisplay(XtDisplay(any widget));
 exit(1);
}
 ...
main()
{
 Widget app_shell;
 ...
 /* XtInitialize does XOpenDisplay, as well as creates a toplevel widget */
 app_shell = XtInitialize("wt", "Wt", myOptions, XtNumber(myOptions), 
 &argc, argv);
 ...
 /* This func fill up myAppData with user specified values/default values */
 /* We get the embedded window's info this way */
 XtGetApplicationResources(app_shell, &myAppData, myResources,
 XtNumber(myResources), NULL, 0);
 ...
 /* if we have an external window to display the image... */
 if (myAppData.win) {
 XtAddEventHandler(app_shell,NoEventMask,True,handle_client_msg,NULL);
 register_client(app_shell, display);
 /* register the func to be called when WebRouser exits */
 register_client_msg_callback(XtNexitNotify, myQuit);
 /* tell WebRouser you have started fine */
 send_client_msg(XtNpanelStartNotify, display, myAppData.win);
 }
 ...
 XtMainLoop(); /* Motif's event loop */
}
/* End of program listing */





















Improving Kermit Performance


A windowing strategy makes all the difference




Tim Kientzle


Tim, a technical editor with DDJ, is the author of The Working Programmer's
Guide to Serial Protocols (Coriolis, 1995) and Internet File Formats
(Coriolis, 1995). He can be contacted at kientzle@ddj.com.


Many people associate Kermit with ponderously slow file transfers.
Conservative encoding, small packets, and "stop-and-go" half-duplex transfers
make speed a common complaint about the original 1981 protocol. (For a
discussion of the original Kermit protocol, see "Kermit Meets Modula-2," by
Brian Anderson, DDJ, May 1989.)
A lot can change in 15 years, however. Developers at The Source Telecomputing
began to experiment with "windowing Kermit," also known as "SuperKermit," in
1985. Windowing allows the Kermit sender to send multiple packets before
expecting a reply. The combination of windowing, longer packets, and relaxed
encoding constraints makes today's Kermit protocol one of the most flexible
and best-performing file-transfer protocols available.
Kermit's windowing approach is clearly faster than protocols such as XModem
and YModem, which must wait after each packet. What many people don't realize
is that under less-than-ideal conditions, Kermit's windowing approach is
significantly faster than ZModem, a protocol with a well-deserved reputation
for fast transfers over good-quality connections. The difference lies in how
these two protocols respond to errors.
In this article, I'll compare the error-handling strategies of a variety of
popular protocols. I'll then give a brief overview of Kermit and describe some
simple heuristics I've developed to improve the performance of Kermit's
windowing strategy.


Error-Handling Strategies


Most serial and networking protocols use a general error-correction strategy
called "retransmission." The sender computes an error-detecting signature
value, such as a checksum or cyclic redundancy check (CRC), for each block of
data and sends that signature value with the data. The receiver recomputes the
signature and compares it with the value received from the sender. If the two
values don't match, the receiver asks the sender to retransmit the block.
XModem and YModem use this strategy in a simple way. The sender sends a single
packet at a time and waits for a response. This works well if there is no
delay; that is, if each packet arrives immediately at its destination. In that
case, the sender sees an instantaneous response to each packet, allowing it to
maintain an almost continuous flow of data, even when there are errors.
In practice, most connections have delays. Each sent packet must navigate
through system I/O buffers, modems, networks, and other obstacles before being
received. XModem can be slow when there are delays; the sender must wait after
each packet for a complete round-trip from the sender to the receiver and
back. When there are both delays and errors, there will be several round-trip
delays for some packets.
Because of these delays, many protocols send several packets before expecting
a response. This allows the sender to continue sending while awaiting the
receiver's response, with corresponding speed gains. However, the price of
these performance gains is more-complex error handling.
Protocols such as ZModem and UUCP's G protocol allow multiple outstanding
packets, but attempt to reduce the complexity by only transferring packets in
order. At any point in time, the receiver only accepts the next consecutive
packet. If the receiver sees the wrong packet (for example, if the packet the
receiver expected was lost or damaged and it's now receiving a later packet),
it will ignore it and send an error notification to the sender. This
simplifies both the receiver and sender. However, when an error occurs, the
two sides must wait for a complete round-trip, as the receiver's error message
travels back to the sender and the sender resends the correct packet.
Intervening packets are simply discarded. As a result, much of the speed
advantage of these protocols is lost when errors occur.
The most complex approach is "packet reassembly," used by TCP (Transmission
Control Protocol) and Kermit. In this approach, the receiver stores each
correct packet, even if it arrives out of order. The name "packet reassembly"
comes from the process of reassembling the packets into the correct order
before passing the received data to its destination. If a packet is damaged,
only that single packet will be resent; in ZModem, a number of subsequent
packets may also have to be repeated. When properly implemented, a protocol
using packet reassembly need never wait for a full round-trip, even when there
are errors.


Kermit Basics


Kermit was developed at Columbia University in 1981 by Frank da Cruz and Bill
Catchings to provide a way of transferring data between a variety of different
computers. Since then, Kermit has been extended to support packet reassembly
("windowing"), a variety of file attributes, crash recovery, character-set
translation, and other features. Modern Kermit is efficient and flexible,
providing fast transfers under less-than-ideal conditions.
In a Kermit transfer, the sender and receiver exchange packets of data. Figure
1 is the original Kermit packet format. An important design feature of Kermit
is that all of the packet overhead--including the length, sequence number, and
error check--is in ASCII form. This allows Kermit, with the optional
eighth-bit encoding, to function in situations where only the printable ASCII
characters can be safely transferred. Two additional formats--"long" and
"extra-long"--support longer packets and a separate error check for the
header; see Figure 2.
Generally, the Kermit sender sends a packet, possibly containing some data,
and the receiver responds with an Acknowledge packet whose data field contains
the response. For example, when a file transfer begins, the sender sends a
Send-Init packet with an encoded string specifying the sender's capabilities;
the receiver's acknowledgment specifies the receiver's capabilities. A packet
and its response have the same sequence number. You can think of a Negative
Acknowledge packet as a request by the receiver for a particular packet; an
idle receiver waiting to see a Send-Init will periodically emit Negative
Acknowledgments with sequence number zero. I'll use the term "exchange" to
refer to a single packet sent by the sender and the receiver's corresponding
Acknowledge packet.
A complete transaction starts with a Send-Init exchange to establish the
common capabilities of the sender and receiver. Each file transferred requires
a File exchange to establish the filename, the transfer of the actual file
data (which I'll explain later) and an End-of-File exchange. After all files
are transferred, a Break exchange terminates the transaction.
Kermit's original strategy for transferring file data used this exchange
mechanism in a simple fashion. The sender would send a Data packet containing
some file data and wait for the corresponding Acknowledge packet. The receiver
would respond to the correct packet with an Acknowledge; a timeout or an
out-of-sequence packet would prompt a Negative Acknowledge requesting the
correct packet.
If you are familiar with XModem, Kermit's original transfer strategy is
similar, but with one important difference. Every response from the receiver
specifies the particular packet being acknowledged or requested. This feature,
lacking in XModem, allowed Kermit to easily add support for full-duplex packet
reassembly.


Kermit's Packet-Reassembly Strategy 


Modern Kermit retains the original strategy for everything except the actual
data transfer. The initial handshake, start-of-file, and end-of-file
negotiations would be more complex if the sender were allowed to send multiple
packets. While transferring file data, however, Kermit allows the sender to
send multiple packets before receiving an acknowledgment. The range of packets
that a Kermit sender or receiver is still actively managing is called the
"window."
Like most protocols, Kermit limits the size of the window. The window size is
negotiated by the sender and receiver at the beginning of the transfer. If the
negotiated window size is 1, the sender will wait after each packet for a
response. This is completely compatible with the original protocol. If the
negotiated window size is 20, then the sender can send a string of 20 packets
before requiring an acknowledgment for the first one. 
The window size is not a limit on the number of packets that have not been
acknowledged. If the window size is 20, the sender cannot send packet number
207 until it receives an acknowledgment for packet number 187, even if every
intervening packet has been acknowledged. In addition, the window size cannot
exceed 32 packets. Both of these restrictions are necessary to avoid
ambiguity. When the receiver sees a packet from the sender, it must determine
the full packet sequence number from the six bits stored in the packet header.
The only additional information the receiver has for determining the full
packet-sequence number is the window size; the full packet number of the
packet just received cannot differ from the packet expected by the receiver by
more than the window size.


Window Management


In my Kermit implementation, the details of packet reassembly are handled by
the "reliability" layer. Lower layers handle the transmission and reception of
single packets. Higher layers handle the transfer of entire files. The
reliability layer ensures that each packet arrives at its destination; this is
where errors are handled.
For simpler protocols such as XModem and YModem, the receiver's reliability
functions wait for the correct packet, sending negative acknowledgments until
the expected packet is successfully received. The sender's reliability layer
repeatedly sends a packet until it is acknowledged.
For Kermit, this simple picture isn't sufficient. To the higher layers, the
reliability layer looks the same as the simple case previously described. The
reliability layer functions are still called once to send or receive each
packet. Internally, however, there are significant differences. The
reliability layer must cache data so it can juggle out-of-order packets.
Under good conditions, the sender's reliability function can always return
immediately. Every time the function is called, it has already received an
acknowledgment for the first packet in the window, so the function can return
as soon as it sends the new packet, stores the packet in the cache, and
processes any pending packets.

Under less ideal conditions, the sender must exercise some care to keep the
transfer moving smoothly. For example, consider the situation in Figure 3.
Each D is a Data packet sent; the Ys are the corresponding Acknowledge
packets. As you can see, packet 32 was just sent; the most recent Acknowledge
was for packet 25. However, no Acknowledge has been received for packet 13. If
the window size is 20, then no further packets can be sent until the
acknowledgment for packet 13 is received; the window is said to be "blocked."
In my implementation, the sender's reliability function will not return until
the window becomes unblocked. 
The challenge for the sender is to keep the window from blocking. It can do
this by correctly determining when to resend a packet. In Figure 3, if the
sender had resent packet 13 earlier, it might have received an acknowledgment
by now and avoided having the window block.
Just keeping the window from blocking isn't enough, however. One naive
strategy is to resend a packet whenever a later packet is acknowledged. With
that strategy, the sender would have sent a total of 13 copies of packet 13 by
the time the situation in Figure 3 arose, and would likely have received a
response from one of those repeated packets in time to avoid having the window
block. But that approach would slow the rest of the transfer; time spent
needlessly repeating packet 13 could have been spent sending additional data.
The trick is to repeat packets soon enough that a response can be obtained
before the window blocks, but infrequently enough to avoid needless
duplication.


Keeping the Books


As you might expect, the way to break this apparent impasse is to track the
correct data about each packet. There are two basic sources of information:
sequence and time. 
Unlike some networking protocols, Kermit operates over serial connections in
which packet order is preserved. If you get an acknowledgment for packet 14
without seeing one for packet 13, then you can reasonably assume that packet
13 should be repeated; see Figure 4.
Using packet sequence to detect lost packets does not mean using the
packet-sequence numbers. In Figure 4, packets 21 and 22 were just sent; the
next packets sent will be 13 and 23. If at some point you receive an
acknowledgment for packet 13, you can use that to check if packets 21 or 22
should be repeated. You can track the actual packet sequence by storing, for
each packet sent, the number of the previously sent packet. That way, each
time you receive an acknowledgment, you can chain back to check the previously
sent packets.
This chaining must be handled carefully to avoid degenerate situations. For
example, if a packet is sent twice in a row, you don't want it to indicate
itself as the previously sent packet. I maintain this list as a doubly linked
list and remove each sent packet from the list and reinsert it at the end.
Example 1 is an outline of the code that's executed every time a packet is
sent.
Of course, even with this sequencing heuristic, the window will sometimes
block. For example, at the end of the file-data transfer, the window size is
reduced to one, which will almost always result in a blocked window as the
final packets of file data are handled.
As before, a blocked window is no reason to needlessly repeat the first packet
in the window. Instead, repeat a packet only when you're reasonably certain
that enough time has elapsed for a response to have been received. Most
protocol implementations rely on fixed or user-specified timeouts. My Kermit
implementation instead measures the round-trip delay and adjusts the timeout
based on the observed behavior. This only requires timestamping each sent
packet and measuring the delay when an acknowledgment is received. To avoid
ambiguity, I only use round-trip measurements for packets that were sent once.
Knowing the average round-trip delay isn't quite sufficient. Some connections
have a wide variation in the delay. One packet might require only four seconds
for a round-trip; the next might require ten. This kind of variation is common
on some packet-switched networks and when transferring files with heavily
loaded hosts. Again, rather than hard-wiring assumptions about the connection,
I measure the standard deviation of the round-trip delay. Example 2 is the
method I use for tracking the average round-trip time and the standard
deviation. To avoid overflow, I use the definition of standard deviation
(root-mean-square of the differences from the mean) rather than one of the
many standard formulas. My time routines return time values in milliseconds
and the standard formulas require keeping the sum of the squares of the
delays. These squares will be quite large, and overflowing a 32-bit integer is
a real possibility. By keeping a decaying average of the squares of the
differences, I can more easily bound the intermediate values in my
standard-deviation computation. To compute the square root, I use one
iteration of Newton's method on each iteration. From these statistics, I use a
timeout interval of the round-trip time plus three times the standard
deviation plus two seconds.


Implementation


Listing One (beginning on page 91) presents KSendPacketReliable, the primary
reliability function from my implementation of Kermit. This function is called
with a packet of data to be sent. It tracks information about the currently
outstanding packets and processes responses from the receiver. The
KERMIT_PRIVATE structure maintains information about the transfer; a pointer
to it is passed to each function. In particular, the KERMIT_PRIVATE structure
maintains a cache of 64 EXCHANGE structures, one for each possible
packet-sequence number. Each EXCHANGE structure maintains information about a
particular packet exchange, including the type of the packets sent and
received, the time the packet was sent, the number of times the packet was
sent, and pointers to the actual packet data.
The full implementation for my program, which provides basic Kermit
file-transfer support, is available electronically (see "Availability," page
3). The UNIX version has been compiled using GCC 2.6 on FreeBSD 2.0. The
serial routines in ftserial.c may require tweaking for other UNIX-like
platforms. 


Conclusion


My earlier Kermit implementations used a variety of different heuristics for
deciding when to repeat packets. One simply resent the first packet when the
window blocked; another resent it when certain later acknowledgments were
received. Replacing those approaches with the heuristics described in this
article substantially improved performance and answered questions that had
arisen from the earlier attempts. (For example, it was clear early on that
retransmission should be considered when the window is blocked and when an
Acknowledge is received. The two different heuristics described in this
article neatly answered the requirements of those two different situations.)
I've timed file transfers with two Kermit implementations and two ZModem
implementations over a local modem connection that lacks proper flow control.
In this situation, my Kermit implementation achieved more than twice the
transfer speed of any of the three alternatives, and was nearly five times as
fast as one of the ZModem implementations. Although inconclusive, these
results suggest that these heuristics can allow even imperfect connections to
achieve good throughput.


Acknowledgment


Tom Lippincott originally suggested the sequencing heuristic described in this
article.


References


Frank da Cruz's Kermit: A File Transfer Protocol (Digital Press, 1987) is the
definitive reference on the Kermit protocol. Also see
http://www.columbia.edu/kermit.
The source code described here is more fully explained in my book The Working
Programmer's Guide to Serial Protocols (Coriolis Books, 1995), which also
gives in-depth explanations of ZModem and Kermit. The sequencing and timeout
heuristics presented here are new, however.
The complete source code from the book is available from ftp.coriolis.com in
the /pub/bookdisk/serial directory. It's periodically updated with bug fixes
and other improvements.
Figure 1: Kermit short packet format.
Figure 2: Kermit long and extra-long packet format.
Figure 3: Window blocked (window size of 20).
 1 1 2 2 3 3 4 4 5 5 6
 0....5....0....5....0....5....0....5....0....5....0....5....0...
Tx: DDDDDDDDDDDDDDDDDDDD :
Rx: .YYYYYYYYYYYY....... :
Figure 4: Acknowledgement for Packet 13 lost.
 1 1 2 2 3 3 4 4 5 5 6
 0....5....0....5....0....5....0....5....0....5....0....5....0...
Tx: DDDDDDDDDDDDDDDDDDDD :
Rx: YYYYYYYYYY.Y........ :
Example 1: Maintaining the list of sent packets.
/* Step 1: Unlink this packet from doubly-linked list */
 slot = sequence number of this packet
 prev = exchange[slot].previousPacket;

 next = exchange[slot].nextPacket;
 if (slot == lastPacket) lastPacket=prev;
 if (prev >= 0) exchange[prev].nextPacket = next;
 if (next >= 0) exchange[next].previousPacket = prev;
/* Step 2: Link at end of list */
 exchange[slot].previousPacket = lastPacket;
 exchange[slot].nextPacket = -1;
 exchange[lastPacket].nextPacket = slot;
Example 2: Estimating the average and standard deviation.
thisDelay = now - <time packet was sent>;
if (roundTripSamples++ == 0) {
 /* Handle first sample differently */
 roundTripDelay = thisDelay;
 roundTripDelayDifference = 0;
}else{
 /* Use real average for first 30 samples, then decaying average */
 if (roundTripSamples > 30)
 roundTripSamples=30;
 /* Compute the square of the difference */
 differenceSquared = (thisDelay-roundTripDelay)*(thisDelay-roundTripDelay);
 /* Update the average */
 roundTripDelay += (thisDelay-roundTripDelay)/roundTripSamples;
 /* Update the mean square */
 roundTripDelayVariance += (differenceSquared-roundTripDelayVariance)
/roundTripSamples;
 /* Update root-mean-square using Newton's method */
 roundTripDelaySD = (roundTripDelaySD +
 roundTripDelayVariance/roundTripDelaySD) / 2;
}

Listing One
STATIC int KSendPacketFromCache(KERMIT_PRIVATE *pK,long sequence,int
addToList)
{
 int slot = sequence & 63;
 long prev, next;
 if ((pK->exchange[slot].myPacket.type == 0) 
 (pK->exchange[slot].myPacket.data == NULL))
 return kOK;
 prev = pK->exchange[slot].previousPacket; /* Unlink from list */
 next = pK->exchange[slot].nextPacket;
 if ((pK->lastPacket & 63) == slot) pK->lastPacket = prev;
 if (prev >= 0) pK->exchange[prev & 63].nextPacket = next;
 if (next >= 0) pK->exchange[next & 63].previousPacket = prev;
 pK->exchange[slot].nextPacket = -1;
 pK->exchange[slot].previousPacket = addToList ? pK->lastPacket : -1;
 if (addToList) {
 if (pK->lastPacket >= 0) /* Add to end of list */
 pK->exchange[pK->lastPacket & 63].nextPacket = 
 pK->exchange[slot].sequence;
 pK->lastPacket = pK->exchange[slot].sequence;
 }
 pK->exchange[slot].tries++; /* Count number of sends */
 pK->exchange[slot].sendTime = SerialTime (pK->initTime); 
 /* Stamp time of send */
 return StsWarn (KSendPacket (pK, slot, pK->exchange[slot].myPacket.type, 
 /* Send it */
 pK->exchange[slot].myPacket.data,
 pK->exchange[slot].myPacket.length));
}

STATIC int KSendPacketReliable(KERMIT_PRIVATE *pK, BYTE type,
 const BYTE *pSendData, unsigned long sendDataLength, 
 unsigned long rawDataLength)
{
 int blocked = FALSE;
 int err;
 int slot = pK->sequence & 63;
 int timeout = pK->my.timeout;
 { /* Put packet into cache */
 EXCHANGE *pThisExchange = &(pK->exchange[slot]);
 if (pThisExchange->myPacket.data == NULL) {
 if (pK->minCache < pK->minUsed) {
 SwapSlots (pK->minCache, slot); 
 /* Move free exchange to end of window */
 pK->minCache++;
 pThisExchange->yourPacket.type = 0;
 } else return StsWarn (kFail); /* Internal consistency failure */
 }
 if (pSendData == pK->spareExchange.myPacket.data) { 
 /* In the reserved slot ? */
 BYTE *pTmp = pThisExchange->myPacket.data; /* Just swap it in */
 pThisExchange->myPacket.data = pK->spareExchange.myPacket.data;
 pK->spareExchange.myPacket.data = pTmp;
 } else /* copy it */
 memcpy (pThisExchange->myPacket.data, pSendData, sendDataLength);
 if (pK->sequence > pK->maxUsed) pK->maxUsed = pK->sequence; 
 /* Update end of window */
 pThisExchange->sequence = pK->sequence; 
 /* Finish initializing this exchange */
 pThisExchange->myPacket.length = sendDataLength;
 pThisExchange->myPacket.type = type;
 pThisExchange->rawLength = rawDataLength;
 pThisExchange->tries = 0;
 pK->txPacket.data = pK->spareExchange.myPacket.data;
 pK->txPacket.length = 0;
 }
 StsRet (KSendPacketFromCache (pK, pK->sequence, TRUE)); /* Send packet */
 if (pK->minUsed <= pK->minCache) blocked = 1; /* Are we blocked? */
 if (pK->maxUsed - pK->minUsed + 1 >= pK->currentWindowSize)
 /* How blocked are we? */
 blocked = (pK->maxUsed - pK->minUsed + 1) - pK->currentWindowSize + 1;
 err = KReceivePacketCache (pK, 0); /* Get a packet if one's ready */
 do { /* Until we're not blocked and there are no more packets pending */
 switch (err) {
 case kBadPacket: /* Didn't get a packet */
 case kTimeout:
 break;
 default: /* Unrecognized error, pass up to caller */
 return StsWarn (err);
 case kOK: /* Got one! */
 {
 EXCHANGE *pThisExchange = &(pK->exchange[pK->rxPacketSequence & 63]);
 switch (pK->rxPacket.type) {
 case 'N': /* Got a NAK */
 if (pThisExchange->myPacket.type != 0) /* Resend packet */
 StsRet (KSendPacketFromCache (pK, pK->rxPacketSequence, FALSE));
 if ((pK->currentWindowSize > 1) (pK->maxUsed > pK->minUsed))
 break; /* Don't generate implicit ACKs for large windows */
 pThisExchange = &(pK->exchange[(pK->rxPacketSequence - 1) & 63]);

 pThisExchange->yourPacket.type = 'Y';
 case 'Y': /* Got an ACK */
 if (pThisExchange->rawLength > 0) { /* ACKed before?*/
 if (pThisExchange->tries == 1) { /* Update round-trip stats */
 long now = SerialTime (pK->initTime);
 long thisDelay = now - pThisExchange->sendTime;
 if (pK->roundTripSamples++ == 0) { /* First sample? */
 pK->roundTripDelay = thisDelay;
 pK->roundTripDelayVariance = 0;
 } else {
 long oldAverage = pK->roundTripDelay;
 long diffSquared;
 if (pK->roundTripSamples > 30) /* Average first 30 */
 pK->roundTripSamples = 30;
 /* Then decaying average */
 pK->roundTripDelay += (thisDelay - pK->roundTripDelay) 
 / pK->roundTripSamples;
 diffSquared = (thisDelay - oldAverage) * 
 (thisDelay - oldAverage);
 pK->roundTripDelayVariance += (diffSquared - 
 pK->roundTripDelayVariance) / 
 pK->roundTripSamples;
 pK->roundTripDelaySD = (pK->roundTripDelaySD +
 pK->roundTripDelayVariance / 
 pK->roundTripDelaySD) / 2;
 }
 }
 if (pK->sending) /* Update file progress on receipt of ACK */
 pK->filePosition += pThisExchange->rawLength;
 pThisExchange->rawLength = 0; /* Don't count packet again */
 }
 {
 long j, i = pThisExchange->previousPacket;
 while (i >= 0) { /* Resend packets based on send order*/
 j = pK->exchange[i & 63].previousPacket;
 if (pK->exchange[i & 63].yourPacket.type != 'Y')
 StsRet (KSendPacketFromCache (pK, i, TRUE));
 i = j;
 }
 }
 while ((pK->exchange[pK->minUsed & 63].yourPacket.type == 'Y')
 && (pK->exchange[pK->minUsed & 63].sequence == pK->minUsed)) {
 { /* Free up slots that have been acknowledged */
 long prev, next;
 prev = pK->exchange[pK->minUsed & 63].previousPacket; 
 /* Unlink */
 next = pK->exchange[pK->minUsed & 63].nextPacket;
 if ((pK->lastPacket & 63) == (pK->minUsed & 63)) 
 pK->lastPacket = prev;
 if (prev >= 0) pK->exchange[prev & 63].nextPacket = next;
 if (next >= 0) pK->exchange[next & 63].previousPacket = prev;
 pK->exchange[pK->minUsed & 63].previousPacket = -1;
 pK->exchange[pK->minUsed & 63].nextPacket = -1;
 }
 pK->minUsed++; /* Mark this exchange as free */
 if (blocked > 0) blocked--; /* Reduce count to unblock window*/
 }
 break;
 case 'E': /* Received error packet, terminate transfer */

 return StsWarn (kFailed);
 default: /* Received unrecognized packet type, terminate */
 return StsWarn (kFail);
 }
 }
 break;
 }
 { /* Resend timed-out packets */
 long i;
 unsigned long now = SerialTime (pK->initTime);
 unsigned long packetTimeout = pK->roundTripDelay + 
 3 * pK->roundTripDelaySD + 2 * SERIAL_TIME_SCALE; 
 /* Avg + 3 standard deviations + 2 seconds */
 long oldest = -1;
 unsigned long oldestTime = ULONG_MAX;
 long firstOld = -1;
 for (i = pK->minUsed; i <= pK->maxUsed; i++) {
 if (pK->exchange[i & 63].yourPacket.type != 'Y') {
 if ((firstOld == -1) && (pK->exchange[i & 63].sendTime + 
 packetTimeout < now))
 firstOld = i; /* Find first timed-out packet */
 else if (pK->exchange[i & 63].sendTime < oldestTime) { 
 /* Find oldest packet */
 oldestTime = pK->exchange[i & 63].sendTime;
 oldest = i;
 }
 }
 }
 if (firstOld != -1) /* Resend first timed-out packet */
 StsRet (KSendPacketFromCache (pK, firstOld, FALSE));
 if (oldestTime == ULONG_MAX) timeout = pK->my.timeout;
 else if (packetTimeout + oldestTime < now) /* Next oldest? */
 timeout = 0;
 else /* Compute interval until next timeout */
 timeout = (packetTimeout - (now - oldestTime)) / SERIAL_TIME_SCALE;
 if (timeout < 1) timeout = 1; /* Minimum is one second */
 if (timeout > pK->my.timeout) timeout = pK->my.timeout; 
 /* Max is negotiated value */
 }
 err = KReceivePacketCache (pK, blocked ? timeout : 0); 
 /* Try to receive a packet */
 } while (blocked (err == kOK));
 if (pK->exchange[pK->sequence & 63].yourPacket.type &&
 (pK->exchange[pK->sequence & 63].sequence == pK->sequence))
 pK->rxPacket = pK->exchange[pK->sequence & 63].yourPacket;
 pK->sequence++;
 if (err == kTimeout) return kOK;
 if (err == kBadPacket) return kOK;
 return StsWarn (err);
}









































































CGI and the World Wide Web


Debugging CGI gateways




G. Dinesh Dutt


Dinesh is an engineer with Hinditron-Tektronix Instruments Limited, Bombay,
India. He can be contacted at brat@htilbom.ernet.in.


Much of the usefulness of the World Wide Web stems from the ability of Web
servers to interact with external programs. The technology that currently
makes this possible is the Common Gateway Interface (CGI). A common
application of CGI, for instance, might involve a user querying a database via
a form. Once the form is filled out, a CGI script (or program) passes the
request from the Web server to the external database, gets the database
output, and sends it back to the user.
More specifically, CGI identifies how the Web server should supply input to
the external programs, along with the format of output to be returned. The
server, in turn, gets inputs from a client such as a Web browser. Since inputs
are normally made available via the clients, I'll assume for this article that
the interaction occurs between the client and the gateway (external programs),
instead of between the server and gateway. Currently, HTTP servers are the
only Web servers that support CGI. This means that CGI is supported on all
familiar platforms--UNIX, Macintosh, and Windows. 
The basic tools you need to use CGI are a language that produces executables
(shell scripts, Tcl, Perl, or C, for instance) and access to a CGI-enabled
HTTP server. This article is based on UNIX, Perl 4.036, the NCSA HTTPD 1.4 and
CERN HTTPD 3pre6 servers, and CGI 1.1, but it applies to other servers,
languages, and platforms. For the sake of example, I'll build a simple
form-based application (for information on forms, see "Coding with HTML
Forms," by Andrew Davison, Dr. Dobb's Journal, June 1995) that uses the form
in Figure 1. Listing One presents the HTML code that generates this form. As
you can see, this form consists of a text box and two radio buttons, only one
of which can be selected. This provides a gateway with Author and Title
inputs. Input is passed to the gateway when the user clicks the Submit button.



Data Input


There are several ways in which input is passed to gateways, including
forms-based methods and ISINDEX. While each method supplies inputs
differently, they all use environment variables to pass information. The
environment variables in Perl are available via the %ENV associative array.
REQUEST_METHOD is one such variable that indicates the method used to submit
the input. (See Table 1 for a description of additional environmental
variables CGI uses.)
In Example 1(a), the form has an ACTION METHOD set to GET, so the input is
available to the gateway in the environment variable QUERY_STRING. With forms
where the ACTION METHOD is set to POST, input is available via stdin (standard
input). The CGI specification states that the server need not supply an
end-of-file (EOF) for the input available via stdin. Instead, the HTTP server
provides the size of the input in the environment variable CONTENT_LENGTH. The
first gotcha comes here, when gateways try to read this input. Since the input
is not terminated with an EOF marker, the gateway must never read more than
the CONTENT_LENGTH or it will wait for further input that will never come.
This hangs the gateway and the client awaiting the results. Example 1(b) reads
stdin to secure the input.
When input is supplied via the ISINDEX interface, the input is made available
via the command-line arguments array ARGV. You can use standard
command-line-argument parsing code to extract the inputs. The REQUEST_METHOD
variable is set to GET in this case, too. Information supplied via ISINDEX is
sent as part of the link, separated by a "?"; for example,
http://amadeus.org/play?Jupiter, where Jupiter is made available via
command-line arguments ($ARGV[0], in this case). But remember, the arguments
cannot have blank spaces between them--even if you "quote" the spaces. To pass
arguments with spaces in them, replace each space with its equivalent
hexadecimal ASCII code prefixed by %; for example, http://
amadeus.org/play?41st%20Symphony will make the gateway receive "41st Symphony"
as its first argument.


Making Sense of the Input


If you were to simply print out the input, the result would be gibberish. One
way to see what the gateway receives is to use the test-cgi program that's
supplied with the NCSA HTTPD. For example, to examine what QUERY_STRING (which
is using the GET submission method) looks like to the gateway, you can set the
ACTION to be "test-cgi"(prefixed by the proper
http://your-server:port/cgi-bin/) in the example form
name=The+Fountainhead&keyword=title.
The inputs are presented to the server as a list of name=value pairs, with
each pair separated by a "&" character. This format also converts blank spaces
to "+" and converts all special characters to their hexadecimal ASCII code.
There are standard libraries available in many languages to decode the input
and present it in an understandable format. Listing Two presents CGIGetInput,
one such decoder written in Perl.
CGIGetInput understands both GET and POST methods and returns the inputs in an
associative array (name supplied by the caller), with each input-field name
being the key and the value of the field being the value of the array element.
You can use the decoded output for processing. For instance, the debugcgi.cgi
script of Listing Three uses this routine. 
However, in case of ISINDEX, the data format is different from the
aforementioned cases. The name part is entirely absent and the data is as
submitted by the user with all hexadecimal symbols converted to their ASCII
equivalents. For instance, in the aforementioned example, the gateway gets
"41st Symphony" even though we said "41st%20Symphony."


Talking Back 


When the gateway needs to communicate with the server after processing, it can
return the type of the forthcoming output via the Content-type header. For an
HTML document, this would be "Content-type: text/html." If you do not specify
the type of the data returned, the browser returns a "500 Server Error"
message to the user. The error logs of the server contain the reason
"malformed header from script." This happens because HTTP first sends some
metainformation about the object that it's about to return--type, size, title,
expiration date, and so on. If this information isn't forthcoming, the server
is unable to parse the input and returns the aforementioned error message. The
valid types that can be used in place of text/html are those that are
supported by the browser/HTTP server. This is given by the HTTP_ACCEPT
environment variable. (Remember, not all browsers support all environment
variables.) For plain ASCII texts, text/html can be replaced by text/plain.
Another effect of the processing could be a request to fetch another document.
To do this, the URL of the document is returned in the format print
''Location: http://amadeus.org/Mozarts_Life.html\n\n''; which tells the server
that it must retrieve the supplied URL and return that to the client. You
could also use the PrintHeader routine supplied in cgi-parse; see Listing Two.
Example 2 provides typical calling sequences.


Yet Another Input Method 


Another method for obtaining input information makes use of the PATH_INFO
environment variable. To illustrate, assume that you have a document that is
available in both French and English. Depending on the user's choice of
language, the correct document must be served. If you have a CGI script called
"document-disher," for instance, a link could be specified as:
http://mymachine.org/cgi-bin/document-disher/French/Mon_Document
http://mymachine.org/cgi-bin/document-disher/English/My_Document
In this case, the CGI script could make use of the extra path information
available at the end of the pathname to retrieve the correct lingual document.

The server also provides an environment variable PATH_TRANSLATED, which
contains a complete, legal filename based on PATH_INFO. Consequently, this
"multilingual" document gateway could simply print the contents of the file
specified in PATH_TRANSLATED if the paths are configured properly.


Which Input Method? 


The input method you use depends on your application. GET can be used when
there's little information to be supplied; for instance, a form like Figure 1,
that supplies only a keyword and the type of the keyword. If your form
involves more data, the contents of the environment variables may be
truncated. Consequently, you should use the POST method for large inputs. (The
Mosaic forms tutorial recommends use of the POST method only.) The ISINDEX
approach to input, on the other hand, lends itself to querying and works well
when you don't know if forms are being used or when you have to support
browsers that don't support forms.

Also, keep in mind that you can mix different input methods to some extent:
http://mymachine.org/cgi-bin/documentdisher/French/Mon_Document?Speak
http://mymachine.org/cgi-bin/getfc?791+793
When forms are used as the method to submit input, people might want to pass
information not modifiable by the user (the form name, for example). You can
do this by adding a ? followed by the form name to the URL of the action link
(or via PATH_INFO). While it's okay to do so, your script must "know" that the
input would be available in two different ways and read both of them; for
example, by manually changing the REQUEST_METHOD variable from within the
script. However, the correct way to pass information that's not modifiable by
the user (again, the name of the form) when using forms is via hidden fields,
specified via the TYPE="hidden" attribute in the form field. 


Debugging Gateways 


A gateway is like any other program in that you will need to be able to debug
it. One of the basic problems with CGI is that the scripts seem to work when
used normally, but fail when called from within a Web browser. The lack of
error messages makes this doubly confusing.
One way to debug gateways is to simulate the behavior of the HTTP server by
setting all the relevant environment variables (QUERY_STRING with the METHOD
set to GET, for instance) and executing the script to see if the decoding of
information is correct. However, this does not test the changed environment
under which the gateway works once it's invoked by the WWW server.
Consequently, I've written a program that reports errors in your gateway's
execution, including those caused by wrong assumptions about the environment.
Using this program (which has its own forms-based interface; see Figure 2), it
should be fairly easy for you to debug your gateways, and the gateway need not
even be written in Perl.
The test script/form as shown in Listing Four works as follows: 
1. Replace the action URL in the debug form (see Listing Five) with the
correct path to debugcgi.cgi in your machine. Ensure that the cgi-parse.pl
file is put in the @INC of the debugcgi.cgi script and is made world readable.
The debugcgi.cgi script needs to be world readable and executable. 
2. Change the paths to the actual location for Perl (replace
/usr/local/bin/perl). 
3. If you use forms to supply input, strip off the <FORM> and </FORM> lines
from your form and attach the resulting body of the form to the debug form
provided. For ISINDEX interfaces, supply the arguments in the ISINDEX area of
the form. 
4. Bring up a browser on this form. 
5. Enter the full pathname of the script invoked by the form to test and also
supply the METHOD used to submit information. 
6. Supply the input to your form. 
7. Click on Submit. 
If everything is okay with your form, you are notified accordingly, and the
output from the form is displayed. (The program currently lacks support to
handle pure image outputs.) If not, an error message is displayed and the
cause of the error reported, including any parsing errors for a script. All
errors resulting from a change in the environment (user to the server) are
trapped and reported. Let's take a look at some of the common errors
encountered in writing gateways.
One of the more common errors is to not provide all of the required
environment to the script. When testing, the script is running with its user
id set to your id, so it has access to your entire environment, files, and
databases. However, when running under the server's control, it runs with the
user id set to that of the server, usually "nobody," so it doesn't inherit
your environment. Thus, executables accessible during testing might not be
found in actual use. Similarly, files readable during testing might suddenly
become unreadable. The necessary files should be world readable and world
executable, and if they need to be written to, world writable.
Another common error occurs when you do not send the Content-type line as the
first line of the output returned by CGI. Make sure the first two lines of the
form are the Content-type line followed by a blank line; otherwise, the
"malformed header" error appears. The Content-type field needs to be set to
the type of the object being returned.
Also, by printing a Content-type line at the very beginning or before printing
an error message, you can redirect the errors to the user; otherwise, error
messages end up in the daemon's error log.
You should also ensure that your HTTP server supports CGI Version 1.1 and that
the server is running with the ability to recognize and execute CGI scripts.
Many sites turn off CGI since it can be a security hole if not properly
configured. In the case of NCSA HTTPD, the directive ScriptAlias gives the
paths that can contain scripts. Also, if the AddType directive is defined as
AddType application/x-httpd-cgi.cgi, then a script ending in .cgi can be
recognized anywhere the server has access. Both directives are to be present
in the srm.conf file. For the CERN server, scripts are configured via the Exec
directive.
The script must be placed in the proper directory, and the server must be
capable of recognizing a file with a specific suffix, such as ".cgi," as a
script to be executed. 
A simple problem with the CERN server is caused by its parsing policy. If the
path to your script is passed/rejected before it encounters the desired Exec
rule, the script will not be executed, but its contents are returned as a
document (in case of a reject, an error message is returned, not the script as
a document). To prevent this, ensure that the Exec rules come first for those
directories containing scripts. 
When troubleshooting, view the error log file for the HTTPD server you're
using. The location of this file is difficult to predict; it varies from a
standard/var/httpd/logs/error_log to /usr/lcal/dolphin/httpd/logs/error_log.
Figure 3 shows the contents of my server's error_log when problems occurred.
One problem immediately apparent from this log is that some of the errors do
not indicate which script was the cause of the error message. The "Can't open
1057" error is one such example. These errors are normally system related,
such as a call to an external program via system(). To properly trap and
report system errors, include the name of the script. It might make sense to
print the contents of the environment variable HTTP_REFERER (if available),
which contains the the URL specified to get to this script.
Furthermore, when using forms with the POST method, you must not expect an
EOF; instead you must access the CONTENT_LENGTH environment variable to get
the number of bytes to read and then read only that much. Otherwise, your
script will hang. One nice feature of CERN HTTPD Version 3 is the ability to
specify a timeout period. If the script doesn't terminate within that time
frame, it's killed. This is specified via the ScriptTimeOut directive, and has
a default value of five minutes. 
If you're mixing output from your script with output from external programs
called from within the script, you should unbuffer the output or else the
output could be in some nonpredictable order. Also, unbuffering STDOUT seems
to improve performance, as the server gets output from the gateway immediately
instead of waiting until the buffer is full. 


Sending Output to the Browser 


In every case, the gateway spews out data without having to bother with HTTP
reply-header format and conventions. The server looks at this output and adds
headers conforming with the HTTP protocol before sending it to the client. If
you wish to save the overhead of your server parsing the output, you may do so
by prepending the appropriate HTTP response headers.
To prevent the server from parsing the output of such scripts, the scripts
should have names that begin with "nph-". For example, NCSA HTTPD comes with a
script called "test-cgi." The same script that talks directly to the browser
is named "nph-test-cgi." 
The main difference between such scripts and ordinary scripts is in the extra
two lines that are prepended to the output. The first is the status-code line
and the second is the server: line that specifies the server name and version.
You need to look at the draft on HTTP to know all the valid status codes. For
example, the nph-test-cgi script returns the headers in Figure 4.


Customized Responses to Problems 


The NCSA HTTPD 1.4 server lets you customize the error message returned. For
example, you could customize the returned error message for a "500 Server
Error" by calling a script that would present a more informative message. To
do this, examine the srm.conf file which contains these lines at the end:
ErrorDocument 302 /cgi-bin/redirect.cgi
ErrorDocument 403 /errors/forbidden.html
This means that the redirect.cgi script will be invoked when a redirect error
occurs. 


Security


Security is a crucial issue when writing CGI scripts because you are in effect
allowing other users to execute programs on your machine based on their
inputs. Many of the problems encountered in writing CGI scripts in this
respect are similar to those encountered when writing UNIX setuid scripts.
Consequently, you should always follow the simple rule: Do not trust the
client input at all. For example, do not blindly use the client input to
construct commands for the system to execute or supply as input to eval. Do
not even print the value input to your script (except during testing, of
course) as hackers can use clever sequences to break into the system.
Further security is possible via the authentication mechanisms provided by
most servers, which require the user to key in a username and password. Only
if this validates is the user allowed to execute the script. Details on how to
configure the server to do this are beyond the scope of this article. Refer to
your server manuals for details.


Conclusion


The net is a rich source of CGI information and numerous, freely available
programs to ease your job of writing and debugging gateway applications. Refer
to the list of web sites in Table 2 for more information on CGI.

Figure 1: Typical Mosaic form.
Figure 2: Forms-based interface of the debugcgi program.
Figure 3: Error messages for server's error_log.
panic: realloc at /usr/local/bin/rfc2html line 93, <RFC> line 1003.
Can't open 1057: No such file or directory
/usr/local/bin/rfc2html did not return a true value
at/usr/local/etc/httpd/cgi-bin/rfc2html line 52, <>
line 14.
[Mon May 22 09:59:15 1995] httpd: malformed header from script
Figure 4: Headers returned by the nph-test-cgi script.
HTTP/1.0 200 OK
Content-type: text/plain
Server: NCSA/1.3
Example 1: (a) Accessing the contents of QUERY_STRING; (b) reading input via
stdin. 
(a)
if ($ENV{'REQUEST_METHOD'} eq "GET") {
 $input = $ENV{'QUERY_STRING'}; 
}

(b)
if ($ENV{'REQUEST_METHOD'} eq "POST") {
 if (!defined ($ENV{'CONTENT_LENGTH'})) { 
 print "Error: CONTENT_LENGTH not set\n";
 exit;
 }
 read (STDIN, $buffer, $ENV{'CONTENT_LENGTH});
}
Example 2: Typical calling sequences.
# Using the cgi-parse.pl, 
&PrintHeader; # Print just the defaulttext/html;
&PrintHeader ("text/plain") # Print type to betext/plain.
# Redirect request to get new document
&PrintHeader ("http://amadeus.org/Mozarts_Life.html",1);
Table 1: Environment variables used by CGI. 
Variable Description

HTTP_REFERER Contains the exact URL in which the
 script was invoked; for example,
 http://www.halcyon.com/ htbin/browser-
 survey. In some older versions of
 browsers, this is called as
 REFERER_URL. 
HTTP_USER_AGENT Gives the name of the browser through
 which the script was invoked. Using
 this, one could serve different
 browsers different documents, one with
 or without netscapisms.
REMOTE_USER If authentication is enabled, this
 returns the name for which the
 authentication succeeded, the server
 must support it.
REMOTE_ADDR/REMOTE_HOST Remote machine making the request. If
 the hostname is unavailable, only the
 address is set.
SERVER_PROTOCOL States the protocol and the version of
 the protocol being used. Currently
 HTTP 1.0 and HTTP 0.9 for older
 servers.
GATEWAY_INTERFACE Name of the gateway interface being
 used and the version number (currently

 CGI 1.1).
AUTH_TYPE Protocol-specific authentication
 supported by the server. Currently,
 the only valid value is "Basic."
SERVER_SOFTWARE Name of the server: NCSA 1.4, for
 example, for the NCSA server version
 1.4.
Table 2: Web sites for information on CGI.
NCSA's documentation on CGI
http://hoohoo.ncsa.uiuc.edu/cgi/intro.html
CGI specification
http://hoohoo.ncsa.uiuc.edu/cgi/interface.html
The HTTP draft
http://info.w3.org/hypertext/WWW/Protocols/HTTP/HTTP2.html
CGI FAQ 
http://www.halcyon.com/hedlund/cgi-faq/
Yahoo index for CGI material
http://akebono.stanford.edu/yahoo/Computers/World_Wide_Web/CGI___Common_Gateway_Interface/
Virtual library material on CGI
http://www.charm.net/~web/Vlib/Providers/CGI.html
CGI-related newsgroups
news://comp.infosystems.www.authoring.cgi
Currently available gateways
http://www.w3.org/hypertext/WWW/Tools/Filters.html
http://www.nr.no/demo/gateways.html
http://www.halcyon.com/hedlund/cgi-faq/gateways.html
http://www.cis.ohio-state.edu:80/hypertext/about_this_cobweb.html
Language libraries for decoding forms input and other useful things
http://wsk.eit.com/wsk/dist/doc/libcgi/libcgi.html - C
http://www.bio.cam.ac.uk/web/form.html - Perl
http://www.lbl.gov/~clarsen/projects/htcl/http-proc-args.html - Tcl
Survey of which browsers support which variables
http://www.halcyon.com/htbin/browser-survey

Listing One
<FORM ACTION="http://yourmachine:urport/cgi-bin/testcgi.cgi" METHOD="POST"> 
<H1>Illustration Form</H1> 
 <P ALIGN=JUSTIFY> 
This form is used as a illustration to the article on CGI. 
<HR> 
<INPUT TYPE="text" NAME="name" VALUE=""> 
Title<BR> 
<OL> 
<LI> <INPUT TYPE="radio" NAME="keyword" VALUE="author"> 
Author 
<LI> <INPUT TYPE="radio" NAME="keyword" VALUE="title" CHECKED> 
Title. 
</OL> 
<HR> 
<INPUT TYPE="submit" VALUE="Submit Form"> 
Submit Button<BR> 
<INPUT TYPE="reset" VALUE="Clear Values"> 
Reset Button. 
 <P> 
</FORM> 

Listing Two 
###############################################################################
## CGI-PARSE.PL ##

## A library to read and parse the input available from forms as per the ##
## CGI 1.1 specification. ##
## This code is in the public domain for people to do whatever they wish to ##
## with it. But, maintain this copyright notice and don't say you wrote it. ##
## This work is distributed in the hope that its useful. But, the author is ##
## not liable for any any incurred damages, directly or indirectly due to ##
## the use or inability to use this software. ##
###############################################################################
###############################################################################
## CGIGetInput ##
## This is a small function which decodes the forms input. It looks at the ##
## REQUEST_METHOD environment variable to decide where to get the input
from.##
## The user can invoke this subroutine thus : ##
## &CGIGetInput (*cgi_in); ##
## and the input is returned in an associative array called cgi_in, with the
##
## key being the name of field and its value being the value of the field ##
## as supplied by user. If the field does not have any input, the entry in ##
## the associative array will be undefined. ##
###############################################################################
sub CGIGetInput {
 local (*input) = @_;
 local ($buffer,@nv_pairs);
 if ($ENV{'REQUEST_METHOD'} eq "GET") {
 $buffer = $ENV{'QUERY_STRING'};
 }
 elsif ($ENV{'REQUEST_METHOD'} eq "POST") {
 read (STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
 }
 else {
 return -1;
 }
 @nv_pairs = split (/\&/,$buffer);
 foreach $nvp (0..$#nv_pairs) {
 $nv_pairs[$nvp] =~ tr/+/ /;
 ($key, $keyword) = split (/=/, $nv_pairs[$nvp], 2);
 $key =~ s#%(..)#pack("c",hex($1))#ge;
 $keyword =~ s#%(..)#pack("c",hex($1))#ge;
 $input{$key} .= '\0' if (defined ($input{$key}));
 $input{$key} .= $keyword;
 }
 return 1;
}
###############################################################################
## &PrintHeader (type/URL, is_it_a_URL) ##
## This function prints the default header. If a type is specified, that is ##
## printed, else the default text/html is printed. If the second parameter
is##
## 1, then the Location header is printed instead of the text/html header. ##
## ##
## Example invocations : ##
## &PrintHeader ("text/plain", 0) ##
## &PrintHeader ("http://www.halcyon.com/hedlund/cgi-faq/",1) ##
## &PrintHeader ("",0) ##
###############################################################################
sub PrintHeader {
 local ($toprint, $url_p) = @_;
 if ($toprint eq "") {
 print "Content-type: text/html\n\n";
 }
 elsif ($url_p) {

 print "Location: $toprint\n\n";
 }
 else {
 print "Content-type: $toprint\n\n";
 }
}
1;

Listing Three
###############################################################################
## DEBUGCGI.PL ##
## This is a simple script which sets up a test environment for CGI script ##
## to be executed and then traps the common errors. The PATH is set to the ##
## minimal set by most systems, for example. All error messages are trapped ##
## and made available to the user. ##
## ##
## This code is in the public domain for people to do whatever they wish to ##
## with it. But, maintain this copyright notice and don't say you wrote it. ##
## This work is distributed in the hope that its useful. But, the author is ##
## not liable for any any incurred damages, directly or indirectly due to ##
## the use or inability to use this software. ##
###############################################################################
$tmpdir = "/tmp/"; # The directory under which the error file will
 # be created.
require "cgi-parse.pl";
%cgi_input = ();
&CGIGetInput(*cgi_input);
$script = $cgi_input{'DebugCgi-ScriptName'};
$method = $cgi_input{'DebugCgi-Method'};
$cmdargs = $cgi_input {'DebugCgi-CmdArgs'};
delete ($cgi_input {'DebugCgi-ScriptName'});
delete ($cgi_input {'DebugCgi-Method'});
delete ($cgi_input {'DebugCgi-CmdArgs'});
$inp = "";
foreach $elem (keys %cgi_input) {
 $cgi_input{$elem} = $cgi_input{$elem};
 $cgi_input{$elem} =~ s# #+#g;
 $cgi_input{$elem} =~ s#([^+A-Za-z0-9])#sprintf("%%%02x",ord($1))#ge;
 $cgi_input{$elem} =~ s#%3d#=#g;
 $inp .= "$elem=$cgi_input{$elem}&";
}
# Encode the input in the form used by HTTP.
#Turn off the include path. The script must use its own @INC and environment.
if (! -e $script) {
 &PrintErrHeader;
 print "<B>Script <EM>$script</EM> does not exist</B><BR>";
 &PrintErrTrailer;
 exit (2);
}
if (! -r $script && ! -x $script) {
 &PrintErrHeader;
 print "<B>Script <EM>$script</EM> is not readable/executable by 
 server</B><BR>";
 &PrintErrTrailer;
 exit (2);
}
#Set the request method.
$error_file = $tmpdir.$^T;
$ENV{'REQUEST_METHOD'} = $method;

if ($method eq "GET") {
 $ENV{'QUERY_STRING'} = $inp;
 open (OUTPUT, "$script $cmdargs 2\>/tmp/errors ") 
 &cry ("unable to pipe script $! \n");
}
elsif ($method eq "POST") {
 $ENV{'CONTENT_LENGTH'} = length($inp);
 open (OUTPUT, "echo \"$inp\" $script $cmdargs 2>$error_file ") 
 &cry ("unable to pipe script $! \n");
}
else {
 &PrintHeader;
 print "Unknown method: $method\n";
 exit (3);
}
$_ = <OUTPUT>;
if (!/^Content-type: / && !/^Location: /) {
 if (-s $error_file) {
 open (ERRF, "< $error_file") &cry ("testcgi.cgi - 
 Unable to open error file $!\n");
 &PrintHeader;
 print "<HTML><BODY>\n";
 @errors = <ERRF>;
 &PrintErrHeader;
 print "<B>Script <EM>$script</EM> has an execution 
 error !!!</B><BR><BR>";
 print "@errors \n";
 &PrintErrTrailer;
 unlink ($error_file);
 exit (4);
 }
 &PrintErrHeader;
 print "The script <EM>$script</EM> has an error :<BR><BR>";
 print "It does not output the Content-type/Location header.<BR>";
 print "Here's what it printed as the first line.\n";
 print "<PRE>\n";
 print;
 print "</PRE>\n";
 &PrintErrTrailer;
 exit (3);
}
$format = m#^Content-type:[ \t]*text/html#;
$_ = <OUTPUT>;
if (!/^$/) {
 &PrintErrHeader;
 print "The script <EM>$script</EM> has an error :<BR><BR>";
 print "The second line it outputs must be a blank, instead I got <PRE>\n";
 print;
 print "</PRE>";
 &PrintErrTrailer;
 exit (3);
}
&PrintHeader;
print "<HTML><BODY><H3>Script <I>$script</I> seems OK !</H3> \n";
print "<P ALIGN=Justify> Here is its output:<BR>\n";
print "<PRE>\n" if (!$format) ;
print $ENV{'PATH_INFO'},"\n";
while (<OUTPUT>) {
 print;

}
print "</PRE>" if (!$format);
print "</BODY></HTML>";
exit (0);
sub cry {
 local ($message) = @_;
 &PrintHeader;
 print "<HTML><BODY><H2>Debugcgi Error !!</H2>";
 print "DebugCGI encountered an error during execution. 
 The error is: ", $message;
 print "\n<BODY><HTML>";
 exit;
}
sub PrintErrHeader {
 &PrintHeader;
 print "<HTML><BODY><H3>Script Error !!</H3>";
}
sub PrintErrTrailer {
 print "</BODY></HTML>\n";
}

Listing Four
 
###############################################################################
## TESTCGI.PL ##
## This is a script which sets up a test environment for the CGI script ##
## to be executed and then traps the common errors. The PATH is set to the ##
## minimal set by most systems, for example. All error messages are trapped ##
## and made available to the user. Thus, he does not have to wonder why for ##
## error cases. ##
## This code is in the public domain for people to do whatever they wish to ##
## with it. But, maintain this copyright notice and don't say you wrote it. ##
## This work is distributed in the hope that its useful. But, author is not ##
## liable for any any incurred damages, directly or indirectly due to use ##
## or inability to use this software. ##
###############################################################################
 
$tmpdir = "/tmp/"; # Directory under which the error file will be created.
require "cgi-parse.pl"; 
sub Usage { 
 print "Usage: testcgi [-f filename containing input] -m METHOD scriptname\n";
 print " where METHOD is GET/POST\n"; 
 exit (0); 
} 
%cgi_input = (); 
&CGIGetInput(*cgi_input); 
&PrintHeader; 
 
$script = $cgi_input{'TestCgi-ScriptName'}; 
$method = $cgi_input{'TestCgi-Method'}; 
delete ($cgi_input {'TestCgi-ScriptName'}); 
delete ($cgi_input {'TestCgi-Method'}); 
 
$inp = ""; 
foreach $elem (keys %cgi_input) { 
 $cgi_input{$elem} = $cgi_input{$elem}; 
 $cgi_input{$elem} =~ s# #+#g; 
 $cgi_input{$elem} =~ s#([^+A-Za-z0-9])#sprintf("%%%02x",ord($1))#ge; 
 $cgi_input{$elem} =~ s#%3d#=#g; 

 $inp .= "$elem=$cgi_input{$elem}&"; 
} 
# Encode the input in the form used by HTTP. 
 
#Turn off the include path. The script must use its own @INC and environment. 
@INC=(); 
$ENV{'PATH'} = "/bin:/usr/bin/:/etc:"; 
 
#Set the request method. 
$error_file = $tmpdir.$^T; 
$ENV{'REQUEST_METHOD'} = $method; 
if ($method eq "GET") { 
 $ENV{'QUERY_STRING'} = $inp; 
 open (OUTPUT,"$script 2\>/tmp/errors ") die "unable to pipe script $! \n";
} 
elsif ($method eq "POST") { 
 $ENV{'CONTENT_LENGTH'} = length($inp); 
 open (OUTPUT,"echo \"$inp\" $script 2>$error_file ") die 
 "unable to pipe script $! \n";
} 
else { 
 print "Unknown method: $method\n"; 
 exit (3); 
} 
print "<HTML><BODY>\n"; 
$_ = <OUTPUT>; 
if (!/^Content-type: / && !/^Location: /) { 
 if (-s $error_file) { 
 open (ERRF, "< $error_file") die 
 "testcgi.cgi - Unable to open error file $!\n";
 print "<HTML><BODY>\n"; 
 @errors = <ERRF>; 
 print "<H3>Script $script has an execution error !!!</H3>\n"; 
 print "@errors \n"; 
 unlink ($error_file); 
 exit (4); 
 } 
 print "<H3>Script $script has an error !!!</H3>\n"; 
 print "It does not output the Content-type/Location header.\n"; 
 exit (3); 
} 
$format = m#^Content-type:[ \t]*text/html#; 
$_ = <OUTPUT>; 
if (!/^$/) { 
 print "Your second line must be a blank\n"; 
 exit (3); 
} 
print "<H3>Script $script Seems OK</H3> \n"; 
print "<P ALIGN=Justify> Here is its output \n"; 
print "<PRE>\n" if (!$format) ; 
while (<OUTPUT>) { 
 print; 
} 
print "</PRE>" if (!$format); 
print "</BODY></HTML>"; 
exit (0); 

Listing Five
 

<!-- <FORM ACTION="mailto:brat" METHOD="POST"> --> 
<FORM ACTION="http://yourmachine:urport/cgi-bin/testcgi.cgi" METHOD="POST"> 
<H1>Test CGI Form</H1> 
 <P ALIGN=JUSTIFY> 
This form is used as a front-end to testcgi. 
<HR> 
<INPUT TYPE="text" NAME="TestCgi-ScriptName" VALUE=""> 
Script Name<BR> 
<INPUT TYPE="text" NAME="TestCgi-Method" VALUE="POST"> 
Method<BR> 
<!-- Insert the form to be tested minus the FORM header and trailer --> 
<!-- and the Submit and clear buttons --> 
<HR> 
<INPUT TYPE="submit" VALUE="Submit Form"> 
Submit Button<BR> 
<INPUT TYPE="reset" VALUE="Clear Values"> 
Reset Button. 
 <P> 
</FORM>












































Using Server-Side Includes


A simple but powerful technique




Matt Kruse


Matt is a computer-science student and a WWW consultant. He can be reached at
mkruse@saunix.sau.edu.


If you've done any browsing at all on the World Wide Web, you've probably seen
pages that contain information or images generated on the fly that change each
time you load the page. Whether it's a counter to tell you that you're visitor
#1000, a greeting that says "Welcome, visitor from yoursite.com," or an
up-to-the-minute schedule for a tourist attraction, it can be created using
server-side includes.
Server-side includes (SSIs) are simply commands embedded inside regular HTML
documents that make your page do something different each time it is loaded.
By using SSIs, you can create dynamic pages that will make your site stand out
from the WWW pack. In this article, I'll describe the format of these
commands, show how they work, and discuss how you can write programs to work
with your Web pages.


The WWW Server


Most WWW servers have a fairly simple job: When a request for a page comes in,
they load the correct file from disk and send it out to the right place. If
you put SSI commands inside your HTML document, however, things must happen a
little differently. First, you need to tell the server to parse the document
for embedded SSI commands, which are to be executed before the document is
sent. By default, most servers will not parse documents at all, so any SSIs
will simply be ignored. If you haven't put any commands inside your documents,
there's no reason for the server to search every single file before sending it
to the client. Likewise, if you use SSIs in only a few pages, the server
needn't parse those pages that don't have them. So, most servers have a way to
distinguish which pages should be parsed and which should not.
Usually, the server decides what to parse based on the filename. If the file
ends in .html, it is delivered unparsed; if it ends in .shtml, it is searched
for SSI commands. Of course, you can change this convention. For those servers
that can handle the extra load of parsing all documents, you may wish to
enable SSI parsing for all .html files.
Some servers (the CERN server, for instance) do not yet support SSIs. Also,
the commands and formats are not a standard and may vary from one server to
another. The examples I present here work correctly with both the NCSA httpd
and Netscape servers, and should also work with other servers, or at least be
close enough to modify easily.


SSI Format


What do SSI commands look like, and how does the server handle a command once
it is found? All SSI commands are conveniently placed inside standard HTML
comments so that the commands will not be displayed if a page ever gets to
users without being parsed. In HTML, any text between "<!--" and "-->" is
considered a comment. Consequently, the general format for an SSI command
inside an HTML document is as in Example 1(a). Several different commands are
available, each taking its own arguments using either one or two tags. 
The #exec command executes an external program and includes its output in your
document. It takes one of two options: cmd, which executes a single command
using /bin/sh; or cgi, which gives a virtual path to a program to be executed.
For example, to call a program in the same directory as the HTML page that the
command is located in, you would use the command in Example 1(b).


Writing the Program


Once you understand how commands are placed into HTML documents, the next step
is, of course, to write a program to be executed. Your program can be written
in any language that is executable on the server. This means C, Pascal, awk,
Fortran, and so on, although the most common choice is Perl. Most scripts are
written in Perl because it is easy to write, has great text-manipulation
capabilities, and doesn't require lengthy compiles. Also, many Perl scripts
and libraries are available on the Internet for downloading and use. All the
examples in this article are in Perl.
The interaction between the program and the HTML page is simple. Whatever
information is sent to STDOUT from your script is inserted into the page and
treated just as if it were there to begin with. On some servers, anything sent
to STDERR will get appended to the error log, which is helpful in debugging.
Every program used in an SSI must return two things: a MIME type identifier
and any output to be included in the document. As you can see from Example
1(c), the first line output by the program is the MIME type; this tells the
browser how to handle the data you'll be sending. In the example, this is
"Content-type: text/html", which says that the information that follows is of
type text and subtype html. The two newlines at the end of the MIME type
header are always required. (One of the most common mistakes that new
programmers make is not returning the MIME header in the correct format,
including the empty line, which results in an error.)
After you've told the browser how to treat your data, any output from your
program will be inserted into the document. The server doesn't check whether
you've returned anything that it isn't ready to handle--for example, an image
or other binary data--so be careful about what you send. The third line in
Example 1(c) simply places the text "Hello, World" into your document wherever
the SSI call is located. Example 2(a) is the original HTML document. Once the
server parses the document and executes "script.cgi", the actual HTML code is
sent to the user's browser; see Example 2(b).
If users view the source code for the just-received HTML page, they will see
the processed HTML, not the source that contains the SSI command. Users don't
even know that a command was executed. This example of a script that inserts
the same text each time the SSI command is invoked has no practical value.
Instead, SSIs are usually used to return output that changes with each load of
the page. A better example, one that represents perhaps the most-common use
for SSIs, is a "counter" script. You've probably encountered this many times:
a Web page with a counter that keeps track of how many times the page has been
loaded and displays the current count to each visitor.
Example 3(a) presents a simple counter script that uses a file to store the
current number of accesses to the page. Each time the script is run inside an
SSI, it reads the current count from the file, increments the count, writes
the new value back to the file, and prints the value to STDOUT. The script
also uses file locking, which is vital in a script such as this. Without file
locking, two people accessing your page at the same time would both cause the
program to read and write to the file at the same time, possibly causing the
count to be lost.
After printing out the MIME header in line 2 of the script, the file
containing the counter value is opened with both read and write access. The
program then gets an exclusive lock on the file using flock(), reads the first
line of the file into the $count variable, resets the file pointer to the
beginning of the file, and increments the variable. The program then prints
the variable twice--once to STDOUT to insert it into the HTML document, and
once to the file to store the new value. The exclusive lock is then given up,
the file is closed, and the program is finished.
Example 3(b) includes a counter in an HTML file. When the count is returned
from the script, the count is placed between the <B> and </B> tags, making the
count appear bold in the browser. Using this script inside of a WWW document
will let you keep an ongoing page-access counter.


Improving the Counter


Although the previous counter script will do the job just fine, some
improvements can make it more flexible. Rather than simply keeping track of
the count for a single page, it would be nice to modify the script to keep
track of counts for any page that calls it. You could then use a counter on
any page on a WWW site by inserting the same command and have it automatically
keep separate counts for each page.
You can accomplish this with only a few modifications to the original program.
One environment variable available to scripts running under the NCSA (and some
other) servers is DOCUMENT_URI, which contains the virtual path and filename
of the HTML page that the script is being called from. You can then use this
to detect which page is making the counter request and load the appropriate
file.
Example 4 is a modified version of the counter script that incorporates these
changes. First, it replaces all "/" characters from the calling URL with
underscores to create a valid filename. If the resulting filename exists, it
is used as the counter file, just as before. If it doesn't exist, the file is
created and 0 is inserted as the current count. This type of counter script
would be especially helpful for a commercial Web-presence provider who wants
an easy way for customers to add counters.


Other SSI Commands



While the #exec command may be the most useful and flexible, other SSI
commands are useful in HTML documents. Table 1 presents some additional SSI
commands found in the NCSA server. 
#include inserts a file specified by the virtual tag into the HTML document.
This is a good way to provide, say, a standard footer at the bottom of each of
your pages. You can then just insert the include command in each page, and
have only one file. If you want to update your footer with more information or
change it in any way, you only have to change a single file. You can also use
the "file" tag if you are referencing a file in the same directory as the HTML
document.
#echo inserts the value of any one of the environment variables currently
available from the server. You can use several variables, including the remote
user's IP address, information about the server, the user's name if it is
available, and so on.
#fsize prints the size of the file specified with the tag. The "file" tag is
also valid for the command, as with #include.
#flastmod displays the last modification date of the file given in the tag.
The "file" tag is also available.
For details on more commands, refer to the documentation for the NCSA server
or the server you are using. Another good place to look is
http://hoohoo.ncsa.uiuc.edu/docs/tutorials/includes.html.


Conclusion


This access-counter example is a useful script, but it is only one of many
possible uses for SSIs. For example, SSIs can display a random image, present
a welcome message to a user, update a private log of access statistics, output
different text depending on the user's browser, or automatically redirect the
user to a different page. SSIs can make your Web pages more dynamic and
interesting, and those who visit your site will have another reason to come
back.
Example 1: (a) General format for an SSI command; (b) an instance of the #exec
command; (c) script.cgi.
(a)
<!--#command tag1="value1"
 tag2="value2"-->

(b)
<!--#exec cgi="script.cgi"-->

(c)
#!/usr/bin/perlprint "Content-type:text/html\n\n";print "Hello, World";
Example 2: (a) HTML document that invokes a script via an SSI; (b) resulting
HTML stream that is sent to the client.
(a)
<html><h1>Sample output</h1><p>Here is the script output:
<!--#exec cgi="script.cgi"-->
</p>
</html>

(b)
<html><h1>Sample output</h1><p>Here is the script output:
 Hello world!</p>
</html>
Example 3: (a) A simple counter script; (b) HTML page that invokes the
counter.
(a)
#!/usr/bin/perlprint "Content-type: text/html\n\n";open(COUNTER,"+<
counter.txt");flock(COUNTER,2);$count=<COUNTER>;seek(COUNTER,0,0);$count++;print
"$count";print COUNTER "$count";flock(COUNTER,8);close(COUNTER);

(b)
<html><h1>Welcome</h1>You are visitor #<b><!--#exec
 cgi="counter.cgi"--></b>.
<p> </html>
Example 4: Improved version of the counter script.
#!/usr/bin/perl
print "Content-type: text/html\n\n";
($PAGE = $ENV{'DOCUMENT_URI'}) =~
 s/_g;
unless (-e $PAGE) {
 open(NEW,"> $PAGE");
 print NEW "0";
 print "0";
 close(NEW);
 exit(0);
 }
open(COUNTER,"+< $PAGE");
flock(COUNTER,2);
$count=<COUNTER>;
seek(COUNTER,0,0);
$count++; 
print "$count"; 

print COUNTER "$count";
flock(COUNTER,8);
close(COUNTER);
Table 1: Additional SSI commands.
Command Format
include file <!--#include virtual="path/to/file"-->
echo <!--#echo var="Environment Variable"-->
file size <!--#fsize virtual="path/to/file"-->
last modified <!--#flastmod virtual="path/to/file"-->






















































Java Command-Line Arguments


A Java package for parsing command-line parameters




Greg White


Greg, a programmer at I/NET, is currently working on the Web Server/400
product, a commercially available Web server for AS/400 computers. He can be
contacted at gwhite@inetmi.com.


Users of the first release of the Web server my company developed made one
request more often than any other: "Please include printed documentation." In
the first release, documentation consisted of more than 160 HTML files,
viewable from any browser supporting HTML 2.0. While users could load
documents into a browser and print them, all links ended up getting lost--not
to mention that the printing sequence wasn't clear. For our part, we also
needed a formatted text version of some of the documentation for reference as
comments within configuration files.
While trying to solve this problem, I ran across RtfToHtml, a program that
converts rich text format (RTF) documents to HTML (see
ftp://ftp.cray.com/src/WWWstuff/RTF/rtftohtml_overview.html). Unfortunately,
our source documentation was already in HTML, so we would have had to convert
it to RTF and manually insert the styles that RtfToHtml can handle. Instead, I
wrote a program that reads in HTML and writes out RTF. By the time I was done,
I had written a system that has, at its core, a Java application that converts
an HTML file to an RTF file (with links preserved as bookmarks) or text file.
The name of the application (and its package of classes) is HtmlXlate.
In this article, I'll introduce CmdLnArg, a package of Java classes which
parses the command-line parameters for the application. In future articles,
I'll detail the process of converting HTML to RTF and formatting text. This
article and the accompanying code are based on the 1.0 beta of the Java
development kit (JDK).


The Java Programming Language


Developed by Sun Microsystems, Java is an object-oriented language similar to
C++. The language has garnered attention because its portability, security,
and support for distributed programming make it useful for Internet
programming. To find out more about Java, refer to the articles "Java and
Internet Programming," by Arthur van Hoff (DDJ, August 1995), "Net Gets a Java
Buzz," by Ray Valds (Dr. Dobb's Developer Update, August 1995), and
"Programming HotJava Applets," by John Rodley (Dr. Dobb's Web Sourcebook,
November/December 1995). You can also find Java-related information at
http://java.sun.com. 
There are differences between Java and C++ in both the language and the
environment. Java does not include familiar C++ features such as pointers,
preprocessing, multiple inheritance, goto, and automatic coercion. While I
might have liked to keep some of these features, Sun makes a case for their
elimination, and the net effect is good. Sun has also added features, such as
automatic garbage collection.
Most of the interest in Java currently centers around "applets." Applets are
like programs, but are loaded and run by a Java-enabled browser such as
HotJava or Netscape Navigator 2.0 beta. A normal HTML document can contain a
reference to an applet that causes the applet to be loaded and run (similar to
the way images are referenced). Applets are given an area of the screen to
work with and tend (but don't have) to be oriented towards graphics and GUI.
Java applications, like applets, are written in the Java language, but are
loaded and run directly by the Java interpreter rather than by a browser.
These are usually command-line programs, but don't have to be. (In the Alpha3
release of the JDK, the HotJava browser was implemented as a Java application.
At this writing, there is no HotJava browser available for the Beta 1.0
release of the JDK.) I chose to make HtmlXlate an application instead of an
applet because I didn't need to display graphics. It is also useful to be able
to call the application from within other command-line programs (a make
program, for example).
When developing in Java, you must compile the source, but you don't have to
explicitly link it. It will be dynamically linked and loaded at run time by
the Java interpreter. The interpreter also checks the code it is loading for
errors and virus-like behavior. The code can then either be interpreted or
converted to the machine code for the current platform. This delayed
(optional) conversion to machine code allows any machine and operating system
with a Java interpreter to run the application without recompiling.
A Java "package" provides a way to collect a related group of classes. Within
the package, each class can be public or not. Only classes declared public can
be used outside of a package. When I originally wrote HtmlXlate, I put the
classes that process command-line arguments into the same package as the rest
of the code. Afterwards, I put those classes into a separate package
(CmdLnArg) for easier maintenance and reuse. 
Sun defines conventions for command-line arguments in the
documenthttp://java.sun.com/progGuide/java/cmdLineArgs/conventions.html. The
document describes "word" arguments, "arguments that require arguments," and
"flags." 
Word arguments are full words that turn an option on or off; for example, a
typical word argument is -verbose. 
Arguments that require arguments consist of a keyword followed by additional
information (such as what format to output). In HtmlXlate, for example, the
keyword is -to and the argument is RtfWriter, TextWriter, or CommentWriter.
Flags are like word arguments except that they have only one letter and can be
combined (for example, -vg would be two separate flags--v and g). I didn't use
flags in HtmlXlate.
HtmlXlate requires support for word arguments and for arguments that require
arguments. All arguments should have a leading hyphen. If more than one flag
argument is specified on the command line, they can be combined behind one
hyphen. 


The Test Program


At this point, I'm going to assume that you have the beta version of the JDK
installed. If not, you can get it at http://java.sun.com/. Although I've
tested this code only with the Windows 95 JDK, it should work with other
implementations as well.
To start, pick a directory that will contain your Java-related files. You will
need to modify your CLASSPATH environment variable to include this drive and
directory. Below the directory you just created, create the subdirectory
CmdLnArg (the same name as the package), which will contain the Java source
files presented in Listings One through Four. 
The directory in which you place the source file must have the same name as
the package, or the Java compiler won't know where to look for the classes
when you use them from outside the package. (At first, I gave them different
names, which resulted in strange compiler errors.)
To compile the source, enter the commands in Example 1. (Note that the same
case must be used for the filenames entered on the command line.) 
Of the classes being compiled, all but the last (Go) are part of the CmdLnArg
package. The Go class is a test application that uses the package and prints
the results to STDOUT. To test the program, type java Go -in file.in -out
file.out. This starts the Java interpreter (java.exe) and tells it to run the
Go application (see Listing Five). The rest of the parameters are passed into
the Go class for interpretation. To see all the parameters Go supports, run
the application without parameters: java Go. (Go is not a reserved name, just
the name of the class I picked to include a static main method.)


Using the Package


To use the package in your applications, follow the model provided in the
processArgs method of the Go application. All applications must follow the
same basic steps:
1. Create a new instance of ArgProcessor and pass it the arguments the
application received from the interpreter; see Example 2(a).
2. Create and add ArgReqArgType and WordArgType instances for each argument
you want to process; see Examples 2(b) and 2(c). Example 2(c) uses a shortcut:
ArgReqArgType takes a StringBuffer as input during construction. Because a
StringBuffer is a modifiable object, the original StringBuffer (inputName) can
be modified in step 3. This means that your code does not need to maintain a
reference to the ArgReqArgType object, but can just check the StringBuffer
object for the results later.
3. Call the ArgProcessor.process method to update the ArgReqArgType and
WordArgType objects added in step 2; see Example 2(d).
4. Interpret the results. For both WordArgType and ArgReqArgType, the result
is stored in a public member named argDest. For WordArgType, argDest is a
Boolean; for ArgReqArgType, it is a StringBuffer. Note that for
ArgReqArgTypes, you can also reference the StringBuffer used to create the
ArgReqArgType (it is the same object that argDest references); see Examples
2(e) and 2(f).


How the Package Works



The package can be divided into the ArgProcessor class (Listing One) and the
argument type classes that it processes (Listings Two through Four).
The ArgProcessor class maintains a Vector of argument types (vArgs) passed in
through the add method. Vector is a class in the package java.util and is
described as a "growable array." When the process method is called, each
argument passed into the application is compared to the argument types stored
in vArgs (via a call to getArg). If an argument type is found, it will be set
through a call to its set method.
The getArg method makes use of the Enumeration interface, also in the
java.util package. Java interfaces are meant to provide the benefits of
multiple inheritance without the complications. An interface is a type that
specifies methods and constants, but does not specify how the methods will be
implemented. The Enumeration interface, for example, specifies the
hasMoreElements and nextElements methods. It is up to the class implementing
the interface to implement those methods (in this case, VectorEnumerator,
found in Vector.java, which is shipped with the JDK).
The use of the Enumeration interface and Vector class illustrates these
features of the Java API, but I probably should have used the Dictionary class
instead. This would have made my code simpler and probably faster; I know more
now about the Java API than I did when I started.
GenericArgType (Listing Two) is an abstract class that provides little
implementation, but defines a method that ArgReqArgType and WordArgType must
implement. The abstract keyword in the class declaration means that it is not
possible to instantiate an object of this type, only a nonabstract subclass.
(Even though a GenericArgType object cannot be instantiated, variables can be
declared to be GenericArgType if they reference objects that subclass
GenericArgType.)
The constructor in WordArgType (Listing Three) shows how a superclass
constructor can be called. The super(argID) line is a call to GenericArgType's
constructor. The set method will toggle the Boolean argDest the first time it
is called and do nothing if called after that. This stops argDest from
flipping back and forth if the user inadvertently specifies the argument more
than once on the command line.
The constructor for ArgReqArgType (Listing Four) is not much different than
the constructor for WordArgType. The set method is more complicated because it
needs to use the command-line argument after the one containing the leading
hyphen. If the user has specified an argument requiring an argument and the
required argument is missing, an exception will be thrown. Otherwise, argDest
will be set, and the index will be incremented by one.


Conclusion


One enhancement to the package would be the addition of support for flag
arguments. This would not be too difficult and would probably require another
Vector class in ArgProcessor to keep track of the flag arguments separately.
This would allow a check first for word arguments and arguments that require
arguments. If that check failed, then a check for flag arguments would be
made.
Another logical enhancement for this package might be to warn the user when
the arguments are not valid. This could be done in the process method of the
ArgProcessor class by adding an Else clause at the bottom of the While loop.
I encourage you to download the JDK and work through a few small example
programs on your own. Java is relatively easy to pick up, especially if you
are familiar with C++. Java also comes with an API that includes enough
functionality to let you concentrate on the interesting parts of your program.
You may also want to watch my page at http://www.inetmi.com/~gwhi since I plan
on putting up interesting Java code and information as time permits. 
Example 1: Enter these commands to compile Java source.
javac GenericArgType.java
javac WordArgType.java
javac ArgReqArgType.java
javac ArgProcessor.java
javac Go.java
Example 2: Using the CmdLnArg package.
(a)
ArgProcessor processor = new ArgProcessor(args)

(b)
WordArgType verboseArg = new WordArgType("-verbose",
false);processor.add(verboseArg);

(c)
StringBuffer inputName = new StringBuffer("");processor.add(new
ArgReqArgType("-in", inputName);

(d)
processor.process();

(e)
boolean bVerbose = verboseArg.argDest;

(f)
String s = inputName.toString();

Listing One
// -- Declare this source to be a part of the CmdLnArg package. 
package CmdLnArg;
// -- Make use of shipped package for Vector, String, StringBuffer classes
import java.util.*;
public class ArgProcessor
 {
 // -- Args as passed into application
 String args[] = null;
 // -- All arg types we will process. Must be derived from GenericArgType.
 Vector vArgs = new Vector();
 // -- Constructor. Hang on to the arguments for when we need to
 // -- process them later.
 public ArgProcessor
 (
 String args[]
 )
 {

 this.args = args;
 } // -- end constructor
 // -- Adds one argument type for when we process
 public void add
 (
 GenericArgType arg
 )
 {
 vArgs.addElement(arg);
 } // -- end add
 // -- Process the args that were passed into the application and store
 // -- the results back in the argument types passed into the add method
 public void process
 (
 )
 throws
 Exception
 {
 int iArgIndex = 0;
 // -- Loop through all of the args passed into the application
 while (iArgIndex < args.length)
 {
 String thisArg = args[iArgIndex++];
 GenericArgType arg = getArg(thisArg);
 // -- Set the argument and (possibly) increase the arg index.
 if (arg != null)
 iArgIndex = arg.set(args, iArgIndex);
 } // -- end while
 } // -- end process
 // -- Look for an argument type that matches the passed-in target
 protected GenericArgType getArg
 (
 String argTarget
 )
 {
 Enumeration allArgs = vArgs.elements();
 GenericArgType foundArg = null;
 // -- Loop through each arg type until we find a match or there are
 // -- no more arg types.
 while (foundArg==null && allArgs.hasMoreElements())
 {
 GenericArgType testArg = (GenericArgType)allArgs.nextElement();
 if (testArg.argID.equalsIgnoreCase(argTarget))
 foundArg = testArg;
 } // -- end while
 return foundArg;
 } // -- end getArg
 } // -- end class ArgProcessor

Listing Two
// -- Declare this source to be a part of the CmdLnArg package.
package CmdLnArg;
// -- Make use of shipped packages for Vector, String, StringBuffer classes
import java.util.*;
public abstract class GenericArgType
 {
 public String argID = null;
 protected boolean wasSet;
 public GenericArgType

 (
 String argID
 )
 {
 this.argID = argID;
 wasSet = false;
 } // -- end constructor
 public boolean wasSet
 (
 )
 {
 return wasSet;
 } // -- end wasSet
 public abstract int set
 (
 String args[],
 int iThisIndex
 )
 throws
 Exception
 ;
 } // -- end GenericArgType

Listing Three
// -- Declare this source to be a part of the CmdLnArg package.
package CmdLnArg;
// -- Make use of shipped packages for Vector, String, StringBuffer classes
import java.util.*;
public class WordArgType extends GenericArgType
 {
 public boolean argDest;
 public WordArgType
 (
 String argID,
 boolean argDest
 )
 {
 super(argID);
 this.argDest = argDest;
 } // -- end constructor
 public int set
 (
 String args[],
 int iThisIndex
 )
 {
 if (!wasSet)
 {
 argDest = !argDest;
 wasSet = true;
 } // -- end if
 return iThisIndex;
 } // -- end set
 } // -- end class WordArgType

Listing Four
// -- Declare this source to be a part of the CmdLnArg package.
package CmdLnArg;
// -- Make use of shipped packages for Vector, String, StringBuffer classes

import java.util.*;
public class ArgReqArgType extends GenericArgType
 {
 public StringBuffer argDest = null;
 public ArgReqArgType
 (
 String argID,
 StringBuffer argDest
 )
 {
 super(argID);
 this.argDest = argDest;
 } // -- end constructor
 public int set
 (
 String args[],
 int iThisIndex
 )
 throws
 Exception
 {
 if (args.length <= iThisIndex)
 throw new Exception("Command line param "
 + argID
 + " must be followed by a string");
 // -- We can't directly change the string, so truncate it and add
 // -- what we want.
 argDest.setLength(0);
 argDest.append(args[iThisIndex]);
 wasSet = true;
 return iThisIndex + 1;
 } // -- end set
 } // -- end class ArgReqArgType

Listing Five
// -- Declare this source to be a part of the CmdLnArg package.
import CmdLnArg.*;
// -- Make use of shipped packages for Vector, String, StringBuffer,
// -- System classes
import java.util.*;
import java.io.*;
class Go
 {
 // -- Set by cmd line arguments
 static boolean bVerbose = false;
 static StringBuffer inputName = new StringBuffer("");
 static StringBuffer outputName = new StringBuffer("");
 static StringBuffer to = new StringBuffer("TextWriter");
 static StringBuffer sWidth = new StringBuffer("");
 static StringBuffer sTabSize = new StringBuffer("");
 static Vector vArgs = new Vector();
 public static void main
 (
 String args[]
 )
 {
 int iReturn = 0;
 try {
 processArgs(args);

 } // -- end try
 catch (Exception e)
 {
 e.printStackTrace();
 iReturn = 1;
 } // -- end catch
 System.exit(iReturn);
 } // -- end main
 static void processArgs
 (
 String args[]
 )
 throws
 Exception
 {
 ArgProcessor processor = new ArgProcessor(args);
 add(processor, new ArgReqArgType("-in", inputName));
 add(processor, new ArgReqArgType("-out", outputName));
 add(processor, new ArgReqArgType("-to", to));
 add(processor, new ArgReqArgType("-width", sWidth));
 add(processor, new ArgReqArgType("-tabsize", sTabSize));
 WordArgType verboseArg = new WordArgType("-verbose", false);
 add(processor, verboseArg);
 WordArgType includeRefs = new WordArgType("-NoRefs", true);
 add(processor, includeRefs);
 WordArgType refSections = new WordArgType("-NoSections", true);
 add(processor, refSections);
 WordArgType refPage = new WordArgType("-NoPages", true);
 add(processor, refPage);
 WordArgType horzRule = new WordArgType("-HR", true);
 add(processor, horzRule);
 processor.process();
 checkSomethingSet();
 // -- Print the results back to the user.
 System.out.println("Input name:" + inputName);
 System.out.println("Output name:" + outputName);
 System.out.println("to:" + to);
 System.out.println("width:" + sWidth);
 System.out.println("tabsize:" + sTabSize);
 if (verboseArg.wasSet())
 System.out.print("Verbose was set to ");
 else
 System.out.print("Verbose left at the default of ");
 System.out.println(String.valueOf(verboseArg.argDest));
 if (includeRefs.wasSet())
 System.out.print("IncludeRefs was set to ");
 else
 System.out.print("IncludeRefs left at the default of ");
 System.out.println(String.valueOf(includeRefs.argDest));
 if (refSections.wasSet())
 System.out.print("RefSections was set to ");
 else
 System.out.print("RefSections left at the default of ");
 System.out.println(String.valueOf(refSections.argDest));
 if (refPage.wasSet())
 System.out.print("RefPages was set to ");
 else
 System.out.print("RefPages left at the default of ");
 System.out.println(String.valueOf(refPage.argDest));

 if (horzRule.wasSet())
 System.out.print("HR was set to ");
 else
 System.out.print("HR left at the default of ");
 System.out.println(String.valueOf(horzRule.argDest));
 } // -- end processArgs
 // -- Adds an argument type to the processor and to our vector so we can
 // -- later make sure somethine was set.
 protected static void add
 (
 ArgProcessor processor,
 GenericArgType argType
 )
 {
 processor.add(argType);
 vArgs.addElement(argType);
 } // -- end add
 // -- Prints info and generates and exception of nothing is set
 protected static void checkSomethingSet
 (
 )
 throws
 Exception
 {
 Enumeration allArgs = vArgs.elements();
 boolean bAnythingSet = false;
 while (bAnythingSet==false && allArgs.hasMoreElements())
 {
 GenericArgType thisArg = (GenericArgType)allArgs.nextElement();
 if (thisArg.wasSet())
 bAnythingSet = true;
 } // -- end while
 if (!bAnythingSet)
 {
 System.out.println("Valid arguments requiring arguments:");
 System.out.println(" -in");
 System.out.println(" -out");
 System.out.println(" -to");
 System.out.println(" -width");
 System.out.println(" -tabsize");
 System.out.println();
 System.out.println("Valid word arguments:");
 System.out.println(" -verbose");
 System.out.println(" -NoRefs");
 System.out.println(" -NoSections");
 System.out.println(" -NoPages");
 System.out.println(" -HR");
 throw new Exception("No valid command-line arguments found");
 } // -- end if
 } // -- end checkSomethingSet
 } // -- end class Go








































































Implementing Multilevel Undo/Redo


Never having to say you're sorry




Jim Beveridge


Jim, a software developer at Turning Point Software, can be contacted at
jimb@turningpoint.com.


A must-have feature in document-based applications is undo and redo. Most
Windows-based word processors, for instance, will faithfully undo every
command you have entered for the past 20 minutes, then perfectly redo all of
them. The benefit to users is enormous. No matter what users do, there is a
safety net to fall back on. Beginners can forge ahead without fear, while
advanced users can play "what-if" and test the result of various changes.
In this article, I'll present a generalized undo/redo mechanism with a history
length limited only by available memory. This undo/redo mechanism will be
built on top of Spiral, a sample application written under Microsoft Visual
C++ and the Microsoft Foundation Class library (MFC). Note that the code
listings cannot be compiled by themselves; they need additional files, which
are provided electronically; see "Availability," page 3. I've tested the
Spiral code with both Visual C++ 1.52 and Visual C++ 2.2, in 16 and 32 bit.
The executable version of Spiral (also available electronically) is a 16-bit
application.
To use Spiral, click with the left mouse button anywhere in the window, then
drag out to define a circle. Let up, and the circle will be filled with a
spiral determined by the span angle in the menu option Spiral:Angle. Changing
the span angle will give radically different results. You can also click on
the line of an existing spiral to select it. You can then move the spiral or
delete it by selecting Clear from the Edit menu. After any of these
operations, you can select Undo or Redo from the Edit menu.


To Undo or Not to Undo 


One of the first questions you must ask before implementing undo is, "What
kinds of things can be undone?" It doesn't make sense, for instance, to undo
commands such as Open or Print. It is necessary to identify every action that
the user can make in an application before answering this question. Commands
in Spiral can be grouped into a variety of categories:
File operations (File New, Open, Save, Save As, Close Print, Print Setup).
View (View Toolbar, Status Bar Window New, Cascade, Tile).
Application state (preferences, options). 
Document actions (Edit, Clear, Clear All, draw spiral, click to select, move
the selection).
File and Print operations are clearly not undoable. Although undoing changes
to the view or application state might be desirable, most applications only
implement undo for actions that change the document. To keep the UI
consistent, many applications restore the application state as a side-effect
of undoing document changes. For example, changing the span angle of the
pattern in Spiral affects the internal state, but it does not make a visible
change to the document. Therefore, a decision must be made to ignore the span
angle during undo, or to restore span angle changes as part of an undo to the
document. I take the former approach in this article.
Document actions are clearly eligible for undo processing, but even here a
careful check is needed. Making a selection does not change the document, so
Undo Selection should never appear on the user's menu. Moving a selection does
change the document, so it must be undoable. To decide what to do, ask
yourself, "What would I, as a user, want to happen?" In the case of the move,
it would probably be best to restore the selection's location, then select it
again.
The decision must also be made about undo's scope. Undo should never extend
across documents. Each document must have its own undo history, otherwise it
would be necessary to undo any window-focus commands so that the proper window
would have the focus for each undo task. For the same reason, commands such as
Cascade Windows should also not be part of the normal undo mechanism.
Other decisions are not so obvious. If the user undoes a Cut, should the
clipboard be restored to its previous state, if possible? Similarly, if the
user saves a file using an existing name, should undo restore the previous
file? No application I've seen implements either of these, but from the user's
perspective it may be desirable to implement this behavior.
Avoid saving the undo history, particularly in your master save file.
Persistent undo histories tend to be confusing to users. The undo history can
also turn into a maintenance headache when the application is revised.


Assigning Tasks


The approach I took to creating a set of C++ classes to implement undo/redo is
similar to building construction. A construction crew has specialized workers,
each of whom performs one particular task. Each worker is told what to do by a
foreman. The foreman may not know how to do any of these tasks, but he does
know how to manage the construction effort.
For undo/redo, the job is to put together a document instead of a building.
Workers will be needed to move selections, create spirals, clear spirals, and
so on. Each task will be represented by a class derived from a base class
CTask. The CTask objects will be worker objects that perform certain tasks.
Each worker class will also know how to undo its task.
The class CTask (see Listing One), is defined as an abstract base class: It
represents what every task should look like but is not itself a real task.
Each class derived from CTask has a constructor to describe it and a
destructor to clean up any private data. Every time the user does something
that could be undone, a new instance of a class derived from CTask will be
created. For example, suppose that the user draws a new spiral. The
CCreateSpiralTask class (see Listing Two) knows how to add a new spiral to the
document. When the user releases the button and the spiral is finished, a new
task is created. The only thing interesting about a new spiral is the spiral's
data, so this information is passed to the task in the constructor: 
CCreateSpiralTask pCreateSpiralTask
=new CCreateSpiralTask(pSpiral);.
The newly created task stores this information in private member data. Each
instance of the class will have its own data. Note that CCreateSpiralTask,
like all tasks, has no default constructor. It is not possible to create a
task without knowing all the details of the task. 
Who should start this task? Only the foreman can do that. The foreman manages
all tasks. Listing Three presents the CForeman class. The foreman is
responsible for executing tasks and keeping track of the task history. The
foreman is the heart of the undo/redo mechanism. To request that a task be
executed, SubmitTask() in CForeman is called:
m_Foreman.SubmitTask(pCreateSpiralTask);. 
To execute the submitted task, CForeman calls the task's Do() member function.
CForeman should be the only object that calls any task's member functions. The
foreman has sole responsibility for keeping the undo/redo history consistent
with the document. If any changes are made to the document without the foreman
having a corresponding task, then the tasks in the history can become confused
and crash the application. In SubmitTask(), CForeman immediately executes the
task by calling: pCreateSpiralTask-->Do(m_pSpiralView);. If the task succeeds,
CForeman adds the task to the end of its history list. The foreman implements
the command history simply by keeping track of which tasks were called. 


Managing the Command History


The command history is maintained as a linked list of tasks that is managed
somewhat like a stack. The MFC collection class CObList with the template
CTypedPtrList quickly provides the needed implementation. (I've "faked" the
template CTypedPtrList in the Spiral source for Visual C++ 1.5x.) CForeman
also keeps a pointer to the last task executed. Undo moves the pointer
backwards in the list, and redo moves the pointer forward; see Listing Four. 
Let's take an example in which the user draws two spirals, makes a selection,
then selects Clear from the Edit menu. At this point the task list looks like
this (remember, making a selection is not undoable by itself):
1. CreateSpiralTask #1.
2. CreateSpiralTask #2.
3. ClearSpiralTask.
The last task executed is task 3. Now the user selects Undo. CForeman handles
the undo by calling the Undo() member function of the last executed task with
a pointer to the current document: pTask-->Undo(m_pSpiralDoc); // pTask points
at task #3.
Now that task 3 has been undone, the last task executed is task 2. The user
selects Undo again. CForeman calls the Undo() member function of task 2. The
last-task-executed pointer is moved back to task 1:
pTask-->Undo(m_pSpiralDoc); // pTask points at task #2.

Next the user hits Redo. CForeman calls the Do() member function of task 2
again and advances the pointer for last task executed back to task 2:
pTask-->Do(m_pSpiralDoc); // pTask points at task #2.
Finally, the user creates another spiral. CForeman handles the new task by
throwing away every task that is waiting to be redone. In this case, only task
3 is thrown away. The task is removed from the linked list of tasks and
discarded with the C++ delete operator. Because the destructor for CTasks is
virtual, the object pointed at by pClearSpiralTask will have the opportunity
to properly clean up.
The undo/redo mechanism is now basically functioning. What's left is to write
a lot more tasks and to properly update the user interface.


Inside a Task


The class for each task must keep track of enough data to be able to complete
the task without querying to the current view and, preferably, without
querying the application state. The current view may not be queried because
the user could select Redo from within any view. By not querying the
application state, Redo can be executed without producing a different result
if the application state changes.
All of the information required to complete a task should be part of the
constructor for the task. For example, CCreateSpiralTask takes a completed
spiral as its argument. The completed spiral contains information about the
center, radius, and span angle. If CCreateSpiralTask's constructor did not
include the span angle, then the spiral would change if the user chose the
Spiral:Angle menu option between the Undo and the Redo.
The class for each task must also keep track of enough data to be able to undo
the task. In the case of CCreateSpiralTask, Undo() can retrieve the last
spiral from the spiral list because the last spiral created is always at the
end of the spiral list. CClearSpiralTask is the opposite of CCreateSpiralTask.
After Do(), the task remembers the deleted spiral and its position in the
class's data area.
A task can be deleted at any time, so it is important to keep track of whether
a task actually owns any objects it points at. If CCreateSpiralTask tries to
delete its spiral after inserting the spiral into the document, the
application will crash. Spiral solves this problem by making each task modal.
The modes are "able to Do()" and "able to Undo()". After CForeman calls Do(),
it must call Undo() before calling Do() again. In CCreateSpiralTask, after
placing the spiral object back in the document in Do(), the task's pointer to
the spiral is set to NULL. If the pointer is not set to NULL in the task, then
CCreateSpiralTask's destructor and the document's destructor will both try to
free the spiral object. Listing One shows, for each member variable, whether
the member variable is needed for Do() or for Undo().


Updating the Menus


I was always impressed by applications that were able to give descriptions of
the actions on the undo and redo history. With the implementation of undo/redo
described in this article, this functionality is easy to implement.
All of the code necessary in the view to update Undo and Redo in the Edit menu
is in Listing Five. The corresponding code in CForeman is shown in Listing
Four in the member functions GetUndoDescription() and GetRedoDescription().
CForeman uses its last-task-executed pointer to get a pointer to the next task
to be undone or redone. Each task has a virtual function called
GetDescription() that returns a short description of the task. CForeman
returns this result from GetDescription(), or the empty string if no task is
waiting.


Managing Multiple Documents and Views


In an application that supports multiple documents and/or multiple views, one
problem is how to maintain separate undo histories. CForeman uses no global
variables, so multiple instances of CForeman can be created. To implement
multiple histories, CForeman is placed as a member of the CSpiralDoc document
class. This way there will be a separate undo history for each document, and
multiple views of the same document will share a single undo history.
Functions that need to call CForeman will be able to access the class through
a pointer to the document. In as much as I've already defined undoable
operations to be those that affect the document, I can guarantee that a
pointer to the current document's foreman will always be available when I need
it.
Spiral supports multiple views of the same data, so any time the document is
changed, all views must be updated. MFC provides a function called
UpdateAll-Views() as part of the document class. Hints are used in Spiral to
describe specific spirals or rectangles to be updated. These hints optimize
the redraw handling and prevent excess redraws across the views.
Tasks are called by CForeman with a pointer to the document because undo/redo
tasks work on the document, not the view. Views are only convenient
representations of that document. With one exception, tasks must completely
ignore any view information because the current view could be closed or
modified the next time the task executes a Do() or an Undo(). The exception is
that the current view should automatically scroll to make whatever was undone
or redone visible to the user.


Updating the Modified Flag


Very few Windows apps recognize when a document has been returned to the
original state. With the undo history presented here, it is possible to
determine that the user has undone all changes and to reset the modified flag
to False. This prevents the user from typing a key accidentally, pressing
Undo, and still having the application ask if it should save the changes.
This feature can be implemented safely by throwing away the undo history when
the user saves. This is the way that Visual C++ 2.x and most other
applications are implemented. Word for Windows, on the other hand, keeps the
undo history after the document is saved but cannot detect that the user has
undone all changes since the last save.


Adding Undo/Redo to Existing Apps


Many existing MFC applications can easily be modified to support this
mechanism. User actions are typically implemented as part of the view. The
view ends up having numerous member functions, each of which takes care of one
user-interface action. To implement an undo/ redo history, each of these
member functions is pulled out of the view and made into a task class. The key
to success is making sure that anything the user can do that modifies the
document is made into an undo/redo task.

Listing One
// task.h - Definitions of Undo/Redo Tasks
// NOTES: 1. Strings should be in resources, but leaving them here makes the 
// examples clearer.
// 2. The virtual functions are not really inline, but they are so short that 
// it makes it easier to read.
/////////////////////////////////////////////////////////////////////////////
class CTask : public CObject {
public:
 CTask() { /* empty */ }
 virtual ~CTask() { /* empty */ }
 virtual void Do(CSpiralDoc*) = 0;
 virtual void Undo(CSpiralDoc*) = 0;
 virtual LPCSTR GetDescription() = 0;
private:
 CTask(const CTask&); // Disable copy constructor
 CTask& operator=(const CTask&); // Disable assignment operator
};

class CCreateSpiralTask : public CTask {
public:
 CCreateSpiralTask(CSpiral* pSpiral) { m_pSpiral = pSpiral; }
 virtual ~CCreateSpiralTask() { delete m_pSpiral; }
 virtual void Do(CSpiralDoc*);
 virtual void Undo(CSpiralDoc*);
 virtual LPCSTR GetDescription() { return "Draw"; }
private:
 CSpiral* m_pSpiral; // Needed for Do()
};
class CMoveSpiralTask : public CTask {
public:
 CMoveSpiralTask(CSpiral* pSpiral, const CSize& size)
 : m_Size(size) { m_pSpiral = pSpiral; }
 virtual ~CMoveSpiralTask() { }
 virtual void Do(CSpiralDoc*);
 virtual void Undo(CSpiralDoc*);
 virtual LPCSTR GetDescription() { return "Move"; }
private:
 CSpiral* m_pSpiral; // Needed for Do() and Undo()
 CSize m_Size; // Needed for Do() and Undo()
};
class CClearSpiralTask : public CTask {
public:
 CClearSpiralTask(POSITION pos) { m_Pos = pos; m_pSpiral=NULL; }
 virtual ~CClearSpiralTask() { delete m_pSpiral; }
 virtual void Do(CSpiralDoc*);
 virtual void Undo(CSpiralDoc*);
 virtual LPCSTR GetDescription() { return "Clear"; }
private:
 POSITION m_Pos; // Needed for Do()
 CSpiral* m_pSpiral; // Needed for Undo()
};
class CClearAllTask : public CTask {
public:
 CClearAllTask() { }
 virtual ~CClearAllTask();
 virtual void Do(CSpiralDoc*);
 virtual void Undo(CSpiralDoc*);
 virtual LPCSTR GetDescription() { return "Clear All"; }
private:
 CSpiralList m_SpiralList; // Needed for Undo()
};

Listing Two
/* task.cpp - Implementation of undo/redo tasks */
#include "stdafx.h"
#include "Spiral.h"
#include "Foreman.h"
#include "SpiraDoc.h"
#include "SpiraVw.h"
#include "Task.h"
#ifdef _DEBUG
#undef THIS_FILE
static char BASED_CODE THIS_FILE[] = __FILE__;
#endif
/* Undo/Redo Actions */
#define INVALIDATE_SPIRAL() pSpiralDoc->UpdateAllViews(NULL, 0, m_pSpiral); \
 pSpiralDoc->m_updateRect = m_pSpiral->m_rectBounding;

/////////////////////////////////////
void CCreateSpiralTask::Do(CSpiralDoc* pSpiralDoc)
{
 // Before Do(), we own the spiral. After Do(), the doc owns the spiral.
 pSpiralDoc->m_SpiralList.AddTail(m_pSpiral);
 INVALIDATE_SPIRAL();
 m_pSpiral = NULL;
}
void CCreateSpiralTask::Undo(CSpiralDoc* pSpiralDoc)
{
 m_pSpiral = pSpiralDoc->m_SpiralList.RemoveTail();
 INVALIDATE_SPIRAL();
}
/////////////////////////////////////
void CMoveSpiralTask::Do(CSpiralDoc* pSpiralDoc)
{
 INVALIDATE_SPIRAL();
 m_pSpiral->Move(m_Size);
 INVALIDATE_SPIRAL();
}
void CMoveSpiralTask::Undo(CSpiralDoc* pSpiralDoc)
{
 INVALIDATE_SPIRAL();
 CSize size(-m_Size.cx, -m_Size.cy);
 m_pSpiral->Move(size);
 INVALIDATE_SPIRAL();
}
/////////////////////////////////////
void CClearSpiralTask::Do(CSpiralDoc* pSpiralDoc)
{
 // Need to remember the spiral preceding the one being deleted so if 
 // the user does an undo, we know where to put it back.
 CSpiralList& SpiralList = pSpiralDoc->m_SpiralList;
 POSITION prevPos = m_Pos;
 SpiralList.GetPrev(prevPos);
 m_pSpiral = SpiralList.GetAt(m_Pos);
 SpiralList.RemoveAt(m_Pos);
 INVALIDATE_SPIRAL();
 m_Pos = prevPos;
}
void CClearSpiralTask::Undo(CSpiralDoc* pSpiralDoc)
{
 CSpiralList& SpiralList = pSpiralDoc->m_SpiralList;
 m_Pos = SpiralList.InsertAfter(m_Pos, m_pSpiral);
 INVALIDATE_SPIRAL();
 m_pSpiral = NULL;
}
/////////////////////////////////////
CClearAllTask::~CClearAllTask()
{
 while (!m_SpiralList.IsEmpty())
 delete m_SpiralList.RemoveHead();
}
void CClearAllTask::Do(CSpiralDoc* pSpiralDoc)
{
 CSpiralList& documentSpiralList = pSpiralDoc->m_SpiralList;
 while (!documentSpiralList.IsEmpty())
 m_SpiralList.AddTail(documentSpiralList.RemoveHead());
 pSpiralDoc->UpdateAllViews(NULL);

}
void CClearAllTask::Undo(CSpiralDoc* pSpiralDoc)
{
 CSpiralList& documentSpiralList = pSpiralDoc->m_SpiralList;
 while (!m_SpiralList.IsEmpty())
 documentSpiralList.AddTail(m_SpiralList.RemoveHead());
 pSpiralDoc->UpdateAllViews(NULL);
}

Listing Three
/* foreman.h - Class definition of undo history manager */
// Forward Declarations. By having these, we do not have to load 
// other include files.
class CSpiralDoc;
class CTask;
class CSpiral;
// Type definitions
#ifdef WIN32
typedef CTypedPtrList<CObList,CSpiral*> CSpiralList;
typedef CTypedPtrList<CObList,CTask*> CTaskList;
typedef CArray<CPoint,CPoint> CPointArray;
#else
#include "template.h"
#endif // WIN32
// CForeman Class Definition
class CForeman
{
public:
 CForeman();
 ~CForeman();
 void SubmitTask(CTask*);
 void Undo();
 void Redo();
 // The "Can" functions are used by OnCommandUpdate
 // to figure out when to grey out the menu option.
 BOOL CanRedo();
 BOOL CanUndo();
 // The "GetDescription" functions are used by OnCommand
 // to describe the actions waiting to be redone and undone.
 LPCSTR GetRedoDescription();
 LPCSTR GetUndoDescription();
 // The foreman is an aggregate member of the document. SetDocument is 
 // called by the document's ctor to provide a pointer back to the doc.
 void SetDocument(CSpiralDoc* pSpiralDoc);
 // After a do or undo, make sure the user can see it.
 void MakeLastTaskVisible(CSpiralDoc* pSpiralDoc);
private:
 // Linked list of tasks.
 CTaskList m_taskList;
 // The position and the index both indicate the next
 // task to undo, but in slightly different forms.
 POSITION m_DoUndoPosition;
 int m_iDoUndoIndex;
 // Remember which document this foreman is related to.
 CSpiralDoc* m_pSpiralDoc;
};
inline BOOL CForeman::CanRedo(){return m_iDoUndoIndex <
m_taskList.GetCount();}
inline BOOL CForeman::CanUndo(){return m_iDoUndoIndex > 0; }
inline void CForeman::SetDocument(CSpiralDoc* pSpiralDoc) 

 { m_pSpiralDoc = pSpiralDoc; }

Listing Four
/* foreman.cpp - Implementation of undo history manager */
#include "stdafx.h"
#include "Spiral.h"
#include "Foreman.h"
#include "Spiradoc.h"
#include "SpiraVw.h"
#include "Task.h"
#ifdef _DEBUG
#undef THIS_FILE
static char BASED_CODE THIS_FILE[] = __FILE__;
#endif
/* Foreman */
CForeman::CForeman()
{
 m_iDoUndoIndex = 0;
}
CForeman::~CForeman()
{
 while (m_taskList.GetCount() > 0)
 delete m_taskList.RemoveTail();
}
void CForeman::SubmitTask(CTask* task)
{
 while (m_iDoUndoIndex < m_taskList.GetCount())
 delete m_taskList.RemoveTail();
 m_DoUndoPosition = m_taskList.AddTail(task);
 m_iDoUndoIndex++;
 task->Do(m_pSpiralDoc);
 m_pSpiralDoc->SetModifiedFlag();
}
void CForeman::Undo()
{
 m_pSpiralDoc->m_updateRect.SetRectEmpty();
 if (CanUndo()) {
 // Undo the current entry, then backup in the list.
 // GetPrev returns the object at the *current* position
 // indicated by m_DoUndoPosition, then sets m_DoUndoPosition
 // back one link. A better name is "GetCurrentAndMoveBackPosition"
 CTask* pTask = m_taskList.GetPrev(m_DoUndoPosition);
 --m_iDoUndoIndex;
 pTask->Undo(m_pSpiralDoc);
 }
 // If the user undid everything, then the document is unchanged
 // relative to how it started. Clear the "modified" flag to show this.
 // Small bug: we do not account for File:Save properly in this example.
 if (m_taskList.GetCount() == 0)
 m_pSpiralDoc->SetModifiedFlag(FALSE);
 if (!m_pSpiralDoc->m_updateRect.IsRectEmpty())
 MakeLastTaskVisible(m_pSpiralDoc);
}
void CForeman::Redo()
{
 m_pSpiralDoc->m_updateRect.SetRectEmpty();
 if (CanRedo()) {
 // Advance to the next entry and execute it.
 m_iDoUndoIndex++;

 if (!m_DoUndoPosition)
 m_DoUndoPosition = m_taskList.GetHeadPosition();
 else
 m_taskList.GetNext(m_DoUndoPosition);
 CTask* pTask = m_taskList.GetAt(m_DoUndoPosition);
 pTask->Do(m_pSpiralDoc);
 m_pSpiralDoc->SetModifiedFlag();
 }
 if (!m_pSpiralDoc->m_updateRect.IsRectEmpty())
 MakeLastTaskVisible(m_pSpiralDoc);
}
LPCSTR CForeman::GetRedoDescription()
{
 if (CanRedo()) {
 POSITION nextPos = m_DoUndoPosition;
 if (!nextPos)
 nextPos = m_taskList.GetHeadPosition();
 else
 m_taskList.GetNext(nextPos);
 return m_taskList.GetAt(nextPos)->GetDescription();
 }
 return "";
}
LPCSTR CForeman::GetUndoDescription()
{
 if (CanUndo()) {
 return m_taskList.GetAt(m_DoUndoPosition)->GetDescription();
 }
 return "";
}
// Find the currently active view and scroll it so that the
// last rect invalidated by a task is visible.
void CForeman::MakeLastTaskVisible(CSpiralDoc* pSpiralDoc)
{
 // Get the active view so it can be updated.
 CMDIChildWnd * pChild =
 ((CMDIFrameWnd*)(AfxGetApp()->m_pMainWnd))->MDIGetActive();
 if ( !pChild )
 return;
 CView * pView = pChild->GetActiveView();
 if ( !pView !pView->IsKindOf( RUNTIME_CLASS(CSpiralView) ))
 return;
 // Compare the window rect to the update rect and act appropriately.
 CSpiralView* pSpiralView = (CSpiralView*) pView;
 CClientDC dc(pView);
 pView->OnPrepareDC(&dc);
 CRect clientRect;
 pView->GetClientRect(&clientRect);
 if (!dc.PtVisible(pSpiralDoc->m_updateRect.TopLeft())
 !dc.PtVisible(pSpiralDoc->m_updateRect.BottomRight()) ) {
 // Try to center the spiral of interest in the current view.
 CPoint ptCenter = pSpiralDoc->m_updateRect.TopLeft();
 int xDesired = ptCenter.x - clientRect.Width() / 2;
 int yDesired = ptCenter.y - clientRect.Height() / 2;
 pSpiralView->ScrollToPosition(CPoint(xDesired, yDesired));
 // This next line is required for proper updates with splitters
 pSpiralView->GetDocument()->UpdateAllViews(NULL, 0, NULL);
 }
}


Listing Five
// Excerpts from SpiraVw.CPP
void CSpiralView::OnUpdateEditUndo(CCmdUI* pCmdUI) 
{
 CString str("Undo ");
 str = str + GetDocument()->m_Foreman.GetUndoDescription() + "\tCtrl+Z";
 pCmdUI->SetText(str);
 pCmdUI->Enable(GetDocument()->m_Foreman.CanUndo());
}
void CSpiralView::OnEditUndo() 
{
 GetDocument()->m_Foreman.Undo();
}
void CSpiralView::OnUpdateEditRedo(CCmdUI* pCmdUI) 
{
 CString str("Redo ");
 str = str + GetDocument()->m_Foreman.GetRedoDescription() + "\tCtrl+A";
 pCmdUI->SetText(str);
 pCmdUI->Enable(GetDocument()->m_Foreman.CanRedo());
}
void CSpiralView::OnEditRedo() 
{
 GetDocument()->m_Foreman.Redo();
}






































Networking Intelligent Devices


NEST brings client/server to embedded systems




Gil Gameiro


Gil is an engineer with Novell's NEST group. He can be contacted at
gil_gameiro@novell.com or http://www.vii.com/~bird/.


As any coffee aficionado will tell you, there's more to a good cup of
cappuccino than freeze-dried flakes and hot water. Today's coffee makers, in
fact, do just about everything but drink the java for you. The espresso
machine in Figure 1, for instance, grinds beans, compacts a wafer, and serves
a good strong cup of espresso--then discards the grounds in a garbage drawer. 
However, what really makes this espresso maker unique is that I've connected
it to a Novell network and control it with a Windows application I call
"Expresso Maker." Figure 2 shows how the application lets you remotely select
a coffee maker, heat the water, specify the serving size and strength, make
the coffee, and gather real-time information about the process and statistics
about usage--all across the network. The program provides system
administration feedback for the resident "javameister"; see Figure 3.


Just Another Embedded System


In many ways, the Expresso Maker system is nothing more than a typical,
networked embedded system. The system designer has to provide a secure
connection, make it possible for the supplier to perform diagnostics and
maintenance tasks, and enable a broker service so that end users can locate
any given item/service, check for availability, and find the closest supplier.
At the heart of the Expresso Maker application is Novell's Embedded Systems
Technology (NEST), which is designed to let you incorporate network protocols
and client services into embedded systems. Even though my application runs
under Windows, I could have written it for OS/2 PM because the library for
NEST remote clients is given in source format. The development of the NEST
target is hosted on whatever environment an operating-system vendor supports.
For WindRiver's VxWorks on an AMD29K processor, for instance, I use a SunSPARC
workstation. For FlexOS, I use DOS and Watcom C. Byte-order dependencies and
alignment differences are therefore reconciled, and all system-level functions
for memory and process management and event, timer, interrupt, and I/O
services are specified in an API that's available for implementation on any
kernel or operating system that supports preemptive multitasking. 
The NEST architecture consists of the following: 
Application layer, which hides the differences between OSs by providing an
easy-to-port set of common OS services. 
NetWare services layer, which provides access to NetWare services (queue,
bindery, directory, connection, file, message, auditing, and authentication
services).
Connection layer, which contains the protocol stacks and the underlying
"plumbing" that interconnects every network node. It provides the connection
between the Services layer and the physical network.
Developing NEST-based applications (like Expresso Maker) requires the NEST
SDK. This toolkit is DOS based, although the SPX and IPX test tools included
with it are Windows hosted. Also included is a Print test tool for testing
embedded NEST printers, which requires a server with NetWare 3.x or 4.x, a
Windows workstation, and the embedded printer to test.
The NEST requester supports--but does not require--connection to NetWare 3.x
and 4.x servers. It includes support for packet signatures, packet burst, RSA
login authentication, and directory services. NEST also supports the NetWare
protocol stack. C source for SPX/IPX, LSL, and MLID is based on the ODI model
standard with open interfaces to the transport services for applications and
the MLID to accommodate different media types. The LSL supports Ethernet
302.2, Ethernet 802.3, Ethernet II, and Ethernet SNAP frame types. The next
version of NEST will also come with source code for IP, TCP, UDP, SNMP, NP,
IPX, SPX, NCP, and NDS protocols. 
To connect the off-the-shelf coffee maker to the network, I had to design,
build, and install a custom interface card. The original espresso maker (a
"Super Automatica" model from SAECO, an Italian company) was designed around a
mechanical sequencer. I had to redesign the coffee maker from the ground up,
since it did not provide the easy hooks needed for microprocessor-based
embedded devices. At the heart of the hardware is a 286-based board (in the
PC104 form with 1 MB of RAM; I could have used less but this amount of memory
was on board) and a 27C010 256-KB EPROM.
For the network adapter, I used an AMD79C960 (PCNet/ISA) chip, mainly because
it is a one-chip adapter and NEST provides a sample HSM that can be modified
to support the PCNet/ISA. The network chip does not require additional RAM and
needs only a little decoding logic, a 256-byte EPROM (mostly for the MAC
address), and the line transformer.
Finally, I designed an I/O interface based on an 8255 Programmable I/O chip,
an 8253 3-channel counter timer (to count the debit of water being served), a
2x16 LCD display, some push-buttons, and six opto-triacs with zero-crossing
detection. These opto-triacs are boosted by TO-220AB case triacs to control
the power elements in the maker--the grinder, drop-dose solenoid, a motor that
compacts the grounds in a wafer, a pump and gate for the water, and a water
heater that keeps the machine ready at all times to serve a cup of brew in
seconds. 
I used FlexOS Lite, a multitasking, real-time operating system from ISI
(FlexOS was originally developed by Digital Research). Since ISI's
implementation does not provide ROMing tools, I had to write a utility that
links the compiled operating system with the NESTed application, compresses
it, and generates an Intel Hex file along with a loader header. The loader
code presents a valid signature to the ROM POST that initializes it. It then
handles the boot process by loading the operating system along with its NEST
driver application without returning.
Even with all this, the most difficult part of the job was cutting a square
hole on the thick metal back of the coffee maker to insert the RJ45 10baseT
network connection.


Building the Expresso Maker Program


Central to any NEST app is the NEST Protocol (NP), a "light" protocol
especially developed for embedded devices allowing for secure remote
configuration and operation. NP's design puts the connection burden on the
remote client, minimizing the code on the NEST side, while maintaining a
secure connection based on a login with a randomly generated access key and
encrypted password. Moreover, each packet contains a running public-key
signature. NP works on top of the IPX layer to minimize the required protocol
stack. It wants less than 10 KB of code on the embedded device. This protocol
does not provide real-time management. It was aimed at browsing and
maintaining settings of predefined configurable items (given that the
logged-in user has relevant access rights).
When building Expresso Maker, I wanted NP to transfer information other than
remote-configuration data. Fortunately, I found an undocumented function,
NPRegisterClassHandler(), used by the remote configuration component to
register its services with the NP daemon. The call requires a pointer to a
function (void (*)(NPConnection*, ECB*, ECB*)) and an integer that is a
function number of NP to manage.
NP is composed of an NPHeader and IPX; see Listing One. The Function and
SubFunction fields give the type of information carried by the data following
the header. When the NP daemon receives a request from a remote client, it
matches the Function number to known services and normally discards the packet
if the service is unknown--unless it matches a user-defined registered handler
(also confusingly referred to as "Class Number" in the source comments).
Listing Two handles my extended class of services. 
The ClassHandler of incoming data is rather simple, but its expected behavior
is rather tricky: It takes a pointer to a ConnectionHandle, a structure
maintained by the NP for every logged-in and active connection. (I strongly
recommend not modifying any of the contents of this structure, as it is
apparently present for the ClassHandler in charge of the login and
authentication process.)
The two remaining parameters are pointers to event control block (ECB)
structures that send and receive network packets. The first ECB* is the
received ECB for the request, and the second is an ECB preallocated and
formatted for a reply to the request. You can assume that the
ECB_Fragment[1].FragmentAddress will always contain the full NP data,
including the header, and that any remaining lower-level protocol (IPX) would
be taken in the ECB_Fragment[0]. Supposedly, future releases of NP will run on
top of UDP as well as IPX. For now, however, forget about the protocol used,
since it has been preformatted by the NP daemon.
The preallocated buffer for the NP data is big enough to contain the largest
amount of data (504 bytes) supported by NP. Although IPX/Ethernet can transmit
up to 1514 data bytes, Novell limited the protocol size to a total of 576
bytes because NP was designed to remain as small as possible; as such, it
cannot disassemble and reassemble large packets. In fact, 576 is the largest
number of bytes that can be transmitted between two IPX/Ethernet nodes in a
heterogeneous network (ARCNet suffers this limit, for example). All together,
SNAP (a larger frame type than most common 802.2), IPX, and the NPHeader make
up close to 504 bytes.
The class-handler routine extracts the pointer in the ECB's Fragment[1] by
casting it to a pointer to an NPRequestBuffer. The SubFunction number in the
NP header allows routing and processing of the request made by the remote
client.


A Few Caveats 


Since NP was designed to be small and secure, it combines system-maintenance
packets with user data. In other words, when the remote client sends a request
to be handled by the class handler, the system acknowledgment is also the
packet that will contain the user-reply data should the request generate data
on the reply (unlike SPX).
Moreover, the way the class handler retrieves the number of OEM data bytes in
the request and specifies the amount of data in the reply is still awkward:
That's probably why NPRegisterClassHandler is undocumented.
Example 1 (from my class handler) shows how the amount of data in the request
is retrieved. The field length of the NPHeader is converted to the CPU native
order using the NSwapHiLo16() macro. The result is then increased by 4 and
subtracted from SIZEOF_NPHeader. The reason for the addition is that the
Length field describes the total length of the NP packet excluding the first
two fields of the NPHeader (that is, CheckSum and Length). Subtracting
SIZEOF_NPHeader (a define equal to the actual size of the NPHeader) gives the
amount of data bytes placed in the packet for OEM usage.
Another caveat concerns declaring the amount of data attached to the reply.
The class handler must not set the Length field of the NPHeader in the reply
ECB; it must instead set the actual size of data to be transmitted plus
SIZEOF_NPHeader in the ECB_Fragment[1].FragmentSize. The NPDaemon will modify
the NPHeader's Length field to reflect the proper size.
The FragmentCount must be set to 2; otherwise no response is generated. This
step is mandatory for proper use of the undocumented ClassHandlers: Failure to
do so will make the remote client retry the same request repeatedly, causing
your handler to be called until a timeout happens.
A last rule of thumb is to process the packet as quickly as possible: Ideally,
this should take no more than 200 ms or so (if no error code is to be returned
because the action takes longer, a success should be sent right back with no
data, and some other mechanism should be used for reporting the actual error).



Inside the Expresso Maker Code


The class-handler code extracts the SubFunction number in the NP header and
routes the request to the appropriate code, which returns an error code. The
NSwapHiLo16() macro is defined to convert a 16-bit value from the CPU native
format to HiLo or vice versa: It swaps or does nothing, depending on whether
LITTLE_ENDIAN is defined. You can use your data the way you want, but NPHeader
16-bit data is carried in the HiLo format for portability.
The error code returned by my function is placed on the Status field of the
NPHeader in the reply ECB. There, too, the NSwapHiLo16() macro is used for
portability. The error code must match the philosophy of the NEST protocol: 0
is success, 70xx is an error from NP, 71xx is an error from the remote
configuration. For errors in my module, I used 76xx and kept the low byte in
predefined NP error code.
When SubFunction is beyond what you expect, it is reasonable to use the error
in the NP range (NPER_UNKNOWN_REQUEST), as this is what NP would have
otherwise returned.
The next challenge was determining how to notify the remote client of an
asynchronous event, provided NP is based on a client request (NP ack+reply
scheme).
Novell tipped me that the Sequence number was also to be used when the NEST
side takes the initiative to notify a remote client, along with a flag 0x1000
set in the NP Flags field. This way, it can notify the client's library that a
packet is not an ack/reply and requires a reply from the client. The Flags are
set with that same bit, plus another 1, to show that it is a reply: That is
how the sequence number is matched to either the previous request or previous
notification.
Novell plans to ship this code with NEST's next release; unfortunately, when I
originally designed this project, I had to write it myself. The modification
of the client library was rather easy, requiring an existing internal function
(makeAckPacket) to reply (without forgetting to turn on the extra bit
NP_FLAG_IS_ACK with the appropriate swapping macro on the client side) and the
use of the Windows PostMessage() API. The modification of the NEST side was a
little harder, as I had to create Timeout and Retry for every asynchronous
request. We'll dispense with the low-level details of creating asynchronous
retries on the NEST side without a specific thread because Novell plans to
provide this functionality in future versions of NEST.
The client deals with sending OEM extended requests by using a DLL call
(undocumented at the time): NPSendExtendedNP(ConnHandle*, EDB*) that allows
for sending extended NP packets on an existing connection. It takes a
ConnHandle* (obtained by the NPLoginToNestDevice() API call) and an EDB* (a
new structure). The EDB will probably be used for two tasks in future releases
of the pEST protocol library: describing an extended packet for the call I am
about to explain, and notifying events asynchronous to the application that's
using the DLL (like the code I wrote on the NEST side). Toward this end, the
program must post a few EDBs waiting for asynchronous events using the
NPListenForEDB() API call. (When I looked at the DLL code, NPListenForEDB()
was in an #if 0 condition block and I had to enable it.)
For now, we need consider only the following fields in the EDB: edbFunction
and edbSubFunction must be filled with the functions we made our NEST device
aware of, while edbDataBlock[] and edbDataSize contain eventual user data to
attach to the NP system header. No special care is required with the byte
ordering of edbFunction and edbSubFunction: The DLL swaps those two to network
HiLo order. However, the data block is up to the OEM. If your NEST device is
based on, say, an AMD29K CPU and the client is Windows, you have to swap
entities other than bytes. If you stick to the network order (HiLo format),
you'll always know what you are looking at, even when you use a LanAlyzer to
look at network traffic.
According to Novell, the current release of the NPDLL library will complete
all the calls synchronously, so the EDB used for an extended OEM NP message
will come back modified with the answer from the NEST device (if the latter
generates user data attached to the NP system ack). The next version of the
DLL will complete an NPSendExtendedNP() by posting an EDB message back to the
application instead of modifying the one used for the request. This change
means Expresso Maker will be incompatible with future versions of the NPDLL,
which will allow you to work synchronously or asynchronously by requesting the
operating mode after logging into a NEST device (with the NPSetConnFlags()
API, which is also in an #if 0 block in the preliminary version of the DLL).


Conclusion


The main purpose of environments such as NEST is to provide more services so
that a network device actually becomes a "server." Such a device does not
necessarily need to authenticate to a tree, but rather to exchange data and
services with a remote client (which could even be another embedded device).
NP is a small, reliable, secure protocol that can be leveraged to provide more
than just remote configuration without paying the price of large code.


Acknowledgment


I'd like to thank computer journalist Philippe Guichon for his invaluable
assistance in writing this article. 
Figure 1: Off-the-shelf espresso maker that's modified for network control.
Figure 2: Expresso Maker user menu.
Figure 3: Expresso Maker sysadm screen.
Example 1: How data is retrieved. 
int msgSize = NSwapHiLo16(reqECB->Header.Length);
msgSize -= (SIZEOF_NPHeader - 4);

Listing One 
typedef struct {
 nuint16 CheckSum;
 nuint16 Length;
 nuint16 ConnectionID;
 nuint16 SequenceNumber;
 nuint16 PacketVersionID;
 nuint16 Reserved;
 nuint16 Function;
 nuint16 SubFunction;
 nuint16 Flags;
 nuint16 Status;
} NPHeader;
/* NPHeader structure is part of every NP packet going on the wire. */
void doNPEspressoMaker(NPConnection *hConn, ECB *reqECB, ECB *replyECB);
{
 /* Get a pointer to the request */
 NPRequestBuffer *stdReq = reqECB->ECB_Fragment[1].FragmentAddress;
 /* Get a pointer to the reply packet */
 NPReplyBuffer *stdReply = replyECB->ECB_Fragment[1].FragmentAddress;
 /* Extract the SubFunction number from the NP Header */
 int replySize = 0;
 int SubFunction = NSwapHiLo16(stReq->Header.SubFunction);
 switch (SubFunction) {
 /* Remotely fake a local control pannel key press */
 case EX_MAKER_FAKE_KEY: {

 /* the key ID to fake is in the first byte */
 nuint8 key = stdReq->DataBuffer[0];
 espressoAddKeyToQueue(key);
 break;
 } 
 /* Remote client is requesting the current status */
 case EX_MAKER_REQUEST_STATUS: {
 memcpy(stdReq->DataBuffer, espressoGetStatus(),
 replySize = SIZEOF_EspressoStatus);
 break;
 }
 /* Display a message on the Maker LCD Display */
 case EX_MAKER_POST_MESSAGE: {
 /* Get NP Lenght field, convert HiLo to Native */
 int msgSize = NSwapHiLo16(reqECB->Header.Length);
 msgSize -= (SIZEOF_NPHeader - 4);
 espressoPostLCD(stdReq->DataBuffer, msgSize);
 }
 /* don't forget to report an error otherwise */
 default: stdReply->Header.Status = NSwapHiLo16(NPER_UNKNOWN_REQUEST); 
 };
 /* Always reply */
 replyECB.ECB_FragmentCount = 2;
 replyECB->ECB_Fragment[1].FragmentLength = replySize + SIZEOF_NPHeader;
}

Listing Two
typedef struct _eventDataBlock {
 struct _eventDataBlock *edbNext;
 void *edbConenction;
 int edbFunction;
 int edbSubFunction;
 int edbStatus;
 int edbRequestID;
 long edbDataSize;
 nuint8 edbDataBlock[420];
} EDB;
/* The EDB Structure as defined in the NPCAPI.H file; the DLL API. */
int TMainWindow::sendNP(int subFunction, unsigned char *req, int reqSize, char
*reply)
{
 EDB edb;
 memset(&edb, 0, sizeof(edb));
 edb.edbFunction = EX_ESPRESSO_CLASS;
 edb.edbSubFunction = subFunction;
 if (edb.edbDataSize = reqSize)
 memcpy(edb.edbDataBlock, req, reqSize);
 int error = ::NPSendExtendedNP(hConn, &edb);
 // If that request generated a reply, transfer it to the user
 if (!error && edb.edbDataSize && reply)
 // that guy's buffer better be big enough!
 memcpy(reply, edb.edbDataBlock, edb.edbDataSize);
 return error;
}
int TMainWinow::fakeKeyPress(int keyID)
{
 unsigned char key = (unsigned)keyID;
 sendNP(EX_MAKER_FAKE_KEY, &key, 1, NULL);
}

































































Fast Networking with WinSock 2.0


Multithreading can speed up your network applications




Derek Brown and Martin Hall


Derek has done extensive development work on Windows-based communications
software. He can be reached at db@stardust.com. Martin is cofounder and CTO of
Stardust Technologies, and can be reached at martinh@stardust.com.


The Windows Sockets specification has made Windows a viable network-software
platform. WinSock 1.1, for instance, provided Windows with a standard TCP/IP
networking API. This led to an explosion of network-enabled software for the
PC, which coincided with the popularity of the Internet and World Wide Web.
WinSock 2.0 builds on the earlier specification by providing true
multiprotocol support, implementing socket sharing, and offering significant
performance enhancements. Still, the upcoming WinSock 2.0 retains binary and
source compatibility with WinSock 1.1.
To better support wireless, ATM, and ISDN networks, for example, WinSock 2.0
has added quality-of-service features to give you greater flexibility and
control over a multitude of network media. These features make it possible for
the application to discover bandwidth and latency for ATM, ISDN, and other
types of networks. This information is particularly useful to multimedia
applications.
The WinSock 2.0 spec also includes explicit multipoint/multicast support.
Although some vendors provide this with WinSock 1.1, the spec doesn't require
it. In addition to the window-based, asynchronous notification of network
events in WinSock 1.1, WinSock 2.0 has event-driven asynchronous notification
and supports overlapped I/O.
One networking problem addressed by WinSock 2.0 involves a variety of
different name-resolution services. WinSock 2.0 allows a program to ask which
name-resolution services are available (for example, DNS or X.500) and then
ask a particular name service to resolve a particular name. This allows
WinSock 2.0 apps to work comfortably in heterogenous network environments.
WinSock 2.0 features such as overlapped I/O and asynchronous-event objects
rely on operating-system functionality not found in 16-bit Windows. For that
reason, WinSock 2.0 will be initially supported only in Windows 95 and Windows
NT. Implementing WinSock 2.0 on 16-bit Windows and Win32s will require
emulating some services of the newer Windows platforms.
In this article, we'll describe how to get maximum performance from WinSock
2.0 applications. In particular, we'll focus on extremely fast network-file
transfers, where the transfer rate is limited only by the speed of the network
connection, the speed of the underlying media, and the overhead of the
operating system itself. In the world of WinSock, this is as fast as it gets.
To achieve this performance, we'll take advantage of two features new to
WinSock 2.0: event objects and overlapped I/O. 


Event Objects and Overlapped I/O


Event objects are used for thread synchronization. Suppose you have two
threads, one of which relies on some component of the other thread to
complete. By calling one of the Win32 event-object routines, the first thread
can be suspended until the second signals an event. The event is a Windows
handle with two states, signaled or nonsignaled. When the first thread
requires processing in the second, it blocks and waits for the event. When the
second thread completes the necessary processing, it signals the event,
allowing the first thread to continue.
In the following discussion, our file I/O thread will be blocked until a
particular read or write completes. An event object is associated with an
asynchronous file I/O call, then WaitForSingleObject() or
WaitForMultipleObjects() is called. The thread is truly blocked until
something--in this case the completion of the I/O--signals the event. Once the
event is signaled, the blocked thread resumes.
One common use of event objects is synchronizing a file I/O thread when using
overlapped I/O. Normally, file I/O operations require multiple buffer copies.
When an application calls write(), the buffer passed into write() is copied
internally, and the write() call completes. The internally copied buffer is
(eventually) written to the file system. Overlapped I/O saves a buffer copy by
using the buffer passed to WriteFile() directly, rather than making an
internal copy. Since the operating system is using the application's buffer
rather than an internal one, the operating system must notify the application
when the I/O is complete. Until that time, the application may not and should
not change the buffer. In Win32, that notification is done by signaling an
event.


Fast Network I/O 


The program we'll present here provides the meat of what could be a generic
FTP client and server. On the server side, the program needs to open a file,
read it, send it, then close it. On the client side, the program needs to open
the file, read from the network, write the file, then close it. 
To perform this operation, we first need a connection from client to server.
On both ends of the connection, the socket needs to be created with
WSASocket() to enable the overlapped I/O. We are using a stream-based protocol
and, for simplicity, assume that we want TCP. On the server side, we then
bind() to a known port number for the client to connect to. The client can
bind() to any available address. 
Our client resolves the host name using the generic TCP/IP addressing scheme.
(This sample code doesn't delve into the various address families supported by
WinSock 2.0.) After successfully binding to a local address, the client calls
(a blocking) connect(). Once the connection is established, it's time to
actually send the file.
With WinSock 1.1, the client would just enter a tight loop, reading from the
network and writing to the file system as fast as it could. However, each
successive call to recv() would not be initiated until the previous block
could be written to disk. Each successive write call could not complete at
least until the buffers were copied internally; see Listing One. With WinSock
2.0 and the Win32 overlapped I/O mechanism, we can dramatically increase the
performance. 
The WinSock 2.0 client code begins in the function get_file_from_server() in
Listing Three. For simplicity, the connection is established using the
old-style blocking routines. A buffer pool is created into which the file is
read and from which the file writes will be sent. An event object is created
for each buffer in the pool. By associating each buffer with an event, we
ensure that each thread will be able to proceed in the proper order. Once the
buffer pool is created and initialized, we begin the process by calling
WSARecv() with each buffer/event object pair.
At this point, the operating system begins filling the buffers. Next, we
create the NetworkThread thread for handing the completion of file I/O. That
thread is blocked, waiting for the event associated with the first buffer to
be signaled. When the first network receive is complete, we create FileThread,
a second thread for completion of file I/O. We wait to create the second
thread to ensure the threads are properly synchronized. When the first buffer
is filled, the associated event will be signaled and the
WSAWaitForMultipleEvents() will return. Since we want to read the file
sequentially, we just wait for the event object associated with each buffer in
turn.
As each buffer is filled, WSAWaitForMultipleEvents() returns (we're only
looking for one event at a time, but you could use the same call to wait for
multiple events). The buffer we originally sent to WSARecv() is now full, and
we want to write it to disk. By using WriteFile(), the buffer will be sent as
is (without a copy into a buffer internal to the file system), and we'll wait
for completion in our file I/O thread. As each file operation completes, the
event associated with that buffer is signaled. We can then send that buffer
back into the pool of buffers waiting to be filled from the network, again
calling WSARecv(). Using this rotating buffer pool, we can implement a
write-behind scheme that allows the operating system to read and write at its
earliest convenience.
The server code also starts by opening a socket, bound to a local address. We
listen() for incoming connections, then enter a blocking accept(). When the
client calls connect(), the server accept()s the incoming connection and we're
ready to send the file.
As with the client, the WinSock 1.1 server in Listing Two reads and sends
blocks of data one at a time. The write to the network layer can't begin until
the data is read from disk. Again, time is wasted in internal buffer copies,
as data is first written into the kernel-disk buffers, then into the user
buffer, and finally into the protocol stack's buffers waiting to be sent. 
To speed things up, the server code in Listing Four uses the same--but
reversed--procedure as the client code. The setup procedure
send_file_to_client() first establishes the connection with the client. We
create and initialize the buffer pool and call ReadFile() with each
buffer/event object pair. After the first event is signaled (from the first
call to ReadFile()), we start the thread to handle the network I/O
notifications. Again, to make sure the threads start out synchronized, we
don't start the NetworkThread until the first file-system read is complete. As
file system reads complete, we'll initiate WSASend() calls to send those
buffers straight to the network layer. 
The most notable improvement would be to take advantage of the multiple
protocol support WinSock 2.0 offers. In this code, we established the
connection based on the familiar IP constructs and resolved the host names
using DNS. The new WinSock 2.0 name/services resolution routines let you
perform transport-independent calls that use IPX/SPX, AppleTalk, DECNet, or
any other protocol for which a WinSock 2.0 namespace service provider exists.


For More Information


One way to follow (or contribute to) the WinSock Group is to join a mailing
list. Send e-mail to majordomo@mailbag.intel.com with subscribe list in the
body of your e-mail. list can be either winsock-2 or winsock-hackers. The
winsock-2 list is the focal point of net-based discussion of WinSock 2.0. The
winsock-hackers list is a hang-out for WinSock 1.1 programmers, although a lot
more discussion happens in the WinSock newsgroups.
Among the newsgroups that provide WinSock 2.0 information are alt.winsock
(general WinSock discussion) and alt.winsock.programming (programmer-specific
WinSock discussions). WinSock information is also available at
ftp://ftp.stardust.com/pub/winsock/ (WinSock 1.1 and 2.0 documents,
freeware/shareware programs, and so on), http://www.intel.com/IAL/winsock2/
(WinSock 2.0 information), and ftp://ftp.cica.indiana.edu/pub/pc/win3/winsock/
(freeware and shareware programs). 
Finally, the World Wide Web sites providing WinSock information include
http://www.stardust.com, http://www.intel.com/IAL/winsock2/index.html,
http://sunsite.unc.edu/winsock/, and
http://www/microsoft.com/pages/developer/winsock/.

Listing One
/* o_client.c -- This is the "Old way" of doing a file transfer. Set up the 
 * connection, then go into a loop, alternating between recv()s and write()s.

 */
int get_file_from_server(char * servername, char * filename)
{
 struct hostent FAR * he;
 SOCKET sock;
 int nRc;
 struct sockaddr_in addr;
 char buf[1024];
 int hFile;
 int recvlen;
 hFile = open(filename, _O_CREATE _O_WRONLY);
 
 he = gethostbyname(servername);
 if(!he)
 {
 return ERR_NO_HOSTNAME;
 }
 sock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
 if(sock == INVALID_SOCKET)
 {
 return ERR_NO_SOCKET;
 }
 /* We want to bind to any available address we have. */
 addr.sin_family = PF_INET;
 addr.sin_port = 0;
 addr.sin_addr.s_addr = INADDR_ANY;
 nRc=bind(sock, (struct sockaddr FAR *)&addr, sizeof(struct sockaddr));
 if(nRc != 0)
 {
 closesocket(sock);
 return ERR_BIND_FAILED;
 }
 /* Now connect to the server. We'll just use a normal blocking
 * connect, because it's easiest for this example. */
 addr.sin_port = htons(KNOWN_PORT);
 addr.sin_addr.s_addr = *(unsigned long *) he->h_addr;
 nRc = connect(sock, (struct sockaddr FAR *)&addr, sizeof(struct sockaddr));
 if(nRc != 0)
 {
 closesocket(sock);
 return ERR_CONNECT_FAILED;
 }
 /* Here we enter our loop. The calls are done sequentially, and each 
 * must wait for it's respective kernel subsystem to complete the
 * operation before continuing. */
 while((recvlen = recv(sock, buf, 1024, 0)) > 0)
 {
 write(hFile, buf, recvlen);
 }
 closesocket(sock);
 return;
}

Listing Two
/* o_server.c -- This is the "Old way" of doing a file transfer. Set up the 
 * connection, then go into a loop, alternating between read()s and send()s.
 */
int give_file_to_client(char * filename)
{

 SOCKET sock, SocketID;
 int nRc;
 struct sockaddr_in addr;
 int fileptr = 0;
 int hFile = open(filename, _O_RDONLY);
 int bytesread;
 sock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
 
 if(sock == INVALID_SOCKET)
 {
 return ERR_NO_SOCKET;
 }
 /* Bind to a known port, so the client knows where to find us. */
 addr.sin_family = PF_INET;
 addr.sin_port = htons(KNOWN_PORT);
 addr.sin_addr.s_addr = INADDR_ANY;
 nRc = bind(sock, (struct sockaddr FAR *)&addr,sizeof(struct sockaddr));
 if(nRc != 0)
 {
 closesocket(sock);
 return ERR_BIND_FAILED;
 }
 /* now listen(), and wait for the client to try to connect */
 nRc = listen(sock, 2);
 if(nRc != 0)
 {
 closesocket(sock);
 return LISTEN_FAILED;
 }
 SocketID = accept(sock, (struct sockaddr FAR *)&addr,
 sizeof(struct sockaddr_in));
 if(SocketID == INVALID_SOCKET)
 {
 closesocket(sock);
 return ACCEPT_FAILED;
 }
 /* Here we enter our loop. The calls are done sequentially, and each 
 * must wait for it's respective kernel subsystem to complete the
 * operation before continuing. */
 while((bytesread = read(hFile, buf, 1024)) > 0)
 {
 send(SocketID, buf, bytesread, 0);
 }
 closesocket(SocketID);
 closesocket(sock);
}

Listing Three
/* client.c -- The new and faster way of doing a file transfer. The buffers 
 * will be filled and emptied as soon as the network and file kernel 
 * subsystems, respectively, are able. First, establish the connection with 
 * the server. Create the socket using the overlapped I/O flag. Recall that 
 * we are assuming TCP. Note that in many places, psuedo-code is used. At
 * the time it was written, the WinSock 2 SDK was not yet available. 
*/
#define DATA_BUF_SIZE 1024
#define BUF_POOL_SIZE 10
HANDLE file_thread;
DWORD file_thread_id;

HANDLE network_thread;
DWORD network_thread_id;
typedef struct _buf_pool_entry
{
 OVERLAPPED overlapped;
 WSABUF data;
 int last_buffer;
 
} BUF_POOL_ENTRY;
BUF_POOL_ENTRY buffer_pool[BUF_POOL_SIZE];
HANDLE FileHandle = 0;
SOCKET SocketID = 0;
DWORD NetworkThread(LPVOID threadparams);
int get_file_from_server(char * servername, char * filename)
{
 struct hostent FAR * he;
 SOCKET sock;
 int nRc;
 struct sockaddr_in addr;
 
 he = gethostbyname(servername);
 if(!he)
 {
 return ERR_NO_HOSTNAME;
 }
 sock = WSASocket(PF_INET, SOCK_STREAM, IPPROTO_TCP,
 NULL, // we're just using TCP, skip protocol
 // independence stuff for now
 NULL, // we'll not worry about socket groups,
 // either
 WSA_FLAG_OVERLAPPED);
 if(sock == INVALID_SOCKET)
 {
 return ERR_NO_SOCKET;
 }
 /* We want to bind to any available address we have. */
 addr.sin_family = PF_INET;
 addr.sin_port = 0;
 addr.sin_addr.s_addr = INADDR_ANY;
 nRc = bind(sock, (struct sockaddr FAR *)&addr, sizeof(struct sockaddr));
 if(nRc != 0)
 {
 closesocket(sock);
 return ERR_BIND_FAILED;
 }
 /* Now connect to the server. We'll just use a normal blocking
 * connect, because it's easiest for this example. */
 addr.sin_port = htons(KNOWN_PORT);
 addr.sin_addr.s_addr = *(unsigned long *) he->h_addr;
 nRc = connect(sock, (struct sockaddr FAR *)&addr,
 sizeof(struct sockaddr));
 if(nRc != 0)
 {
 closesocket(sock);
 return ERR_CONNECT_FAILED;
 }
 SocketID = sock;
 /* initialize the buffer pool */
 for(i = 0; i < BUF_POOL_SIZE; i++)

 {
 buffer_pool[i].overlapped.hEvent = CreateEvent(NULL,FALSE,FALSE,NULL);
 if(buffer_pool[i].overlapped.hEvent == NULL)
 {
 // clean up all the objects and exit. We failed initialization
 }
 buffer_pool[i].data.buf = malloc(DATA_BUF_SIZE);
 if(buffer_pool[i].data.buf == NULL)
 {
 // clean up all the objects and exit. We failed initialization
 }
 buffer_pool[i].data.size = DATA_BUF_SIZE;
 buffer_pool[i].last_buffer = FALSE;
 }
 /* Send all of the buffers in the pool to WSARecv, so they're ready for 
 * incoming data. */
 for(i = 0; i < BUF_POOL_SIZE; i++)
 {
 if(WSARecv(SocketID, &(buffer_pool[i].data), 1, &bytesread, &flags,
 &(buffer_pool[i].overlapped), NULL) == 0)
 {
 // Function succeeded immediately. Set the event object ourselves
 SetEvent(buffer_pool[i].overlapped.hEvent);
 }
 else
 {
 if(WSAGetLastError() == WSA_IO_PENDING)
 /* exactly what we expected */
 else
 /* we've failed, clean up and return */
 }
 }
 /* Create the network thread. The network thread waits for network 
 * operations to complete, and then takes those (full) buffers and sends 
 * them to the filesystem via WriteFile. */
 network_thread = CreateThread(NULL, 0, NetworkThread, NULL, 
 0, // run immediately
 &network_thread_id);
 /* Wait until network thread is complete */
 return SUCCESS;
}
DWORD NetworkThread(LPVOID threadparams)
{
 int nRc;
 int pool_ptr = 0;
 BOOL firstpass = 1;
 DWORD bytesread, flags;
 BOOL success;
 int file_ptr = 0;
 do
 {
 /* No need to check the return code. Since we specified WSA_INFINITE, 
 * the only time it will return is when our event is signalled. */
 WSAWaitForMultipleEvents(1, &(buffer_pool[pool_ptr].overlapped.hEvent),
 FALSE, WSA_INFINITE, FALSE);
 /* WaitForMultipleEvents() blocks until the next recv() buffer
 * is filled. We check (below) to see how many bytes were
 * actually read, and then send off to be written to disk. */
 success = WSAGetOverlappedResult(SocketID,

 &(buffer_pool[pool_ptr].overlapped),
 &bytesread, FALSE, &flags);
 /* If we've received the event notification, but the number
 * of bytes is zero, then the connection;s been closed.
 */
 if(bytesread == 0)
 {
 buffer_pool[i].last_buffer = TRUE;
 // set the event, so that the file thread will receive this "EOF"
 SetEvent(buffer_pool[pool_ptr].overlapped.hEvent);
 // we're done receiving
 break;
 }
 buffer_pool[pool_ptr].overlapped.Offset = file_ptr;
 file_ptr += bytesread;
 if(WriteFile(FileHandle, buffer_pool[pool_ptr].data.buf, bytesread, 
 &bytes_written, &(buffer_pool[pool_ptr].overlapped)) == TRUE)
 {
 // function succeeded immediately. Set the event object ourselves
 SetEvent(buffer_pool[pool_ptr].overlapped.hEvent);
 }
 if(firstpass)
 {
 // This is the first buffer to be filled. Create the
 // thread that handles completion of file I/O
 file_thread = CreateThread(NULL, 0, FileThread, NULL,
 0, // run immediately
 &file_thread_id);
 firstpass = FALSE;
 }
 if(++pool_ptr >= BUF_POOL_SIZE)
 pool_ptr = 0;
 }
 /* when bytesread is zero, we're don
 while(bytesread > 0 );
 /* Wait for file thread to complete before leaving */
 ExitThread(0L);
}
/* Everytime a write to the filesystem completes, take the buffer, and call 
 * WSARecv() so that the network subsystem can fill it back up. */
DWORD FileThread(LPVOID threadparams)
{
 int nRc;
 int pool_ptr = 0;
 DWORD bytesread, flags;
 BOOL success;
 int file_ptr = 0;
 do
 {
 /* We don't need to check return code. Since we specified INFINITE,
 * the only time it will return is when our event is signalled. */
 WaitForSingleObject(&(buffer_pool[pool_ptr].overlapped.hEvent),
 INFINITE);
 success = GetOverlappedResult(FileHandle,
 &(buffer_pool[pool_ptr].overlapped),
 &bytesread, FALSE);
 /* If we've received the event notification, but the number
 * of bytes is zero, then the connection;s been closed. */
 if(buffer_pool[i].last_buffer == TRUE)

 {
 // we're done
 break;
 }
 buffer_pool[pool_ptr].overlapped.Offset = file_ptr;
 file_ptr += bytesread;
 /* Buffer is ready to be returned to the protocol stack for reading */
 if(WSARecv(SocketID, &(buffer_pool[pool_ptr].data), 1, &bytesread,
 &flags, &(buffer_pool[pool_ptr].overlapped), NULL) == 0)
 {
 // function succeeded immediately. Set the event object ourselves
 SetEvent(buffer_pool[pool_ptr].overlapped.hEvent);
 }
 if(++pool_ptr >= BUF_POOL_SIZE)
 pool_ptr = 0; 
 }
 /* when bytesread is zero, we're done
 while(bytesread > 0 );
 ExitThread(0L);
}

Listing Four
/* server.c -- The new and faster way of doing a file transfer. The buffers 
 * will be filled and emptied as soon as the network and file kernel 
 * subsystems, respectively, are able. First, establish the connection with 
 * the server. Create the socket using the overlapped I/O flag. Recall that we

 * are assuming TCP. Note that in many places, psuedo-code is used. At
 * the time it was written, the WinSock 2 SDK was not yet available.
 */
#define DATA_BUF_SIZE 1024
#define BUF_POOL_SIZE 10
HANDLE file_thread;
DWORD file_thread_id;
HANDLE network_thread;
DWORD network_thread_id;
typedef struct _buf_pool_entry
{
 OVERLAPPED overlapped;
 WSABUF data;
 int last_buffer;
 
} BUF_POOL_ENTRY;
BUF_POOL_ENTRY buffer_pool[BUF_POOL_SIZE];
HANDLE FileHandle = 0;
SOCKET SocketID = 0;
DWORD NetworkThread(LPVOID threadparams);
int give_file_to_client(char * filename)
{
 SOCKET sock;
 int nRc;
 struct sockaddr_in addr;
 int fileptr = 0;
 dword bytesread;
 sock = WSASocket(PF_INET, SOCK_STREAM, IPPROTO_TCP,
 NULL, // we're just using TCP, skip protocol
 // independence stuff for now
 NULL, // we'll not worry about socket groups, either
 WSA_FLAG_OVERLAPPED);
 if(sock == INVALID_SOCKET)

 {
 return ERR_NO_SOCKET;
 }
 /* Bind to a known port, so the client knows where to find us. */
 addr.sin_family = PF_INET;
 addr.sin_port = htons(KNOWN_PORT);
 addr.sin_addr.s_addr = INADDR_ANY;
 nRc = bind(sock, (struct sockaddr FAR *)&addr, sizeof(struct sockaddr));
 if(nRc != 0)
 {
 closesocket(sock);
 return ERR_BIND_FAILED;
 }
 /* Now listen(), and wait for the client to try to connect */
 nRc = listen(sock, 2);
 if(nRc != 0)
 {
 closesocket(sock);
 return LISTEN_FAILED;
 }
 SocketID = accept(sock, (struct sockaddr FAR *)&addr,
 sizeof(struct sockaddr_in));
 if(SocketID == INVALID_SOCKET)
 {
 closesocket(sock);
 return ACCEPT_FAILED;
 }
 /* Initialize the buffer pool */
 for(i = 0; i < BUF_POOL_SIZE; i++)
 {
 buffer_pool[i].overlapped.hEvent = CreateEvent(NULL, FALSE, FALSE, NULL);
 
 if(buffer_pool[i].overlapped.hEvent == NULL)
 {
 // clean up all the objects and exit. We failed initialization
 }
 buffer_pool[i].data.buf = malloc(DATA_BUF_SIZE);
 if(buffer_pool[i].data.buf == NULL)
 {
 // clean up all the objects and exit. We failed initialization
 }
 buffer_pool[i].data.len = DATA_BUF_SIZE;
 buffer_pool[i].last_buffer = FALSE;
 }
 /* Send all of the buffers in the pool to WSARecv, so they're ready for 
 * incoming data. */
 for(i = 0; i < BUF_POOL_SIZE; i++)
 {
 buffer_pool[i].overlapped.Offset = fileptr;
 if(ReadFile(FileHandle
 &(buffer_pool[i].data.buf), DATA_BUF_SIZE, &bytesread,
 &(buffer_pool[i].overlapped)) == 0)
 {
 // function succeeded immediately. Set the event object ourselves
 SetEvent(buffer_pool[i].overlapped.hEvent);
 }
 else
 {
 if(WSAGetLastError() == WSA_IO_PENDING)

 // good, that's what we expected
 else
 /* we've failed, clean up and return */
 }
 fileptr += DATA_BUF_SIZE;
 }
 /* Create the file I/O thread. The file I/O thread waits until the buffers
 * have been filled, and then takes those full buffers and sends them out 
 * to the network (via WSASend(). */
 file_thread = CreateThread(NULL, 0, FileThread, (LPVOID) (LPINT) &fileptr,
 0, // run immediately
 &file_thread_id);
 /* Wait until network thread is complete */
 return SUCCESS;
}
DWORD NetworkThread(LPVOID threadparams)
{
 int nRc;
 int pool_ptr = 0;
 BOOL firstpass = 1;
 DWORD bytesread, flags;
 BOOL success;
 int file_ptr = *(LPINT) threadparams;
 do
 {
 /* No need to check the return code. Since we specified WSA_INFINITE, 
 * the only time it will return is when event is signalled. */
 WSAWaitForMultipleEvents(1, &(buffer_pool[pool_ptr].overlapped.hEvent),
 FALSE, WSA_INFINITE, FALSE);
 /* WaitForMultipleEvents() blocks until the next send() buffer
 * is empty. We then send it back to the filesystem to be refilled.
 * If this is the last buffer, we just complete. */
 success = WSAGetOverlappedResult(SocketID,
 &(buffer_pool[pool_ptr].overlapped),
 &bytesread, FALSE, &flags);
 if(buffer_pool[i].last_buffer == TRUE)
 {
 // we're done
 break;
 }
 buffer_pool[pool_ptr].overlapped.Offset = file_ptr;
 file_ptr += DATA_BUF_SIZE;
 if(ReadFile(FileHandle, buffer_pool[pool_ptr].data.buf,
 DATA_BUF_SIZE, &bytes_written,
 &(buffer_pool[pool_ptr].overlapped)) == TRUE)
 {
 // function succeeded immediately. Set the event object ourselves
 SetEvent(buffer_pool[pool_ptr].overlapped.hEvent);
 }
 file_ptr += DATA_BUF_SIZE; 
 if(++pool_ptr >= BUF_POOL_SIZE)
 pool_ptr = 0;
 }
 /* when bytesread is zero, we're done
 while(bytesread > 0 );
 ExitThread(0L);
}
/* Everytime a write to the filesystem completes, take the buffer, and call 
 * WSARecv() so that the network subsystem can fill it back up. */

DWORD FileThread(LPVOID threadparams)
{
 int nRc;
 int pool_ptr = 0;
 DWORD bytesread, flags;
 BOOL success;
 int file_ptr = 0;
 do
 {
 /* We don't need to check the return code. Since we specified INFINITE,
 * the only time it will return is when our event is signalled. */
 WaitForSingleObject(&(buffer_pool[pool_ptr].overlapped.hEvent),
 INFINITE);
 success = GetOverlappedResult(FileHandle,
 &(buffer_pool[pool_ptr].overlapped),
 &bytesread, FALSE);
 /* If we've received the event notification, but the number
 * of bytes is zero, then we've reached EOF. */
 if(bytesread == 0)
 {
 buffer_pool[i].last_buffer == TRUE;
 // set the event
 SetEvent(buffer_pool[pool_ptr].overlapped.hEvent);
 // we're done reading
 break;
 }
 buffer_pool[pool_ptr].overlapped.Offset = file_ptr;
 file_ptr += bytesread;
 /* This buffer is ready to be returned to protocol stack for sending */
 if(WSASend(SocketID, &(buffer_pool[pool_ptr].data), 1, &bytesread,
 &flags, &(buffer_pool[pool_ptr].overlapped), NULL) == 0)
 {
 // function succeeded immediately. Set the event object ourselves
 SetEvent(buffer_pool[pool_ptr].overlapped.hEvent);
 }
 if(firstpass)
 {
 // This is the first buffer to be filled. Create the
 // thread that handles completion of file I/O.
 network_thread = CreateThread(NULL, 0, NetworkThread, threadparams,
 0, // run immediately 
 &network_thread_id);
 firstpass = FALSE;
 }
 if(++pool_ptr >= BUF_POOL_SIZE)
 pool_ptr = 0; 
 }
 /* when bytesread is zero, we're done
 while(bytesread > 0 );
 /* Wait for the network thread to complete */
 ExitThread(0L);
}







































































Examining RogueWave's Tools.h++


A class library for C++ programmers




P.W. Scherer


Perry is a senior systems analyst for Arco Alaska and can be contacted at
laspws@aai.arco.com.


A while back I wrote a fairly complex system for storing and retrieving
petroleum-reservoir simulation X/Y plot output. Using only a compiler, it took
me over a year to develop the application, which consisted of about 30,000
lines of C++ code.
When the time came to port the program to another platform, a fellow developer
introduced me to RogueWave's Tools.h++ foundation-class library. Surprisingly,
it only took me three months to write the new version, which was only 6000
lines long. Furthermore, the run-time speed of the app increased by more than
an order of magnitude. 
This project made me a believer in good foundation-class libraries. Luckily,
C++ programmers have a variety of options, including STL, MFC, Booch, and
RogueWave. In this article, I'll examine RogueWave's Tools.h++, the
cornerstone for all of my C++ work since that first successful project.


RogueWave Tools.h++ Overview


RogueWave's Tools.h++ is a C++ class library consisting of more than 100 C++
classes, including those for time, dates, strings, linked lists, and other
fundamental structures. Other classes support virtual streams, string and
character manipulation, file management, regular expressions, tokenizers,
virtual-page and buffered-page heaps, buffered-disk page heaps, timers,
benchmarks, templates, bit vectors, cache managers, error handling, iterators,
and more. The classes can be run as a DLL, allowing smaller executables and
code sharing. 
Tools.h++ also includes a complete set of Smalltalk-like collection classes.
Class Set, for instance, can be used to collect a group of screen windows that
will need to be refreshed, eliminating redundant refreshes. All classes
support persistence, allowing objects to be saved to disk and restored later.
Multiple pointers to the same object can also be restored. All classes are
extensible so that you can create your own custom classes. 
The class library is compatible with most compilers and platforms, including
Windows 3.x, Windows 95, Windows NT, MS-DOS, OS/2, Macintosh, Sun/SunOS,
Solaris, IBM RS/6000/AIX, Silicon Graphics Iris/IRIX, HP/HP-UX, DG/DG/UX, UNIX
System V, and even Linux. 


String Support


String manipulation is an area where class libraries can shine, and
RogueWave's RWCString class family is no exception. In addition to the
expected behaviors for strings (dynamic resizing, concatenation operators, and
so on), RWCString is multithread safe and supports multibyte character sets.
It uses "copy on write" methods during copy construction for higher
efficiency. "Copy on write" maintains reference counts to a single instance
until copying is forced by a change to one of the referenced instances.
RWCString is typically one of the first classes new Tools.h++ users become
familiar with. In Listing One, for example, Operator() has been overloaded to
accept a regular-expression argument. The operator returns a reference to a
substring (RWCSubString, cousin of RWCString) whose extent is the segment
"34.5". The assignment operator "=" is used to replace segment "34.5" with
segment "184.9". This syntax is elegant and cleverly designed. 
Note that in Listing Two, RWCTokenizer does not alter the string being
tokenized. This behavior is entirely different from the standard strtok, which
deposits NULLs between every token in the tokenized string. Note also that
RWCStrings are aware of streambuf and virtual stream I/O. RogueWave gives all
of its foundation classes this ability. RWWString, the multibyte analog of
RWCString, contains much of the functionality of RWCString, but understands
wchar_t (wide char*) character units.


Date/Time Handling 


If you have ever written (or maintained) time/date functions based upon
time.h, you will appreciate Tools.h++'s time and date functionality. Classes
such as RWDate and RWTime let you program at the level of a Visual Basic
programmer for such standard concepts as date and time. Listing Three
illustrates how you can use these two classes. As Listing Four shows,
RogueWave has also gone a long way toward supporting internationalization. 
The only thing I've struggled with concerning RWTime is "4294967295," the
largest unsigned long. One of RWTime's private members is the number of
seconds since 1/1/1901 UTC. Given that "02/05/2037 21:18:15" is exactly
4,294,967,295 seconds from 1/1/1901, you have discovered an upper time bound
that you should be aware of if you need dates past 2037. There are many
obvious workarounds, but I nonetheless forgot about it--and users discovered
it for me.


Template Classes


Templates are parametrized collection classes, where the parameter is the type
of the object being collected. Tools.h++ templates come in three flavors:
intrusive lists, value-based collections, and pointer-based collections. Most
of my recent development work has been with the pointer and value template
classes. The stricter class typing inherent in template-based programming
offers significant advantages over the older polymorphic-inheritance
approaches. Templates greatly simplify complex data-structure manipulation. 
In general, value-based collections offer easier syntax (less pointer
manipulation and manual free-store destruction) at the expense of decreased
performance during insertions and deletions. The performance issue is related
to the fact that value-based collections make copies of inserted items rather
than simply referencing the pointer. 
I typically use value-based templates on small data structures of fairly
constant size, "template-ready" classes from RogueWave (RWCString or RWDate,
for instance), or built-in types like floats or ints. When the data-structure
complexity warrants, or when the collection is frequently updated through
resizing or insertion, I generally pick a pointer-based template.
Consider the data structure in Listing Five, which could represent a point on
an X/Y curve and a textual description of the point. In order to collect this
XYPoint structure in a Tools.h++ template, you must extend the data structure
so that the semantics of the structure can be supported by the template.
Clearly, these XYPoints should be stored in a sorted collection. The sort
order should be determined by the xDate member. We must also know when two
items are "equal." In addition, copy construction, default construction, and
assignment semantics should be defined; see Listing Six. The class XYPoint is
now ready to be used with nearly any of the Tools.h++ pointer or value
templates. In short, the templates features in Table 1 are automatically
available when you use Tools.h++ templates. Listing Seven illustrates both
pointer- and value-sorted vector template usage.
All expected behavior for collections with the exception of object persistence
is well covered by Tools.h++ templates. Templates in both the pointer and
value variety include hash dictionaries, sorted vectors, ordered vectors,
singly and doubly linked lists, hash sets (uniqueness enforced), stacks, and
queues. For those few compilers that do not yet support templates or support
them incompletely, Tools.h++ provides "generic" collection classes, similar to
templates but less efficient. These use some of the macro-based parametrized
types described by Stroustrup. Few users will need these classes, but they may
come in handy for the occasional compiler incompatibility problems that may
come up during RogueWave installation. Gnu's g++ has had some historical
compatibility problems with Tools.h++, so future releases of g++ may require
the generic pseudotemplates.


Adding Persistence 


Persistence is not built into the Tools.h++ templates, but RogueWave makes it
easy to add object persistence by providing the RWvistream and RWvostream
abstract base classes. These classes capitalize on the considerable strengths
of the iostream model, but extend the functionality for arbitrary binary data
and abstraction of byte sources and sinks. Tools.h++ provides its own
instantiable ASCII and binary veneer over the streambuf class (RWp[io]stream
and RWb[io]stream, respectively). 
The beauty of virtual streams is the power that the generalization of data
sources and sinks implies. When you send a class to a RWvostream, you are
effectively sending it without concern for its storage format within the
stream pipeline. This opens up all sorts of possibilities, such as using
object persistence over network protocols like Berkeley sockets. You could
write your object to a socket-based child of RWvostream, whereupon it could be
received and read across the network by another socket-based child of
RWvistream. The implementation of the streaming operators to accomplish this
behavior is invariant across the choice of storage media. In other words, the
same streaming functions can be used to store a class onto a network or an
ASCII file. I've implemented such a scheme using Tools.h++'s RWbistream and
RWbostream by adding virtual stream capabilities to a class of my own design.
Then, I constructed bistreams and bostreams on either end of an interprocess
pipeline. I could then simply send the binary data as a class through the
pipeline, whereupon it could be caught at the other end of the pipeline in
exactly the state we had sent it. The result is elegant: Instead of sending a
stream of bytes that must be reparsed on the other end, we send an entire
object capable of reconstructing itself on the other side. The reconstructed
object retains all of its original methods and properties.

In another recent engineering project, we discovered that an application of
ours had become I/O bound. This program performed a series of material balance
calculations on 600 multiwell patterns for the Kuparuk River Unit, the second
largest oil field in the United States. It was too time consuming for the
engineers to request the data from a relational database because the data
required for the material-balance calculations was extremely hierarchical. We
decided, therefore, to implement a persistent RogueWave class capable of being
written to and read from a RogueWave virtual stream. This class encapsulated
the entire time-valued behavior of a material-balance pattern, including the
cumulative fluid amounts, reservoir-volume conversion factors, and initial
properties of the pattern. The Oracle access speed per pattern was about 30
seconds. With Tools.h++ virtual streams, we were able to cut access time to a
fraction of a second. The Oracle data was processed at night in a huge batch
process into the RogueWave object database. In the morning, the engineers were
able to rapidly flip through the patterns and make decisions at a much faster
rate. 
Listing Eight extends the XYPoint class to be "virtual-stream aware." Since
RWDate and RWCString are already virtual-stream aware, this task becomes
trivial. 
These functions store the individual class you are collecting, but what about
the collection itself? How do you implement persistence for a Tools.h++
template collection? For simple vectors, you must store only the integer
number of elements followed by the individual members of the collection. This
is not a preferred method due to the loss of objectmorphology. However, if you
purchase the source code from RogueWave, you can build children of the
template classes which contain some of the persistence methods of the
Smalltalk collectable classes. I've implemented a child of RWTPtrSortedVector
with some basic persistence capabilities without much trouble. 


Smalltalk-like Collection Classes


If you use polymorphic inheritance trees in your class design, you will not be
disappointed with RogueWave's implementation of its Smalltalk-80-like
Collection Classes. Any object of this variety must ultimately be derived from
the RWCollectable abstract base class, the root of the inheritance tree.
Although the trend seems to be moving away from this variety of class design
and toward templates, this approach has distinct advantages. The programming
interface is a bit cleaner. There is an aesthetic "goodness" in the fact that
inheritance trees reuse object code rather than source code (as templates do).
Tools.h++ also provides elegant, more-complete persistence methods for all
children of RWCollectable. The persistence methods alone make RWCollectable
worthwhile for projects that store and retrieve objects from files other than
traditional databases. 
The methods for making a class RWCollectable are similar to those for making
it compatible with Tools.h++ templates, but more extensive. The usual methods
for equality, comparison, construction, and destruction must be defined.
RWCollectable classes must additionally redefine virtual persistence functions
restoreGuts and saveGuts. These functions have been overloaded for both RWFile
(a file I/O veneer class) and RWv[io]stream classes. An isA() function must be
defined to return the type of an RWCollectable child. This allows you to
identify object types during ambiguous moments such as construction of
collectibles whose type is unknown until run time. The weak typing of such
class design can lead to the usual casting problems of all inheritance trees,
but being able to store several types of RWCollectable classes within common
containers such as RWSortedVector and RWBTreeDictionary is compensation
enough.


Object-Database Management 


The strong suit of the Smalltalk Tools.h++ classes is their advanced
persistence mechanisms. However, a database must know how to organize multiple
objects on a disk all at once and be able to retrieve them rapidly.
RWFileManager, child of RWFile, performs such a function by maintaining a
linked list of free-space blocks within a disk file. Requests can be made to
RWFileManager for just enough space to store a given class instance.
RWCollectable classes can return the space required to store themselves with
member function binaryStoreSize(). RWBTreeOnDisk maintains a B-tree on the
disk file. The disk-based B-tree is used in conjunction with RWFileManager to
associate a name (const char*) with each object stored with RWFileManager.
The name can be used later to retrieve the object from the disk file using the
high-speed disk B-Tree. I've used this technique when storing large amounts
(up to 50-MB files) of X/Y curve objects. These file-management classes are
robust and extremely fast and provide an alternative storage mechanism for
data that does not fit well within the relational paradigm. (For details on an
application that implements this feature, see my article "Simplifying C++ GUI
Development," DDJ, September 1995.) 
In Listing Nine, which illustrates how to use the Tools.h++ object-database
management facilities, my address is stored on the disk file with the key
"Perry." As Listing Ten shows, retrieving the address is straightforward.


Conclusion


The parts of the Tools.h++ library I've examined in this article barely
scratch the surface of what RogueWave offers. The library also provides
database wrappers, graphics libraries, linear-algebra support, a Motif
wrapper, and even its own version of the Booch Components. In short, RogueWave
has a tool for just about any standard programming task.


For More Information


Tools.h++
RogueWave Software
260 SW Madison Avenue
Corvallis, OR 97333
503-754-3010
http://www.roguewave.com
Table 1: Tools.h++ template features.
Feature Benefit
Dynamic resizing No extra coding or checks required.
Array indexing
 for all vector
 templates arr[i]...
Searching find and index operators locate members.
Insertion Resizing.
Removal Any items in the collection can be removed.
Entry count entries() method returns number of items.

Listing One
RWCString wonderTool("Rogue");
wonderTool += "Wave tools.h++";
cout << wonderTool << endl; // Prints "RogueWave tools.h++" 
 // ( '+=' concatenates)
RWCString aString = "I am 34.5 years young";
RWCRegexp re1("[0-9]+*[\\.]*[0-9]*"); // Construct regexp object.
 // Recognizes decimal #s.
aString(re1) = "184.9";
cout << aString << endl; // Prints "I am 184.9 years young"

Listing Two
 

size_t ind = aString.index ( "young" );
aString(ind,0) = "old";
cout << aString << endl; // Prints "I am 184.9 years old"
RWCTokenizer tok ( aString ); // Class RWCTokenizer performs 
 // strtok-like functions.
cout << tok() << endl; // Prints "I"
cout << tok() << endl; // Prints "am"
cout << aString << endl; // Prints "I am 184.9 years old"

Listing Three
RWDate today;
cout << today << endl; // Prints the current date (default constructor)
RWDate weekAgo = today-7; // Subtracts 7 days from today.
RWDate myBirthDay ( 24, 2, 61); // m,d,y constructor.
cout << myBirthDay.weekDayName() << endl; // Prints "Friday"! 
cout << today - myBirthDay << endl; // Prints number of days I've been alive.

Listing Four
// This example alters the date string for French-speaking users.
RWLocale& french = *new RWLocaleSnapshot("fe");
cout << myBirthday.asString ( 'x', french ) << endl;

Listing Five
struct XYPoint {
 RWDate xDate; // Independent (X) value.
 float yValue; // Dependent (Y) value.
 RWCString ptDescription; // Text associated w/point.
};

Listing Six
class XYPoint 
{
private:
 
 RWDate xDate; // Independent (X) value.
 float yValue; // Dependent (Y) value.
 RWCString ptDescription;// Text associated w/point
public:
 // Main constructor.
 XYPoint ( unsigned mo, unsigned day, 
 unsigned year, float val, const char* descrip ) :
 xDate ( day, mo, year ), yValue(val),
 ptDescription ( descrip ) { }
 // Key-only constructor.
 XYPoint ( unsigned mo, unsigned day, unsigned year ) :
 xDate ( day, mo, year), yValue(0.0), ptDescription()
 { }
 // Default constructor.
 XYPoint() : 
 xDate(), yValue(-9999.99), ptDescription() { }
 // Copy constructor.
 XYPoint ( const XYPoint& xyp ) :
 xDate(xyp.xDate), 
 yValue(xyp.yValue),
 ptDescription(xyp.ptDescription ) { }
 // Assignment operator = 
 XYPoint& operator=(const XYPoint& xyp )
 {
 xDate = xyp.xDate;

 yValue = xyp.yValue;
 ptDescription = xyp.ptDescription;
 }
 // Equality == operator
 RWBoolean operator==( const XYPoint& xyp ) const
 {
 if ( xDate == xyp.xDate )
 return TRUE;
 else
 return FALSE;
 }
 // Less then < operator.
 RWBoolean operator<( const XYPoint& xyp ) const
 {
 if ( xDate < xyp.xDate )
 return TRUE;
 else
 return FALSE;
 }
};

Listing Seven
RWTValSortedVector<XYPoint> myValCurve; // Create a new curve of XYPoints.
// Insert 1/1/95, 2/1/95, 3/1/95, 4/1/95 data points into curve.
myValCurve.insert ( XYPoint ( 1, 1, 95, 1000.0, "Point 1" ) ); 
myValCurve.insert ( XYPoint ( 1, 2, 95, 1050.0, "Point 2" ) );
myValCurve.insert ( XYPoint ( 1, 4, 95, 2000.0, "Point 4" ) );
myValCurve.insert ( XYPoint ( 1, 3, 95, 1500.0, "Point 3" ) );
// Value-based collection now has 4 points in correct order.
XYPoint locator ( 1, 1, 95 ); // Create a temporary point to be 
 // used for searching.
myValCurve.remove ( locator ); // Remove the first point only.
cout << myValCurve.entries() << endl; // Prints '3', since 3 pts are left.
myValCurve.clear(); // Removes all elements.
RWTPtrSortedVector<XYPoint> myPtrCurve; // Pointer-based analog of above.
myPtrCurve.insert ( new XYPoint ( 1, 1, 95, 1000.0, "Point 1" ) ); 
myPtrCurve.insert ( new XYPoint ( 1, 2, 95, 1050.0, "Point 2" ) );
 ...
 ...
// Pointer-based collection must explicitly free the dynamically
// allocated points! This can be the source of huge memory leak! BEWARE! ;-)
myPtrCurve.clearAndDestroy(); // Deletes each XYPoint and 
 // removes from collection.

Listing Eight
RWvostream& operator<< ( RWvostream& s ) const
{
 s << xDate << yValue << ptDescription;
 return s;
}
RWvistream& operator>> ( RWvistream& s )
{
 s >> xDate >> yValue >> ptDescription;
 return s;
}

Listing Nine
RWFileManager fm("mydb.dat"); // Create file manager class on a disk file.
RWBTreeOnDisk bt(fm); // Construct B-Tree for the file manager.

RWCString myAddress ( "11915 Merry Lane" );
RWOffset loc = fm.allocate ( myAddress.binaryStoreSize() );
fm.SeekTo ( loc );
fm << myAddress;
bt.insertKeyAndValue ( "Perry", loc );

Listing Ten
RWOffset foundLoc = bt.findValue ( "Perry" );
fm.SeekTo ( foundLoc );
RWCString foundAddress;
fm >> foundAddress;
cout << "My address is: " << foundAddress << endl;



















































Lex and Yacc 


Compiler-construction techniques for the everyday programmer




Ian E. Gorman


Ian is a software developer and systems analyst. He can be reached at
ActiveSystems Inc., 11 Holland Ave., Suite 700, Ottawa, ON, Canada K1Y 4S1.


Forty years ago, it took a group of experts 17 staff years to write the first
Fortran compiler. Today, a single programmer with less knowledge can do
essentially the same thing in a matter of weeks. What makes this possible are
advances in both computer science and software-development tools. In
particular, tools such as lex and yacc become powerful allies in the hands of
programmers who know how to use them. For instance, I've used lex and yacc for
everything from converting queries to SQL syntax, to verifying that examples
of server command files were consistent with command syntax. 
In this article, I'll first examine lex and yacc, focusing on the MKS Lex &
Yacc Toolkit for DOS, OS/2, and Windows NT. MKS Lex builds a C/C++ or Turbo
Pascal lexical analyzer that takes a stream of input and breaks it into tokens
according to specific rules. MKS Yacc, on the other hand, builds a C/C++ or
Turbo Pascal parser that takes a stream of tokens, matching them against a
specified grammar. (If you have a UNIX system, lex and yacc may already be
installed. Free versions, such as flex and bison, will compile in almost any
system.) To illustrate the power of these tools, I'll then describe how I used
them to build a keyword-query compiler for a CD-ROM database.


Lex and Yacc Backgrounder


Yacc generates a parser, yyparse(), that checks whether or not different parts
of the input (like numbers, variable names, and operators) are in the correct
order. It then processes those parts. Usually the order of processing is
different from the order of input. 
Lex generates a lexer, yylex(), that splits the input into substrings, which
it then classifies for the parser. The lexer can also do some preliminary
processing. yylex() always returns a type code (called a "token"). For some
substrings, yylex() may also put a value (called an "attribute") in the global
variable yylval.
Listings One and Two, extracted from the MKS Lex and Yacc documentation, build
an interpreter with lex and yacc. The interpreter is a simple, four-function
calculator, with up to 26 variables (A to Z) for temporary storage. Writing
such a calculator in C is a fairly big job; with lex and yacc, you need only
the code in the listings.
The lex code (Listing One) is just a list of regular expressions, with their
associated code segments. Each code segment executes when a substring of the
input matches the corresponding regular expression. Lex puts the matching
substring in the global variable yytext. Thus lines 8 to 16 classify the
single letters A to Z (and a to z) as VARIABLE, and produce a value from 0 to
25, which the parser will use to identify the particular variable. Similarly,
lines 18 to 21 classify an unbroken string of digits as INTEGER, and convert
the string to an integer for use by the parser. In this way, input strings are
classified and given some initial processing before any of the input goes to
the parser.
The yacc code (Listing Two) is a list of constructions (or recipes) for
building new objects out of old ones. For example, lines 26 to 34 show how the
calculator program handles arithmetic expressions (like sum and difference).
Any INTEGER coming in from the lexer is an expression (line 27). A VARIABLE
identifies an array element in which to store the value of an expression (line
28). Expressions can be constructed from previously constructed expressions
(lines 29 to 32).
Objects from the lexer are normally identified in yacc code by character
constants or uppercase names; objects constructed by the parser are usually
identified by lowercase names. This makes it easier to read and understand the
code. You attach code to each construction by placing the code in braces after
the objects that you use to build new objects. The value of the new object is
represented by $$ signs. The values of the old objects are represented by $1,
$2, and so on, from left to right. If the parser cannot immediately use the
result, it goes on a stack (managed by code that yacc produces) until it is
needed. The equation a=3+4*c, for example, must be processed in reverse order
of input, but you only have to write code for simple constructions: c (line
28), 4*c (line 31), 3+expression (line 29), and a=expression (line 23). In the
equation a=(3+4)*c, addition must be done first. Line 33 handles this case.
After an opening parenthesis, the parser can accept only a complete expression
followed by a closing parenthesis. The parser stores five objects (leading
parenthesis, first expression, plus, second expression, and trailing
parenthesis) on a stack, then recognizes them as an expression (line 33),
pulls them off the stack, and replaces them on the stack by one new
expression.
The two equations illustrate the labor-saving power of yacc. You do not have
to figure out when and how to save intermediate results for an infinite
variety of possible calculations. You do not even need to know how such things
are done. You only need to find a simple way to build complex structures out
of simple structures, and write your own code to process the simple
structures.


A Keyword-Query Compiler for a CD-ROM Database


The CD-ROM databases I help build allow users to retrieve data by keyword
searches on one or more fields. To search for a record with either the word
"harry" or the two words "tom" and "dick" in a single field, the user would
enter a keyword list tom dick, harry. Words in a list separated by spaces must
all appear in the field. Lists separated by commas are alternatives; a record
will be selected if a field satisfies any one of the lists. Thus spaces are
equivalent to a logical AND, and commas are equivalent to a logical OR.
I used an off-the-shelf search engine that requires the query to be like
Example 1. While this looks like SQL, the operators <, <=, =, >=, and > do not
mean what they would in a relational database. They are containment operators
instead of comparison operators. For example, FIELD1 = 'joe' indicates that
FIELD1 contains 'joe' (possibly with other words), not that FIELD1 has the
value 'joe'. Similarly, FIELD1 < 'joe' indicates that FIELD1 contains some
word less than 'joe'. A field with the value 'xerxes jack' would satisfy this
condition.
I didn't want to expose nontechnical users to the engine query language,
particularly when it looked like a language (SQL for relational databases)
with different operators. Consequently, I wrote a parser to convert user
expressions to expressions that the query engine would accept; see Example 2.


Other Design Issues


The retrieval engine was designed to work with one database, and it used
global variables. I moved the engine C code into a C++ database class,
changing the global variables to private variables in that class. This allowed
me to have several open databases instead of one. I could use much of our
existing C interface code by changing the function declarations, making the
functions public members in the database class.
The compiler allocates temporary storage on the heap, and any yacc-based
compiler will discard some of the corresponding pointers when it detects an
error. I therefore entered each of these pointers in a list-manager class when
I allocated data in the compiler. Deleting the list manager after each
compiler run guarantees that all memory is freed.
Since the rest of the package was written in C++, I preferred to write the
parser in C++. This is easy to do with MKS Yacc.
When searching on multiple fields, the field expressions are connected by AND
or OR, but in left-to-right precedence (or from top of screen to bottom)
instead of the usual precedence of AND over OR. Consequently, I used yacc only
for the individual field expressions, and used a simple C function to put the
expressions together.
I did not use lex because some of our applications defined keywords
differently in each database. For example, one database might allow hyphenated
words, but another in the same application might not. Lex requires you to
specify the characters that can make up a keyword when you generate the
compiler, not when you run it. This project had only a few different tokens
(see Listing Three, lines 17 to 22), so it was not hard to write the lexer in
C++ instead of lex.


Writing the Compiler


I wrote the compiler in stages. I simultaneously built a lexer and yacc
grammar with no C++ code. The grammar and the lexer are interdependent. The
lexer will not compile without a definition file that yacc produces from the
grammar. The parser will not run without input from the lexer. However, you
can do much of the testing before putting C++ code in the parser, because the
C++ code is only used to produce the parser output.
Listing Three is the code for a working parser that reads and parses input,
and produces error messages for invalid input. In short, it does everything
but produce the output. If you have designed a new language, like my keyword
lists, you can change the grammar until you are satisfied with the language.
One of the great advantages of yacc is that you can do this kind of testing up
front, before putting a lot of work into C/C++ code.
The parser gets input (lines 17 to 22 of Listing Three) from the lexer.
Starting from the bottom of Listing Three, the parser makes ever larger units
out of input tokens, until all input has been combined into one unit, which is
the complete search expression for one field. For example, you can follow a
character string (CHARSTRING) up through lines 98-99, 66-67, 57-59, and 42-43
until it becomes part of the complete field expression in lines 31-32.
After you have a working parser, you can add code. Listing Four is some of the
code that illustrates how the parser works. The variable scan is a pointer to
an object of type ParseItem_t, created by the lexer. The object holds a string
scan->Value(), and you can append another string to that string with the
member function scan->ConCat().
When a CHARSTRING comes in (lines 55-70), and the field is a character field,
the corresponding string value is quoted; if the field is numeric, it is not.
Then the string is saved (by saving the pointer $$) as the value for the new
token value. This particular section illustrates how the yacc grammar is
influenced by the code you intend to write--if I had not wanted to process
strings upon arrival from the lexer, I could have left this section out and
replaced all occurrences of value with CHARSTRING.
If the CHARSTRING lacks a preceding operator code, it will be concatenated
with the fieldname and the '=' operator (lines 29-37). If the CHARSTRING
follows an operator (<, <=, >=, >), then it will be concatenated with a
fieldname and that operator (lines 29 and 40-50). Both cases produce a
simplecondition.

A compoundcondition (line 4) can be a simplecondition (lines 7-9). A
compoundcondition followed directly by a simplecondition will be ANDed with
that simplecondition (lines 11-17). A compoundcondition followed by a comma
and then a simplecondition will be ORed with that simplecondition (lines
19-26).
After the user keyword lists are converted to search expressions for each
field, they are put together in an SQL command for the search engine. Since
the expressions are concatenated very simply, this was done with a C++ program
(Listing Seven) that called a setup program (Listing Five) for each field.


Conclusion


The secret to using lex and yacc in a production environment is to pick
problems commensurate with your ability to use the tools, and to increase your
ability with practice. While the CD-ROM project I've described here may be
trivial to a computer-science student, yacc saved us at least a week in
development time.
I found yacc handy when we changed the structure of our keyword language.
Testing a change was a matter of a few minutes when we could change our yacc
source, instead of several days if we had been writing in C or C++. 
Lex and yacc are generally useful even when you are not in the compiler
business. I have used lex and yacc to verify the syntax of languages that I
was describing in developer's guides for a UNIX text-management server and a
Windows text-search client. This alerted me immediately to changes that
occurred in the language definitions as the design of the server and client
evolved. I've also used lex and yacc to produce SGML output from proprietary
text markup. We eventually switched to other tools, but we used a modified
version of the lex program as a preprocessor. In another project, I used lex
and awk to extract English literals from Visual Basic programs and replace
them with French literals. Finally, I've used lex and MKS Toolkit (UNIX
utilities for DOS) to automate corrections to 18,000 pages of scanned
documents, preventing a delay of several weeks in delivery. 


For More Information


MKS Lex & Yacc
Mortice Kern Systems Inc.
185 Columbia Street West
Waterloo, ON
Canada N2L 5Z5
http://www.mks.com
Example 1: Typical query by off-the-shelf search engines.
select * from table where
 FIELD1 = 'tom' and FIELD1 = 'dick' or FIELD1 = 'harry'
Example 2: Parser conversion of user expressions to expressions that the query
engine accepts.
User expression Engine query expression
joe AC1 = 'joe'
<2 AC1 < 2
<10 >3, 20 AC1 < 10 AND AC1 > 3 OR AC1 = 20
10..20 AC1 BETWEEN 10 AND 20

Listing One
 1 %{
 2 #include "ytab.h"
 3 extern int yylval;
 4 %}
 5 
 6 %%
 7 
 8 [A-Z] {
 9 yylval = *yytext - 'A';
10 return VARIABLE;
11 }
12 
13 [a-z] {
14 yylval = *yytext - 'a';
15 return VARIABLE;
16 }
17 
18 [0-9]+ {
19 yylval = strtol(yytext, (char **)NULL, 0);
20 return INTEGER;
21 }
22 
23 0x[0-9a-fA-F]+ {
24 yylval = strtol(yytext, (char **)NULL, 16);
25 return INTEGER;
26 }
27 

28 [-()=+/*\n] return *yytext;
29 
30 [ \t]+ ;
31 
32 . yyerror("Unknown character");

Listing Two
 1 %{
 2 #include <stdio.h>
 3 %}
 4 
 5 %token INTEGER VARIABLE
 6 %left '+' '-'
 7 %left '*' '/'
 8 
 9 %{
10 static int variables[26];
11 %}
12 
13 %%
14 
15 program:
16 program statement '\n'
17 program error '\n' = { yyerrok; }
18 /* NULL */
19 ;
20 
21 statement:
22 expression = { printf("%d\n", $1); }
23 VARIABLE '=' expression = { variables[$1] = $3; }
24 ;
25 
26 expression:
27 INTEGER
28 VARIABLE = { $$ = variables[$1]; }
29 expression '+' expression = { $$ = $1 + $3; }
30 expression '-' expression = { $$ = $1 - $3; }
31 expression '*' expression = { $$ = $1 * $3; }
32 expression '/' expression = { $$ = $1 / $3; }
33 '(' expression ')' = { $$ = $2; }
34 ;

Listing Three
 1 /* $Header$ */
 2 /* Field parser for use with CD-ROM and Visual Basic DLL */
 3 /* Ian E. Gorman, ActiveSystems Inc., Ottawa, Ontario, Canada */
 4 
 5 /* Code removed to show only the yacc grammar */
 6 
 7 /* DEFINITIONS -------------------------------------------------------- */
 8 %{
 9 
10 #include <ctype.h>
11 #include <stdarg.h>
12 #include "dbinfo.hpp" /* includes "parsetab.hpp", "token.hpp" */
13 #include "logfile.hpp"
14 
15 %}
16 

17 %token CHARSTRING GLOBSTRING
18 %token ','
19 %token AND
20 %token OPCODE RANGEOP
21 %token ALL BLANKS
22 %token '"'
23 
24 /* OPCODE and CHARSTRING have strings associated with them.
25 The other terminals are not associated with strings. */
26 
27 %%
28 
29 /* RULES -------------------------------------------------------------- */
30 
31 program:
32 condition
33 error
34 program error
35 ;
36 
37 /* This production just collects various productions into one production,
38 to reduce the number of different actions that must be coded in the
39 next higher production.
40 */
41 
42 condition:
43 compoundcondition
44 specialcondition
45 quotecondition
46 ;
47 
48 /* Special conditions that exclude any other condition */
49 
50 specialcondition:
51 BLANKS /* fieldname = NULL symbol */
52 ALL /* fieldname >= minimum field value */
53 ;
54 
55 /* One or more ordinary conditions (simple comparisons) */
56 
57 compoundcondition:
58 
59 simplecondition /* Begin the parse */
60 compoundcondition simplecondition %prec AND /* AND simplecondition */
61 compoundcondition comma simplecondition /* OR simplecondition */
62 ;
63 
64 /* ordinary conditions -- single comparisons and ranges */
65 
66 simplecondition:
67 value /* fieldname = value */
68 globvalue /* fieldname like value */
69 opcode value /* fieldname >=,>,<,<= value */
70 value rangeop value /* fieldname >= value1 AND fieldname <= value2 */
71 ;
72 
73 /* Quoted string comparison -- convert a quoted string to a sequence of
74 ANDed equalities. This is how we search for a string with embedded
75 spaces when Bookware will only search for individual words.

76 */
77 
78 quotecondition:
79 quote stringlist quote /* completed a quoted string */
80 ;
81 
82 quote:
83 '"' /* tell lex to discard nonchar symbols */
84 ;
85 
86 stringlist:
87 value /* fieldname = value */
88 stringlist value /* AND fieldname = value */
89 ;
90 /* '(' ')', ',', OPCODE are ignored inside a stringlist */
91 /* BLANKS and ALL are treated as ordinary strings inside a string list */
92 
93 
94 /* Values for alpha fields must be quoted in Bookware searches, values for
95 numeric fields must not be quoted. This is where we decide what to do.
96 */
97 
98 value:
99 CHARSTRING
100 ;
101 
102 /* Wild cards -- xxxx% will become -- fieldname like 'xxxx%'
103 Values must be quoted in Bookware searches.
104 numeric fields must not be quoted. This is where we decide what to do.
105 */
106 
107 globvalue:
108 GLOBSTRING
109 ;
110 
111 rangeop:
112 RANGEOP
113 ;
114 
115 opcode:
116 OPCODE
117 ;
118 
119 comma:
120 ','
121 ;

Listing Four
 1 /* Compound logical condition */
 2 /* Ian E. Gorman, ActiveSystems Inc., Ottawa, Ontario, Canada */
 3 
 4 compoundcondition:
 5 
 6 simplecondition /* Begin the parse */
 7 {
 8 $$ = $1;
 9 }
10 
11 compoundcondition simplecondition %prec AND /* AND simplecondition */

12 {
13 $$ = $1;
14 $$->ConCat(" AND ");
15 $$->ConCat($2->Value());
16 scan->ParseList->Discard($2);
17 }
18 
19 compoundcondition comma simplecondition /* OR simplecondition */
20 {
21 $$ = $1;
22 $$->ConCat(" OR ");
23 $$->ConCat($3->Value());
24 scan->ParseList->Discard($2);
25 scan->ParseList->Discard($3);
26 }
27 ;
28 
29 simplecondition:
30 value /* fieldname = value */
31 {
32 $$ = new(scan->FieldName) ParseItem_t;
33 scan->ParseList->AddAtTail($$);
34 $$->ConCat(" = ");
35 $$->ConCat($1->Value());
36 scan->ParseList->Discard($1);
37 }
38 globvalue /* fieldname like value */
39 { /* more code here */ }
40 opcode value /* fieldname >=,>,<,<= value */
41 {
42 $$ = new(scan->FieldName) ParseItem_t;
43 scan->ParseList->AddAtTail($$);
44 $$->ConCat(" ");
45 $$->ConCat($1->Value()); /* opcode */
46 $$->ConCat(" ");
47 $$->ConCat($2->Value()); /* value */
48 scan->ParseList->Discard($1);
49 scan->ParseList->Discard($2);
50 }
51 value rangeop value /* fieldname >= value1 AND fieldname <= value2 */
52 { /* more code here */ }
53 ;
54 
55 value:
56 CHARSTRING
57 { /* Encloses a string value in single quotes, when the field is not
58 numeric. */
59 if ( scan->FieldNumeric ) {
60 /* Can put a check here for non-numeric value in numeric field.
61 If not numeric, YYERROR */
62 $$ = $1;
63 } else {
64 $$ = new("'") ParseItem_t;
65 $$->ConCat($1->Value());
66 $$->ConCat("'");
67 scan->ParseList->Discard($1);
68 }
69 }
70 ;


Listing Five
 1 /* Produce SQL query segment from data for one field */
 2 /* Ian E. Gorman, ActiveSystems Inc., Ottawa, Ontario, Canada */
 3 
 4 /* This function is a wrapper for the yacc parser, yyparse() */
 5 
 6 char * cParse::FieldParse(
 7 FldItem_t * Field /* list of field data: name, opcode, expression */
 8 )
 9 {
10 char * Result = NULL;
11 int i;
12 yy_parse yaccparse = yy_parse(YSTACKSIZE, ystates, yvals);
13 yy_scan * yscan;
14 
15 if ( ! Field )
16 return NULL;
17 
18 yscan = new( /* set up the token separator */
19 Field->Expr(),
20 Field->Name(), 
21 Field->IsNumber(), /* 1 for numeric fields, 0 otherwise */
22 Field->NullValue(),
23 Field->FirstValue()
24 ) yy_scan;
25 
26 if ( ! yscan )
27 return Result;
28 
29 if ( ! yaccparse.yyparse(yscan) ) {
30 /* parse the expression into SQL query */
31 Result = new char[strlen(yscan->output->Value())+1]; /* success */
32 strcpy(Result, yscan->output->Value());
33 }
34 
35 delete yscan;
36 
37 return Result;
38 }

Listing Six
/* Abbreviated version of class yy_parse from MKS lex and yacc */
class yy_parse {
public:
 yy_scan* scan; // pointer to scanner
 int yydebug; // if set, tracing if compiled with YYDEBUG=1
 yy_parse(int = 150); // constructor for this grammar
 yy_parse(int, short *, YYSTYPE *); // another constructor
 ~yy_parse(); // destructor
 int yyparse(yy_scan * ps); // parse with given scanner
 void yyreset() { reset = 1; } // restore state for next yyparse()
 void setdebug(int y) { yydebug = y; }
// The following are useful in user actions:
 void yyerrok() { yyerrflag = 0; } // clear error
 void yyclearin() { yychar = -1; } // clear input
 int YYRECOVERING() { return yyerrflag != 0; }
};


Listing Seven
char * cParse::MakeSQL(
 cSearchList * ListManager /* Field list manager for current database */
 )
{
 char * StrSQL; /* new string -- SQL query */
 char * Temp;
 FldItem_t * Element;
 enum _FieldOP LeadingOpCode, FollowingOpCode;
 ParseItem_t * QueryString;
 if ( ! ListManager )
 return NULL;
 /* Start with SQL command verb and data base name */
 QueryString = new("SELECT * FROM dbn WHERE (") ParseItem_t;
 /* Assume no opcode before first expression */
 LeadingOpCode = eNOP;
 for ( Element = ListManager->Head()
 ; Element != NULL
 ; Element = ListManager->Next(Element)
 ) {
 FollowingOpCode = ( ListManager->Next(Element)
 ? ListManager->Next(Element)->OpCode()
 : eNOP );
 /* open parenthesis at beginning of several ORed field conditions */
 // if ( FollowingOpCode == eOR && LeadingOpCode != eOR )
 // QueryString->ConCat("(");
 /* parse each field condition, enclose result in parentheses */
 QueryString->ConCat("(");
 if ( ! (Temp = FieldParse(Element)) ) {
 LogErrorDLL(__FILE__, __LINE__,
 "NULL expression while parsing query, Parsed OK to: %s\n",
 QueryString->Value()
 );
 delete QueryString;
 return NULL; /* incomplete string would be no good */
 }
 QueryString->ConCat(Temp);
 delete Temp;
 QueryString->ConCat(")");
 /* close parenthesis at end of several ORed field conditions */
 // if ( LeadingOpCode == eOR && FollowingOpCode != eOR )
 // QueryString->ConCat(")");
 /* if not the last field condition, append the connector to next */
 if ( ListManager->Next(Element) != NULL )
 QueryString->ConCat(SQLtoken(FollowingOpCode));
 LeadingOpCode = FollowingOpCode;
 }
 QueryString->ConCat(");");
 StrSQL = new char[strlen(QueryString->Value())+1];
 if ( StrSQL ) {
 strcpy(StrSQL, QueryString->Value());
 }
 delete QueryString;
 return StrSQL;
}
DDJ



































































PROGRAMMING PARADIGMS


Looking for an HTML Book




Michael Swaine


This month I thought I'd share my experience in looking for a good HTML book.
In addition, there's the third installment in my survey of alternative
programming paradigms, or "Languages That Are Not C."
I got roped into a project to develop a one-day, intensive course in HTML for
raw beginners. My task was to find a text for the course, a book on which we
could base the lectures. I suspected right from the start that this was either
impossible or unwise. Any book that could be covered in a day would have to be
pretty skimpy. A book that we could give the students for further study, sure,
that might make sense, but a text for the course? I was skeptical.
Over a dozen books later, my suspicion had turned into a conviction. No book
was going to work. I made the obvious proposal: that we write the class
materials from scratch and not try to cut corners. Having done the research,
though, I found that I was in a good position to recommend HTML books to the
students. Or maybe to others. Because it occurred to me that, while some of
the books I read were written for the raw beginner, some assume a level of
sophistication that you would only find--well, among readers of this magazine.
So. If you sometimes get questions from people hoping to write HTML, if you
occasionally write or expect to write HTML, if you have responsibility for
maintaining a Web site or for maintaining in-over-their-heads Web site
maintainers, or if you have noticed that everybody in the world seems to want
to put up a Web page and you figure you might as well put together a course or
seminar and make a buck off this madness, here's my take on the HTML books I
examined.
Although I read more, I've whittled the list down to five books. There are
probably good books I've left off the list, but every book on the list is
worth buying. Some of these books were first published in 1994, which makes
them unusably ancient. They're all good enough that their publishers should be
updating them regularly, but I haven't tried to project which ones will have
new 1996 editions out by the time you read this.


Quick Study


HTML Manual of Style (Ziff-Davis Press, 1994, ISBN 1-56276-300-8) is Larry
Aronson's first book, and he's to be commended. The book is short (132 pages),
simple, and clearly organized. By the end of the first 30 pages, he's
introduced and exemplified most of the vocabulary of HTML 2.0.
Aronson gets purity points for recommending that links be incorporated into
the flow of a paragraph rather than laid out in lists or detached from context
("Click HERE for my resume.") In this way he honors the hypertext intent of
the Web. Incorporating links into the flow of the text works great for budding
hyperfiction writers and researchers trying to smooth the footnote bumps out
of their reports. I imagine, though, that we're going to see more and more
violations of this design ideal as more and more people bend the Web to uses
for which it was not originally intended. Aronson has good advice on the
quirks of particular browsers, in particular in how they handle partial URLs.
HTML Manual of Style lacks some reference material that I'd like to see.
There's no complete reference on URLs, nothing on CGI or multimedia or server
issues. And the HTML tag reference could be more complete; it lists, but
doesn't explain, the values for tag attributes.
Aronson uses real Web pages as examples and does real critiques on them. There
are two schools of thought on examples: Some people think that only made-up
examples can get across their pedagogical points, while others think that
real-world examples are the best way to point out real problems. I'm with the
latter group. Maybe it's because I get a perverse pleasure out of watching
real people's work being picked apart. Aronson isn't cruel, but he is honest.
He cites errors and evaluates their seriousness. Each of the pages he's chosen
exemplifies some virtue, so most of what he has to say is positive (and
useful). One of the best pages he presents is by John December. The first 30
pages of this book are the closest I got in my research to what I was looking
for: a truly short course in HTML. The rest of this book is style advice and
reference. A clear, well-organized HTML manual.


Massive Tome


John December and Neil Randall have produced a huge book entitled The World
Wide Web Unleashed (Sams Publishing, 1994, ISBN 0-672-30617-4). It attempts to
cover everything about the Web: how to connect to the Web as a user, reviews
of browsers, a tour of Web sites, HTML, and the future of electronic commerce.
The book is 1058 pages long; nearly 100 pages of that is appendices, and these
are nearly all links to useful sites. These sites are the most important part
of the book. John December, well known on the Web for his lists of resources
and informed commentary, wrote the HTML and Web-development sections. Randall
and several others joined December in writing the less technical material.
I guess I'll give the authors purity points for wondering why Web browsers
make it possible to print out pages. Larry Aronson knows one answer that
should be obvious to December and Randall: so that you can produce decent
examples of Web pages in a book on Web-page design.
HTML is covered in the broader context of designing Web pages. It's not
concise, but it's solid. There is nothing substantial on CGI or multimedia,
and there is no comprehensive URL reference--surprising in a book this fat.
The book is way fatter than it needs to be: The authors are not concise, the
very useful links in the appendices belong on a disk or a disc or a Website,
and some of the chapters are merely informed or interesting speculation.
Nevertheless, there is a lot of good information here, and if you are your
company's Web guru, you probably will appreciate having it on your shelf.


No Nonsense


Mary E.S. Morris takes a no-nonsense approach to her subject. Her HTML for Fun
and Profit (Prentice Hall, 1995, ISBN 0-13-359290-1) jumps right into the
details of setting up a server. By the time she gets through with that, she
has probably weeded out the people who just want to mark up documents and has
narrowed down her readership to people capable of and willing to take on the
whole process of creating and publishing Web pages.
This 264-page book has more information on the specific topic of relative URLs
than any book I've looked at. I'd picked this topic deliberately as one
benchmark of completeness. The coverage Morris provides would be very useful
if you were setting up a site that you might later need to move to another
machine, or that you might want to mirror.
Morris's coverage of server includes and CGI scripting is strong. She gives a
good introduction for UNIX, Mac, and NT systems and provides a number of
useful Perl scripts. As for HTML specifically, the coverage is good. The
reference table on HTML tags doesn't indicate permissible nesting, which some
of the other books are clear about. Her discussion of forms is clear and seems
exhaustive. This is a good book for people setting up and running Web servers.


Charting a Course


Teach Yourself Web Publishing with HTML in a Week (Sams Publishing, 1995, ISBN
0-672-30667-0) is the first of two books on HTML by Laura LeMay. The second,
which I haven't seen, is Teach Yourself More Web Publishing with HTML in a
Week. Now, here is a book (two, actually) obviously designed for a course.
And, equally obviously, not for a one-day course. LeMay's 403-page book
implicitly makes the case that the one-day intensive course is a bad idea. If
it takes her a week (or two) and she's bragging about it, well.... I found
several of the things that I was looking for specifically. Her HTML reference
indicates the permissible nesting of tags, and hers is the only book that
contained what I considered a sufficiently thoughtful discussion of the pros
and cons of lumping apparently separate pages into one file, linked using
named anchors.
LeMay gives a lot of design advice, including a discussion of storyboarding.
Her URL reference includes the specification for the URL for nonanonymous ftp,
which is rare in HTML books. She has good advice on URLs, such as when not to
use File URLs. She gives an overview of CGI scripting and imagemaps. Possibly
she gets into these subjects more deeply in week two.
LeMay is of the made-up-examples school. It works well for her. This is a good
course book and a good reference; although I don't think I'd call it "the most
complete HTML reference I have seen" as the cover blurb does.


Duke of URLs


It's probably foolish to talk about "the most complete HTML reference I have
seen," since I'm sure I'll see three or four more by the time this column sees
print. Nonetheless, if I were to nominate a most complete, it would probably
be Ian Graham's HTML Sourcebook (John Wiley & Sons, 1995, ISBN 0-471-11849-4).
This 416-page book is an introduction to, and reference for, HTML, URLs, HTTP,
and CGI scripting.
The HTML coverage is clear and readable. I found most of the things I was
looking for. The coverage of URLs is the best I've seen. Like LeMay, Graham
gives the URL for nonanonymous ftp; his discussion of Gopher URLs suggests, as
the other books don't, that you might actually want to support the Gopher
protocol. He also discusses the rlogin URL, personal directories using the
tilde character, and fragment reference using the # character. His is the only
book in which I could find out what to do with a filename that includes a
forward slash.

The discussion of CGI scripting is mostly an overview with links to tools.
There is a good chapter on the HTTP protocol, and an appendix on MIME.


Potion for the C-Sick


The rest of this column is the third installment in my ongoing look at
alternative programming paradigms.
These tend to be embodied in little languages, sometimes in the Jon Bentley
sense and sometimes in the sense of a pared-down, single-author implementation
of a paradigm that typically tromps a bigger footprint on the disk. Tiny Ada,
as it were. As it is, in fact, although not this month. What I'm looking for
when I look at these little languages is what makes the paradigm distinctive,
and the state of its health. My interest is not in the merits of the paradigms
or of the implementation so much as in their distinctiveness and their chances
for survival in the Darwinian struggle. I confess that this comes out of a
blind, a priori faith in the value of diversity. Save the paradigmatic rain
forest, that's my motto.


Rewriting ReWrite


Roy Ward (rward@random.otago.ac.nz) has written an interesting little language
called "ReWrite." It runs only on the Macintosh, requiring at least a 68020
and System 7.0, and will run in emulation on a PowerMac. It is a compiled
language, and the ReWrite compiler is written in ReWrite. ReWrite is
interesting specifically as a testbed for exploring the rewrite-rule
programming paradigm.
Programming using rewrite-rule syntax is usually encountered in functional
languages like Haskell, ML, Miranda, Clean, or in the functional mode of
Mathematica. When you use ReWrite, you feel as though you're using one of
these functional languages. You define functions with rewrite rules. Example
1(a) is the definition of the factorial function in ReWrite. The basic syntax
is a list of rules specifying a transformation; see Example 1(b). You can add
conditions (or guards) to these rules; Example 1(c) is that factorial function
in a more-robust form. The modification inside the brackets specifies a
condition for the match to take place, and the one outside the brackets
specifies an additional condition (beyond the match) that must be satisfied
for the rule to apply. More precisely, these two forms of rule syntax are as
in Example 1(d), where name is a token, patterns is zero or more patterns
separated by commas, condition is an expression, and results is zero or more
expressions separated by commas. A pattern can be a constant (optionally
coerced to a type), a token (which matches any single value, optionally
conditioned to a type), or a list or portion of a list of values. The list
representation is somewhat Lisp-like, and allows ReWrite to define core Lisp
functions, as in Example 1(e).
As is the case with proper functional languages, these functions don't have
side effects (other than obvious ones like screen output).


No Garbage


But underneath the rewrite syntax, ReWrite is working in an applicative way.
That is, the code is fully compiled and there is no eval mechanism, no garbage
collection, and no other complicated memory management. Functions clean up
after themselves, including all the list-processing functions. And code
compiles to "moderately efficient" 68020/68030 machine code. Ward presents an
example program in ReWrite and Pascal to find the nth prime. In one test,
nprime [2000] takes 32 ticks in Pascal and 134 ticks in ReWrite. This isn't
bad, surely, for a language implementation designed for exploratory purposes
and only in an early rev, but it does require a lot of explicit typing of
variables. A naive ReWrite version of this program is a lot slower.
On the other hand, ReWrite is a lot faster than Mathematica, which has a
similar syntax. The reason is that Mathematica is interpreted, and ReWrite is
strictly compiled. And Mathematica costs money, while ReWrite is freeware.
On the third through fifth hands, Mathematica is a robust, professional,
supported product that runs on many platforms, and a major new version of
Mathematica is due out imminently. I hope to write about it soon. 
Example 1: Using ReWrite.
(a)
factorial [0] -> 1;factorial [n] -> n * factorial [n-1];

(b)
rule [pattern] -> result;

(c)
factorial [0] -> 1;factorial [n : int] :: n>0 -> n * factorial [n-1];

(d)
name [patterns] -> results;name [patterns] :: condition -> results;

(e)
car [ {x, . rest} ] -> x;cdr [ {x, . rest} ] -> rest;cons [ x, rest ] -> {x, .
rest};
























C PROGRAMMING


Quincy 96: If You Only GNU




Al Stevens


Ann Lindsey Williams was the prettiest girl in the sixth grade, which made her
the prettiest girl in the world, at least as far as my world extended.
Consequently, I was overjoyed when the principal teamed us to sell milk in the
Lorton, Virginia Elementary School cafeteria. The Fairfax County School Board
had allowed that two students could be released from class a few minutes early
to set up the milk line and sell the milk. In exchange for our time and
service, which also meant that we ate alone after the shift was over, we each
earned a carton of milk, a significant contribution to the economies of our
families' budgets in those tough times. It was an assignment of some
consequence, a matter of duty, not to be taken lightly. Besides performing an
official function and providing economic relief for my own personal daily
operating expenses, I got to have lunch every day with Ann Lindsey Williams,
the prettiest girl in the sixth grade.
Billy Alvey came through the line every day, tossed some coins on the table,
paying a nickel or a dime more than his milk cost, and said with a superior
smirk, "Keep the change." Billy liked to impress Ann Lindsey, although I doubt
that he noticed me. Patiently each day, I followed Billy to where he sat down,
returned his change to him, and explained that we were not allowed to keep the
change. No suggestion of impropriety, graft, bribery, kickback, or any such
indiscretion, imaginary or otherwise, would be allowed to jeopardize my
position.
After several days, and with this scenario repeating itself each day, Ann
Lindsey asked me why it was that we couldn't keep the change. I looked around
the cafeteria, which doubled as an auditorium for student assemblies and a
meeting and dance hall for the PTA. I regarded the Virginia State flag in one
corner, Old Glory in the other, the patriotic mural on one wall, portraits of
George Washington and Harry Truman on the other, and said after a moment's
thought, "I think it's because we work for the government."
Those sure were simpler times.


GNU Projects


Recently I undertook a project to develop a CD-ROM, in conjunction with DDJ. I
had to choose between that project and writing the definitive Minesweeper for
Dummies for Trudy Neuhaus at IDG books. The CD-ROM looked like it would be
easier.
Plans for the project include multimedia, interactive, razzle-dazzle displays
under Windows 95. The tutorial content comes from three books I've written,
one on programming in general, one on learning C, and another on C++. Users
will switch between the tutorial sessions and the programming exercises
seamlessly, or so the plan goes. A definite must for the CD-ROM is a small,
common software-development environment so that users can compile and run
exercise programs and do some programming on their own. The books included
that kind of support with QBasic, a mainstay of DOS; Quincy, a home-grown C
interpreter; and GNU C++. Similar solutions but three completely different
development environments.
I plan to continue to use QBasic for the first part of the tutorial, but
neither Quincy nor GNU C++ fit into my vision of an integrated, online
tutorial environment. That requirement has spawned "Quincy 96," a new "C
Programming" column project kicking off this month. Quincy 96 is a Windows 95
version of Quincy that supports standard C and C++.
To review, the Quincy C interpreter runs in DOS under the D-Flat
user-interface C library. I developed the program from an earlier version of a
K&R interpreter. I included Quincy on the companion diskette of the C tutorial
book and described it in this column last year. The new Quincy is not an
interpreter. Instead, it is a Windows-hosted integrated development
environment (IDE) that launches C and C++ compilers to compile and link
projects and execute the compiled programs.
Let's consider the reason for a new Quincy by addressing some of the drawbacks
of the old one. First, the old Quincy (Version 4) is a source-code
interpreter. Consequently, it sets no speed records when executing C programs.
Second, Quincy 4 has a few bugs. It works well with the tutorial programs in
the book, but students have managed to uncover small quirks in Quincy's
interpretation of C. Third, the old Quincy interprets programs consisting of
one translation unit only--one source-code file and its included headers. The
standard C library functions are implemented, but Quincy 4 has no linker to
link multiple object modules in a project. Finally, Quincy 4 is restricted to
interpreting programs written in the C language. As such, Quincy 4 offers
little to the C programmer who is learning C++.
C++ code has been interpreted, I am told, but writing a C++ interpreter is a
bigger chaw than I care to bite off. The efficiency of an interpreter is a
function of the size of the language, the complexity of the programs being
interpreted, and the effectiveness of the interpreter itself. Needless to say,
C++ is big, and C++ programs are complex. Even if I could get an interpreter
running in the time I have, it probably wouldn't run very well.
Having no C++ version of Quincy, I used the MS-DOS port of GNU C++ on the
companion diskette of my C++ tutorial book. I discussed that experience last
August, giving a lot of space and favorable comment to GNU C++, a fact
completely ignored by Richard Stallman of the Free Software Foundation when he
wrote to complain about a different review of GNU C++ in the September issue.
See the December 1995 issue for his letter to the editor.
GNU C and C++ assume a UNIX-like, command-line, text-mode development
environment, which is good enough for a free compiler to be given away with a
book but not quite good enough for what I have in mind. Quincy C and GNU C and
C++ in their current incarnations do not fit well into the interactive
tutorial environment planned for the CD-ROM project.
Obviously, something more appropriate is called for.


What's GNU for Windows 95?


Cygnus Support (http://www.cygnus.com) has ported GNU C and C++ 2.7.1 to the
Win32 environment. The port is still in its beta configuration, but Cygnus has
provided what will eventually be a free Windows-development compiler system.
You can use GNU C and C++ to write Windows 95 and Windows NT protected-mode,
32-bit, GUI programs as well as 32-bit DOS command-line programs that run in a
Windows 95 DOS box. 
I have a compelling reason to use the Cygnus port in addition to the incentive
that it can be freely distributed. GNU C++ 2.7.1 implements many of the new
ANSI constructs--RTTI, new-style casts, template improvements, STL,
namespaces, mutable, bool, typename, explicit. This compiler is the first I've
seen for the PC platform that implements all these constructs, and it is free.
You'll need a decompression/extraction utility that recognizes long filenames
and works with the tar archive and zip compression formats. (I use WinZip 6.0,
a shareware program from Nico Mak Computing. You can download and register the
shareware version of WinZip from CompuServe.)


The Good GNUs and the Bad GNUs


The GNU Win32 package includes both compilers, a make utility, the GNU
debugger, many of the GNU utilities, and a complete set of standard C and C++
libraries. For my purposes, the compilers are almost perfect. First, the C++
compiler provides an early experience in the new ANSI C++ constructs. Second,
my tutorial exercises use standard input and output to demonstrate language
features, and the GNU compilers fully implement that generic interface. But
the compilers themselves work in that same austere environment, and I want
something that integrates with the GUI for the CD-ROM project.


Quincy 96, or What's GNU, Pussycat?


Quincy to the rescue. Quincy 96 is an IDE built as a multiple document
interface (MDI) Windows 95 program. I used Visual C++ 4.0 and the Microsoft
Foundation Class library to build Quincy 96 and learned a lot of lessons about
that environment--which I will share with you as the project progresses.
Quincy 96 inherits its name from the earlier Quincy, which was named after my
daughter Wendy's cat. The cat was named after Oscar Madison on "The Odd
Couple." How's that, you say? Let me explain. Wendy was a fan of Oscar Madison
but unsure of her new kitten's sex. Oscar is a male name. "The Odd Couple" was
in syndicated reruns, and Jack Klugman, the actor who played Oscar, had moved
on to playing the medical examiner "Quincy," a name that Wendy figured would
fit either way. Quincy the cat later cleared up the gender mystery by falling
in love with a traveling salescat and adding significantly to the cat
population at our house.
Quincy 96 has a simple mission. It manages two kinds of documents: project
documents and text source-code documents. A project document is the same as it
is with any IDE--a list of source-code files that, when compiled and linked,
constitute the program. You build the source-code files and add their names to
the project list. You can then build the executable program file with a
command. Quincy checks the date/time stamp of the files to see what should be
recompiled. Quincy scans each .c or .cpp file parsing out #include
preprocessor directives to determine all the appropriate dependency
conditions. When a .c or .cpp source-code file is to be compiled, Quincy
launches the proper GNU compiler to run as a threaded process in the
background. Quincy intercepts warning and error messages from the compiler and
presents them to the user, who can select from them to go immediately to the
errant line of code. Quincy launches the linker when the compiles are
completed and, optionally, launches the compiled/linked executable program.
There is no debugger yet, but I am working on it. This process, end-to-end, is
similar to what Turbo C 1.0 did ten years ago but in the Windows 95 operating
environment and with the addition of support for contemporary C++.
Quincy 96 has an additional mode of operation. If no project document is open,
the build command assumes that the currently in-focus .c or .cpp document is a
stand-alone program to be compiled and linked. This mode permits me to run
short exercise programs like "hello world" without building a project document
for each of them.
Figure 1 is Quincy 96's application window with some source files loaded.


Design Patterns?



Remember design patterns? They were a hot buzz word a few conferences ago.
They kind of fizzled out of the popular press when software-marketing types
found out there was nothing to sell. A small cadre of developers is still
quietly working on the process, and some works have been published. Among
other things, design patterns identify common programming problems and their
solutions. The discussions that follow are design patterns. Well, sort of.
They fit the definition of the problem that design patterns address, but I
haven't applied any would-be formal pattern methodology in forming their
descriptions.
These problems represent routine program operations that should be well
understood by any MFC programmer and, therefore, well documented by Microsoft.
However, I had to painstakingly ferret out their solutions by poring over the
documentation and online help and by experimenting. This is a combination of
my relative inexperience with MFC and the poverty of indexes into the
mountains of MFC reference material. Sometimes there was no description of the
solution, much less the problem. Sometimes the descriptions were difficult to
find because I didn't know where to look or what to look for. Sometimes the
solutions formed from fragments of seemingly unrelated information scattered
about the documentation. The Windows API and MFC include hundreds--maybe
thousands--of undocumented or poorly documented things that programmers need
to know about. Programming by folklore.
Rather than describing each Quincy 96 source-code file in detail, I'll address
individual programming issues and publish the source code that accompanies
each such discussion. I won't dwell too much on the details of how I figured
all these things out; I'll mainly show you the solution and hope that it saves
you some time in your next VC++ MFC project.


Text Editors


Quincy is, first of all, a text editor. Before you can build any executable
programs in an IDE, you have to build source-code files, and so Quincy is an
MDI text editor. This month I'll discuss some of the design patterns that fell
out of the text editing aspects of the design. Next month I'll talk about
other parts of the project and the design patterns that resulted.
An MDI application can consist of multiple instances of multiple document
types. Each open document is represented by one or more document views. Quincy
has several document types with one view per document type. Only one instance
of the project document type can be open at a time, but many instances of the
source-code document types--C files, C++ files, and header files--can be open
for editing.
The Visual C++ Developer Studio allows you to build an MDI application with
documents that have application-specific views derived from the CEditView
class, which gives the documents the properties of a text editor. You can open
multiple editor documents, type into them, cut, copy, paste, and even pretend
you are opening existing documents and saving ones you have changed. All the
code to do that is generated for you by the Developer Studio. Easy. The hard
part is doing more than that.


Editor Font and Tab Stops


The default font for CEditView objects is the system font, which, being
proportionally spaced, is not particularly acceptable for typing source code.
A fixed-spaced font is necessary so that code indentations line up properly.
You can add a CFontDialog dialog box to your user interface and let the user
select a font, but I prefer to have the code font of my choice, 10-point
Courier, as the default for text-editor documents.
The character font is associated with the document's view. To associate a
specific font with a particular view, you first generate the font as a data
variable by instantiating an object of type CFont. Declare this object as a
data member of the derived CEditView class (see Listing One) and create the
font itself in the class constructor by calling the CFont class's CreateFont
member fuwnction; see Listing Two. Finally, apply the SetFont member function
in the class's overriding Create function after calling the base class's
overridden Create function (Listing Three). This action applies the created
font to each open document of that type in the application.
The arguments to CFont::CreateFont are not intuitive. I deduced their values
by writing a program that uses CFontDialog to create fonts and then looking at
the Courier font's corresponding values. This pattern, then, explains only how
to associate a 10-point Courier font with a derived CEditView class object.
You would have to figure out the arguments to use the pattern for other fonts.
Observe that the program also calls the SetTabStops function from the Create
function in Listing Three. I don't like the default eight-character tabs;
they're too wide. The function's argument is expressed in dialog units, which
are 1/4 of a character width, so 16 dialog units add up to one tab stop every
four characters.
This pattern is not particularly tricky. All the functions are well
documented, except for the esoteric CreateFont arguments. The real problem was
that nothing at the highest level pointed me to the functions or told me where
to apply them and in what sequence. You must already know about them in order
to go directly to their documentation.


Line and Column Numbers


Most text editors can optionally display the current line and column number.
Word for Windows displays them in the status bar along with the time of day,
page number, and other stuff. The line number is essential information in a
programmer's editor because compilers report warnings and errors by line
number. The CEditView class does not automatically display the line and column
numbers.
First, you have to extract the current line and column number from the current
document. That sounds simple enough, but try to find out how from the
documentation when you don't already know. Searches through the indexes and
the online help for references to "line," "column," "cursor," and the like
yield nothing. It takes a while to figure out that what we always called a
"cursor" is now called a "caret."
The MFC online-help system is great. You can cruise through class descriptions
and find out all kinds of good stuff. After a thorough reading of the
class-member descriptions for the CEditView class, its embedded CEdit class,
and all its base classes through CView up to CWnd, I eventually found a way to
get the current line and column number.
The line number is easy. The CEdit::LineIndex function returns the character
position of the first character of the current line when its argument is -1.
That value can be converted into the current line number by passing it as an
argument to the CEdit::LineFromChar function.
The column number is more difficult. The trick is to use the CEdit::GetSel
function to get the character positions of the beginning and ending of the
CEdit object's current selection, which is the currently marked block of text.
As it turns out, when there is no currently marked block, the current
selection is defined as beginning and ending at the current caret character
position. When there is a marked block, the ending position corresponds to the
current caret position. The documentation does not explain this behavior, but
it works nonetheless.
You would think that the difference between the current caret character
position and the character position of the first character of the line would
produce the current column number. It does not, however, when tab characters
are in the line, which is likely to happen in source code. The program has to
scan the text of the current line and count the space characters that the view
adds when it displays tab-character offsets. You'll see that after 200
characters of text in a line, I don't bother correcting for tabs. A C++
source-code line is too long if it needs 200 characters.
Listing Four is the function that determines the current line and column
number from a CEditView class and sends those values to the derived CWinApp
class to display in the status bar.
There might be an easier way. There should be. I didn't find it, however, and
I'm sure if one exists, I'll get lots of mail from veteran MFC programmers
telling me about it and about how stupid I am for not already knowing about
it. Where was all that help when I was trying to get this thing to work? One
thing is certain. The CEdit class should offer up these data values more
willingly. Deriving from CEdit to add the behavior wouldn't work because then
you'd have to coerce CEditView into using your derived class instead of CEdit.
Having figured out the line and column values, the application has to display
the data on the status bar. The standard VC++ status bar includes three little
recessed boxes that tell you whether the Num Lock, Scroll Lock, and Caps Lock
keys are on or off. This is a software concession to the knowledge that the
lights on the keyboard that provide the same information get reversed
sometimes. I wanted to add a box to the left of the standard ones to display
the line and column numbers. At the same time I wanted to get rid of the
Scroll Lock box. (Does anyone know what the Scroll Lock key is for? Windows 95
sure doesn't use it to lock out any scrolling. As far as I know its only
purpose is to change the pilot's viewing angle in Flight Simulator.) The
status bar in Figure 1 shows that I got it working okay. Here's how.
Listing Five shows an array, indicators, that VC++ builds into the CMainFrame
source-code file. indicators contains string identifiers. Listing Six shows
the string-identifier declarations in the .rc resource text file. These
strings identify the default text values for the panes in the status bar. The
indicator elements tell the frame window how many panes there are and how wide
they should be. The first indicator, ID_SEPARATOR, identifies the leftmost,
unrecessed pane that the system uses to display menu and tool-button help
messages.
Listing Seven shows the indicators array as modified to eliminate the Scroll
Lock pane and insert a pane to use for the line and column numbers. I tried to
use a different string identifier created specifically for this purpose but
couldn't get it to work. This might be due to the beta copy of VC++ 4.0 that I
have.
Listing Eight shows code added to the CMainFrame::OnCreate function to change
the characteristics of the new pane so that it is wider (120 pixels wide) and
has no initial text value.
Finally, Listings Nine and Ten show functions added to the main frame and
application classes to display data in the new pane while the program is
running.


Text-Data Serialization


MFC includes a data-serialization feature for reading and writing documents.
You provide a Serialize function in the derived document class, and the system
calls it to read and write document data with the appropriate file opened and
a CArchive object that knows how to store and retrieve objects of many
different types and classes.
Normally the document class manages the document's data, and the view class
manages the display of the data. The text-editor view has to keep a complete
copy of the text, so it might as well manage the data all by itself.
The CEditView class embeds a CEdit object to contain the data. The CEdit class
represents the data as an object of type CString, VC++'s string class. If you
use the CArchive object to write and read the text, an object-description
field is inserted at the beginning of the object. This works fine for text
editing, but when you want to pass that file to a compiler or any other
program that expects raw text, the object-description field gets in the way.
The CEditView class includes a SerializeRaw function that reads and writes
text data without the object-description field. Listing Eleven shows how I
used that function from within the derived document class's overriding
Serialize function. The text document has only one view. The
GetFirstViewPosition function returns a position index to that one view that
the single call to GetNextView uses to return a pointer to the CEditView
object associated with the document. A call to SerializeRaw through that
pointer passing a reference to the CArchive object takes care of both reading
and writing. The call to CArchive::Flush for output operations ensures that
the text is written to the file before Quincy 96 tries to pass the file
specification to the compiler.


Text Search and Replace


The MFC documentation is vague on how to implement text search-and-replace
operations in a CEditView derived class. As you read the class-member
descriptions and meander through the hypertext links in the online help, you
get the impression that MFC provides the primitives, and you have to integrate
them somehow. Actually, the process is much simpler. All you have to do is add
menu commands with the ID_EDIT_FIND and ID_EDIT_REPLACE identifiers, and the
framework does the rest.


Suppressing the GNU Document



When you execute an MFC application built by the Developer Studio's AppWizard,
the application's first action is to create a new, empty document. If your
application supports multiple document types, you must select one of them from
a dialog box to tell the framework what kind of new document to create. That
behavior, which reflects the behavior of many Microsoft applications (Word,
for example), is not always what you want. In Quincy's case, the application
should reopen the project and text documents that were open when the
application last ran, similar to how the Developer Studio works. Getting an
application to load previous documents is no trick. Suppressing the creation
of that initial new document is not as easy.
Listing Twelve shows the code generated in the derived CWinApp application
class's InitInstance member function. The CWinApp::ProcessShellCommand
function opens documents specified on the command line or dragged to the
applications icon by the user. If neither condition exists, the function
creates a new document. That's not what I want. If neither condition exists, I
want the application to use values saved in its .ini file to load the
documents from the previous run. By stepping into the ProcessShellCommand
function with the debugger, I was able to develop the test in Listing
Thirteen, which calls ProcessShellCommand only when documents specifications
are provided as command parameters from the command line or a drag operation.


No GNUs is Good GNUs


Enough of the GNU puns already. As you can probably tell, I am a fan of the
GNU compiler suite. First, GNU is free; it offers schools and students a way
to use and learn C and C++ without straining their budgets for individual
compiler purchases and site licenses. Second, GNU's C++ language
implementation is contemporary; it implements many of the new ANSI features
that teachers should be teaching and students should be learning.
So why, then, am I not using GNU compilers as the development platform for my
"C Programming" column projects?
I would have used GNU C++ rather than Visual C++ to build Quincy but for these
reasons: First, the beta of the Cygnus port is not ready to compile C++
Windows programs. A small C demo works, but the C++ compiler chokes on some of
the declarations in the Windows header files. Second, no one has licensed or
ported MFC to GNU C++. That could happen only with Microsoft's blessing.
Third, the Visual C++ Developer Studio, my development environment of choice,
launches the Microsoft compiler and does not work with GNU C++. If the first
two problems are ever solved, I might try trussing up Quincy 96 (97, 98, 99,
ought-ought?) so that it takes on some of the visual aspects of AppWizard and
ClassWizard.
Finally, the terms of the GNU Library General Public License for programs
linked with GNU libraries would wrap the program in a shroud of source-code
availability responsibilities that I do not have time for. The source code to
my projects is always available to everyone, but the GNU license binds you in
perpetuity (or for three years after your last distribution, whichever comes
first) to make available the source code to the GNU libraries, too, with
everything offered on "a medium customarily used for software interchange," as
interpreted by the foundation but not defined in the license. Referring the
user to a third-party source, even to the FSF themselves, is not an approved
medium, although I don't understand why. Neither is making the source code
available for download from an online service or an Internet ftp site. I was
told that the latter suggestion could not be approved because it would exclude
anyone who does not have online access.
I will be including all the Quincy source on the CD-ROM as well as the
complete compiler package, but I don't want to tie that particular product to
programming projects in this column, and DDJ might not want to be required to
sell the product for what the license calls "...a charge no more than the cost
of performing this distribution," upon which the GNU license does not
elaborate. And so for now, I'll plod along with Borland and Visual C++.


Source Code


The source-code files for the Quincy 96 project are free. You can download
them from the DDJ Forum on CompuServe and on the Internet by anonymous ftp;
see "Availability," page 3. To run Quincy, you'll need the GNU Win32
executables from the Cygnus port. They can be found on ftp.cygnus/pub/sac.
If you cannot get to one of the online sources, send a 3.5-inch high-density
diskette and an addressed, stamped mailer to me at Dr. Dobb's Journal, 411
Borel Avenue, San Mateo, CA 94402, and I'll send you the Quincy source code
(not the GNU stuff, however--it's too big). Make sure that you include a note
that says which project you want. The code is free, but if you care to support
my Careware charity, include a dollar for the Brevard County Food Bank. 
Figure 1: Quincy 96's application window. 

Listing One
class CTextView : public CEditView
{
 CFont newfont;
 // ...
};

Listing Two
CTextView::CTextView()
{
 newfont.CreateFont(-13,0,0,0,400,0,0,0,0,1,2,1,49,"Courier");
 // ...
}

Listing Three
BOOL CTextView::Create(LPCTSTR lpszClassName,LPCTSTR lpszWindowName,
 DWORD dwStyle, const RECT& rect,
 CWnd* pParentWnd, UINT nID, CCreateContext* pContext) 
{
 BOOL rtn = CWnd::Create(lpszClassName, lpszWindowName, dwStyle,
 rect, pParentWnd, nID, pContext);
 SetTabStops(16);
 SetFont(&newfont, TRUE);
 return rtn;
}

Listing Four
void CTextView::ShowLineColumn()
{
 CEdit& rEdit = GetEditCtrl();
 // --- character index of 1st char, current line
 int nLineIndex = rEdit.LineIndex(-1);
 // --- current line number
 int nLineno = rEdit.LineFromChar(nLineIndex);
 // --- character index of current position
 int nStartChar, nEndChar;
 rEdit.GetSel(nStartChar, nEndChar );
 // --- read the current line

 char buf[200];
 rEdit.GetLine(nLineno, buf, 200);
 // --- compute tab character adjustment
 int col = 0;
 int tabct = 0;
 for (int x = 0; x < nEndChar - nLineIndex; x++) {
 if (x == 200)
 break;
 if (buf[x] == '\t')
 while (++col % 4)
 tabct++;
 else
 col++;
 }
 // --- current column number
 int nColumn = (nEndChar - nLineIndex) + tabct;
 theApp.ShowLineColumn(nLineno+1, nColumn+1);
}

Listing Five
 
static UINT indicators[] =
{
 ID_SEPARATOR,
 ID_INDICATOR_CAPS,
 ID_INDICATOR_NUM,
 ID_INDICATOR_SCRL,
};

Listing Six
STRINGTABLE DISCARDABLE 
BEGIN
 ID_INDICATOR_EXT "EXT"
 ID_INDICATOR_CAPS "CAP"
 ID_INDICATOR_NUM "NUM"
 ID_INDICATOR_SCRL "SCRL"
 ID_INDICATOR_OVR "OVR"
 ID_INDICATOR_REC "REC"
END

Listing Seven
static UINT indicators[] =
{
 ID_SEPARATOR,
 ID_INDICATOR_EXT,
 ID_INDICATOR_CAPS,
 ID_INDICATOR_NUM,
};

Listing Eight
int CMainFrame::OnCreate(LPCREATESTRUCT lpCreateStruct)
{
 // ...
 m_wndStatusBar.SetPaneInfo(1, ID_INDICATOR_EXT, 0, 120);
 m_wndStatusBar.SetPaneText(1, "");
}

Listing Nine
void CMainFrame::ShowStatus(CString& strNewStatus, CString* pstrStatus)

{
 if (pstrStatus != 0)
 m_wndStatusBar.GetPaneText(1, *pstrStatus);
 m_wndStatusBar.SetPaneText(1, strNewStatus);
 m_wndStatusBar.SendMessage(WM_PAINT,0,0);
}

Listing Ten
void CQuincyApp::ShowStatusText(CString& strText, CString* pstrOldText)
{
 CMainFrame* pMainFrame = static_cast<CMainFrame*>(m_pMainWnd);
 pMainFrame->ShowStatus(strText, pstrOldText); 
}
void CQuincyApp::ShowLineColumn(int nLine, int nColumn)
{
 CString strExt;
 if (nLine != 0)
 strExt.Format("Ln %d, Col %d", nLine, nColumn);
 ShowStatusText(strExt);
}

Listing Eleven
void CTextDocument::Serialize(CArchive& ar)
{
 POSITION pos = GetFirstViewPosition();
 ASSERT(pos != NULL);
 CEditView* pView = static_cast<CEditView*>(GetNextView(pos));
 ASSERT(pView != NULL);
 pView->SerializeRaw(ar);
 if (ar.IsStoring())
 ar.Flush();
}

Listing Twelve
 // Dispatch commands specified on the command line
 if (!ProcessShellCommand(cmdInfo))
 return FALSE;

Listing Thirteen
 // --- suppress initial FileNew command on startup
 if (cmdInfo.m_nShellCommand != CCommandLineInfo::FileNew) {
 // Dispatch commands specified on the command line
 if (!ProcessShellCommand(cmdInfo))
 return FALSE;
 }
 else {
 // ----- reload documents from the previous session
 // ...
 }
DDJ









































































ALGORITHM ALLEY


Binary Search




Micha Hofri


Micha, a member of the computer science department at Rice University, is the
author of Analysis of Algorithms: Mathematical Methods, Computational Tools
(Oxford University Press, 1995). He can be contacted at hofri@cs.rice.edu.


Introduction 
by Bruce Schneier
Binary searches are among those algorithmic staples that have uses everywhere.
If you want to locate an entry in a sorted array, a binary search is the most
efficient way to do so. I can't think of a college course in basic algorithms
that doesn't cover binary searches.
The technique isn't perfect, however. Records must all be the same length and
must be stored in a static array. If the entries are of variable length or in
some kind of dynamic data structure, other less-efficient search
techniques--search trees, interpolation searches, hashing, and the like--take
over. But since the entry can be a key pointing to a more-complex record,
binary-search techniques can be used in many situations.
This month, Micha Hofri sees how efficient he can make a basic binary-search
algorithm. Why are we bothering with something so basic? Micha's analysis is
interesting not so much in what it reveals about binary search, but in how the
algorithm works. Good programmers do this kind of analysis with any algorithm
that has significant effects on the performance of the system. (At least, they
do when they have the time and budget to program right, not when the deadline
is in five hours and management doesn't care what it looks like as long as it
works.) Micha's trade-offs--iteration versus recursion, more simple steps
versus fewer complex steps--are the type that can be made almost everywhere.
Even in a world of ever-increasing processor power and clock speeds, a finely
crafted algorithm is still a thing of beauty.
By the way, I am still interested in hearing your column ideas, whether you
want to write a column yourself or you'd like to see a certain topic explored.
You can contact me at schneier@winternet.com.
Binary search is the method of choice for searching a list for a given key
value. In fact, it is the optimal comparison-based search algorithm (hashing
methods can do better--with some trade-offs). To use binary search, the lists
to be searched must be sorted (we assume in increasing order), of known
length, and indexable in an array. This implies that the keys must be all of
the same size, and that binary search can not be used to search a linked list.
These conditions mean that the ith smallest key can be directly accessed, as
Ai (or A[i], using C-like notation). Binary search can be used to find a given
key or to check whether or not the list contains one. The sorting implies, for
example, that if the key value 6 (assuming all the keys are integers) is
followed by the key value 10, then 7, 8, and 9 are not in the list. 
I'll present here a basic form of binary search, which I refer to as "BS";
I'll also discuss "BS1" and "BS2." Much of the following discussion is based
on Gilles Brassard and Paul Bratley's book Algorithmics: Theory and Practice
(Prentice-Hall, 1988). Since BS is a "divide-and-conquer" algorithm, its form
is naturally recursive; see Example 1(a). Note in line 3 the comment
_(i+j+1)/2_. (The term _a_ is the "floor" of a; it is the largest integer that
does not exceed a.) In particular, on entry, when i=0, and j=n-1, k=_n/2_. To
adapt binary search to check for missing values, you simply change line 2 to
Example 1(b). 


Analyzing Binary Search


When implementing BS, a few design options are possible. All considerations
are based on computing the run time. The purpose here is not just to learn
about BS, but to show the power--and limitations--of this type of analysis.
The first question to resolve is the cost unit. Each basic operation in a
source language (such as C) is taken as one unit, as is each comparison in BS.
However, line 3 in Example 1(a), for instance, needs three operations
(although some compilers can do better than that). Q denotes the cost of a
function call and return; therefore, if BS terminates in line 2, its cost is
Q+1; otherwise it is Q+1+3+1+another call.
To be more precise, suppose A has n entries and x is equally likely to match
each of them. If T(n) denotes the total cost of such a call, then T(1)=Q+1 and
for n>1, T(n)=Q+5+T(r), where r depends on the result of the comparison in
line 4 and is either k=_n/2_ or n-k=n/2 (the "ceiling" of n/2 or the
smallest integer >/=n/2).
Although T(n) is a random variable, we deal here only with its expected value,
t(n). (For a deeper analysis, see my book Analysis of Algorithms: Mathematical
Methods, Computational Tools, Oxford University Press, 1995.) Since x is
equally likely to be each of the entries, it is with probability of _n/2_/2n
in the lower part and n/2/n in the upper part. Figure 1 (a) presents an
equation--here, a recurrence--for the mean running time. This kind of equation
is typical for divide-and-conquer algorithms. The successful approach to
solving this type of equation is: 
1. Simplify. 
2. Look for a sequence of argument values where the equation can be solved by
standard analytic means.
3. Use this as a guide to guess a solution from a table of values produced by
the recurrence. Then use substitution to prove that the guess satisfies the
equation.
Simplifying means replacing Q+1 by a, Q+5 by b, multiplying by n, and defining
u(n)=nt(n). This leaves us with the equation in Figure 1 (b). We then choose
values of n for which the troublesome floor and ceiling functions relent:
powers of 2, or n=2m. The result is in Figure 1 (c). 
The relation n/2=2m-1 suggests a change of notation, so we define v(m)=u(2m)
as in Figure 1 (d). This is finally a standard, first-order recurrence, which
can be solved by the simple iteration in Figure 1 (e). 
So much for these special arguments; writing m=lg n (binary logarithm), we
have u(n)=n(blg n+a). Clearly, n(b lg n+a) cannot be right for all values of
n, since the recurrence generates integer coefficients and lg n is not
integral unless n is a power of 2. We must return to the original recurrence
for u(n), in Figure 1 (b), and use it--floors, ceilings, and all--to generate
several values, compare them with the special solution, and look for a
pattern. Here, after computing about ten values, everything falls into place;
when n=2m+r, 0 </= r<2m, where m=_lg n_, then the solution in Figure 1 (f)
fits all the generated values. When r=0, we get back to the special solution.
Testing by substitution into Figure 1 (b) is successful, so t(n)=u(n)/n=a+b
(_lg n_+2r/n). Since r/n < 1, this result supports the standard claim that "BS
runs in logarithmic time."
So much for the expected run time. Determining its variability requires a
different kind of computation, one that addresses the probabilistic structure
directly. I show this in my book; a partial result is that the number of
iterations has only two possible values, _lg n_ and lg n (if n=2m, these are
both m).


Recursion Elimination


Having computed the expected cost of BS, let's try to reduce it. One obvious
source of cost is the recursive call at each level of the procedure. This is
"tail recursion," where the last instruction performed at each level is a
single recursive call. The call is easy to replace using a single loop; see
Example 2. The logic is identical to that of BS, except that a new call is not
initiated at each level. Instead, either the left or right end of the interval
is adjusted.
The logical structure of such a program parallels that of a recursive program,
but the analysis is entirely different. In BS, every instruction was done (at
most) once at each level; not so in BS1. That's why the left-most column is
added in Example 2: The number of times each line is repeated is given as a
function of the searched interval size. The symbol a(n) denotes the number of
times that the main instructions (lines 4 and 5) are executed. We define p1 as
the probability that the condition in line 5 is satisfied and q1=1-p1. We make
the crucial assumption that p1 depends in the same way on the interval size
each time line 5 is reached. This is the precise analog of the assumption we
made when analyzing BS--and for the same reason. Regardless of which
subinterval still needs to be searched, the assumption that x is equally
likely to be in any position translates without change: The previous
comparisons delimit the subinterval to be searched, but provide no other
information on the location of x. Hence p1 always has the same functional form
as at the first iteration, _n/2_/n, and q1=n/2/n.
If a' is the cost of the lines performed once--1, 2, 3, and 8--then a'=Q+2. b'
counts the cost that recurs a(n) times--lines 3, 4, 5, 6, or 7--so b'=6. The
equation to determine a(n) follows from a description of the search: Starting
with an interval of size n, one iteration leaves us with an interval of the
size _n/2_ (in probability _n/2_/n) or n/2 (with the complementary
probability n/2/n); hence the equation in Figure 2 (a). Up to a factor of b,
this is the same as Figure 1 (a), so the solution is familiar: n=2_lg n_+r
yields the result in Figure 2 (b). The amount saved by this modification is
t(n)-t1(n), and writing it in full yields the equation in Figure 2 (c).
The last relation allows us to estimate Q (without reading arcane compiled
code). Table 1 represents the average time per search (in ms) on a standard
workstation running under a UNIX-like operating system. Consider for the
moment only the left four columns; compute for each n the value _lg n_+2r/n
and perform a linear regression between these values and the column labeled
BS-BS1. This yields the intercept -0.248 and slope 0.383. Since the intercept
(in admittedly flexible units) is given in Figure 2 (c) as -1, the value of Q
in these units is approximately 2.54--about two and a half basic operations.
The cost of recursion is not trivial for a procedure with such a short body.
Still, it is hardly onerous. However, this evaluation disregards two important
facts:
Q depends on the number of variables pushed into (or popped from) the stack;
here 3(i,j,x). This is actually two addresses and x. It would differ for calls
with other argument vectors.
Q's value is materially influenced by details in the implementation of stack
operations and cache and system-storage management. On an older, slower
minicomputer with essentially the same operating system, Qapproximately equals
1.9. This approach to estimating Q failed on a different UNIX-like system,
when BS suffered penalties for recursions of depth beyond 5 that raised the
running time by an unexpected factor of 5 to 8! (Yet another experiment
suggested that Q without stack-management overhead is somewhat less than 2.)
This was the highest price I saw paid for a flexible, dynamic
memory-management policy.


Three-Way Comparison


The next modification was prompted by the observation that T(n) is essentially
constant, while BS and BS1 sometimes "locate" the desired key early in the
search (BS and BS1 are unaware of this, since equality is never tested, only
inequality). The value x could be in position _n/2_ (located at the first
stab) but the routines would just go chugging along, only to return there lg n
steps later. Example 3 adds just such a test. While BS1 always makes two
comparisons per iteration, BS2 needs two in about half of its iterations, and
three in the other half--but presumably fewer iterations. Is this a good idea?
Analysis should provide the answer: We find in BS2 the same probability p1
used before, Pr(x<A[k])=_n/2_/n, In addition, we have p2, defined as the
probability that x=A[k], given that x is not in the first k positions; hence
p2=1/(n-k). The dependence of the exact number of operations on the results of
the comparisons is somewhat more complex here. It is evaluated by considering
the average number of iterations BS2 will perform, again assuming that x is in
the array and is equally likely to be any element. Reading the evolution of
s(n) from the code in Example 3 yields the equation in Figure 3 (a). This
equation is more complex then those presented so far; for example, two initial
values are required to drive the recursion. They too can be deduced by reading
BS2: s(1) is obviously 0, and s(2) is always 1. Since k=_n/2_, then
n-k-1=n/2-1.

Define the function b(n)ns(n), to get the equation in Figure 3 (b), where the
initial values are b(1)=0 and b(2)=2. Picking n as a power of 2 does not help
as before; instead, we pick n=2m-1, thenn/2-1=_n/2_=2m-1-1. We find a
first-order recursion, similar to the equation in Figure 1 (c). This is shown
in Figure 3 (c). Using the initial value for m=1, which is b(2-1)=1s(1)=0, a
simple unreeling reveals the solution in Figure 3 (d). Trying to guess the
general solution is more difficult (I needed more than 30 terms to see the
pattern) because the increments change as n reaches 3/2 n* (where n*
represents 2_lg n_). Figure 4 (a) shows the guessed solution; g(n) is given in
Figure 4 (b). Substitution confirms this as a solution to the equation in
Figure 3 (b).
Finally we compare the costs t1(n) and t2(n). First, how many iterations, on
average, does BS2 save over BS1? The difference is a(n)-s(n), and is given in
Figure 5 (a) for the range n*n<3/2n*. 
As n increases from n* to 3/2n*, this difference varies from 1.5 to 0.66
(approximately). In the rest of the range, the equation in Figure 5 (b) varies
correspondingly, approximately from 1.33 to 1.5. The savings are modest and
are essentially independent of the size of n, but dependent on the ratio n/n*.
Hence the difference between the running times can be roughly set to Figure 5
(c).
Allow the same cost C for any operation (comparison, arithmetic, or
assignment) so that the two cost factors in Figure 5 (c) are nearly 6C and C
(for two operations, at about half the iterations), respectively. Using only
lg n*, the leading term from the solution for s(n), shows that BS1 is better
when the equation in Figure 5 (d) applies.
If we take 4/3 as a representative value for Dn, we can expect the extra check
in BS2 to pay off unless the array is more than about 200 entries long, with a
relatively important dependence on the ratio n/n*. Table 1 bears this out.
Since the logarithm of n* appears in this relation, the critical length is
sensitive to minor variations in the implementation. Higher precision requires
consideration of the compiler and the machine code it generates, as well as
details of the machine architecture, such as availability of
increment/decrement instructions, separate data and instruction caches, and
the like. 
Note that in some machines, condition codes (with computed branches) allow for
a three-way comparison at no added cost. If a compiler is designed to take
advantage of this and can use a single comparison for lines 5 and 7 of BS2,
then it will outperform BS1 on the average for any value of n. Another
possibility is to code it directly in assembly language. Since binary search
is used so frequently, this may be a reasonable approach.
Example 1: BS, recursive form of binary search.
(a)
1. BS(A[i..j],x)2. if (i==j) return i; /* This is it! */3. k=(i+j+1)/2; /*
Computes _(i+j+1)/2_ */4. if (x<A[k])return BS(A[i..k-1],x); /* When x is
below, */5. else return BS(A[k..j],x); /* or above the midpoint */

(b)
2. if (i==j) {if (x==A[i]) return i; else return -1;}
Example 2: BS1, first iterative form of BS.
(1) 1. BS1(A[n],x)
(1) 2. i=0; j=n-1;
(a(n)+1) 3. while i<j
(a(n)) 4. {k=(i+j+1)/2;
(a(n)) 5. if x<A[k]
(p1 a(n)) 6. j=k-1;
(q1 a(n)) 7. else i=k;
(1) 8. } return i;
Example 3: BS2, second iterative form of BS.
(1) 1. BS2(A[n],x)
(1) 2. i=0; j=n-1;
(s(n)+1) 3. while i<j
(s(n)) 4. {k=(i+j+1)/2;
(s(n)) 5. if x<A[k]
(p1s(n)) 6. j=k-1;
(q1s(n)) 7. else if (x>A[k])
(q1q2s(n)) 8. i=k+1;
(q1p2) 9. else i=j=k;
(1) 10. } return i;
Figure 1: Analyzing BS.
Figure 2: Recursion elimination.
Figure 3: Three-way comparison.
Figure 4: Form of the guessed solution.
Figure 5: Determining what BS2 saves over BS1.
Table 1: Average time per search (in ms) 
n BS BS1 BS-BS1 BS2 BS1-BS2
10 4.22 2.99 1.23 2.76 0.23
100 7.13 4.93 2.20 4.71 0.22
1000 10.18 6.88 3.30 6.99 -0.11
10000 14.96 9.87 5.09 10.45 -4.51




















PROGRAMMER'S BOOKSHELF


Algorithms for C and C++ Programmers




Dean Gahlon


Dean is a senior software engineer at a network communications company in
Minneapolis. He can be reached at dean@network.com.


I find computer books that deal with algorithms more interesting than, say,
entry-level C or C++ books. Algorithm books provide tools I can use directly
to build programs, and generally are lasting reference works. Practical
Algorithms for Programmers, by Andrew Binstock and John Rex, and Practical
Algorithms in C++, by Bryan Flamig, both fit this description.
Although these two books cover much of the same ground, they approach the
topic from different angles. The focus of Binstock/Rex is on the algorithms
(expressed in C), while that of Flamig is more on expressing algorithms in
C++. In other words, Binstock/Rex is an algorithm book; Flamig is a C++ book
covering algorithms.
In general, Binstock/Rex is more accessible than Flamig, both in writing and
coding. The authors go into detail about the trade-offs among the variants of
various algorithms. They do a good job of giving the information you need to
make an informed decision as to the best version for a particular application.
Flamig, on the other hand, is briefer and tends to say less about the possible
options. As to the coding, Binstock/Rex is clearer, with comments explaining
each section of the program. Comparatively, the code in Flamig is somewhat
terse. (I'll admit that part of my preference for Binstock/Rex is that I like
their indentation style more than Flamig's. Also, Binstock and Rex use more
white space, which makes the code more readable.)


Reading for Reference


For the most part, Practical Algorithms for C Programmers is better organized
than the Flamig book. For example, Binstock/Rex thoroughly cover string
searching in one chapter (including some interesting approximate-string
algorithms), then move on to cover other topics. Flamig, however, discusses
string searching in part of one chapter, then returns to the topic again in a
later chapter on finite-state machines. Although it makes some sense to put
that particular algorithm (the Aho-Corasick string-matching algorithm) in the
chapter on FSMs, a potential reader is likelier to say "I want an algorithm
for string searching" than "I want an algorithm using finite-state machines."
For reasons such as this, Binstock/Rex is the better of the two as a reference
work. 
Although it's probably a small thing, I also found the index in Binstock/Rex
to be more helpful and complete than the index in Flamig. I tested this by
looking up "string searching" in the index to each book; there was an entry
for it in Binstock/Rex, but none in Flamig.
I also prefer the references in the Binstock/Rex book. They use notes at the
end of each chapter, with informative, explanatory sentences. Flamig has just
a bibliography at the end of the book. While the latter is more scholarly, I
find going to the back of the book to find further references breaks up the
flow of text. I prefer all the information on one topic to be together. I also
like Binstock/Rex's references because they mention the drawbacks of the
article they're referring to. For instance, in listing the ElfHash routine,
they warn you that some published versions of the routine leave out a crucial
character. The erroneous version would cause the function to return zero every
time it was called, which isn't exactly desirable.


Beyond C and C++


If what you're looking for is a C++ book that goes beyond teaching the
language, Flamig is fine. Hidden within its code listings are some interesting
practical uses of some of the subtleties of C++. One example is combining the
placement new operator with a class's copy constructor to construct an
instance of that class in place.
One interesting idea in Flamig (which forms the basis of a good portion of the
book) is that of generators. A generator is something like an iterator (as
widely represented in C++ literature and class libraries). However, rather
than stepping through an existing set of objects, as an iterator does, each
call to the generator generates the next object in the set. I think the idea
of generators could lead in several potentially useful directions. However,
the author's introduction of them results in a lengthy discussion of unwinding
recursion into iteration. This ends with some semireadable code involving GOTO
statements. (I'm afraid I'm not as well-disposed toward GOTOs as Flamig is.)
Although I find the idea of generators intriguing, I'm not quite sure that
it's worth the cost of the readability of the resulting code.
In some sense, the main advantage to Flamig isn't so much the algorithms as
the practical examples of class construction and other routines using C++. It
is not, however, a guidebook for object-oriented design. Except for the idea
of generators, the entire question of object orientation is relatively
incidental to the book's presentation. But since there are many other books on
object-oriented design out there, I don't consider this a flaw in the book.


Beyond the Ordinary


Quite often, Flamig's Practical Algorithms in C++ deals with some slightly
more advanced topics--permutations, graphs, finite-state machines, and the
like. Binstock/Rex's Practical Algorithms for C Programmers, however, takes on
some out-of-the-ordinary topics: date and time calculations,
arbitrary-precision arithmetic, and data validation and integrity. The
material on heaps is much more complete in Flamig than it is in Binstock/Rex.
Binstock/Rex, though, covers more basic data structures: linked lists, trees,
and the like. To be fair, Flamig apparently covered much of this in his
companion book Practical Data Structures in C++ (John Wiley & Sons, 1993, ISBN
0-471-55863-X).
Hashing is a good example of the two books' relative coverage of a specific
topic. (Hashing is a method of storing data so that it is retrievable via what
is essentially a table lookup. A hash function converts the key for the item
being stored into an index into the hash table.) Both books cover the topic
adequately--in fact, both present the same optimal hash function, taken from
the same source. However, Binstock/Rex goes into greater detail in explaining
the trade-offs involved in the various types of hashing (or, more accurately,
collision handling). Practical Algorithms for C Programmers also details some
ideas on how to get good performance out of hashing. Flamig covers hashing
more broadly, discussing, among other things, a class that implements
file-based hashing (included on disk) and rebuilding the hash table when it
grows too large.


Conclusion


Both Practical Algorithms for C Programmers and Practical Algorithms in C++
present a wide range of algorithms in source-code form and go into detail when
describing specific algorithms. Neither book leaves algorithms as exercises
for the reader, as do many algorithm books that are designed as textbooks.
I prefer Binstock/Rex's Practical Algorithms for C Programmers because it
covers more ground. However, it's a close race. Flamig's focus on C++ issues
is a strong pull. Although Flamig's book is more theoretical than that of
Binstock/Rex, it still keeps close to the "practical" in its title. (The
practicality of the algorithms in Binstock/Rex speaks for itself.)
Binstock/Rex has a minor disadvantage in that getting the source code on disk
requires a separate payment to the publisher, because it's not included with
the book. To me, however, that's a small problem.
Practical Algorithms for C Programmers
Andrew Binstock and John Rex
Addison-Wesley, 1995 512 pp., $29.95
ISBN 0-201-63208-X
Practical Algorithms in C++
Bryan Flamig
John Wiley & Sons, 1995 450 pp., $43.95
ISBN 0-471-00955-5


































































SWAINE'S FLAMES


Veronica, Billionaires, and Spelling


Veronica, n. A technique in which the matador stands immobile while passing
the cape slowly before the charging bull.
Veronica, n. A technique in which the system stands immobile while passing
menu items slowly before the dozing user.
The similarity sorta makes you wonder if the traditional Spanish protocol for
interacting with bulls influenced the traditional Minnesota protocol for
interacting with Gophers.
Speaking of rodents, apparently Steve Jobs is now a billionaire. (Oh, maybe
that was a little obscure. Here's what I was thinking: Steve's company Pixar
is working closely with Disney on some all-digital-animation feature films,
and Disney is closely associated with a mouse. Hence the rodent reference, you
see.)
Anyway, with the completion of Pixar's first film, Toy Story, the paper value
of Steve's holdings adds up to more than he ever had while at Apple. There are
enough ironies here to satisfy even a journalist's appetite, not the least of
them being that it was Pixar, acquired from George Lucas in 1986, rather than
NeXT, the company Steve built from scratch after leaving Apple, that
ultimately paid off. If Steve is a role model, the message for Jobs wannabes
is apparently, "Hitch your wagon to someone else's cast-off dream."
"I don't think I'm personally a role model," Bill Gates told Newsweek
interviewers in a special "Kissing Up to the Billionaire" issue of Newsweek
that contained an excerpt from Bill's book, The Road Ahead. Pass the word.
America's youth needs to know this.
I hope you appreciate the sacrifices I make for you. For example, I read
everything written about Billionaire Bill so that you don't have to. What you
get here is just the cream. You won't read, for example, "The King of Comdex,
if not the computer industry, if not the future itself. The richest man on the
planet, and maybe the smartest: William Henry Gates III." You'd read that in
the KUTTB issue of Newsweek, under Steven Levy's byline. 
<Murphy Brown voice> 
"Jeez, Steven, did you really write that tripe, or just hold your nose while
an editor lathered it on?" 
</Murphy Brown voice>
Everybody is writing about Bill. Here, though, is the bottom line. Ready?
It's not about Microsoft and how it's missing the Internet window or not
missing the window, or how Netscape is going to be what Microsoft might have
become, or Microsoft is really going to become what Netscape is expected to
become, or about Bill's vision of the future, which is a mosaic (you should
pardon the expression) of other people's dreams, whether cast-off or not.
It's not about his divinity, another topic explored in Newsweek, or his house,
or his attitude toward Apple or IBM or the Justice Department.
It's about his attention span.
Many of the pioneers of the information revolution were in it for the thrill
of the new, and when it started to get familiar, they lost interest. Some left
and did something else, like Mitch Kapor, who became a lobbyist. Some stayed
and went through the motions, like, well, you know a few, I'll bet. And some
were one-trick ponies who had nothing to offer after the first trick.
Here's the bottom line: Bill won't leave. He won't lose interest, he won't get
bored, and he is not a one-trick pony. He is, though, a one-track mind, and
that track is software. As long as there's software, expect him to be involved
with its making and distributing.
Now isn't that a thought to wake you in the middle of the night in a cold
sweat? 
Snit of the month. The growing use of e-mail, not to mention Web-page
publishing, threatens to reverse the trend toward illiteracy among the
supposedly educated without at the same time improving their spelling. The
following flame is intended only for those who need it: The word "independent"
contains no "a"s. Zero. None. Nary an "a". It is not spelled "independant."
Don't make me tell you again. The word "separate," on the other hand, has two
"a"s. It is not spelled "seperate." There is no common English word that
begins "sepe-". Flush that sequence from your cache.
I wouldn't have to spend time on this stuff if you would all just shape up.
Michael Swaineeditor-at-large
mswaine@cruzio.com


































OF INTEREST
Walnut Creek CD-ROM has announced The Official POV-Ray CD-ROM, an officially
sanctioned compilation of resources for the freeware POV-Ray raytracer by a
POV-Team member. It is both PC and Mac compatible, with preformatted
Mac-folder layouts. More than a simple collection of files, the Official
POV-Ray CD is an indexed, cross-referenced guide to POV-Ray and 3-D computer
graphics. The global index has over 10,500 lines of descriptive text for the
3113 files on the CD-ROM.
Included on the CD is Version 2.2 of POV-Ray, with precompiled binaries for
MS-DOS, Macintosh, Amiga, and Linux. It also contains unofficial compiles for
AXP (NT/OSF), BSDI, FreeBSD, HPUX, Inmos T800, NeXT (68k,i386,hppa), NT
(i386), OS/2, Power Macintosh, MIPS R4000 (NT), RS/6000, SGI-IRIX, and Sun.
The full source code for POV-Ray 2.2 is provided for those wanting to compile
their own binary. The CD sells for $39.95.
Walnut Creek CD-ROM
1547 Palos Verdes Mall, Suite 260
Walnut Creek, CA 94596
510-674-0783 
http://www.cdrom.com
Microsoft has announced two new SDKs for Internet-related development. The
online Internet SDK is for content providers and Webmasters who create both
Web content and Web sites, while the Internet Business Development Kit is
targeted for resellers and system integrators. 
The Internet SDK will provide--in an online format-- information and tools
traditionally delivered by disk-based SDKs. You will get access to authoring
tools, such as "Blackbird;" server software, including "Gibraltar;" browser
software such as the Internet Explorer; technologies such as Internet Server
API (ISAPI) and OLE; and libraries with current Internet product betas. 
The Internet Business Development Kit will include a beta version of the
Microsoft Windows NT Internet server (code-named "Gibraltar") the Internet
Assistant for Word, Word Viewer, which allows people who don't use Word to
view, print, and follow hyperlinks in Word documents posted on the Internet,
the Internet Explorer 2.0, an Internet browser designed for Windows 95, and a
beta version of "Blackbird," which includes components for design, authoring,
and distribution of multimedia applications. Also included in the kit will be
reviewers guides, white papers, demo scripts, and presentations. 
Microsoft Corp.
1 Microsoft Way
Redmond, WA 98052
206-882-8080
http://www.microsoft.com
ibizkit@microsoft.com
Great Circle, from Geodesic Systems, is an automatic memory-management system
for C/C++ programmers. The tool is designed to automatically eliminate bugs in
C/C++ programs. Unlike manual memory management, which relies heavily on
debugging, Great Circle automatically fixes memory bugs without programmer
intervention. The tool supports all C/C++ constructs, including unions,
multiple inheritance, polymorphism, interior pointers, arrays, composition,
and exceptions. The tool, which supports most popular operating systems and
C/C++ compilers, sells (in object-code format) for $300.00-$500.00 for PCs and
for $700.00-$1200.00 for workstations. C++ source code is also available.
Geodesic Systems
4745 N. Ravenswood Avenue, Suite 111
Chicago, IL 60640
312-728-7196
info@geodesic.com
ParaSoft has announced Insure++ 3.0 (formerly Insight), an automatic, run-time
error-detection environment. Insure++ lets you quickly pinpoint bugs and
provides information necessary them. In particular, the tool detects the "most
wanted" errors: memory corruption; operations on uninitialized, NULL or "wild"
pointers; memory leaks; errors that allocate and free dynamic memory;
operations on unrelated pointers; and more. Version 3.0 enhancements include
support for Sequent, Tandem, VMS, Lynx, and Linux platforms, in addition to
the previously supported Sun, HP, IBM, DEC, SGI, and x86. Version 3.0 supports
threads and precompiled headers, and also detects leaks by simply linking
(dynamic or static). Insure++ 3.0 is priced from $1995.00 for a single-machine
license. 
ParaSoft Corp.
2031 S. Myrtle Avenue
Monrovia, CA 91016
818-305-0041 
http://www.parasoft.com
Atemi has announced the release of its NetShade 2.0 network encryption
software. NetShade 2.0 can be used in conjunction with Web browsers, e-mail
systems, and other applications to secure information sent over the Internet.
NetShade v2.0 is designed to work across different platforms: For example, a
Windows user will be able to communicate securely with a Macintosh user.
NetShade combines the RSA public-key cryptosystem with a collection of
encryption ciphers, including DES and Triple DES. Users determine which
algorithm will be used to encrypt a session, depending on how much security is
appropriate. Other NetShade 2.0 features include PGP-compliant authentication
and on-the-fly connection management. The initial release of NetShade 2.0 will
support Windows 3.1 and the Macintosh. Subsequent releases will support
Windows 95, Windows NT, UNIX, and other platforms. 
Atemi Corp. 
202 West Hill Street, Suite 3 West
Champaign, IL 61820
217-352-3688
http://www.atemi.com
America Online has announced its AOL Developers Studio, a program that opens
the AOL online platform to third-party developers, enabling them to integrate
their software with AOL. The program consists of software tools for adding
online functionality, technical support, and product management to assist
partner development, and marketing initiatives to showcase partner products to
the rapidly expanding AOL audience. 
The AOL Developers Studio SDK provides the following for both Windows and
Macintosh: one-button access, allowing limitless links for members to
instantly launch to sites on AOL or the Internet, and also lets developers
create added value to an existing software product; custom access, that lets
developers control the UI while enabling members to download files and
send/receive e-mail, leveraging AOL's up-to-date content, including news,
stock quotes, and mail; interapplication communications, so that you can write
add-on applications for the AOL service enabling users to access the power and
interactivity of AOL, including e-mail, chat, instant messaging, file
transfer, and other functions; and a games API, for developing interactive,
multiplayer online games for players in and out of the AOL community. 
America Online
8619 Westwood Center Drive
Vienna, VA 15469
703-918-2681
SDKPartner@aol.com
Starfish Software, the new company headed by Philippe Kahn, has released its
Dashboard SDK, an API with C/C++ and Delphi 32 source code. The Dashboard SDK
will let you create Dashboard Loadable Modules (DLMs) that can be executed
from Dashboard 95, a Windows 95 application that gives users better
organization of their work area, instant access to frequently used
applications, and a common interface across Windows 3.x, Windows NT, and
Windows 95. One benefit of DLMs is that they provide single-click access to a
custom or vertical-market application. 
Starfish Software
1700 Green Hills Road
Scotts Valley, CA 95066
800-765-7839
http://www.starfishsoftware.com
IST's OpenExchange DLL 1.0, a programming library for importing and exporting
data, lets you read and write spreadsheet files in all major formats without
the overhead of OLE or DDE. You can use the DLL, for instance, to read data
into memory variables, then use that data to update your database with your
own existing drivers. Likewise, you can output data to a new file from memory,
or transfer data directly from one file into another. The DLL is compatible
with Visual Basic, C++, Delphi, PowerBuilder, Visual FoxPro, Clarion, and
other development environments, and supports all major spreadsheet files
(Excel, Lotus, Quattro Pro, dBase, and others). The DLL sells for $295.00.
IST
P.O. Box 3774
Joplin, MO 64803
417-781-3282
74777.2651@compuserve.com
Ventana Communications has announced a pair of book/CD-ROMs for Windows
developers, one covering Delphi and the other Visual Basic. Written by Harold
Davis, the Delphi for Windows Power Toolkit covers dynamic-memory allocation
and management, creating DLLs, applying VBX controls, using DDE and OLE, and
the like. The Visual Basic 4.0 Power Toolkit, authored by Richard Mansfield
and Evangelos Petroutsos, focuses on working with databases (SQL, data
control, and the like), OLE Automation, workgroup networking (MAPI, POPMAIL,
TelNotes), and various API issues (bitmapped graphics, managing .INI files,
and more). The CD-ROMs include companion code, custom controls, shareware,
sample files, and utilities. The book/CD-ROM packages sell for $49.95 each. 
Ventana Communications

P.O. Box 13964
Research Triangle Park
NC 27709-7955
800-743-5369
http://www.vmedia.com/index.html 
Microhelp has announced its OLETools, a collection of OLE controls for
programmers using development environments such as Visual Basic, Visual C++,
and similar tools. The OLETools package includes more than 100 16- and 32-bit
OLE controls. Among the categories of controls are those for: interface, data
and time, multimedia, and miscellaneous (networks, subclassing, gauges, dials,
and the like). OLETools sells for $189.00.
The company has also announced Code Complete, a suite of tools including
Splash Wizard (to add splash screens and built-in version checking), Code
Analyst (for code dissecting and cross referencing), and AutoCoder (for
standardizing module and function headers in team-development environments).
Code Complete sells for $249.00.
Microhelp Inc.
4211 J.V.L. Industrial Park Drive NE
Marietta, GA 30066
770-516-0889
76325.3207@compuserve.com
Gryphon Microproducts has released W3MAGIC for World Wide Web development.
W3MAGIC is a programming language that adds over 100 new high-level tags and
functions to HTML, from animation and dynamic buttons to counters and user
logs. It allows you to protect your data and code by encrypting it. Encrypted
code can be executed just like normal HTML, but appears unintelligible when
copied. The language works with all browsers.
In addition to providing special effects such as animation, wipes, fades, and
pushes, W3MAGIC lets you gather statistical information about visitors to your
page, including counters and user identification by country, originating link,
browser type, and pages accessed. It can display random text, random GIFs, and
dynamically generated graphs, and can perform table lookups. 
W3MAGIC, which sells for $399.00, is available for Solaris, Sun, PC Linux,
BSDI UNIX, AIX, BSDI, FreeBSD, and Novell UNIX. An account with cgi-bin access
is required to use this tool.
Gryphon Microproducts 
12808 Ruxton Road
Silver Spring, MD 20904
301-384-6868
http://www.fox.net/~w3magic/ 
SuccessWare has announced an SDK for its SuccessWare Database Engine 2.0
(SDE2) (formerly known as ROCK-E-T), a replaceable database engine for Visual
Basic 4.0, Visual C++, and any DLL/OCX development environment. The
SuccessWare Database Engine technology allows for a common xbase-style
(record-based) data navigation and syntax during program development with
minimal concern for the ultimate format of the database. This leads to
movement between database environments with little or no code modification.
Also supported is database-compatible concurrent-access locking on networks. 
SDE2 supports table, index, and memo files for CA-Clipper (NTX), FoxPro 2.x
(IDX/CDX), and HiPer-SIx (NSX) systems. It also supports freeform-text
searching; fixed-length fields that automatically expand to meet the input
requirements; image/BLOBs storage and retrieval without any intermediate
files; conditional indexes; instant index filters for absolute control over
the database views; and record-level data encryption. 
The SuccessWare Database Engine 2.0 SDK is priced at $299.00 per developer
license for multiuser-application creation. It is royalty-free.
SuccessWare International 
27349 Jefferson Street, Suite 110
Temecula, CA 92590
909-699-9657
74774.2240@compuserve.com
NobleNet has announced its OneDriver ODBC SDK, which allows you to customize
external functions on either the client side or server side (or both) of a
client/server database application. This permits deployment of applications in
two-sided (thin or fat client) and two-model (two or three tier)
architectures, including Oracle, Sybase SQL Server, Informix, and others.
The SDK conforms to Microsoft ODBC 2.0 and is backward compatible with 1.0.
The OneDriver ODBC SDK provides a universal, client-based ODBC-2.0 driver that
automatically receives and interprets database calls and responses and an
SQL-based source-code module that lets you customize access to external
functions. Depending on the application functionality desired, processing is
directly or indirectly remote to a server-based ODBC driver library. The
library is used to access the selected database or databases via C source code
that mirrors the SQL code on the client. The OneDriver ODBC client-side driver
is WinSock compliant, runs on both TCP and IPX, and has been certified on over
20 WinSock-compliant products. 
NobleNet Inc.
337 Turnpike Road
Southboro, MA 01772
508-460-8222
sales@noblenet.com




























EDITORIAL


Absence of Malice


From time to time, all of us end up doing the wrong thing for the right
reason. Of course, sometimes we do the right thing for the wrong reason, or
even the wrong thing for the wrong reason--oh, you get the idea. In any case,
you can make up your mind regarding the cause and effect of the pickle that
Randal Schwartz is in once you've heard his story.
Schwartz is a respected member of the programming community. In addition to
writing the popular Learning Perl and coauthoring (with Larry Wall)
Programming Perl (both published by O'Reilly & Associates), Schwartz has been
a contributor to the comp.lang.perl newsgroup, moderator of the newer
comp.lang.perl.announce newsgroup, and Perl columnist for both Unix Review and
Web Techniques magazines. Schwartz also has made a name for himself as
professional trainer and contractor, focusing on sysadm and security issues.
It was two years ago this month, however, that Schwartz was indicted on three
felony charges--one count of altering computer systems without authorization,
and two of accessing a computer with intent to commit theft. The victim was
Intel's Hillsboro, Oregon supercomputing division where Schwartz had been
working for several years as a consultant. In July, a jury convicted him on
three felony violations of Oregon's computer crime law. Then in September, the
judge reduced the first count (which essentially charged that Schwartz had
installed two different methods of accessing his Intel e-mail through the
Internet) to a misdemeanor, then sentenced Schwartz to five years probation
and a 90-day jail sentence that will begin in 1998. For the other two counts
combined, Schwartz received 18 months probation, 480 hours of community
service, and is required to tell prospective employers about his felony
convictions. Furthermore, Intel is asking restitution, somewhere in the
neighborhood of $70,000, even though an Intel attorney acknowledges that the
company found no evidence that Schwartz planned to use the "stolen"
information. 
In his defense, Schwartz said that he was only trying to show Intel how
inadequate its security system was. At the time, Schwartz was working under
two Intel contracts: one to deploy DNS servers for the entire corporation, and
another as a system administrator for some network-support machines. Since
both contracts were running out, he'd hoped to generate a new contract to
improve Intel's security. To that end, Schwartz ill-advisedly ran Crack, a
commercially available password-breaking program that uses brute force to
discover vulnerable passwords. His plan was simply to put together a
proposal--based on real data--for improving Intel security. The sort of
information he intended on presenting in the proposal included nearly 50
network passwords he'd discovered (including that of one ambitious vice
president whose password was "pre$ident").
Before Schwartz could put his proposal together, however, an Intel employee
noticed an unauthorized program was hogging computer time. Upon discovering
Schwartz's Crack run, he notified security, and in the flip of a bit, Schwartz
went from being an "independent consultant" to an "industrial spy." Even
though management recommended that Schwartz simply be confronted because there
was clearly no criminal intent at work (Schwartz ran Crack under his own login
and didn't try to dissimulate his efforts), Intel's jackbooted security team
(maybe needing to justify their jobs) opted to call in the sheriff's
department. 
Schwartz admits that he made a number of "bone-headed" mistakes--not
clarifying the rules about Internet access, not reporting the first cracked
password, not immediately reporting the results of the run--for which he
probably deserved termination. However, he also says that his actions "were
motivated by my desire to give Intel the best possible value for the money
they were paying me," adding that none of his acts were based on malicious
intent. In summary, Schwartz said: "I am sorry that I caused Intel any grief
or hardship, and that in hindsight, I should have been clearer about my
intention and actions." 
The upshot of all this is that Schwartz is in a financial bind. There's little
chance he will ever work at Intel again, even though he has given the company
five years of good measure. Nor is he likely to work at any company that
agrees with Intel's beliefs about him. With dim employment prospects, Schwartz
has so far spent about $135,000 on his defense. When it's all said and done,
he will probably end up paying $160,000 before even considering appeals.
A legal defense fund has been set up for him, and fellow programmers have
"paid" Schwartz for "services rendered" to the tune of about $15,000. If you
wish to contribute, make a check out to "Stonehenge" and send it to Stonehenge
Consulting Services, Attn: Legal Defense Fund, 4470 SW Hall Suite 107,
Beaverton, OR 97005-2122. Any money you contribute will be disclosed as income
by Schwartz and thus is not tax deductible for you, unless you're a business
and want to file a 1099 form on him. I've sent in my check, and hope you'll
send in one, too.
Jonathan Ericksoneditor-in-chief














































LETTERS


C++ Standards


Dear DDJ,
I was disappointed by Al Stevens' "C Programming" column in the January 1996
issue, at least the part that dealt with C++. 
Al starts out lamenting the short comment period that the C++ Standard
committee allowed. I agree with his sentiment, but unfortunately, if the rest
of his comments on C++ are any indication of what the committee would have had
to put up with, then I am afraid I side with the committee.
First there was ifstream::read and ofstream::write. Maybe the length parameter
should be type size_t instead of int, but ifstream/ofstream are supposed to
read/write "characters" from files, therefore, having the other parameter be a
char* (instead of a void*) seems appropriate to me. In C++, the correct way to
read/write objects to/from an fstream is to define operator>> and operator<<
for the class, not just arbitrarily cast things to either void* or char*.
Then Al brings up "A New for Statement." I stood up and cheered when I read
that the committee had changed the scope of a loop index declared in the first
expression of the for statement. The fact that the committee had the guts to
fix something that was clearly wrong--in spite of the whining about breaking
code that they were undoubtedly going to get from hackers who have been
exploiting this hole in the language--gave me confidence that the
standardization effort was going to give us something worthwhile. For years,
my own coding style guide has had a sentence that reads "If you need access to
the loop index outside of the loop, then declare it OUTSIDE OF THE LOOP." The
suggestion that this change breaks a lot of code does not move me. Locating
all the for statements that need to be changed, and moving the declaration, is
a straightforward mechanical exercise.
Finally, there is the "enum as a Type" problem. I will admit that my reading
of the April 1995 draft has not always been correct, but this case seems
straightforward enough. An enum is a type; it is not an arithmetic type; and
the increment operator can only be applied to arithmetic types (or pointers
other than void*). So I have to say that it seems that Visual C++ 4.0 has got
it right. More importantly, this is logically what you would expect. In the
general case, enumeration constants do not have to be monotonically
increasing, uniformly spaced, or even discrete. Therefore, there is no logical
meaning for increment when applied to a variable of enumeration type. Again, I
am sure this breaks a lot of code since the committee went to the trouble of
specifying increment for type bool (I can not understand the appeal of writing
f++; instead of f = true;). Unfortunately, the code was broken anyway, all the
committee has done is make the compiler vendors flag it as an error.
Maybe it was being forced to write Cobol for two years, but I just do not
understand the apparent mentality of programmers who think that minimizing the
number of characters used in the source somehow produces better programs. The
real problem with the C++ standard is not going to be fixing all the code it
breaks, but convincing all the latent C programmers that there still are
enough language loopholes left in C++ that they can use to write programs. 
Jack W. Reeves
76217.2354@compuserve.com
Al responds: First: So, fstream objects are meant for character input and
output only? Why do you suppose the specification includes the ios::binary
open mode? And, overloaded insertion/extraction operators are the only correct
way to read and write data? Makes you wonder why they included the read and
write member functions. Sorry, we'll have to disagree on that one. Second: I'm
glad that you like the new for statement declaration rules. I'm sorry that you
think I'm a whining hacker because I like the old way. We disagree on both
halves of that issue. The old way worked and was okay, and, to paraphrase a
former president, "I am not a whining hacker." Thirdly: I agree that enums
could be a type, and do not mind that the rules might have changed, but I
disagree that the specification is clear and unambiguous about it. So, you get
two thumbs down and a horizontal thumb. But this exchange is about
language-feature preferences--yours and mine. Programmers are expected to
engage in civil disagreement about those kinds of issues. The point of the
original piece, however, was to use those arguments, not merely to express my
opinions about them, but also to show how the committee applies the broken
code position when it suits their purposes and ignores it when it does not.
Heed these reflections from a proud "latent C programmer" and former Cobol
practitioner of many more than two years. When you take a stand and "side with
the committee" against anyone who calls into question their policies and
decisions, remember this: There are no sides to be taken. When the air clears
and the specification is set in stone, there will not be winners and losers.
We will all be in this together.


Dateline Again


Dear DDJ, 
I have not programmed in Basic since I installed OS/2 almost three years ago
and discovered REXX, so I am not about to comment on the algorithms Homer
Tilton presented in his January 1996 letter to the editor. In addition, I have
not programmed calendar conversions since my article "Calendar Conversions"
was published in the late (lamented) Programmer's Journal (November/December
1990). The Basic program presented therein was for compiled Basic only, as it
required long integers (not available in the interpreters then available) and
used a separately compiled module of conversion subroutines. It did not
require splitting the year on April Fool's Day. The program would calculate
the day-of-week and Julian Day number for any date following January 1, 4713
b.c. for the Julian calendar; or following October 15, 1582 for the Gregorian
calendar (the first day that calendar system was in use, anywhere), and the
equivalent date in the other calendar system (if valid), starting from
whichever calendar system was specified as an input parameter.
Just for the fun of it, I dug up my archived DOS-source code for that program
(written for Basic PDS 7) and recompiled it to an OS/2 executable file. When I
gave it the input parameter of 1/1/-4713, the program told me that Julian date
January 1, 4713 b.c. was a Monday, that it was Julian Day 0, and that the
equivalent Gregorian calendar date was invalid. So, Homer's calendar for
January, 4713 b.c., as shown in his letter, is correct.
Murray Lesser
Murray.Lesser@f347.n109.z1.fidonet.org


Crypto Attacks


Dear DDJ,
Bruce Schneier's article "Differential and Linear Cryptanalysis" (DDJ, January
1996) is the best I've read on that topic. He really has a gift for writing.
Dorothy Denning
denning@cs.cosc.georgetown.edu 


Netscape Animation


Dear DDJ,
In his article "Animation Using the Netscape Browser" (Dr. Dobb's Sourcebook
on Internet and Web Development, November/December 1995), Andrew Davison
forgot a minor point: Netscape must be set to verify documents every time, or
this will not work--Netscape will pull the documents from the cache rather
than from the server.
Ed Carp 
Dallas, Texas 
ecarp@netcom.com


Executable Content


Dear DDJ, 
I was interested to see the dates listed by Michael Doyle in his January 1996
DDJ letter to the editor. I too saw many demonstrations of executable content.
The most memorable and vivid was an e-mail message I received from Keith Ohlfs
in 1991 (he's the guy who made all those cool icons at NeXT). When I opened
his e-mail message, a set of Display PostScript commands executed that: 
Had an animation of Keith jump out of my e-mail viewer,
Had him rollerblade across my screen, beeping as he bounced off the windows I
had open,
Before he left my screen he shot a ball out of his mouth that expanded to the
size of my e-mail window,
When the ball had stopped spinning, it displayed his new e-mail address. 

Keith always did have a flair for coolness. This example seems to meet all of
the prior art criteria: sending a message across the Internet, the content
executing on my PC. Display PostScript is a stack-based language that has
support for threads, libraries, and the like. Maybe Adobe should be awarded a
patent on executable content! 
Thor Heinrichs-Wolpert 
Victoria, B.C., Canada 
twolpert@ca.oracle.com 


More on Color Quantization


Dear DDJ,
My congratulations to Dean Clark on his well-written article, "Color
Quantization Using Octrees" (DDJ, January 1996). As the author noted, the
octree algorithm and Paul Heckbert's popularity and median-cut algorithms are
those most commonly used for color quantization.
However, there are many other color quantization algorithms that have been
presented in the computer science, image display, and photographic-sciences
literature. Some offer better time or space performance, and others are
optimized for quantizing color-video sequences. Interested readers can
download a comprehensive bibliography of some 70 color quantization algorithm
references as CQUANT95.ZIP from http://www.ledalitecom/library/cgis.html, or
via anonymous ftp as /pub/doc/cquant95.Z from hobbeslbl.gov.
Ian Ashdown
iashdown@ledalite.com 

















































Dr. Dobb's Journal Excellence in Programming Awards


Jonathan Erickson


Please join us in congratulating Larry Wall and James Gosling, recipients of
the 1996 Dr. Dobb's Journal Excellence in Programming awards. Selected by a
special Dr. Dobb's Journal editorial committee, Larry and James are being
honored for the significant contributions they've made to the advancement of
software development. 
Befitting this month's focus on computer languages, both individuals are being
recognized for their efforts in developing languages used by millions of
programmers worldwide. Larry is the author of Perl (short for "Practical
Extraction and Report Language"), and James is the chief architect of Java. 
Perl, the general-purpose scripting language Larry created nearly a decade
ago, has been described as the "Swiss-Army chain saw" of UNIX tools. Over
time, it has become the language of choice among system administrators for
quickly cobbling utility programs together. However, it is the dramatic
popularity of the Internet and World Wide Web that has thrust Perl into the
forefront of professional programming. Much of the back-end processing on Web
servers is done with CGI scripts written in Perl. Still, Perl's use in
mission-critical applications ranging from financial transactions to
aeronautical design is a far cry from Larry's original intent of using Perl
simply as a tool for manipulating text, navigating files, invoking external
commands to obtain dynamic data, and printing out easily formatted reports.
Perl currently runs on DOS, Macintosh, Amiga, VMS, OS/2, Windows, and many
flavors of UNIX. 
Interestingly, Larry didn't start out as a programmer. He's a linguist by
training, having attended both the University of California at Berkeley and
UCLA. Larry has written many free programs, including the rn news reader and
the patch program. He's also known for metaconfig, a program that writes
Configure scripts. With Randal Schwartz, Larry authored the 1990 book
Programming in Perl. 
Larry's initial design allowed Perl to function as a data-reduction language,
but Perl also became a convenient file-manipulation language, with facilities
for file renaming, deleting, moving, and attribute-changing. As the language
developed, Larry added features to make Perl a useful process-manipulation
language: On the appropriate operating system, you can easily create and
destroy processes and control the flow of data between them. Perl also became
a powerful networking language allowing access to resources on local networks
and on wide-area networks, including the Internet. A number of Web servers are
written in Perl. The smallest Web server, TinyHttpd, consists of 190 lines of
Perl. A full-featured server, Plexus, offers multithreading and authentication
in 800 lines of Perl.
Few computer languages have arrived on the software-development scene with
such widespread fanfare as Java, a programming language unleashed by Sun
Microsystems last year. Even nonprogrammers who wouldn't know C++ from Basic
are talking about Java. Within the last few months alone, Java (and its
offspring JavaScript) has been endorsed by virtually every major software
vendor. What Java delivers, and what has developers excited, is the capability
to compile programs into a binary format that can be executed on many
platforms without recompilation--embedded executable content, in other words. 
In addition to his duties as a Sun Fellow and corporate vice president, James
Gosling was the lead engineer on the Java project. As Arthur van Hoff, author
of the Java compiler, recounted in one of the first published articles on the
language ("Java and Internet Programming," DDJ, August 1995), James and a
small team of Sun engineers began work on what became Java in 1990.
Originally, James was developing software for the consumer electronics market,
but he quickly recognized that the new language also addressed many of the
issues related to software distribution over the Internet.
Still, Java isn't James' only claim to fame. He is a legendary figure among
UNIX programmers, having written the first C version of Emacs and for doing
the Postscript-based dynamic windowing environment for SunOS known as "NeWS."
As Ray Valds observed in a Dr. Dobb's Developer Update (August 1995) article
on Java, "the maturity and experience of Gosling's vision is apparent to those
... who've programmed extensively in Java. From the start, the heft and
balance of the language feels right, and continues to wear well over time."
James first became involved in distributed computing upon his arrival in 1984
at Sun. Before joining Sun, he built a multiprocessor version of UNIX, the
original Andrew window system and toolkit, and several compilers and mail
systems. James received a BS in computer science from the University of
Calgary, Canada, and a PhD from Carnegie-Mellon University.
Java is a simple, object-oriented, multithreaded, garbage-collected, secure,
robust, architecture-neutral, portable, high-performance, dynamic language
that's similar to--yet simpler than--C and C++. In designing the language,
James has come up with some clever tricks to increase performance while
preserving platform independence; for example, a technique for the run-time
binding of symbolic references _to numeric offsets by overwriting the bytecode
stream with equivalent _quick instructions. 
Since its introduction last summer, Java has grown from being a language
specification to a complete development environment that includes a compiler,
interpreter, debugger, applet viewer, language run time, class libraries, and
more. Although the Sun Java Development Kit only runs on Windows NT/95 and
Solaris 2.3 (or higher) at this writing, the language itself has been ported
to Linux, DEC Alpha, Amiga, NeXT, Windows, SunOS, and the like. Sun also has
committed to porting Java to the Macintosh. 
It is significant to the Dr. Dobb's editorial committee that both Perl and
Java are, for the most part, based on the principles of openness and
cooperation that embrace the guiding spirit of Dr. Dobb's Journal. Perl source
code is freely available, under the terms of Larry's "Artistic License." Java
binaries, on the other hand, can be redistributed free of charge in both
commercial and noncommercial applications. Furthermore, the Java source is
available at no charge for educational, research, evaluation, and
noncommercial porting purposes.
In addition to being acknowledged at the Software Development '96 Conference
in San Francisco, Dr. Dobb's Journal is granting $1000 scholarships--in Larry
and James' names--to university programs of their choice. At Larry's behest,
the grant will be given to the the computer-science department at Seattle
Pacific University. James has requested that his award be granted to the
University of Calgary Alumni Fund.
Please join us in congratulating both Larry and James. Through their work,
they've reminded us that a mix of technology, innovation, vision, and a
cooperative spirit continue to be fundamental software-development principles.
Figure 1: Larry Wall, author of the general-purpose scripting language, Perl.
Figure 2: James Gosling, the chief architect of Java.







































Dylan's Creole Interface


Interfacing a little language to a big database




Edward Cessna


Ed, a senior software engineer with the Harlequin Group, can be contacted at
eec@harlequin.com.


Dylan is an object-oriented dynamic language originally developed by Apple
Computer, with input from Harlequin and Carnegie Mellon University. From the
outset, the Dylan language design has been in the public domain. But because
the term "Dylan" is trademarked, anyone implementing the language must obtain
Apple's permission to use the name. Several organizations have announced
commercial-quality Dylan releases, including Harlequin's forthcoming
DylanWorks, a dynamic development environment and native compiler that
produces fast, compact executables. DylanWorks will initially be released for
Windows 95 and Windows NT, and provides full interoperability with OLE and
Win32 functionality.
Likewise, Carnegie Mellon University is developing an integrated Dylan
development environment for UNIX called "Gwydion." This is on the heels of
"Mindy," a byte-code compiler for UNIX, Mac, OS/2, and Windows. Other
experimental Dylan implementations include Marlais, an experimental Dylan
interpreter written in C for UNIX, Macintosh, and Windows; and Thomas, a Dylan
interpreter written in Scheme. The current version of Apple's implementation,
Apple Dylan, is an integrated development environment for the Macintosh
available as a "technology release." The environment generates stand-alone
applications and libraries for 68K Macs, native PowerPC, and fat binaries. The
environment is not PowerPC native, and runs emulated on PowerPCs.
The Dylan programming environment includes Creole, a foreign-function
interface that allows Dylan programs to call routines written in other
languages--and routines in another language to call Dylan routines. The Creole
specification (designed by David Moon), which is also in the public domain, is
the basis for the foreign-function interface used in Apple Dylan, DylanWorks,
and Gwydion. Creole provides access to the Macintosh toolbox and third-party
libraries needed to develop an application or application component for the
Macintosh. In this article, I'll describe how to interface the prealpha
version of Dylan with Version 4.02 of the Sybase RDBMS library. There are no
technical reasons for my choosing the Sybase RDBMS; I simply have access to
it. Mileage and integration steps may vary depending on the version of Sybase
or implementation of Dylan or Creole.
Two Dylan concepts--generic functions and methods--are considered functions
(C++, on the other hand, has many--operators, virtual, overloaded, member
functions, and the like). A generic function is polymorphic and consists of a
set of zero or more methods that define its behavior. When a generic function
is called, it picks one of its methods (according to argument type) and
applies this method to the arguments passed to the generic function. This is
more efficient than it sounds because Dylan optimizes function calls. Methods
are specialized functions that work on just one set of argument types. This
triad is a simple class hierarchy: <function> is the base class for both
<generic-function> and <method>.


Creole Overview


Creole is neither a separate application nor a mode of Apple Dylan. It can be
viewed as an extension to the Dylan language that adds a statement, define
interface, which describes the interface to C functions, structures, unions,
global variables, and macros. Creole also adds classes, functions, and macros
to support the interaction between Dylan and C code. For Creole to perform
this, it has five primary concepts at the language level: interface
importation, access paths, cross-language calls, name mapping, and type
mapping. 
Interface importation involves importing a C header file. Creole imports
declarations for functions, variables, types, constants, structures, and
unions. The fields of a structure and union are variables to Creole. C macros
are functions if the macro accepts parameters; otherwise they are constants.
Creole maps C variables and structures and union fields into Dylan getter and
setter function pairs. These pairs act as an interface for a variable or
"slot" (similar to a C++ class member variable).
A define interface statement, which imports a header file, has a number of
options that control Creole's default importation behavior. These options
allow selective importing of declarations within a C header file, explicit
type-mapping control, and explicit name mapping to avoid name conflicts.
For each imported item, Creole creates a corresponding object or function.
Each imported structure or union becomes a class and the fields of the
structure or union become slots with corresponding getter and setter
functions. Each imported C function becomes a Dylan method with a parameter
list that corresponds to the C- function parameter list. The parameter types
for the Dylan method can be controlled by Creole's type-mapping support.
If one header file includes another, Creole does not import the declarations
within the second header. A separate #include clause or define interface
statement can be used to import this second header file.
An access path is a mechanism that loads C functions into memory, making the
functions' addresses known to the Dylan program. Creole supports four
high-level access paths: inline machine code, external module, and two shared
libraries--Apple's Shared Library Manager (ASLM) and Code Fragment Manager
(CFM). In addition, there are two low-level access paths called "direct
pointer" and "PowerPC transition vectors." The inline machine-code access path
is for accessing the toolbox and operating-system traps. The Sybase library is
an MPW C object library. The only access path available to C object libraries
(if the source code is not available) is the external module path. MPW is
required to use this access path.
A Dylan routine can call C functions (or vice versa). Creole records the
calling sequence, argument, and result type during importation of a function
declaration. Creole creates a corresponding Dylan function that, when invoked,
calls the C function, translates arguments and results, and manages the
differences between the two languages' run-time environments. Type mapping
controls the translation of the arguments and results between languages.
Name conflicts can occur because of differences between C and Dylan.
Structures and unions within C have local scope; Dylan has no corresponding
feature. C is case sensitive, Dylan is not. Creole has a set of
name-translation rules that maps a specific set of C naming conventions into
Dylan naming conventions. 
Type mapping translates a type in the imported interface to a Dylan class.
Type mapping applies to cross-language call arguments, cross-language call
results, imported variables, and fields of imported structure and union types.
Every C type is mapped to a Dylan type.
To understand how Creole does type mapping, you must understand the type
categories that Creole uses. These categories are statically typed pointers,
untyped pointers, and by-value types. Statically typed pointers are pointers
to objects with a type that cannot be determined at run time. Examples of this
are pointers to C structs. The type of such a pointer is declared statically
in the interface. The run-time object does not contain any type information.
Untyped pointers do not point to an object. An example in C is void* (and
char* in old C code). All untyped pointers are mapped to an instance of
<machine-pointer>. The C type information is not retained.
Examples of by-value types are number, character, string, and Boolean.


The Application Nub


A unique aspect of Apple Dylan is the strong separation between the
development environment and the application. The development tools are all
part of the environment, whereas the code under development is executed within
the Application Nub--essentially an empty Dylan application. The environment
downloads code and data into the Application Nub via Apple Events. The use of
Apple Events allows you to run the environment and Application Nub on separate
machines.
With other dynamic programming languages (Lisp and Smalltalk, for instance),
it is nearly impossible to separate the application from the development
environment. That's why applications developed in these languages have large
disk and RAM footprints. Apps developed in Dylan have footprints comparable to
those in C++.


Dylan Meets Sybase


The basic procedure for integrating the Sybase library (or any other C
library) into a Dylan program is to import the header files and statically
link the library into the Application Nub. Supposedly, that's all there is to
it. Nothing is this easy, however. In C, you can write ambiguous declarations
of variables and functions--and that's a problem for Dylan. More specifically,
Sybase's header files, like some UNIX header files, are full of ambiguous
declarations. 
If you are accessing a shared library (ASLM or CFM), you do not have to
rebuild the Application Nub. You simply install the shared library into the
system and start calling the desired routines from the listener or your Dylan
code.
Creole requires that imported header files should be mutually independent: One
header file should not require another header file to be included prior in
order to work properly. All supporting declarations must be defined either
within the imported header file or within another header included in the
imported header file.
Unfortunately, Sybase commits a first-order violation of this rule; most of
Sybase's header files must be preceded by sybfront.h. The simple way around
this (until Creole has direct support for dependent headers) is to modify the
Sybase files to explicitly include sybfront.h. Fortunately, Macintosh header
files and standard C library header files are mutually independent. For
example, the Macintosh header file Quickdraw.h depends upon the files types.h
and QuickdrawText.h, and if you look in Quickdraw.h, you'll see that these
files are explicitly included.
Before creating your first define interface statement, Sybase's header files
(sybfront.h, syblogin.h, sybdb.h, sybtoken.h, and sybfront.h) must be copied
to the project folder. First, create a new module and name it "sybase-dbms".
To import the entire contents of sybdb.h, create a source record in the module
sybase-dbms with the define interface statement in Example 1(a). When you
compile this statement, there will be warnings like Example 1(b) since this
define interface statement is a little simplistic. 
Sybase uses the pattern of Example 2(a) for most structure definitions. Since
Dylan is case insensitive, both dbprocess and DBPROCESS are mapped to the same
Dylan object, <dbprocess>. As any C programmer knows, typedef DBPROCESS is
essentially a name for struct dbprocess; hence, both dbprocess and DBPROCESS
represent the same object. Therefore, you need to import only one of them.
Example 2(b) shows what happens when you implicitly import dbprocess. Issuing
the compile command generates the error in Example 2(c).
The define statement in Example 2(b) is explicitly telling Creole just to
import the declaration of dbprocess. However, Creole does not know the data
types--except built-in data types--of the fields that comprise dbprocess. For
example, dbfile, the first field of dbprocess, is typed struct servbuf* and
Creole does not know what a servbuf is. The structure definition for dbprocess
has a number of fields, most of which are immaterial. Using Creole's struct
clause, you can explicitly import only those fields that you have an interest
in. For now, you will not import any fields; see Example 2(d). Another compile
yields no error or warning message. Creole has created a class by the name
<dbprocess>. If you type <dbprocess> in the listener, Dylan will respond with
#<the class sybase-dbms<dbprocess>>. This object (classes are objects) can
then be inspected by selecting the Inspect Listener Result menu item from the
Debug menu.


Sybase from the Listener



C functions linked into the Application Nub can be invoked directly from the
listener and send their results back to the listener--a portal into the run
time. Dylan expressions can be entered into the listener: They will be
compiled, downloaded, and executed within the run time. Values returned by the
execution of the entered expression will be printed in the listener.
To illustrate, I'll import types, functions, and macro definitions necessary
to invoke the basic Sybase functions from the listener. Once everything is
imported, I will initialize Sybase's db-library, create and initialize a login
record, create and open a connection to a database process on the server,
close a database connection, and exit (cleanup) db-library from the listener.
The first function to import is dbinit, which initializes the Sybase library
and has the prototype RETCODE dbinit(void), where RETCODE is typedefed as an
int. Example 3(a) is the corresponding define interface statement. Compiling
this generates the warning message in Example 3(b). Creole is saying that the
declaration for the dbinit function has been imported, but it has no idea
where the object code of the function is. For now, you will specify the
location of the function using the external-module clause, which specifies the
access path; see Example 3(c).
The next function to import is dblogin, which creates a login record for use
in dbopen and has the prototype LOGINREC* dblogin(void). Since LOGINREC has
not been imported, it goes on to the import list. In LOGINREC, defined in
syblogin.h, the macros DBSETLHOST, DBSETLUSER, DBSETLPWD, and DBSETLAPP set
the host name, user name, user password, and application name fields,
respectively.
Examining the LOGINREC macros reveals the fields that must be imported:
lhostname, lhostnlen, lusername, lusernlen, lpw, lpwnlen, lhostproc, lhplen,
lappname, lappnlen, lservname, lservnlen, lprogname, and lprognlen.
The declarations of the various character fields (for example, char
lhostname[MAXNAME]) depend upon the size specifier MAXNAME, where MAXNAME is a
#define constant defined in sybdb.h. Note that MAXNAME is defined in sybdb.h
but is needed in syblogin.h, where LOGINREC is defined--syblogin.h does not
include sybdb.h. This creates another dependency between header files:
syblogin.h requires sybdb.h to be previously included. To handle this
dependency, you can hard-code the constant in either syblogin.h or through the
define clause for the #include statement (for example, #include "syblogin.h",
define: {"MAXNAME" => 30}).
You now need to import both the macros that set the fields of a LOGINREC and
the function dbsetlname, which these macros call. Creole creates inline
functions for these macros. dbsetlname has the prototype RETCODE
dbsetlname(LOGINREC* lptr, char* name, int type). The second parameter, name,
could cause a run-time problem if you try to pass a string as the second
argument. name has a C type of char*, but Creole treats it as an untyped
pointer. C programmers know that name is probably a null-terminated string,
but it could also be a pointer to a single character or to an array of
characters. There is no way for Creole (or even the C compiler) to determine
the complete data type of this argument. The behavior of the program
determines the semantics of name.
You can avoid these problems if header files are structured properly. Pointer
types (that is, char*) should be typedefed and these typedefs should be used
in place of the pointer types. If you defined a typedef for SybString (typedef
char* SybString;) in the Sybase header files and replaced all references to
char* with SybString, you would only have to tell Creole once that a SybString
maps to a <c-string>. A good example of this is StringPtr and StringHandle
defined in types.h.
When Creole reads the declaration for dbsetlname, it creates the function
dbset-lname(lptr :: <LOGINREC>, name :: <machine-pointer>, type :: <integer>)
=> return-code :: <integer>. If you passed a string for the name argument
within a dbsetlname call, you would get a type mismatch: A string, typed
<c-string> is not a <machine-pointer>. You need to explicitly define the type
for name via the function clause of the define interface statement.
Listing One presents two define interface statements: one for sybdb.h and one
for syblogin.h. Next, the dbopen function is imported, which creates a
database process on the server and opens a connection to the process. The
prototype for this function is DBPROCESS* dbopen(LOGINREC* password, char*
servername). There is nothing new to import, other than the function name.
dbopen has the same problem as dbsetlname: the argument servername is an
untyped pointer and a program needs to pass it a string. Using the function
clause solves this.
The last functions to import are dbclose and dbexit. dbclose has the prototype
void dbclose(DBPROCESS* dbproc), and dbexit has the prototype void
dbexit(void). With both functions, the only symbols you need to import are the
function names. Listing Two thus becomes the define interface statement for
sybdb.h.


Building the Application Nub


In rebuilding the Application Nub using the external access path, you create a
source record that calls the write-external-modules function. When compiled,
this function generates an assembly file that contains the definition of the
external modules. The name of the generated file is the name passed to this
function; external-modules.a would suffice.
The first step to building the new Application Nub is to bring up the project
from within Dylan and compile--without connecting to the Application Nub--each
of the define interface statements then compile the write-external-modules
function call. If you try to compile the define interface statements while
connected to the standard Application Nub, Apple Dylan will complain that it
cannot find your external modules, since the standard Application Nub does not
have the Sybase code linked in. After you make a custom Application Nub that
includes both the Sybase code and the external module that makes it accessible
to Dylan, this problem will disappear.
Once the external modules file has been generated, you need to modify the
makefile (supplied with Apple Dylan) for the Application Nub to include the
Sybase library (libsybdb.o). Next, launch MPW (Version 3.3 or later), build
the Application Nub, and move the custom Application Nub to your project
folder. 
Before connecting to the new Application Nub, copy (or make an alias of) the
Apple Dylan file Kernel.dl to your Extensions folder or to your project
folder. Now you can go back into Apple Dylan and connect to the Application
Nub. In the prealpha version of Apple Dylan, the Application Nub must be named
"Application Nub." Apple Dylan looks for this file by name.
After the needed functions and types have been imported and the Application
Nub has been rebuilt with the Sybase library, Sybase calls can be made by hand
(from the listener). Before typing into the listener, set the module to sybase
dbms; see Example 4.


Sybase Error and Message Handlers


Before importing functions to do queries, you can optionally install error and
message handlers for Sybase to invoke. If handlers are not installed, Sybase
will write error messages to the console (generally considered a bad practice
for a commercial app) and exit.
Installing the error and message handler is straightforward using dberrhandle
and dbmsghandle. Importing these functions is another story, starting with how
dberrhandle and dbmsghandle are declared. In the Sybase file sybdb.h,
dberr-handle is declared as int (*dberrhandle(int (*handler)()))();.
dbmsghandle has a similar declaration problem.
Creole can't handle this declaration because the type information for the
parameters of the function the handler points to is insufficient. You must
therefore create a header file with an enhanced declaration of dberrhandle and
dbmsghandle, plus a separate define interface statement that imports their
definitions. You could modify the header files directly, but it is probably
better not to modify a third-party header file. Listing Three shows the
sybfix.h include file. In Listing Four, the corresponding define interface
statement adds callback, a new clause that defines a Dylan macro. When
invoked, the macro creates a C-callable function, or "alien method," which
obeys C-calling sequences and can be passed to a C program. The callback
clause's argument-type allows explicit typing of an ambiguous argument. The
fifth, sixth, and seventh arguments of the msgHandler callback macro are
declared as pointers to a char*. Dylan does not know that these arguments are
strings and, without the argument typing, sees them as untyped pointers.
To install the callbacks, you can create the handlers as normal Dylan methods
and assign the results of the callback macros to a variable that is then
passed to dberrhandle or dbmsghandle.
The message handler takes arguments whose data type match the declaration in
the sybfix.h file. There is nothing elegant about this message handler--it
prints the value of the various parameters to the listener. The release of
Dylan I'm using doesn't support printing to the listener, so I use the
warning-signaling mechanism. The signal method is a Dylan version of printf
that takes an arbitrary number of arguments, where the first argument is the
format string that controls the printing of the remaining arguments. Since
signal is part of the signaling mechanism, it prints the word "warning" before
the formatted string; see Listing Five. sybfront.h must be imported before the
message handler can be compiled without warnings. This method uses the
constant $int_continue defined in this header file. This and other constants
can be imported using the define interface statement in Listing Six.
You can create the callback routine once the message handler is defined. The
use of the callback macro seems strange because the Dylan macro system is
incomplete; macro use may change once the system is finalized. Listing Seven
sets up an alien-callable alien method that calls the message handler. The
error-handler function and ErrHandler callback are performed similarly; see
Listing Eight.


Querying Sybase


The next step is to query the database. Listing Nine lists the function
declarations that must be imported to issue a query to Sybase. dbcmd needs a
function clause: The second argument has a C type of char*. dbbind does not
need a function clause to augment the type information for destvar; you will
be passing an untyped-pointer argument.
To import the constants NO_MORE_RESULTS, NO_MORE_ROWS, NTBSTRINGBIND, DBNOERR,
and INTBIND, take Listing Ten as the final define interface statement. I'll
query against a sample table of viruses and mortality rates. The mortality
rates in Table 1 were distilled from The Cambridge World History of Human
Disease. I took liberties in summarizing the mortality rates for each virus.
In some cases, the rates were presented in terms of a range; in others, the
rates were presented for hospitalization or no care. If there were more than
one number, I took the lower one. This gives everything needed to write a
program (or a method) that queries the database and returns. The oh-oh method
connects to the fictitious morbidity database on server "Oahu" as user
"virologist" with password "Aiea." Once the connection has been established,
oh-oh issues a query to the database asking for a list of viruses where the
mortality rate is greater than 50 percent. Listing Eleven shows the results of
this query, which are displayed in the listener using the signal facility. To
run oh-oh, invoke it from the listener. Example 5 shows how this method runs.


Conclusion


For more information on Dylan and Creole, refer to Dylan Interim Reference
Manual available at cambridge.apple.com via ftp or on the Dylan Web page
located at http://www.cambridge.apple.com.


For More Information


Apple Computer
1 Infinite Loop 
Cupertino, CA 95014 
http://www.apple.com

Harlequin Inc.
1 Cambridge Center

Cambridge, MA 02142
http://www.harlequin.com

Carnegie-Mellon University
5000 Forbes Avenue
Pittsburgh, PA 15213
http://cmu.edu
Example 1: (a) Using the define interface statement to import the contents of
sybdb.h; (b) compiling the statement in (a) generates this message.
(a)
define interface#include "sybdb.h"end interface;

(b)
Warning in a Creole interface: "dbprocess" and "DBPROCESS" both map to Dylan
variable <dbprocess>; one will be lost.
Example 2: (a) Typical Sybase structure definition; (b) implicitly importing
dbprocess; (c) Error message generated when compiling (b); (d) using Creole's
struct clause.
(a)
struct dbprocess{ <wonderful field declarations removed>};typedef struct
dbprocess DBPROCESS;

(b)
define interface #include "sybdb.h", import: {"dbprocess"};end interface;

(c)
Fatal Error in a Creole interface: No type mapping known for "struct
servbuf*". Cannot import variable "dbprocess::dbfile" because there is no type
mapping from "struct servbuf*" to a Dylan class. You should use define
interface to import the file or use type: {"struct servbuf*" => your-class} to
add an explicit type mapping.

(d)
define interface #include "sybdb.h", import: {"dbprocess"}; struct
"dbprocess", import: {};end interface;
Example 3: (a) The define interface statement that initializes the Sybase
library; (b) compiling generates this message; (c) specifying the location of
the function using the external-module clause.
(a)
define interface #include "sybdb.h", import: {"dbinit", "dbprocess"}; struct
"dbprocess", import: {};end interface;

(b)
Warning in a Creole interface: No access path to "dbinit" exists. On the 68k
we can only call things via inline machine code or one of the access paths to
external object code, which are External Modules, the Code Fragment Manager,
and the Apple Shared Library Manager. This means that there is no way to call
"dbinit". An attempted call will signal an error at runtime.

(c)
define interface #include "sybdb.h", import: {"dbinit", "dbprocess"},
external-module: syb-externals; struct "dbprocess", import: {};end interface;
Example 4: Setting the module to sybase dbms before using the listener
(boldface denotes user input).
Welcome to Apple Dylan!
Dylan> dbinit()
1
Dylan> define variable login = dblogin()
define login
Dylan> dbsetlhost(login, "Oahu")
1
Dylan> dbsetluser(login, "tourist")
1
Dylan> dbsetlpwd(login, "Aiea")
1
Dylan> dbsetlapp(login, "example1")
1
Dylan> define variable dbproc = dbopen(login, "db")
define dbproc
Dylan> dbproc
#<Dylan-User%<dbprocess> at: #x019FE7C0>
Dylan> dbclose(dbproc)
Dylan> dbexit()
Dylan>
Example 5: Running the sample method (boldface denotes user input).
Welcome to Apple Dylan!
Dylan> oh-oh()
Warning: Ebola Zaire 90

Warning: Ebola Sudan 52
Warning: Bubonic Plague 51
#f
Table 1: Sample mortality rates from The Cambridge World History of Human
Disease. 
Virus Mortality
Yellow Fever 20
Typhoid Fever 10
Smallpox Variola Major 25
Smallpox Variola Minor 1
Rocky Mtn. Spotted Fever 20
Relapsing Fever 5
Marburg Fever NULL
Leptospirosis 5
Legionnaires' Disease 15
Lassa Fever 16
Influenza 1
Ebola Sudan 52
Ebola Zaire 90
Bubonic Plague 51

Listing One
define interface
 #include "sybdb.h",
 import:
 {"dbinit", "dbprocess", "dbsetlname",
 // Parameterized Macros
 "DBSETLHOST", "DBSETLUSER", "DBSETLPWD", "DBSETLAPP",
 // Constant Macros
 "MAXNAME"},
 external-module: syb-externals;
 struct "dbprocess", import: {};
 function "dbsetlname",
 argument-type: {name => <c-string>};
end interface;
define interface
 #include "syblogin.h",
 define: {"MAXNAME" => 30},
 import: {"loginrec"};
 struct "loginrec",
 import:
 {"lhostname", "lhostnlen",
 "lusername", "lusernlen",
 "lpw", "lpwnlen",
 "lhostproc", "lhplen",
 "lappname", "lappnlen",
 "lservname", "lservnlen",
 "lprogname", "lprognlen"};
end interface;

Listing Two
define interface
 #include "sybdb.h",
 import:
 {"dbinit",
 "dbprocess", "dblogin", "dbopen",
 "dbclose", "dbexit", "dbsetlname",
 // Parameterized Macros
 "DBSETLHOST", "DBSETLUSER", "DBSETLPWD", "DBSETLAPP",
 // Constant Macros

 "MAXNAME"},
 external-module: syb-externals;
 struct "dbprocess", import: {};
 function "dbopen",
 argument-type: {servername => <c-string>};
 function "dbsetlname",
 argument-type: {name => <c-string>};
end interface;

Listing Three
#include "sybfront.h"
#include "sybdb.h"
typedef int (*ErrHandler)(DBPROCESS* dbproc, int severity, int dberr, 
 int oserr, char* dberrstr, char* oserrstr);
ErrHandler dberrhandle(ErrHandler handler);
typedef int (*MsgHandler)(DBPROCESS* dbproc, int msgno, int msgstate, 
 int severity, char* msgtext, char* srvname, 
 char* procname, unsigned short line);
MsgHandler dbmsghandle(MsgHandler handler);

Listing Four
define interface
 #include "sybfix.h",
 import:
 {"dberrhandle", "dbmsghandle", "ErrHandler",
 "MsgHandler"},
 external-module: syb-fix-externals;
 callback "ErrHandler",
 argument-type: {5 => <c-string>},
 argument-type: {6 => <c-string>};
 callback "MsgHandler",
 argument-type: {5 => <c-string>},
 argument-type: {6 => <c-string>},
 argument-type: {7 => <c-string>};
end interface;

Listing Five
define method message-handler(dbproc :: <dbprocess>,
 msgno :: <integer>,
 msgstate :: <integer>,
 severity :: <integer>,
 msgtext :: <c-string>,
 srvname :: <c-string>,
 procname :: <c-string>,
 line :: <integer>)
=> action :: <integer>;
 ignore(dbproc);
 signal("~&Msg ~d, Level ~d, State ~d", msgno,
 severity, msgstate);
 if (size(srvname) > 0)
 signal("Server '~a', ", srvname);
 end if;
 if (size(procname) > 0)
 signal("Procedure '~a', ", procname);
 end if;
 if (line > 0)
 signal("Line ~d", line);
 end if;
 signal("~&~a", msgtext);

 $int_continue;
end method;

Listing Six
define interface
 #include "sybfront.h",
 import:
 {"INT_EXIT", "INT_CONTINUE", "INT_CANCEL", "SUCCEED", "FAIL"};
end interface;

Listing Seven
define variable message-handler-proc =
MsgHandler(dbproc(msgno, msgstate, severity, msgtext, srvname, 
 procname, line), message-handler(dbproc, msgno, msgstate, 
 severity, msgtext, srvname, procname, line));

Listing Eight
define method error-handler(dbproc :: <dbprocess>, severity :: <integer>,
 dberr :: <integer>, oserr :: <integer>,
 dberrstr :: <c-string>, oserrstr :: <c-string>)
=> action :: <integer>;
 ignore(dbproc);
 ignore(oserr);
 ignore(dberr);
 ignore(severity);
 if (dbproc = $null-machine-pointer dbdead(dbproc))
 $INT_EXIT;
 else
 signal("DB-Library error:~&~a", dberrstr);
 if (oserr ~= $DBNOERR)
 signal("Operating-system error:~&~a", oserrstr);
 end if;
 $INT_CANCEL;
 end if;
end method;
define variable error-handler-proc = ErrHandler(dbproc(severity, dberr, oserr,
 dberrstr, oserrstr),error-handler(dbproc, severity,
 dberr, oserr, dberrstr, oserrstr));

Listing Nine
RETCODE dbcmd(DBPROCESS* dbproc, char* cmdstring);
RETCODE dbsqlexec(DBPROCESS* dbproc);
RETCODE dbresults(DBPROCESS* dbproc);
RETCODE dbbind(DBPROCESS* dbproc, int column, int vartype, DBINT varlen, 
 BYTE* destvar);
STATUS dbnextrow(DBPROCESS* dbproc);

Listing Ten
 #include "sybdb.h",
 import:
 {"dbinit", "dbsetlname", "dbprocess", "dblogin", "dbopen",
 "dbclose", "dbexit", "dbcmd", "dbsqlexec", "dbdead", "dbbind",
 "dberrhandle", "dbmsghandle", "dbnextrow", "dbresults",
 // Parameterized Macros
 "DBSETLHOST", "DBSETLUSER", "DBSETLPWD", "DBSETLAPP",
 // Message/Error Handler Types
 "MHANDLEFUNC", "EHANDLEFUNC",
 // Constants
 "MAXNAME", "NO_MORE_RESULTS", "DBNOERR", "NO_MORE_ROWS"

 // Binding Constants
 "INTBIND", "NTBSTRINGBIND"},
 external-module: syb-externals;
 struct "dbprocess", import: {};
 function "dbopen", argument-type: {servername => <c-string>};
 function "dbsetlname", argument-type: {name => <c-string>};
 function "dbcmd", argument-type: {cmdstring => <c-string>};
end interface;

Listing Eleven
oh-oh
define method oh-oh()
 if (dbinit() = #f)
 error("Unable to initialize Sybase.");
 end if;
 dberrhandle(error-handler-proc);
 dbmsghandle(message-handler-proc);
 let login = dblogin();
 dbsetlhost(login, "Oahu");
 dbsetluser(login, "virologist");
 dbsetlpwd(login, "Aiea");
 dbsetlapp(login, "macabre");
 let dbproc = dbopen(login, "morbidity");
 dbcmd(dbproc, "select name, mortality ");
 dbcmd(dbproc, "from virus ");
 dbcmd(dbproc, "where mortality > 50");
 dbsqlexec(dbproc);
 for (result-code :: <integer> = dbresults(dbproc)
 then dbresults(dbproc), until result-code = $NO_MORE_RESULTS)
 if (result-code = $SUCCEED)
 // <c-string> Virus Name
 with-stack-block(name(<machine-pointer>, 31),
 // <integer> Mortality
 with-stack-block(
 mortality(<machine-pointer>, 4),
 begin
 dbbind(dbproc, 1, $NTBSTRINGBIND, 0, name);
 dbbind(dbproc, 2, $INTBIND, 0, mortality);
 while(dbnextrow(dbproc) ~= $NO_MORE_ROWS)
 signal("~A ~A", c-string-at(name),
 signed-long-at(mortality));
 end while;
 end))));
 end if;
 end for;
 dbexit();
end method;
















Building Parsers with Leopurd


A portable parser for the rest of us




Thor Mirchandani


Thor is a consultant in Winston-Salem, NC. He specializes in object technology
and development of distributed systems using C/C++ and PowerBuilder.


Whether it's from a user, script file, or serial line, input often must be
broken into logical parts before it can be processed. Many programmers resort
to nested conditions involving string comparisons, but the resulting code can
be hard to understand and maintain. Sometimes a brute-force approach is not
sufficient, and trusty old yacc is used to generate a "real" parser. The
advantage of this approach is that only the grammar must be maintained; the
disadvantage is that yacc generates table-driven parsers that are difficult to
read and modify manually. In this article, I'll present leopurd, a program
that generates a maintainable, table-free parser from an input grammar.


Top-Down and Bottom-Up Parsers


The two types of parsers in widespread use are top-down and bottom-up. All
parsers read language tokens from an input stream and examine them against a
set of production rules. Each rule has a left side--the target--and a right
side describing a valid way to reach the target. More than one production rule
can be associated with each target. 
A typical bottom-up parser determines which production rule to apply for a
given input sequence by implementing rules as lookup tables. When the parser
receives the first token, it tags all rules that can begin with that token. As
it reads more tokens, it eliminates candidate rules until only one remains.
This process involves guesswork and backtracking. The code implementing
table-driven, bottom-up parsers is illegible and difficult to modify, and
parser speed often is slower than that of top-down parsers. However, plenty of
tools generate bottom-up parsers automatically, and such parsers can always be
generated for an unambiguous grammar that requires no more than one token look
ahead. Bottom-up parsers include those generated by yacc, bison, and similar
programs.
Most handcoded parsers are of the top-down variety, and start with the
top-level rule in the grammar. They examine input to see if they can complete
this rule. If not, other functions are called to parse any candidate rules. As
each rule is completed, the corresponding function executes action code
associated with the rule and returns control to the caller. Top-down parsers
are easier to modify and optimize than table-driven parsers--the parser can
often be made both smaller and faster. However, no standardized tools exist
for generating top-down parsers, and some grammars cannot be implemented
correctly as top-down parsers. (Fortunately, most popular computer languages
can be easily parsed this way.)
The leopurd parser combines the best features of both top-down and bottom-up
parsers. It builds legible, maintainable, optimizable top-down parsers from
standard grammars similar to yacc grammars. Leopurd generates two C source
files from an input grammar: yyleo.h, containing definitions and function
prototypes; and yyleo.c, containing the parser source code and any
user-supplied code.


The Leopurd Grammar


Leopurd's input grammar describes the syntax of the parser input. The syntax
of a language tells you only if a sentence is valid; the sentence's meaning is
the realm of semantics. Leopurd supports a subset of the yacc definition
language, but it lacks a literal block, associativity, %type and %union
statements in the definition section, and C action statements in the rules
section. Future versions of leopurd should be fully compatible with yacc.
A grammar file has three sections, separated by double percent signs (%%). The
first section is the definition section, where you define the unique parts of
a language--the language's terminal tokens. A definition begins with %token,
followed by one or more terminal-token identifiers, which can be names or
single characters within single quotes. Example 1 defines four unique tokens
for a simple calculator.
The rules section defines the syntax of the language, showing the order in
which tokens may appear in a sentence. The left side of a production rule is
an abstract (nonterminal) symbol. The right side is a combination of
nonterminal and terminal symbols, arranged the way they appear in a sentence.
More than one right side can be assigned to each left side if separated by a
pipe symbol (). A semicolon terminates a group of rules for a single left
side. Every nonterminal symbol referenced in a right side must be defined in a
left side somewhere in the grammar.
Example 2 defines the abstract symbol "expression" to be a sum or difference
followed by an equal sign. The second rule illustrates how recursion is used.
It defines a sum to be a NUMBER followed by '+' and another NUMBER, or a
NUMBER followed by '+' and another sum. The abstract symbol difference is
defined similarly. Both the rules and definition sections may contain any
amount of whitespace.
A right side can be empty. Unlike yacc, leopurd requires that you define the
special token EMPTY if your grammar uses empty production rules, as shown in
the productions for sum. A nonterminal symbol that matches EMPTY by some rule
is said to be "nullable." 
The final part of the grammar, the user section, is copied verbatim to the
output file yyleo.c and can be any legal C code. Here you can put code used by
the parser, such as a scanner, error handler, symbol table, global variables,
and a main() function for testing the parser; see Example 3.


Code Generation


Leopurd generates C code to implement a top-down recursive descent parser from
an input grammar. The entry point to the parser is the function yyparse(). The
parsing target for yyparse() is the top-most nonterminal symbol in the
grammar. Since there can be only one yyparse(), there can be only one
top-level target.
Because error handling varies widely between applications, leopurd generates
only the prototype of the error handler, yyerror().
Leopurd also generates prototypes for the functions next() and match(). A call
to next() causes the next token to be read from the input stream. A subsequent
call to match() evaluates whether the argument matches the token read from
input. These functions are highly dependent on the nature of the input and are
left up to you to implement.
Two functions are generated for each nonterminal symbol: <name>_parse() and
<name>_action(). The <name>_parse functions are generated completely by
leopurd and perform the actual parsing of the nonterminal symbol <name>.
The <name>_action() functions are placeholders for the language's semantics.
Leopurd generates a function skeleton, to which you add actions to be taken
upon successful parsing of the token. Separating syntax and semantics in this
way increases portability and maintainability, and simplifies testing and
debugging.


Recursive-Descent Parsing 


An abstract, top-down parser parses a sentence by building a parse tree, in
which each node represents a token. After a tree is built, the parser
traverses the tree in postorder, executing code associated with each node. If
the parser cannot generate a tree, it flags an error and returns. 
Recursive-descent parsers generate parse trees implicitly by calling parse
functions recursively. Upon a successful parse, the leaf function returns. As
each parse function returns, its action code executes. 
Parsing starts with the yyparse() function, which gets the first token from
the input by calling next(). yyparse() then determines which production rule
to use by calling match(). When a match is found, the parser does one of two
things: If it expects a terminal token, it gets the next token and checks it
by calling match(); if it expects a nonterminal symbol, it calls the symbol's
<name>_parse() function. This process continues until an error is encountered,
in which case yyerror() is called or the production rule is parsed
successfully. Upon success, the last <name>_parse() function calls
<name>_action(), which executes code associated with <name>. After
<name>_action() returns, the process continues until the whole parse tree is
unwound. Finally, any actions associated with yyparse() are executed, and
yyparse() returns.
Since functions return in the reverse of the order in which they are called,
the <name>_action() code of leaf nodes exe-cutes before code closer to the
root. Thus, a production far down in the grammar has precedence over one
further up. This is how the grammar's structure determines operator
precedence. (Right-recursive rules result in right-association for
operations.)
How does the parser determine which production rule to select? How do you
determine which tokens to match against? For terminal tokens, the parser just
matches against the token itself. For nonterminals, it must know which
terminal tokens can legally begin a particular nonterminal. That set of tokens
for a nonterminal n is called "first of n" and is written FIRST(n). 
For a nullable nonterminal, you must determine which terminal tokens can
legally come after the empty nonterminal. That set of tokens is called "follow
of n" and written FOLLOW(n). Finding the FIRST() and FOLLOW() sets is an
iterative process. Once the sets are defined, leopurd knows which tokens to
match in each rule.
The resulting parser will operate properly only if the FIRST() and FOLLOW()
sets unambiguously determine which production rule to choose at each step. The
grammar in Example 1 and Example 2 is riddled with such problems.

Leopurd produces a trace of terminal and nonterminal tokens, production rules,
and FIRST() and FOLLOW() sets. The trace is directed to stdout, and can be
captured in a file for debugging or grammar verification.


A Parser Example


Listing One shows a grammar for a calculator which can evaluate addition and
subtraction expressions. When an equal sign (=) is parsed, the expression is
evaluated. 
The user section defines the functions yyerror(), next(), and match(), which
are declared by leopurd, and a main() function that can read input from a file
or standard input. 
The function yylex() is the scanner/lexical analyzer, which recognizes tokens
in the input stream and returns their values. The global variable yytext
points to the first character of the current token. The token's length is
stored in yyleng.
When run with the grammar in Listing One, leopurd produces output on stdout
similar to Listing Two. Listings Three and Four are the generated files,
yyleo.h and yyleo.c. 
The <name>_parse() functions implement a ready-to-roll parser. The top-level
parse function, statement, is renamed to yyparse(). If the top-level
nonterminal uses recursion (like the production rule in Example 4), a
redefinition of statement_parse() in yyleo.h maps it to the yyparse()
function. Also, although no explicit rule says so, the FOLLOW() of the
top-level nonterminal always includes "end of input."
By contrast, the <name>_action() functions are stub functions, sufficient for
yyleo.c to compile cleanly. They should be fleshed out manually to provide the
semantics of the language.
The "raw" parser generated by leopurd sacrifices efficiency for clarity, so
you may need to hand-optimize the resulting code. One such inefficiency is
duplicated code, such as the duplicated handling of matches on '-' and '+' in
predicate_parse() in Listing Four. This problem could be solved in the grammar
by defining a new nonterminal "operator" to match either '+' or '-'. However,
a new nonterminal costs a function call at run time, so manual optimization
might be preferable. Some grammars result in multiple recursive calls to the
same function ("tail recursion"). Such calls can be replaced with a loop,
resulting in better performance and stack-space savings. The first if
statement in each parse function, which tests for matches on the FIRST() and
FOLLOW() sets, often can be eliminated completely.


Source Code


The complete C source code for the current version of leopurd is available
electronically (see "Availability," page 3). LITE.H contains definitions
common to all source files. It defines the symbol LIST, resulting in visible
run-time trace on stdout. For leopurd to run quietly, you must undefine LIST.
The scanner is found in LITE1.C. LITE2.C contains functions to parse the
definition and user sections. The rules section is parsed by functions in
LITE3.C. LITE4.C contains functions that calculate FIRST() and FOLLOW() sets
and build the parser.
Leopurd itself is constructed around several top-down parsers. The source code
also shows ways to implement the next(), match(), and yyerror() functions, as
well as a simple scanner, yylex().
The code generator in LITE5.C produces ANSI C output. It is relatively easy to
modify to produce code for other languages. 
The source code is ANSI C compliant, with the exception of calls to strdup(),
a non-ANSI function supported by many compilers. They can be replaced with
calls to malloc(), strlen(), and strncpy(). Leo-purd compiles cleanly using
Borland C++ and Turbo C under MS-DOS, and on many UNIX platforms.
Example 1: Definition section for a simple calculator.
%token NUMBER
%token '+' MINUS
%token '='
%%
Example 2: Rules section for a simple calculator.
expression: sum '=' 
 difference '='
 ;
sum: NUMBER '+' NUMBER 
 NUMBER '+' sum 
 EMPTY
 ;
difference:
 NUMBER MINUS NUMBER 
 NUMBER MINUS difference
 ;
%%
Example 3: Code section for a simple calculator.
#include <stdio.h>
#include <stdlib.h>
static int line_number=0;
/* print an error message */
void yyerror(char *msg_str){
 printf("Error! line %d: %s\n",
line_number,msg_str);
}
int main(){
 int rc;
 while(!(rc=yyparse()));
 exit(rc);
}
Example 4: A recursive top-level production rule.
 statement:
 while '(' expression ')' statement ;

Listing One

%token NO_TOKEN
%token EOI
%token NUMBER 
%token EMPTY
%%
statement: expression '='
 ;
expression: factor predicate 
 EMPTY
 ;
predicate: '+' factor predicate 
 '-' factor predicate 
 EMPTY
 ;
factor: NUMBER 
 '(' expression ')'
 ;
%%
/***** user section *****/
#include <stdio.h>
#include <ctype.h>
#include <string.h>
int yylex(void);
#define INLEN 256
int look=NO_TOKEN;
static FILE *fp;
char *yytext="";
int yyleng;
/**** error function ****/
void yyerror(char *str){
 printf("Error! %s\n",str);
}
/**** look ahead to next token ****/
void next(void){
 look=yylex();
}
/**** match token ****/
int match(int token){
 if(look==NO_TOKEN)
 next();
 /**** uncomment to trace matches 
 printf("matching %d %d\n",look,token); /* */
 return token==look;
}
/**** keep parsing input until an error occurs ****/
int main(int argc,char *argv[]){
 int rc;
 if(argc==1) fp=stdin;
 else if(NULL==(fp=fopen(argv[1],"rb")))
 return printf("File Error!\n");
 while(!(rc=yyparse()));
 printf("%s\n",(rc==2)?"Success!":"Error");
 return fclose(fp);
}
/**** this is the scanner function ****/
int yylex(void){
 static char inbuf[INLEN]={'\0'};
 static char *p=inbuf;
 do{

 /* need another line */
 if(!*p){
 if(!fgets(inbuf,INLEN,fp))
 return EOI;
 p=inbuf;
 }
 /* strip leading whitespace */
 while(isspace(*p)) p++;
 }while(!*p);
 yytext=p;
 yyleng=1;
 switch(*p++){
 case '+': return (int)'+'; /* single char operator ret. by value*/
 case '-': return (int)'-';
 case '*': return (int)'*';
 case '/': return (int)'/';
 case '(': return (int)'(';
 case ')': return (int)')';
 case '=': return (int) '=';
 default:
 if(isdigit(*yytext)){ /* a number */
 while(isdigit(*p)){
 p++;
 yyleng++;
 }
 return NUMBER;
 }
 else printf("Unknown token: %c!\n",*yytext);
 }
 return NO_TOKEN;
}
/*** end of user section ****/

Listing Two
EMPTY 1003
NUMBER 1002
EOI 1001
NO_TOKEN 1000
predicate 1007
factor 1006
expression 1005
statement 1004
factor:
 '(' expression ')'
 NUMBER
predicate:
 EMPTY
 '-' factor predicate
 '+' factor predicate
expression:
 EMPTY
 factor predicate
statement:
 expression '='
predicate: (nullable) 
FIRST(): '-' '+' 
FOLLOW(): ')' '=' 
factor: 
FIRST(): '(' NUMBER 

FOLLOW(): '-' '+' ')' '=' 
expression: (nullable) 
FIRST(): '(' NUMBER 
FOLLOW(): ')' '=' 
statement: 
FIRST(): '(' NUMBER '=' 
FOLLOW(): 
Top level non-terminal : statement

Listing Three
#define YYSTYPE int
int match(int);
void next(void);
void yyerror(char *);
#define NO_TOKEN 1000
#define EOI 1001
#define NUMBER 1002
#define EMPTY 1003
#define statement 1004
#define expression 1005
#define factor 1006
#define predicate 1007
int predicate_parse(void);
int predicate_action(void);
int factor_parse(void);
int factor_action(void);
int expression_parse(void);
int expression_action(void);
#define statement_parse yyparse
int yyparse(void);
int yyparse_action(void);

Listing Four
#include "yyleo.h"
#include <stdlib.h>
/***** user section *****/
#include <stdio.h>
#include <ctype.h>
#include <string.h>
int yylex(void);
#define INLEN 256
int look=NO_TOKEN;
static FILE *fp;
char *yytext="";
int yyleng;
/**** error function ****/
void yyerror(char *str){
 printf("Error! %s\n",str);
}
/**** look ahead to next token ****/
void next(void){
 look=yylex();
}
/**** match token ****/
int match(int token){
 if(look==NO_TOKEN)
 next();
 /**** uncomment to trace matches 
 printf("matching %d %d\n",look,token); /* */

 return token==look;
}
/**** keep parsing input until an error occurs ****/
int main(int argc,char *argv[]){
 int rc;
 if(argc==1) fp=stdin;
 else if(NULL==(fp=fopen(argv[1],"rb")))
 return printf("File Error!\n");
 while(!(rc=yyparse()));
 printf("%s\n",(rc==2)?"Success!":"Error");
 return fclose(fp);
}
/**** this is the scanner function ****/
int yylex(void){
 static char inbuf[INLEN]={'\0'};
 static char *p=inbuf;
 do{
 /* need another line */
 if(!*p){
 if(!fgets(inbuf,INLEN,fp))
 return EOI;
 p=inbuf;
 }
 /* strip leading whitespace */
 while(isspace(*p)) p++;
 }while(!*p);
 yytext=p;
 yyleng=1;
 switch(*p++){
 case '+': return (int)'+'; /* single char operator ret. by value*/
 case '-': return (int)'-';
 case '*': return (int)'*';
 case '/': return (int)'/';
 case '(': return (int)'(';
 case ')': return (int)')';
 case '=': return (int) '=';
 default:
 if(isdigit(*yytext)){ /* a number */
 while(isdigit(*p)){
 p++;
 yyleng++;
 }
 return NUMBER;
 }
 else printf("Unknown token: %c!\n",*yytext);
 }
 return NO_TOKEN;
}
/*** end of user section ****/
int predicate_parse(void){
 if((!match('-'))&&(!match('+'))&&(!match(')'))&&(!match('='))){
 yyerror("predicate");
 exit(1);
 }
 else if(match('-')){
 next();
 if(factor_parse()){
 yyerror("factor");
 return 1;

 }
 if(predicate_parse()){
 yyerror("predicate");
 return 1;
 }
 }
 else if(match('+')){
 next();
 if(factor_parse()){
 yyerror("factor");
 return 1;
 }
 if(predicate_parse()){
 yyerror("predicate");
 return 1;
 }
 }
 else if(match(')')match('=')){
 return 0;
 }
 else return 1;
 predicate_action();
 return 0;
}
int predicate_action(void){
 return 0;
}
int factor_parse(void){
 if((!match('('))&&(!match(NUMBER))){
 yyerror("factor");
 exit(1);
 }
 else if(match('(')){
 next();
 if(expression_parse()){
 yyerror("expression");
 return 1;
 }
 if(match(')'))
 next();
 else{
 yyerror("')'");
 return 1;
 }
 }
 else if(match(NUMBER)){
 next();
 }
 else return 1;
 factor_action();
 return 0;
}
int factor_action(void){
 return 0;
}
int expression_parse(void){
 if((!match('('))&&(!match(NUMBER))&&(!match(')'))&&(!match('='))){
 yyerror("expression");
 exit(1);

 }
 else if(match('(')match(NUMBER)){
 if(factor_parse()){
 yyerror("factor");
 return 1;
 }
 if(predicate_parse()){
 yyerror("predicate");
 return 1;
 }
 }
 else if(match(')')match('=')){
 return 0;
 }
 else return 1;
 expression_action();
 return 0;
}
int expression_action(void){
 return 0;
}
int yyparse(void){
 if((!match('('))&&(!match(NUMBER))&&(!match('='))&&(!match(EOI))){
 yyerror("statement");
 exit(1);
 }
 else if(match(EOI)){
 return 2;
 }
 else if(match('(')match(NUMBER)match(')')match('=')){
 if(expression_parse()){
 yyerror("expression");
 return 1;
 }
 if(match('='))
 next();
 else{
 yyerror("'='");
 return 1;
 }
 }
 else return 1;
 yyparse_action();
 return 0;
}
int yyparse_action(void){
 return 0;
}











































































A Conversation with Michael Cowlishaw


The creator of Rexx takes time out for a chat




Jack Woehr


Jack programs client/server applications, some of them in Rexx, in Golden,
Colorado. He can be contacted at jax@well.com.


Since joining IBM over 20 years ago, Michael F. Cowlishaw has worked in
virtually all of IBM's major research centers--from the T.J. Watson Research
Center in Yorktown Heights, New York, to the Laboratories Systems Technology
Group in Great Britain. Over the years, Cowlishaw has received several
internal awards. Most significantly he was named an IBM Fellow in 1990, which
allows him to work on projects of his own choosing. His current interests
include user interfaces, electronic publishing, and Internet protocols. Still,
Cowlishaw is best known for developing the Rexx programming language. Frequent
DDJ contributor Jack Woehr recently chatted with Cowlishaw about Rexx and
other topics. 
DDJ: Programming languages make kind of an abstract, aesthetic impression. Did
you know Forth before you wrote Rexx?
MFC: I made it my business to know something about every language around at
that time. I always considered myself a professional language designer,
really.
DDJ: Rexx seems to have a little bit of a lot of languages in it. It kind of
looks like Basic, but there's a factoring that's reminiscent of Forth, and a
striving toward simplicity and convenience for the end user. Rexx is just an
easy language to use at the command line.
MFC: That was the intent.
DDJ: The value of Rexx is that most shell languages are lousy as programming
languages, and most programming languages are lousy for performing simple
system functions. Rexx is halfway between.
MFC: The general principle is that very few people have to implement
interpreters or compilers for a language, whereas millions of people have to
use and live with the language. One should therefore optimize for the
millions, rather than the few. Compiler writers didn't love me for that,
because Rexx got to be a hard language to interpret or compile, but I think it
has paid off for people in general, certainly programmers in general.
DDJ: Rexx's parser is weird, at least if you think that the way C does things
make sense.
MFC: Right. Earlier languages all had very formal grammars, which we needed to
parse and process, but they always had these syntax quirks which languages
such as C and Pascal come across. I very deliberately didn't have that kind of
formal, or simplified language grammar, in order to make it easier to work
with. So rather, it was tuned for what people wanted to do and the way they
wanted to write programs, rather than ways which made it easy to write
compilers.
DDJ: So the difference between Rexx and other languages really is an aesthetic
judgment that you and your associates made.
MFC: I think that's fair, yes. It's, "Try to apply good taste," or something
like that, but "goodness," that's quite, of course, hard to define.
DDJ: Do you think that "classic" programming languages, where users type
something into an editor telling the computer what to do, have any future?
MFC: I think they're going to be around for a long time. There are many
instances where one does require precision of expression which a formal
language such as a programming language makes possible.
Though there still may be many things that you will be able to do, say, by
verbal instruction and/or drag-and-drop programming and these kinds of things,
there will still be--for some applications at least--a need for precise
algorithms. And programming is really just a way of expressing an algorithm.
One possible example is banking, where you want your bank account to be worked
out very precisely.
DDJ: Isn't that the kind of program that is increasingly being done by
drag-and-drop?
MFC: Right. But underlying that program is something that does arithmetic.
It's a program, whether it's represented by a macro on a piece of hardware or
whether it's a program written by typing text.
DDJ: Someone is still going to have to hew the wood and draw the water.
MFC: I think you have probably made a fair point that banking is one area
where the underlying building blocks already exist, and now you can put the
building blocks together. But there are still huge areas of commerce and human
endeavor where the basic building blocks you're going to put together do not
yet exist, and still have to be programmed. 
DDJ: How long until they all exist?
MFC: Well, if they did all exist, then you'd never be inventing anything new
anymore. I hope that doesn't ever happen.
DDJ: In A Discipline of Programming, (Prentice Hall, 1976) Edsger Dijkstra
lavishes much care over a greatest-common-denominator algorithm. Is that sort
of thing disappearing from computer science?
MFC: I don't think it has. People still refer to Don Knuth's books, which are
very similar in their detail. Indeed, I'm writing an arithmetic package today,
and that's the first place I went to decide which algorithm I'm going to use.
When I write something, I try and make sure it's [as] good as possible. From
long experience, I know that programs can often last very much longer than
you'd expect. I had a piece of electronic mail just the other day that
suggested a new feature in a program, which I didn't recognize the name of.
Then I looked back at my files. The last time I touched that program was 1979.
He's still running it, and it works just fine. I had forgotten about it, and
he suddenly had this idea for a new feature! I'm actually quite surprised that
an assembly-language program written in 1979 is still running without being
broken by changes in the operating system since then. This was under VM
("Virtual Machine Operating System" for mainframes), which was my primary
operating system from 1976 until 1987, though I was using other operating
systems as well: UNIX, DOS, of course, and now mostly I use OS/2.
All through the '80s, I was writing for PCs as well as mainframes, but it
wasn't really a satisfying environment for me because of the 16-bit [limit]
and other limits of the operating system. It wasn't until OS/2 came out--OS/2
2.0 in particular, which is 32-bit--that I really switched over totally.
Since the late '80s just about everything I write, except where it involves
user interface, the GUI, is intended to be cross-platform portable. I try to
keep out things that are specific to other operating systems. I have a
character-based program we ported to seven different operating systems in a
week.
DDJ: I look at Rexx with some awe in that it's a language which spread very
quickly through the mainframe world, and beyond into Amiga, DOS, OS/2, NT, and
UNIX.
MFC: That was largely due to IBM's internal network, VNET.
DDJ: Can you briefly discuss the Virtual Machine (VM) operating system? 
MFC: VM's main attraction...certainly back in the '70s and some would say it
still is...[is that it is] one of the best development environments around.
Every user sharing that machine has effectively their own machine. It really
was a series of PCs connected by a LAN, except they all sat in one box, and
everyone had a complete, simulated virtual machine. It was a similar kind of
development environment to what people have today. Each person had their own
personal single-user operating system, with security since the boundary of
each virtual machine was very well defined. It was a delightful environment to
use and to program. 
DDJ: Were you a VM enthusiast in its heyday?
MFC: I would say so, yes.
DDJ: Back in 1988, IBM was referring to a 386 running OS/2 as a "programmable
terminal." The mainframes were still so powerful that a little box, even with
a nifty GUI, didn't impress many IBM technicians.
MFC: It's such a big company that you have different people working in
different places which are geographically and culturally widely separated.
People often work in isolation and don't have the opportunity to spend time
getting to know what's going on in other corners of the corporation.
The AS/400 division was already very successful before many people in IBM were
aware of it or what it did or what its computers looked like or how they
worked. This is the case of the PC in IBM. People in research were well aware
of the potential, but people working on mainframes, because they were working
on mainframes, had no need for PCs and therefore knew very little about them.
DDJ: I've heard IBMers say, "VM has outlived many of the executives who tried
to kill it." People don't always realize how much VM influenced what we have
today.
MFC: There is a parallel situation on personal-computer operating systems,
where the operating system is setting up a virtual machine under which you run
a copy of either the same or a different operating system. You can set up a
DOS box under OS/2, and it's such a complete simulation of a PC you can
actually boot DOS from a diskette into this virtual machine. That's
essentially what VM did...provide you with a large number of 360/370 virtual
machines, all running under the same operating system. This is a pleasant
environment, since every user had their own "machine." It was also a good way
of testing operating systems, because you could run them under VM and test
without bringing down an entire machine, making it unavailable to users, until
you had done that testing.
The idea of virtual machines didn't originate with IBM, but [VM's antecedent]
CP67 was one of the earliest environments to use virtual machines, and VM
first brought them to their potential.
DDJ: The engineers who developed the Intel 80386 and its V86 mode must have
seen VM.
MFC: That's certainly true. VM took the concept of virtual machines to
considerably greater lengths than the people who originally thought of it had
in mind. VM did not stop at virtualizing the process unit, but began
simulating [mainframe I/O] channels and channel adapters. Effectively, users
had a complete system simulated, done in a way which, thanks to various
hardware innovations, exhibited great efficiency. When you've got your time
slice, you run as though you're the native machine. It's not all simulation.
These [are] concepts we now see in OS/2 and so on.
DDJ: Are the mainframes still going to be there in ten years?
MFC: I think so. Firstly, for some applications, they're particularly
well-suited, commercial applications with centralized databases. They're
optimized for getting data on and off disks much faster than workstations
usually are. They will evolve, and I suspect in ten years you may not be able
to easily draw a distinction between what's a mainframe and what's not. I've
been wishing for some years that someone would take the mechanics of a PC,
which belch out heat and noise, and put them in a room miles away from where
people are sitting, so that all you'd have on your desk would be the input and
output devices you need. 
That's essentially what the mainframes gave you, in that they consolidated the
disk drives and the power supplies and the central processing units in one
place and people had something very simple on their desks.

I'm not the first to point it out, but the World Wide Web browsers of today
are effectively dumb terminals.
DDJ: Do you program in C++?
MFC: I program at the moment in either Rexx or C, including ObjectRexx.
DDJ: I find ObjectRexx the best new language idea of the 1990s.
MFC: IBM has stated its intention to make a version of ObjectRexx available
for Linux. This is a free, use-as-you-will source version released for a
vendor-neutral platform.
DDJ: What do you think of PC DOS Rexx?
MFC: It's a direct port of OS/2 Rexx. They took very little out of it, the
double-byte character support. They have a different version of DOS in Japan
anyway.
DDJ: Where are the hotbeds of Rexx usage today?
MFC: In the United States, obviously. Germany is very strong. I have a Rexx
bookshelf which is about four feet long: German, French, Japanese, Swedish...
DDJ: Yet many people don't know Rexx is there because it's not common under
Windows.
MFC: People often only know one operating system. Perhaps they've used another
in the past, but now they only use one. Or if they've only used one computer
language, the same thing applies, they often have the view there couldn't be
anything better. I went through a phase very early on where I had to use some
pretty awful languages. Then I came across Fortran. I was using Fortran for a
while, then someone tried to persuade me to use PL/I. I thought, "Nothing
could be better than Fortran." Then I was required to learn PL/I for my job,
and I found out it's actually much better than Fortran. This opened my eyes
[to the fact] that there's very much more to the computer business than what
one happens to be using today.
A professional computer person does need the experience of a wide variety of
operating systems and languages to be able to have a broad view and to be able
to contribute to the field. 
DDJ: What about the syntax of C?
MFC: I use C every day, so it's hard to be objective. If one's trying to write
a concise language, then it doesn't do too bad a job. There are clearly things
about it...I have more trouble with the semantics of argument passing and
pointers than I do with the syntax. I think nowadays one could take C and
apply the Rexx kind of syntax to it and still keep it efficient.
To some extent, needs have changed over the years. It was pretty important in
the '60s and '70s to use notations to save typing because the main output
devices in those days were teletypes and 2741s, which were very slow. APL was
extremely popular in those days because on the slow output devices you could
have big programs typed out in a very short time.
Later, conciseness began to be a burden rather than an advantage. It became
more important to have readable programs than concise ones.
Now the need is to build things out of smaller components which some expert
has previously written. ObjectRexx and SOM and OpenDoc are good ways of
building such components, so I believe they will become very important in the
future.
DDJ: Have you programmed these systems much?
MFC: Actually, most of what I do now is pretty low level. I'm writing a Web
server now.
I was doing some research on neural networks, in particular a
neural-network-based text-retrieval system. I wanted to be able to test that.
Unlike most programs, where you can wrap something around them and go away, I
had no idea what kind of queries they were putting in and how well the
retrieval system was responding to them. I had to run it on my own machine to
study how the algorithms were working, yet allow other people access to it.
I wrote a gopher server to do that a few years ago, and what happened is that
because when I write these things, I attempt to generalize them, the gopher
server had a Rexx interface. It was completely programmable, so someone
figured out that you could program the gopher server to be a rudimentary Web
server. But that wasn't ideal for various reasons, various assists it would
have been really useful to have in the server.
So the gopher server evolved into a Web server. In doing that, I spent a lot
of time.... I was unhappy with existing Web servers because they seemed to be
very slow responding even when doing trivial things. I went back and with the
advantage of hindsight, knowing what Web servers were being used for, I could
try and build something optimal for that use. I concentrated on reduced
response time and programmability. I used the fact that Rexx can run from
memory without being fired up as a separate process to make a very fast,
scriptable Web server.
For example, on a 486/50, it will respond to an incoming request in 20
milliseconds, including running a Rexx script. If you do some caching so it
doesn't use the Rexx script, then it's about ten milliseconds. Logs are
written straight through to disk but they don't affect response time. Quite a
number of interesting interactions between the subsystems of a server made it
quite an interesting project. 
DDJ: Those numbers imply careful testing.
MFC: Careful measurements, not so much serious testing. I've found that when
everyone is concerned about performance, the only clearly sensible way to go
about it is do measurements. I've often...seen programmers realize that their
program is too slow and spend days and weeks optimizing parts and [then] find
it doesn't make any difference because they didn't do measurements to find out
what's really taking the time.
I've seen this on a very large scale. I won't say what company it was in, not
to embarrass anybody. But there's a very large project that was written
entirely in Rexx, tens of thousands of lines. It was too slow as an
application. The project team decided it was obviously because it was written
in Rexx, so they split up and got four programmers and four subcontractors and
they all went away and rewrote the components. Then they put them together for
system test at the end and found that the response was only 2 percent better,
which made absolutely no difference at all to the end user. Of course, the
time wasn't being spent in Rexx at all, the time was being spent in database
lookups and communications over networks.
I've seen examples of that over and over again in my career. Everyone knows
intuitively what is wrong so nobody takes any measurements.
DDJ: How does one produce good software engineers?
MFC: I don't think there's one simple answer. It's self-motivation of the
programmer, much more dependent on the individual personality than the
training or whatever they have. It helps if they come across a lot of good
people and work with a variety of different projects and operating systems.
Many of the best programmers I know were hardware engineers originally. A few
years ago, if you were designing hardware, you'd make sure your design was
right before you ever put it together. After you'd laid out the cards, and
gathered the components, if you hadn't done your design right, you just had a
piece of junk.
When I program, I don't find a lot of bugs. If it compiles--that is, after I
correct my typos--it usually runs the first time. One question I often ask a
programmer is, "How good are you at using the debugger?" If they're an expert
at using the debugger, then I know they're not a very good programmer.
There are things where you do have to be expert at using the debugger, such as
writing operating systems or device drivers, and those are different cases.
But in general, if you are writing application-level code, then you shouldn't
have to use a debugger. It implies you weren't being very thorough in writing
the program in the first place.
DDJ: Do you feel you have a lot of foresight about where computing is going?
MFC: In questions I ask myself, I usually feel confirmed. One always misses
some of the things that happen. One I didn't really see coming was the World
Wide Web, like many people, because that year I had my head down and I wasn't
on the Internet very much, and all of a sudden it happened.
Sometimes a very unexpected direction can have a great effect. I suppose in
some sense Rexx was a bit like that. It was something I decided to do in the
middle of one night. Maybe if I'd had something else for dinner the night
before and felt ill it would have never happened, but now there are millions
of people programming in it.
DDJ: How would you compare Rexx and Perl?
MFC: Larry Wall [creator of Perl] came to our Rexx Symposium a couple of years
ago. We had an interesting joint session discussing languages. Perl's designed
for a different audience. It's for a C-like programmer and it fits that
audience. Rexx probably has a more general appeal.
There's a lot of overlap. Both languages are good, for example, for writing
scripts for the Web. They are really quite different philosophies on how to
design a language, both valid. Larry's philosophy is to put anything into the
language that anybody asks for. Then it's there for everybody. Everybody has
their own favorite features, and he makes people happy that way.
My approach is the other way 'round, that is, don't put anything in unless
it's really, really necessary, because then you end up with a really small
language with few notations, and you make people happy for a different set of
reasons.
DDJ: What is your connection with ObjectRexx?
MFC: ObjectRexx was originally designed by Simon Nash, a long-time colleague
of mine. It's very much his design. My contribution was more as a consultant,
somebody to bounce ideas off, mostly about keeping it in line with the
existing philosophy of Rexx.
We used to sit around the table in a pub in the village of Hursley and discuss
things, rather than me saying, "You can't do it that way." It was done by
consensus.
DDJ: What's Hursley like?
MFC: It's a small village halfway between Winchester and Romsey. It has two
pubs next to it in Hursley Park, an old manor house which now forms
headquarters of the IBM UK Laboratories, along with the other new buildings
that have been built around it. It was previously owned by Vickers; it's where
the Spitfire was designed in the Second World War. IBM took over more than 30
years ago and has restored the house pretty much to the way it was. Paneling,
and the Wedgwood room still has its china, and these sorts of things.
DDJ: In addition to Rexx, what related work are you proud of?
MFC: One of the things I'm most proud of was some work I did on color
perception, answering the question, "How many bits per color do you need in a
pixel on the screen not to be able to tell the difference from the human point
of view?" It works out that for a standard screen at a standard reading
distance, you need about four bits in green, three bits in red, and two bits
in blue. If you sample your image for that particular spread, it's
indistinguishable from eight bits in each color. It's what formed the basis of
the standard color palette in OS/2, which gives you more colors which are
related to green than to blue or red, so if you have just eight bits for the
display, you can do better than if the bits were equally divided between the
colors.
DDJ: You've spent your time on interesting things!
MFC: I'm very interested in...how humans interact with displays, and
languages...in some sense they are related. I've always been a fan of using
color in displays [and the] use of color for indicating things. At the Oxford
University Press project, the intent was that I should write an editor for a
black-and-white screen because they felt a monochrome screen had better
definition and [was] easier to read. I insisted on making it suitable for
color as well. They then ran a formal test at the end of the project to decide
whether they should use color screens, which were visibly fuzzier, or
monochrome screens. They came out with a conclusive answer that it should be
color. People dealing with markup on monochrome screens were kind of snapping
at people, and losing their temper, and giving up halfway through the task,
while the people using the color screens sailed through this interesting test.
DDJ: Can you tell us about the Oxford University Press project? 
MFC: Around 1985, the Oxford University Press needed an editor that could
handle highly structured data: the content of the Oxford English Dictionary,
which is about a 20-volume, 1000-page-per-volume dictionary. They had chosen
to mark it up with SGML, but they didn't have a good editor. In fact, nobody
had a good editor at that time for dealing with that complexity of data. So I
wrote an editor for them called "LEXX" which ran on IBM mainframes.
The Oxford English Dictionary was originally only in hard copy. It all had to
be retyped to be put into electronic form, and then the structure of that data
was marking the things that were headwords, things which were content, things
that were quotations...that markup is all SGML.
LPEX is a reimplementation of LEXX. It's the same design under the covers, as
far as the data and the way that the parsing is done, and so on, but it's
reimplemented for OS/2 and AIX platforms. It's now mostly used for program
editing, because of its ability to parse data and color keywords, and other
features.
DDJ: In other words, your programming career hasn't just brought technical
satisfaction. You've also performed a chore of great social import, in that
you have aided the computerization of the Oxford English Dictionary.
MFC: That's correct.





































































Windows 95 Journaling and Playback


Using keyboard and mouse macros anywhere and anytime




Mark Russinovich and Bryce Cogswell


The authors are researchers in the computer science department at the
University of Oregon. Mark can be reached at mer@cs.uoregon.edu, and Bryce, at
cogswell@cs.uoregon.edu.


One of the basic applications that comes with Windows 3.1 is the Recorder
program in the Main program group. This utility uses built-in Windows
journaling and playback hooks to record mouse and keyboard inputs into a file
that can be used for later playback, appearing to the system as if a user were
entering the input again. If you used Recorder, you probably did so only to
see what it does--not for anything meaningful. However, journaling and
playback are useful for tasks such as automated testing, online software
demoing, input filtering, and macro development. In fact, several successful
commercial products are built on this journaling mechanism. 
In this article, we'll describe an implementation of a Windows 95 journaling
and playback facility that overcomes major shortcomings in the Windows 3.1
implementation. To demonstrate the power and flexibility of the new facility,
we'll present a macro recorder program that supports keyboard and mouse macros
that can be used anywhere and anytime. (The complete source for Recorder is
provided electronically; see "Availability," page 3.) Other Windows 95
programming techniques demonstrated by the application include the use of
multithreading and having virtual devices notify a Win32 program of an
asynchronous event.


Application-Level Journaling


Windows 3.1 journaling (application level) hooks work at the
application-message level and have been carried forward into Windows 95
generally unchanged (although Microsoft no longer provides a recorder
program). While these hooks are sufficient for many applications, they have
significant shortcomings. For example, they do not journal keyboard input
going into DOS windows or input going into a DOS full-screen session, so DOS
programs and sessions can't be recorded or played back. If you never run DOS,
you won't care; but for most of us, the DOS prompt will be part of our lives
for some time to come.
Another problem is that the old hooks force all input coming from the mouse
and keyboard to be funneled into a journaling program for it to save, before
being passed on to the target application(s). The inverse occurs on playback,
with Windows asking the journaler for input to simulate. A major breakthrough
with Windows 95 is its decentralized input scheme. In the decentralized
approach, higher reliability and faster performance are achieved by removing
the bottleneck of having input pass through a common point before being
distributed. By having Windows 95 feed input directly to the input queues of
applications, no one application can hang the system by locking the input
queue indefinitely, as was possible in Windows 3.1. (For 16-bit programs
running on Windows 95, the single input queue is still in effect.) It is
therefore undesirable to use the application-level mechanism because it uses
the older, less-robust, and less-efficient approach.
Finally, the traditional journaling hooks require a Windows program to
actively participate in the journaling process. This also can degrade
performance because the program must wake up at every mouse or keyboard input,
using up cycles that other programs might need.


Windows 95 System-Level Journaling


The designers of Windows 95 recognized the drawbacks of the old journaling
hooks and introduced new, more-efficient mechanisms for journaling and
playback. Unfortunately, the hooks have no Windows application-level API,
making them inaccessible to most developers. The new hooks have been placed at
the lowest level possible--the virtual device (VxD) level--giving users the
ultimate in input visibility and control. 
Hooks for the mouse and keyboard are necessary since separate VxDs deal with
each type of input. Both VxDs have a VxD-level API allowing other VxDs to
request to see inputs as they are received by the system, as well as to
generate simulated input from a device. 
To demonstrate the new facilities, we've developed Recorder, a macro-recorder
application. Recorder consists of a Windows 95 32-bit GUI that serves as the
user interface and a VxD that serves as the recording and playback controller.
First, let's see how the Recorder VxD simulates input from the mouse and
keyboard.
Recorder sees system input via "service hooking," an obscure feature of the
Windows VxD architecture that lets you see just about everything going on
inside the guts of Windows and gives you the control to completely change its
behavior. Once a service is hooked, any VxD or application calling the service
gets redirected to the hooker VxD first. The hooker can choose to pass the
request on to the next VxD on the chain, change the parameters to the request,
or service the request itself.
To make hooking VxD services possible, each VxD has a memory location assigned
for each of its services that points to the top of a hook chain. When a new
hooker is registered for a service, the appropriate address is modified to
point to the new hook routine. The hook routine itself must be declared as a
Hook_Proc in its assembly-language declaration like this: BeginProc
Hook_Routine, Hook_Proc Chain_Save. In this case, Chain_Save is a double-word
variable that you assign and to which Hook_Device_Service stores the previous
top of the hook chain. The Hook_Proc tag causes the code in Example 1 to
appear at the start of the procedure.
When a hook is removed from the chain, the link to the next service in the
chain is obtained from the Chain_Save location, making hooking and unhooking
transparent to the VxD programmer. The Chain_Save variable can also be used by
the hooking routine to chain to the previous hooker if it wants to pass the
request on.
To view keyboard input, Recorder must hook the keyboard VxD (VKD) service
VKD_Filter_Keyboard_Input. This service is called by VKD itself upon keyboard
input. The service was written with the sole intention that some other VxD
would hook it to record, and possibly alter, keyboard input. Listing One shows
the VKD's call to the service and the service itself.
To alter input, a hooking VxD simply changes the value of the CL register to a
new scancode; to kill input, the hooking routine sets the carry flag before it
returns. The Recorder VxD hook procedure saves the value of the CL register
and the current time into a buffer. Listing Two hooks the service, while
Listing Three is a skeleton keyboard-recording routine.
A second service is used to record mouse input. Whenever there is mouse input,
the VMD_Post_Pointer_Message service of the VMOUSE VxD is invoked with
parameters indicating mouse location and the state of the mouse buttons.
Recorder must hook this service to see this input.
Unlike the VKD filter function, where the returned carry flag determines
whether the event is passed on to applications, VMD_Post_Pointer_Message
actually notifies applications itself. To kill mouse input, the hook function
should not chain the request (instead of setting the carry bit, as is done for
the VKD filter). Listing Four is the Recorder VxD code that hooks the service,
and Listing Five is a skeleton of the mouse recording-hook procedure.
The DDK documentation for VMD_Post_Pointer_Message includes another parameter,
the mouse-instance structure pointer, that is supposed to be passed in the EDX
register. Stepping through VMD_Post_Pointer_Message with a debugger reveals
that the EDX register is not used, so the documentation is not accurate and
this parameter can be ignored.
Once a Recorder-type application has saved mouse and keyboard input, it must
be able to play it back. To play back or simulate keyboard input, you use the
VKD service VKD_Force_Keys. This service takes an array of virtual scan codes
and sends them into the system as if a user had typed them on the keyboard.
Scan codes are normally generated by the keyboard for both key presses and key
releases with the key-release scan code being identical to the key-press scan
code except that the high bit of the scan-code byte is set. Listing Six is the
Recorder code that simulates one keystroke.
Mouse input is simulated just as easily, but VMOUSE has no separate service
for input simulation. Instead, the Recorder must use the
VMD_Post_Pointer_Message service to generate input because VMD provides this
service for mouse minidrivers. Mouse minidrivers are for mouse devices that
Windows does not know how to deal with. Rather than have mouse-driver
developers implement the entire functionality of a mouse driver, Windows 95
lets them focus on the hardware-specific details of the device while using the
system interface services provided by VMOUSE (VMD_Post_Pointer_Message, for
instance). A program simulating mouse input acts like a mouse minidriver
calling this input routine. Listing Seven is a code fragment from Recorder
that replays mouse input.
One final issue to be addressed is the speed at which applications simulate
input. Recorder plays input at the same speed it was recorded. This is
necessary in cases where playing back the input too quickly will not produce
the desired effect, such as starting an application and then pulling down a
menu. If the menu doesn't exist when the pull-down action is played, the macro
won't do what it was supposed to. Timing of input is obtained by using the
Virtual Machine Manager (VMM) service Get_System_Time_Address, which returns a
pointer to a memory location where Windows keeps track of the number of
milliseconds since Windows booted. Accessing this location directly avoids the
overhead of calling a service like Get_System_Time. Once timestamps are
recorded, the playback mechanism ensures the same relative timing between
inputs using the Set_Global_Timeout service.


The Recorder Application


The GUI Win32 portion of the Recorder application presents a dialog-box main
window (see Figure 1) with buttons for recording and deleting macros, as well
as a button that allows one to save recorded macros to disk. Up to four macros
can be defined and assigned function keys F1-F4. To record a macro, the user
clicks on the Record button in the dialog box. Recorder then pops up a dialog
box indicating that macro recording will start when one of the function keys
F1-F4 is pressed and will finish when the same function key is pressed a
second time. So that mouse macros always begin with the mouse at the same
location, the mouse cursor moves to the center of the screen whenever macro
recording or playback starts. 
Once a macro has been recorded, it can be assigned a name by editing the
listbox associated with its assigned function key. A set of macros can be
saved to disk with the Save button and will be automatically loaded the next
time Recorder is run in the directory where the saved macro files have been
stored. Individual macros can be deleted by selecting the desired listbox and
then clicking on the Delete button.
One useful Win32 feature demonstrated by the Recorder program is
multithreading. When a macro playback has started, Recorder launches a
separate thread to wait for the VxD to indicate that the macro has finished
playback. The new thread blocks on a Win32 event waiting for the VxD to toggle
the event and let it continue. After it continues, the thread plays a beep to
inform the user that the macro is done and then exits. This structure allows
the Win32 program to continue to update its display while waiting for the
macro replay to finish. General asynchronous communication from a VxD to a
Win32 application can be constructed on this framework.
Figure 1: Dialog box main window of the Recorder.
Example 1: When the Hook_Proc tag is used, this code appears at the top of the
procedure.
 jmp Hook_Routine ; skip over the book-keeping info
 jmp Chain_Save ; bogus code - its just here to point Windows at
 ; the variable storing the previous service address
 dd 0 ; not used

Hook_Routine: ; hooking routine is here

Listing One
 ...
 ; VKD calls its own service with input
 mov cl, scancode ; scancode is the keyboard input
 VxDCall VKD_Filter_Keyboard_Input ; call the service
 jc noinput ; if carry set, kill the input
 ...
; the default filter service
BeginProc VKD_Filter_Keyboard_Input
 clc ; clear flag to let input through
 ret
EndProc VKD_Filter_Keyboard_Input

Listing Two
 
 ; hook the keyboard input
 GetVxDServiceOrdinal eax, VKD_Filter_Keyboard_Input 
 ; get the id of the filter service
 mov esi, offset32 Record_Keyboard ; address of our hook routine
 VMMCall Hook_Device_Service ; hook
 mov Keyboard_Proc, esi ; save previous hook

Listing Three
Public Record_Keyboard
BeginProc Record_Keyboard, Hook_Proc Keyboard_Proc
 ; save CL and timestamp to recording buffer here
 ; call the previous hooker
 call Keyboard_Proc ; let other VxDs massage input
 clc ; clear carry to let the key through
 ret
EndProc Record_Keyboard

Listing Four
 ; hook the mouse input routine
 GetVxDServiceOrdinal eax, VMD_Post_Pointer_Message
 ; get the id of the service
 mov esi, offset32 Record_Mouse ; pass our hook routine address
 VMMCall Hook_Device_Service ; hook it
 mov Mouse_Proc, esi ; save service address

Listing Five
Public Record_Mouse
BeginProc Record_Mouse, Hook_Proc Mouse_Proc
 ; record the mouse input here. The movement parameters passed in are:
 ; esi - mouse delta x
 ; edi - mouse delta y
 ; al - mouse button status
 ; call the previous service
 call Mouse_Proc ; pass event through to system
 clc
 pop esi
 ret
EndProc Record_Mouse

Listing Six
 mov ecx, 1 ; number of keys
 lea esi, [ebx].scancode ; address of scancode array

 VxDCall VKD_Force_Keys

Listing Seven
 mov esi, [ebx].deltax
 mov edi, [ebx].deltay
 mov al, [ebx].button
 VxDCall VMD_Post_Pointer_Message
























































Moving from C++ to Java


Understanding object-oriented concepts is the key




Gary Aitken


Gary has been the technical lead and chief architect for a large, commercial,
UNIX-based C++ toolkit for the past seven years. He now is an engineer at
Integrated Computer Solutions, where he is working on Java-based technology.
He can be reached at garya@ics.com.


If you haven't been banished to a desert island, then you've heard about Java
and its potential impact on both developers and users. In this article, I'll
highlight some of the differences between Java and C++. My purpose is not to
teach you how to program in Java, but rather to make you aware of potential
problems and opportunities when moving from C++ to Java. I'll provide brief
explanations of those concepts with which you may not be familiar, although I
won't provide in-depth coverage of how they work or how to use them. Keep in
mind that these are the major differences as I perceive them and are the
result of my personal experiences with Java.


Java Executes on a Virtual Machine


Java source is not compiled into normal machine code. It is translated into
code for a virtual machine specifically designed to support Java's features. A
Java interpreter then executes the translated code. No link step is required;
the interpreter dynamically links in additional classes on demand; see Figure
1.


Java is Totally Object Oriented


Java is a totally object-oriented language. This means everything you do must
be done via a method invocation (member function call) for a Java object. To
start with, there is no such thing as a stand-alone main function. Instead,
you must begin to view your whole application as an object; an object of a
particular class. But what class? Most Java applications simply make a new
class derived from the primitive Java Object class and implement everything
they need, but you can save much time and improve consistency between
applications by creating a base application class to handle features common to
all applications.
The strict, object-oriented nature of the Java environment means existing C
and C++ code can't be used directly; the same goes for system calls. In C++,
you can get to existing C procedures, including most system calls, simply by
declaring the C procedure as outside of the normal C++ namespace using the
extern "C" syntax.
In Java, there's a similar escape hatch, but it isn't nearly as simple to use.
You must define a native method, whose purpose is to interface to the C
function, then provide the glue to connect to it. The Java environment
provides tools to help with this task, but the whole process certainly isn't
as trivial as the C++ extern escape. Interfacing to C++ classes is even more
complex, involving the interface to C classes and the normal problems of
invoking C++ functions and member functions from C. Fortunately, many of the
more-common system-utility functions are already provided via methods in the
System class, but these obviously won't include any of the useful procedures
or classes you may have built up over the years. Save delving into this until
you really need to.


Separate Header Files don't Exist in Java


In Java, everything about a class is contained in a single file. The signature
of a method appears in only one place, and the implementation of a method must
appear simultaneously with its declaration. The advantage of this is that it
is more difficult to mistakenly program using files that are out of
synchronization with the implementation, or to get a library that is missing
the implementation of some member function. Class declarations (function
signatures and public variables) are available to the Java translator even
from the binary output of a compilation, so no additional header files are
needed, just the compiled object file.
The disadvantages of this are primarily related to how we program. Many C++
programmers use header files more or less as documentation. To see what the
interface to a particular member function is, you bring up the header file and
find the function. You can usually look at most header files on a single page
and get a good idea of how a particular class should be used. In Java, there
is no such concise summary available. Since the code to implement a method
must appear with the method definition, and the code for a single function
frequently occupies a page or more, it is difficult to look at any Java source
and get an impression of how the class should be used. You must have adequate
documentation for any classes you intend to use, which should go without
saying, but often documentation is sorely lacking when you are dealing with
in-house-designed classes or classes that are not part of a fully supported
commercial product.
Two tools supplied in current Java environments which should help with this
are javap, a disassembler that prints class signatures, and javadoc, which
produces HTML documentation from comments embedded in source files.


Packages Partition the Namespace in Java


One problem large C++ projects encounter is namespace pollution--how can you
ensure that some other developer working on a different aspect of a project
won't create a class with the same name as a totally different class in
another part of the project? Worse yet, a vendor may deliver a library with a
class that uses a name that you have already used. There are various ways to
minimize these problems in C++, but a project may be well underway before the
problem rears its ugly head, and correcting it then is painful and costly.
Java addresses this problem using a concept called "Packages," which
effectively partition the name space for classes by collecting classes into
named packages. Two classes with the same name, in different packages, are
still unique. The key, then, is to be sure related classes are collected into
their own package.
Remember, however, that Java does not solve the general problem of name
collision. Extending a base class and thereby causing a collision with a
derived class will remain a problem. For example, if your favorite vendor has
supplied a set of classes that you use as base classes and your derived class
has a method named foo, you will have problems if the next version of the
vendor's class contains a new method also named foo.


Exceptions are First-Class Characteristics in Java


In C++, exceptions and exception handling are rather esoteric; many C++
programmers may never deal with them and may have no idea what they are.
Exceptions are error conditions that are not expected to occur in normal
processing. Consequently, they are not returned from a method as either
arguments or the return value; nonetheless, they cannot be ignored. An example
would be a method to compute the square root of a number. The normal interface
expects a positive real number as an argument and returns a positive real
number as a result. Since a program might incorrectly pass a negative number
as an argument, the method can check for this and throw an exception when it
occurs. In most systems, programmers are not required to deal with exceptions,
and the occurrence of an unexpected exception causes abnormal program
termination. 
In Java, exceptions are a full-fledged part of the language. The signature of
member functions includes exception information, and the language processor
enforces a programming style whereby if you call a method that can throw an
exception, you must check to see if any of the possible exceptions occurred
and handle them. Almost every Java programmer will encounter exceptions, since
some of the more-useful classes from the supplied libraries throw them.
Dealing with exceptions is not difficult but is something you will need to be
aware of. The documentation for a method must indicate the exceptions it
throws. If you forget about them, don't worry; the compiler will remind you. 


Strings are Different from Character Arrays in Java



Java includes a String object, which is a constant. A String is not the same
as an array of characters, although it is easy to build a String object given
an array of characters. You should use Strings instead of arrays of characters
wherever possible, as they cannot be overwritten by different values
unintentionally.


Java has Limited Support for Constant Objects and Methods


In C++, you may declare a formal parameter to a function or a return value as
const, effectively preventing the body of the function from modifying the
argument and the caller from modifying the return value. In addition, you may
declare a member function as constant, indicating it cannot change any aspect
of the object upon which it operates.
Java supports the notion of constant, read-only values, using the final
modifier. However, it does not support the notions of constraining writeable
objects to be read only when passed as an argument, constraining return values
to be read only, or constraining a method to not modify the object upon which
it operates.
This omission is less of a problem in Java than it is in C++, mostly because
of the differentiation between the String class and an array of characters,
but it does leave a hole for errors. In particular, there is no way of
ensuring that a method that should not modify an object does not inadvertently
do so.


Java has no Pointers


Understanding the concept of pointers is one of the most difficult aspects of
C and C++ programming. Pointers are also one of the biggest sources of errors.
Java has no pointers; instead objects are passed as arguments directly, rather
than passing pointers to objects. In addition, you must manipulate arrays
using indices. Neither of these are a big deal in most cases; however, the
lack of pointers is a major deal if you are writing systems where pointers to
functions or pointers to member functions are involved. These situations arise
frequently in systems involving callbacks with known signatures to objects of
a base type, where numerous different methods having the same signature may
all be used for a particular callback and are dynamically assigned. There are
ways around the problem, but they are not particularly intuitive or
convenient. 


Java has no Parameterized Types


A parameterized type provides a means of writing one piece of code to describe
an implementation for several similar types of arguments. An example would be
a square-root method that operates on either integers or floating-point
numbers. In C++, this capability is provided by templates. 
Java has no equivalent of C++ templates. If you have been using templates
merely as a convenience, such as to build several similar functions using
different types of arguments (as in the previous example), this isn't too big
a disaster. It means more cutting and pasting to write each of the similar
classes or methods by hand, but does not present a serious roadblock in terms
of whether or not you can write an equivalent program. However, if you have
been using templates to automatically generate classes, it is a problem, and
there is no easy way around it.


Java is Garbage Collected


In a garbage-collected language, the underlying run-time environment keeps
track of which pieces of memory are in use and which are not. When a piece of
memory is no longer needed, the system automatically reclaims the memory. For
example, an object created inside a method, but not passed back to the caller
or stored as part of a global object, can no longer be of any use after the
method is exited. Call it magic if you like, but the system really does know
which objects you are using, and which ones you can't possibly touch again
because there are no references remaining that address them (this is one of
the benefits of not having pointers in the language). As a result, you no
longer need to worry about destroying objects when they are no longer in use
or freeing memory returned by some function call. An incredible amount of time
and debugging effort in C++ is focused on bugs caused by deleting objects
which are still in use, or memory leaks caused by not deleting objects no
longer in use. Java's use of garbage collection greatly reduces these errors,
although it does not eliminate them--bad program logic can still leave
no-longer-needed objects linked onto an active data structure, in which case
they won't be garbage collected. Many classes in C++ contain destructors
primarily to release auxiliary storage used by an object of the class. The
fact that Java is garbage collected means you don't have to write destructors
for these classes. It does not, however, mean you can forget about writing
destructors for all classes. For example, an object that opens a network
connection still needs to clean up gracefully by closing the connection when
it is destroyed. In Java, a destructor is known as a "finalization method."


Java Does not Implement Multiple Inheritance


In any complex, object-oriented system, implementing a class that needs to
inherit the functionality of more than one other class is a frequent problem.
A Manager class, for example, might need to serve as the head of a linked list
(of the employees the manager manages), but a Manager also needs to be an
Employee. There are several ways of dealing with this problem. One of these is
multiple inheritance--allowing a class to be derived from more than one base
class. In this example, Manager could be derived from both Linked List and
Employee. 
Java does not implement multiple inheritance. Instead, you may declare
interfaces, which describe the programming interface used to achieve some
functionality. A class may then implement one or more interfaces, as well as
its own unique functionality. Different, unrelated classes may implement the
same interface. Formal parameters to methods may be declared as either classes
or interfaces. If they are interfaces, objects of different classes that
implement the interface may be passed as arguments to the method. 
The concept of interfaces is considerably easier to master than multiple
inheritance, but has its limitations. In particular, you must write code to
reimplement the desired functionality in each class implementing an interface.



Java Supports Multithreading


Multithreading lets you write a program that potentially does two or more
operations at the same time. For example, you could finish reading a large
file while still allowing the user to edit the part already read in. To do
this, you break your program into different threads of execution. To work
correctly, the program needs to be careful about how the different threads
manipulate any data or make decisions based on data common to more than one
thread.
Java was designed to support multithreaded applications from the start. The
classes and interfaces provided make breaking an application into different
threads simple. Language primitives handle automatic synchronization and
locking of critical data structures and methods.


Java Comes with a Diverse Set of Predefined Classes


The default Java environment currently consists of several different Java
packages implementing a diverse set of fundamental classes. These give you a
real jump start in terms of your ability to quickly write a meaningful
application. They include the following:
java.awt. Most applications developed today make heavy reliance on GUIs. Java
provides an abstract window toolkit (awt), which allows you to deal with GUI
objects in a generic manner without regard to the system on which your program
is running. The big advantage is that your program will automatically run on
all supported Java platforms. As of this writing, that includes Windows 95/NT
and Sun UNIX platforms, but by the time you read this it will most likely
include others, such as the Mac and most other UNIX flavors. The current awt
is a least-common-denominator GUI toolkit, a problem not inherent in the Java
design, but rather, a result of the rapid explosion of the technology, and
time-to-market and resource constraints on the Java development team. Expect
the awt to evolve into a more full-functioned set of classes.
java.applet. An applet, in the context of Java, is a graphic piece of a larger
program, focused primarily on providing some form of browser-related content.
Applet itself is a subclass of an awt component and provides extended
capabilities to support rendering dynamic images, such as animations and
audio.
java.io. The java.io package supplies classes to support reading and writing
streams, files, and pipes. 
java.lang. These classes support the basic Java objects and native types:
Class, Object, Boolean, Float, Double, Integer, String, and so on, plus those
dealing with the extended capabilities of the language and connection with the
rest of the system environment.
java.net. The java.net package supplies classes to support network
programming. These include dealing with sockets, Internet addresses, network
datagrams, uniform resource locators (URLs), and content handlers for data
from a URL.
java.util. These are general-purpose utility classes for data structures, such
as dictionaries, hashtables, dates, stacks, bit sets, and strings. The package
does not have the breadth of similar, commercial C++ libraries, but does
provide convenient and time-saving implementations for some commonly needed
classes.


Summary



Much of the popular press is claiming Java is much easier to learn than C++,
heralding it as a breakthrough in that regard. Certain aspects of Java are
easier to learn than C++, but the difficulties most people have in learning to
program in both C++ and Java have little to do with the language itself.
Instead, they have to do with fundamental, object-oriented concepts. If you
understand those, picking up the Java syntax will be a breeze. If you don't,
you will probably find Java just about as confusing as C++.
Figure 1: The Java development environment.




























































Using OODCE


Take DCE and C++ and mix them up




Jonathan Roberts and Dan Zigmond


Jonathan is a senior software engineer with Compuware in Alameda, California,
and can be reached at Jonathan_Roberts @compuware.com. Dan is a software
engineer with Siren Software in Palo Alto, California, and can be contacted at
djz@siren. com.


Distributed computing has been possible as long as computers have been on
networks, but only recently have sophisticated tools become available to
facilitate writing distributed apps in heterogeneous environments. The Open
Software Foundation's Distributed Computing Environment (OSF/DCE) is one such
environment, providing multithreading, remote procedure call (RPC), time
synchronization, security, and uniform naming (for more information, see
"Distributed Computing and the OSF/DCE," by John Bloomer, DDJ, February 1995).
DCE applications are typically complex. The DCE libraries contain over 500
functions and data types. Even seemingly trivial applications require
nontrivial amounts of code just to cover the basics. In fact, the difficulty
of writing real-world DCE applications by hand has long been seen as an
obstacle to DCE's widespread acceptance. OODCE, a C++ library originally
developed by Hewlett-Packard, greatly simplifies the development of
distributed applications.
By encapsulating DCE functionality in C++ objects, OODCE accomplishes several
things at once. First, it allows programmers to write DCE applications in C++
style. Secondly, it makes obvious the underlying DCE object model. In the DCE
model, a server is responsible for exporting objects to the RPC run time,
making them accessible to clients. Each exported object can be seen as an
instance of a class defined by an IDL interface specification. For example, a
server may export several Coke-machine objects, each of which acts
independently. Although these two machines are exported by the same server,
one machine may report being empty while another has plenty of Cokes to sell.
This aspect of the interface is obscured in the standard DCE interface by the
myriad of C APIs required to get anything done. OODCE clarifies this aspect of
the DCE object model by expressing it in a set of C++ class hierarchies.
Finally, OODCE simplifies developing DCE applications by hiding details of key
components with default behavior, allowing you to specialize the behavior when
necessary. For instance, registering a DCE server with the RPC run time
occupies two full pages in the OSF/DCE Application Development Guide. Using
OODCE, we can get away with only two member-function calls to theServer
because the DCEServer class provides reasonable default behavior regarding
network protocols, authentication services, and the like. If we don't like
that default behavior, we can override it in a subclass of DCEServer. Thus, we
can put together a server in record time with the option of fine tuning it
later. This aspect captures the spirit of OODCE.
Rather than explain in full detail every part of OODCE, we'll walk through the
development of a simple application. This should demonstrate how little effort
is required to get a prototype distributed system off the ground. We'll also
discuss what else you can (and can't) do with OODCE 1.0.


Defining the Interface


One key feature of sophisticated distributed computing is interoperability.
Sharing data on a homogenous network has been relatively simple for many
architectures and operating systems for about 15 years on microcomputers and
about 25 years on larger systems. Trying these machines together in a way that
transcends both computer architecture and operating system is one of DCE's
strengths.
The same principle holds true for DCE development systems. An OODCE server
that could talk only to OODCE clients would be of limited utility. Similarly,
an OODCE client that could not work with existing C-based servers would be
shut out of many applications where legacy servers are in place (and working
too well to be rewritten), but where new clients are needed.
For this reason, an OODCE application starts out with exactly the same sort of
interface specification as a C-based DCE application, written in the DCE
Interface Description Language (IDL). An IDL file describes a specific
interface to a server class, including all its potential RPCs and all its data
types.
In the C world, an IDL compiler ("idl") translates this file into three output
files: two files of C code implementing a set of "stubs" for marshaling and
unmarshaling parameters and return codes on both the client and server files,
and a header file specifying the function signatures for which the server
programmer must provide function implementations.
OODCE takes this a step further with its own IDL compiler, "idl++," which
produces the three files of the standard idl compiler, plus four more files to
implement the C++ interface. We will describe these files by way of example.
Our example application is a Coke-machine server that allows some clients to
purchase Cokes from the machine, and allows others to add Cokes to the machine
to keep it from running out of Coke. Listing One, an IDL file for this app,
includes three basic RPCs: CokeCount, returning the number of Cokes available;
BuyBeverages, allowing a customer to purchase some Cokes; and AddBeverages,
allowing a maintainer to restock the machine. Listing Two, CokeMachine.h, is
produced by both idl and idl++. Both programs also produce the basic "stub"
files CokeMachine_cstub.c and CokeMachine_sstub.c, which actually implement
the underlying RPCs. These files are too long (and tedious) to reproduce here.
Together they comprise about 1200 lines of code that you'll never need to see.
The following files are produced only by idl++. Listing Three, CokeMachineC.H
(and CokeMachineC.C, which we haven't shown), defines the client class
CokeMachine_1_0. As you'll see when we write some client applications, this is
all you need to use CokeMachine_1_0. For specialized semantics, such as always
finding the Coke machine closest to your office, you can subclass from
CokeMachine_1_0 and write your own constructor. 
CokeMachineS.H (Listing Four) describes two server-side classes, CokeMachine_
1_0_ABS and CokeMachine_1_0_MGR. These two classes are identical except that
CokeMachine_1_0_ABS is an abstract class that declares our RPC functions pure
virtual, while CokeMachine_1_0_MGR is a concrete class that declares them
virtual and requires you to provide the actual code to define the server
behavior. Using the MGR class is a good way to get your application off the
ground quickly, but if your object needs state or non-RPC member functions,
you will want to subclass from the abstract class and ignore the MGR class
altogether. Listing Five (CokeMachineS.C) shows how we might have implemented
the member functions of the CokeMachine_1_0_MGR class to get our server up and
running in a hurry.
CokeMachineE.C implements a wrapper in C for each RPC we specify in the idl
file. Each wrapper is responsible for dispatching the incoming RPC to the
corresponding member function of the correct server-side object.


Writing the Server


Because our server is driven entirely by incoming RPCs, the actual server code
is trivial; see Listing Six. It simply instantiates a CokeMachine_1_0_Mgr
object, registers it with theServer, exports it, and listens for incoming
requests. theServer, whose member functions are used in each of these steps,
is a global object available to any program featuring the line #include
<oodce/Server.H>. Only server-side code should use this class.
theServer is responsible for exporting OODCE objects to the RPC run time. You
can export more than one object, and these objects can even be of different
classes. For example, a single server can export three Coke machines, two
bubble-gum machines, and a beer machine. Because many states have a law
prohibiting the sale of beer to individuals under 21 years of age, we must
assign proper access control to our beer machine, but we can ignore such
restrictions for our other objects. 
theServer can also export itself to CDS as an RPC entry, as a group element,
or as a profile element. Additionally, it is responsible for selecting which
network protocols to support, initiating and halting the listening to RPCs,
and maintaining the program's login context. Keeping these functions within
the DCEServer allows us to forget about them elsewhere in our code. 


Writing the Client


Implementing the client side is even easier. There are no classes to define;
we simply make member function calls on a CokeMachine_1_0 object. The default
implementation of the client-side classes (here, CokeMachine_1_0) handles
binding to existing server-side objects upon construction. It also handles all
the RPCs implicitly, so that calls to the member function CokeMachine_1_0 on
the client side transparently result in calls to the object
CokeMachine_1_0_Mgr on the server side.
Listings Seven and Eight are two simple clients, customer and stockboy. The
only difference between the two is that customer calls BuyBeverage while
stockboy calls AddBeverage.


OODCE Benefits and Limitations 


OODCE has many virtues beyond rapid prototyping. Not only is naming a server
easy, but with a single constructor, cleaning up the CDS namespace can be made
automatic upon server shutdown. The default method for locating objects
exported by named servers is very simple: Just pass the name to your client
object's constructor. A DCETracer class has a trace level that is very useful
for tracking the activities of a server. The trace level, of course, can be
changed as the server runs. We recommend that no server be passed off to
testing or production without DCETracer.
Lastly, security has been made simple. This is one of OODCE's most-significant
contributions, but the security classes deserve an article unto themselves.
While OODCE is clearly a step toward creating a reasonable distributed
programming environment, there is still room for improvement. For instance,
there is no default behavior for pipes (a convenient way to move large amounts
of data between clients and servers in DCE), which would help speed
prototyping with them. The distribution of objects is asymmetric, unlike OSF's
Request For Comments (RFC) 48.2. In this RFC, references to client-side
objects can be passed to servers via RPC, making the client a client/server
without requiring extra code. Do not allow RPC exceptions (which are C
exceptions) to propagate beyond the scope of any automatic C++ variables,
because the C exception mechanism will not destroy these objects, producing
memory leaks. Using exceptions with multiple RPC threads is also problematic.
This is significant because a server with only one RPC thread isn't much of a
server. The OODCE notes indicate that this problem will be remedied in a
future release.
Other limitations stem from the fact that OODCE is built using C++ rather than
a more-sophisticated object-oriented language. Consider our technique for
locking mutexes in C++. Example 1(a) is a naive way to do this. The problem
with this sort of code is that if DoSomething() throws an exception, TheMutex
will never be unlocked. OODCE provides a way around this using DCEPthreadLock;
see Example 1(b). Here, both the locking and the unlocking are implicit and
are handled by the constructors and destructors of DCEPthreadLock. When
TheLock is created, TheMutex is locked; when TheLock is destroyed, TheMutex is
unlocked. Because C++ exceptions are defined to call all destructors when they
leave a given scope, our code is now exception safe.

But what if we want to lock TheMutex for only part of our function? Then we
need to explicitly create a new scope, as in Example 2. This works, but it
starts to look ugly. As our programming needs become more complicated, it's
hard not to feel that we're coding around the language, inventing new
variables and levels of scope in order to twist a few C++ rules in our favor.
C++ has tended to encourage this sort of coding along with the use of macros
to clean things up. Example 3, for instance, is not generally considered good
C++ style, although it does encapsulate the messy details of locking a mutex
by introducing a new scope. Furthermore, because of C++'s relatively
unsophisticated macro system, LOCK will break if we use it to lock multiple
mutexes, and the name __TheLock may not be used elsewhere in the scope of our
LOCK macro (variable capture). LOCK will also behave unexpectedly if your
compiler doesn't support the new scoping rule for For loops. Our macro assumes
that the scope of variables declared in the For does not extend beyond the
body of the loop. Many compilers still make these variables visible in the
surrounding scope after the For; in such compilers, we can use LOCK only once
within a given scope without causing compile-time errors.
We could change the LOCK macro to allow us to supply the name of the dummy
variable to solve both problems, but this defeats the purpose of trying to
hide such details from the application programmer. We could also write macros
to lock other numbers of mutexes (LOCK2, LOCK3, and so on) to avoid the
nesting problem, but this only works if we always lock all the mutexes for
exactly the same scope. For example, we still can't use the macros to lock one
mutex for a while, then lock another, then unlock both, because such macros
still won't nest correctly. 
Other object-oriented languages, such as Common Lisp, have embraced rather
than shunned macros, with the result that we can easily customize the language
to better suit our needs. In Common Lisp, we can easily define macros that
completely hide the details of mutex locking without introducing any of the
problems caused by the limitations of the C preprocessor. In Example 4, the
macro with-locks takes a list of mutexes and locks them in order, then
executes the statements that follow. In this example, the function
protected-task-one is only called after both mutex-one and mutex-two have been
locked; protected-task-two is called after mutex-three has been locked as
well. Whether all the statements are executed or the normal flow of control is
interrupted through an error condition or a nonlocal exit, the mutexes are
unlocked. In Common Lisp, it is impossible to tell whether with-locks is a
macro added to the language, or one that is built in (like with-open-file).
Listing Nine provides the complete code for the with-locks macro.


Is C++ Worth the Trouble?


So why use C++ at all for DCE applications? C++ has always been a sort of
compromise between the high-level expressiveness of languages like Common Lisp
and Smalltalk, and the efficiency of languages close to the iron, like C and
assembly. It's clearly easier to use a language with automatic garbage
collection (like Common Lisp and Smalltalk) than to code the memory management
by hand (as in C and C++). But programmers are willing to manage their own
memory to facilitate run-time efficiency.
In distributed computing, local run-time efficiency gains are often dwarfed by
network communications, which typically consume far more time than memory
management. Furthermore, in a DCE-based system, the RPC run time itself has
considerable overhead, including a garbage-collection system. Given all this,
why stick with C++?
In his book The C++ Programming Language, Bjarne Stroustrup gives four reasons
why C++ remained close to the original C model, even when rejecting this
heritage might have improved the language. The reasons are essentially
cultural: the millions of lines of C code and C-based libraries (he counts
this as two reasons), the hundreds of thousands of C programmers, and the need
for C and C++ to be used side-by-side. 
All four reasons apply to DCE applications at times, but other times it may
make sense to break free of the C/C++ model entirely and use a more abstract
and more expressive language. OODCE provides an excellent set of tools for
making distributed computing about as easy as it could be in C++.


OODCE and DCE 1.2


The next major DCE release is billed as having built-in C++ support. It should
be available in 1996 (although it took some vendors quite a while to move from
DCE 1.0.2 to 1.1, so don't hold your breath). Although 1.2's C++ support
shares the goal of facilitating DCE application development in C++, it is not
compatible with OODCE.
The C++ component of DCE 1.2 is described in the Open Software Foundation's
DCE RFC 48.2, prepared by DEC. Instead of providing a complete C++ class
library for DCE programming, RFC 48.2 describes a set of extensions to idl
itself for providing distributable objects. (Remember that OODCE goes out of
its way not to change the input language of idl.) In doing so, it establishes
the "hooks" that a class library (like OODCE) uses to provide complete C++
support for DCE. One very interesting feature of RFC 48.2, which is missing
from OODCE, is object-location transparency, which allows use of objects
without knowing whether they are local or only proxies representing nonlocal
objects. While OODCE may be rewritten at some point to take advantage of this
and other features, we know of no plans to do so.


Availability


As of this writing, OODCE is only available on HP-UX series machines (HP 9000)
running HP's DCE/9000. There appears to be some interest in supporting OODCE
on other platforms, but no specific ports have been announced. The Open
Software Foundation is currently discussing with HP the possibility of making
OODCE part of the OSF DCE source offering, which would help spread it
throughout the DCE community.
Example 1: (a) The naive way to lock a mutex; (b) the preferred way in OODCE.
(a)
void Function(){ TheMutex.Lock(); DoSomething(); TheMutex.UnLock();}

(b)
void Function{ DCEPthreadLock TheLock ( TheMutex ); DoSomething();}
Example 2: Locking a mutex for only part of a code block in OODCE.
void Function
{
 DoSomeInitialWork();
 {
 DCEMutexLock TheLock
 ( TheMutex );
 DoSomething();
 }
 DoSomethingElse();
}
Example 3: Hiding the details of mutex locking using macros.
#define LOCK( m ) for(DCEMutexLock
 __TheLock( m );;)
void Function()
{ 
 DoSomeInitialWork();
 LOCK( TheMutex ) {
 DoSomething();
 }
 DoSomethingElse();
}
Example 4: Hiding the details of mutex locking in Common Lisp.
(defun function()
 (initial-task)
 (with-locks (mutex-one mutex-two)
 (protected-task-one)

 (with-locks (mutex-three)
 (protected-task-two))
 (unprotected-task))

Listing One
/* Copyright (c) 1995 Avatar Software, Inc. */
[uuid( 75e76bbc-bc9d-11ce-902c-0800096d6656 ),
 version(1.0)]
interface CokeMachine
{
typedef long int CokeMachineStatus;
typedef long int BeverageType;
 unsigned long GetStatus( [in] handle_t h );
 unsigned long BuyBeverages(
 [in] handle_t h,
 [in] long NumBeverages
 );
 unsigned long AddBeverages(
 [in] handle_t h,
 [in] long NumBeverages
 );
}

Listing Two
/* Generated by HP OODCE IDL++ compiler version 1.0 */
#ifndef CokeMachine_v1_0_included
#define CokeMachine_v1_0_included
#ifndef IDLBASE_H
#include <dce/idlbase.h>
#endif
#include <dce/rpc.h>
#ifdef __cplusplus
 extern "C" {
#endif
#ifndef nbase_v0_0_included
#include <dce/nbase.h>
#endif
extern idl_ulong_int CokeCount(
#ifdef IDL_PROTOTYPES
 /* [in] */ handle_t h
#endif
);
extern idl_ulong_int BuyBeverages(
#ifdef IDL_PROTOTYPES
 /* [in] */ handle_t h,
 /* [in] */ idl_ulong_int NumBeverages
#endif
);
extern idl_ulong_int AddBeverages(
#ifdef IDL_PROTOTYPES
 /* [in] */ handle_t h,
 /* [in] */ idl_ulong_int NumBeverages
#endif
);
typedef struct CokeMachine_v1_0_epv_t {
idl_ulong_int (*CokeCount)(
#ifdef IDL_PROTOTYPES
 /* [in] */ handle_t h
#endif

);
idl_ulong_int (*BuyBeverages)(
#ifdef IDL_PROTOTYPES
 /* [in] */ handle_t h,
 /* [in] */ idl_ulong_int NumBeverages
#endif
);
idl_ulong_int (*AddBeverages)(
#ifdef IDL_PROTOTYPES
 /* [in] */ handle_t h,
 /* [in] */ idl_ulong_int NumBeverages
#endif
);
} CokeMachine_v1_0_epv_t;
typedef struct CokeMachine_v1_0_m_epv_t {
idl_ulong_int (*CokeCount)(
#ifdef IDL_PROTOTYPES
 /* [in] */ handle_t h,
 /* [out] */ error_status_t *_dcecxxsts
#endif
);
idl_ulong_int (*BuyBeverages)(
#ifdef IDL_PROTOTYPES
 /* [in] */ handle_t h,
 /* [in] */ idl_ulong_int NumBeverages,
 /* [out] */ error_status_t *_dcecxxsts
#endif
);
idl_ulong_int (*AddBeverages)(
#ifdef IDL_PROTOTYPES
 /* [in] */ handle_t h,
 /* [in] */ idl_ulong_int NumBeverages,
 /* [out] */ error_status_t *_dcecxxsts
#endif
);
} CokeMachine_v1_0_m_epv_t;
extern CokeMachine_v1_0_epv_t CokeMachine_v1_0_c_epv;
extern rpc_if_handle_t CokeMachine_v1_0_c_ifspec;
extern rpc_if_handle_t CokeMachine_v1_0_s_ifspec;
#ifdef __cplusplus
 }
#endif
#endif

Listing Three
/* Generated by HP OODCE IDL++ compiler version 1.0 */
#ifndef __CokeMachine_1_0_Class_Included______LINEEND____
#define __CokeMachine_1_0_Class_Included______LINEEND____
#include <oodce/Interface.H>
#include <CokeMachine.h>
extern rpc_if_handle_t CokeMachine_v1_0_c_ifspec;
class CokeMachine_1_0: public DCEInterface {
public:
 // Define Class Constructors
 CokeMachine_1_0(DCEUuid& to = NullUuid):
 DCEInterface(CokeMachine_v1_0_c_ifspec, to) {}
 CokeMachine_1_0(rpc_binding_handle_t bh, DCEUuid& to = NullUuid):
 DCEInterface(CokeMachine_v1_0_c_ifspec, bh, to) {}
 CokeMachine_1_0(rpc_binding_vector_t* bvec, DCEUuid& to = NullUuid):

 DCEInterface(CokeMachine_v1_0_c_ifspec, bvec, to) {}
 CokeMachine_1_0(DCENsiObject* nsi_obj, DCEUuid& to = NullUuid):
 DCEInterface(CokeMachine_v1_0_c_ifspec, nsi_obj, to) {}
 CokeMachine_1_0(unsigned char* name,
 unsigned32 syntax = rpc_c_ns_syntax_default,
 DCEUuid& to = NullUuid):
 DCEInterface(CokeMachine_v1_0_c_ifspec, name, syntax, to) {}
 CokeMachine_1_0(unsigned char* netaddr,
 unsigned char* protseq, DCEUuid& to = NullUuid):
 DCEInterface(CokeMachine_v1_0_c_ifspec, netaddr, protseq, to) {}
 CokeMachine_1_0(DCEObjRefT* ref):
 DCEInterface(CokeMachine_v1_0_c_ifspec, ref) {}
 // Member functions for client
 idl_ulong_int CokeCount(
 );
 idl_ulong_int BuyBeverages(
 /* [in] */ idl_ulong_int NumBeverages
 );
 idl_ulong_int AddBeverages(
 /* [in] */ idl_ulong_int NumBeverages
 );
};
#endif

Listing Four
 /* Generated by HP OODCE IDL++ compiler version 1.0 */
#ifndef __CokeMachine_1_0_Mgr_Class_Included______LINEEND____
#define __CokeMachine_1_0_Mgr_Class_Included______LINEEND____
#include <oodce/InterfaceMgr.H>
#include <oodce/DCEObj.H>
#include <CokeMachine.h>
extern CokeMachine_v1_0_m_epv_t CokeMachine_v1_0_mgr;
extern rpc_if_handle_t CokeMachine_v1_0_s_ifspec;
class CokeMachine_1_0_ABS : public virtual DCEObj, public DCEInterfaceMgr {
private:
 CokeMachine_1_0_ABS(DCEObj& obj, uuid_t* type):
 DCEObj(obj.GetId()),
 DCEInterfaceMgr(CokeMachine_v1_0_s_ifspec, obj, type,
 (rpc_mgr_epv_t)(&CokeMachine_v1_0_mgr)) {}
public:
 // Declare Class Constructors
 CokeMachine_1_0_ABS(uuid_t* obj, uuid_t* type):
 DCEObj(obj),
 DCEInterfaceMgr(CokeMachine_v1_0_s_ifspec, (DCEObj&)*this, type,
 (rpc_mgr_epv_t)(&CokeMachine_v1_0_mgr)) {}
 CokeMachine_1_0_ABS(uuid_t* type):
 DCEObj((uuid_t*)(0)),
 DCEInterfaceMgr(CokeMachine_v1_0_s_ifspec, (DCEObj&)*this, type,
 (rpc_mgr_epv_t)(&CokeMachine_v1_0_mgr)) {}
 // Declare Class pure virtual member functions
 // These correspond to the remote procedures
 // declared in CokeMachine.idl
 // These need to be implemented by the developer
 virtual idl_ulong_int CokeCount(
 ) = 0;
 virtual idl_ulong_int BuyBeverages(
 /* [in] */ idl_ulong_int NumBeverages
 ) = 0;
 virtual idl_ulong_int AddBeverages(

 /* [in] */ idl_ulong_int NumBeverages
 ) = 0;
};
class CokeMachine_1_0_Mgr : public CokeMachine_1_0_ABS {
public:
 // Declare Class Constructors
 CokeMachine_1_0_Mgr(uuid_t* obj):
 DCEObj(obj),
 CokeMachine_1_0_ABS(obj, (uuid_t*)(0)) {}
 CokeMachine_1_0_Mgr():
 DCEObj((uuid_t*)(0)),
 CokeMachine_1_0_ABS((uuid_t*)(0)) {}
 // Declare Class member functions. These correspond to the remote 
 // procedures declared in CokeMachine.idl. These need to be 
 // implemented by the developer
 virtual idl_ulong_int CokeCount(
 );
 virtual idl_ulong_int BuyBeverages(
 /* [in] */ idl_ulong_int NumBeverages
 );
 virtual idl_ulong_int AddBeverages(
 /* [in] */ idl_ulong_int NumBeverages
 );
};
#endif

Listing Five
// Copyright (c) 1995 Avatar Software, Inc.
#include <iostream.h>
#include "CokeMachineS.H"
unsigned long CokeMachine_1_0_Mgr::CokeCount( )
{
 // lock access to this coke machine
 DCEPthreadLock ContentLock( &ContentMutex );
 return NumItems;
}
unsigned long CokeMachine_1_0_Mgr::BuyBeverages( idl_ulong_int NumBeverages )
{
 // lock access to this coke machine
 DCEPthreadLock ContentLock( &ContentMutex );
 // Buy as many beverages as possible up to NumBeverages
 // and set NumBeverages to 
 unsigned long ItemsBought = NumBeverages;
 if( ItemsBought > NumItems )
 ItemsBought = NumItems;
 NumItems -= ItemsBought;
 return ItemsBought;
}
unsigned long CokeMachine_1_0_Mgr::AddBeverages( idl_ulong_int NumBeverages )
{
 // lock access to RepContents
 DCEPthreadLock ContentLock( &ContentMutex );
 NumItems += NumBeverages;
}

Listing Six
// Copyright (c) 1995 Avatar Software, Inc.
#include <iostream.h>
#include "CokeMachineS.H"

#include <oodce/Server.H>
void main( int argc, char *argv[] )
{
 try {
 DCEPthread* TheCleanupThread = new DCEPthread(DCEServer::ServerCleanup,
 (void *)(0));
 CokeMachine_1_0_Mgr TheCokeMachine;
 // Register CokeMachine object with the server object
 theServer->RegisterObject(TheCokeMachine, true);
 
 cerr << "Listening..." << endl;
 theServer->Listen(); 
 }
 catch (DCEErr& exc) {
 cerr << "Caught DCE Exception\n" << (const char*)exc;
 }
 exit(0);
}

Listing Seven
// Copyright (c) 1995 Avatar Software, Inc.
#include <stdlib.h>
#include <iostream.h>
#include "CokeMachineC.H"
#include <oodce/Exceptions.H>
int main( int argc, char * argv[] )
{
 try {
 CokeMachine_1_0 TheCokeMachine((unsigned char*) argv[1],
 (unsigned char *) "ip");
 TheCokeMachine.BuyBeveragres(atoi(argv[1]));
 }
 catch (DCEErr& exc) {
 cerr << "Caught DCE Exception: " << (const char *) exc << endl;
 exit(1);
 }
 exit(0);
}

Listing Eight
// Copyright (c) 1995 Avatar Software, Inc.
#include <stdlib.h>
#include <iostream.h>
#include "CokeMachineC.H"
#include <oodce/Exceptions.H>
int main( int argc, char * argv[] )
{
 try {
 CokeMachine_1_0 TheCokeMachine((unsigned char*) argv[1],
 (unsigned char *) "ip");
 TheCokeMachine.AddBeveragres(atoi(argv[1]));
 }
 catch (DCEErr& exc) {
 cerr << "Caught DCE Exception: " << (const char *) exc << endl;
 exit(1);
 }
 exit(0);
}


Listing Nine
;;; with-locks.lisp
;;; Copyright (c) 1995 Avatar Software, Inc.
;; The WITH-LOCKS macros takes pains to do two things. First, it
;; ensures that the macro is completely expanded at compile time, to
;; speed up code. Second, it ensures that only mutexes which have
;; been locked (through a successful call to LOCK) get unlocked (by
;; calling UNLOCK). In other words, if (LOCK THE-MUTEX) itself causes
;; a non-local exit (due to an error, for example), then we do not
;; want (UNLOCK THE-MUTEX) to be called. This is accomplished through
;; a fairly thick wrapping of UNWIND-PROTECTs and PROGNs.
(defmacro with-locks (mutex-list &body body)
 "Evaluate the forms in BODY with the mutexes in MUTEX-LIST locked."
 (labels ((expand-with-locks (mutex-list body)
 (if (null mutex-list)
 `(progn ,@body)
 `(progn (lock ,(car mutex-list))
 (unwind-protect
 ,(expand-with-locks (cdr mutex-list) body))
 (unlock ,(car mutex-list))))))
 (expand-with-locks mutex-list body)))
DDJ









































A Tiny Preemptive Multitasking Forth


Better tools for embedded applications




Andy Yuen


Andy, who holds a master's degree in engineering from Carleton University in
Ottawa, Canada, currently works in Sydney, Australia. You can reach him at
andy_yuen@sydney.sterling.com.


Although Forth is a powerful, general-purpose programming language, it is
ideally suited for developing embedded systems: It is extensible; produces
fast, compact code; and provides an interactive development environment. It
is, however, sometimes convenient to organize an embedded application as a set
of cooperating tasks synchronized using semaphores to simplify logic. It also
is imperative that you be able to write interrupt-service routines in Forth
itself, instead of assembler (a more-common practice).
While many implementations of Forth support multitasking, they usually only
support cooperative multitasking, where one task has exclusive use of the CPU
until it gives up the CPU voluntarily. There are very few task-synchronization
mechanisms. There are add-on Forth words that allow writing of
interrupt-service routines in Forth, but most of these general-purpose Forth
implementations are too bloated to be usable in embedded systems.
Instead of starting from scratch to write my own Forth with preemptive
multitasking and interrupt-service routines, I used eForth for the 8086 as a
base. eForth, developed by Bill Muench and C.H. Ting, provides a simple model
that can be ported easily to many 8-, 16-, or 32-bit CPUs. It features a
small, ROMable, machine-dependent kernel, and portable high-level code.
Currently there are implementations for 8051, 6811, 80x86 (16 and 32 bit), and
the like. eForth is widely available at ftp sites, including
taggeta.oc.nps.navy.mil and asterix.inescn.pt.
In this article, I'll describe how you can provide support for preemptive
multitasking, semaphores (for task synchronization), and Forth
interrupt-service routines in the 16-bit 8086 eForth--with about 1K of
additional code. The same features can easily be ported to other
processor/microcontroller-based systems, provided that there is sufficient RAM
for separate stack and user-variable areas for each task.


Multitasking Services


My design criteria were that the multitasking services be simple to implement,
fast in task switching, and small. Synchronization mechanisms were needed to
coordinate task interaction. All multitasking services needed to be atomic;
that is, noninterruptable so that the integrity of the multitasking kernel's
internal data structures could be guaranteed.
The multitasking services are summarized in Table 1. Figure 1 is the
state-transition diagram of the multitasking kernel.
Like BIOS and DOS functions, the multitasking services are invoked using a
software interrupt. All registers are saved (with the exception of IPREEMPT)
when a multitasking service is invoked, and restored on exit, so the task's
context can be preserved between task switches. 
When a task is created, it is put in the ready-to-run queue. The method of
choosing a task to run is governed by the scheduling algorithm. There can only
be one task running at a time. When a PREEMPT or IPREEMPT is executed, the
running task's context (its register contents) is saved on the stack, the task
is put back in the ready-to-run queue, and another task is run. When a task
executes a WAIT on a semaphore, it either continues execution if the semaphore
count is nonzero, or blocks otherwise. In the latter case, the task is moved
to the semaphore queue and another task from the ready-to-run queue is run.
The blocked task will remain blocked until the semaphore it is waiting on is
SIGNALed by another task. When that happens, the blocked task will be moved
back to the ready-to-run queue to await execution.
There is always a ready-to-run task in the system. This task is the IDLE task,
which is created by the multitasking kernel when GO is called. It simply calls
PREEMPT continuously to switch to another user task. The IDLE task can be
thought of as a special, low-priority task in the system that only executes
when there is no other ready-to-run task. In other words, if a user task is
running and a PREEMPT or IPREEMPT is executed (by itself or by a timer
interrupt-service routine) and the IDLE task is the only task in the
ready-to-run queue, no task switch will result, and the running task continues
execution.
To save memory, I have not implemented queues based on linked lists. Instead,
I use a 16-bit word (not to be confused with a Forth word) to represent the
ready-to-run queue. Each bit of the word represents a task. For example, if
the queue has a value of 8005h, there are three tasks in the queue--task 0,
task 2, and task 15 (the IDLE task). The same approach is used for a semaphore
queue except that bit 15 (the most-significant bit) is used to indicate
whether it is a count or a queue. When bit 15 is clear, the word represents a
semaphore count. When bit 15 is set, it indicates that it is a queue and it
has the same meaning as the ready-to-run queue. By using bit 15 as a queue
indicator you limit yourself to a maximum of 15 user tasks that can wait on a
semaphore. However, we assigned the system IDLE task as task 15 (the 16th
task), which will never execute a WAIT anyway. By using this simple queue
representation, you save the trouble of implementing linked lists, thus
conserving memory and simplifying our implementation. The whole multitasking
kernel--together with 60 semaphores--occupies only about 700 bytes of memory.
You can implement different scheduling algorithms by scanning the set bits in
the queue. For example, if you always start scanning for set bits (which
represent tasks) from bit position 0 (least significant bit), you have a
priority scheduler where the lower the task number (between 0 and 14 if the
system IDLE task is not counted), the higher its priority. To implement a
round-robin-like algorithm, save the bit position where you found a task and
start scanning for set bits in the queue one position higher the next time
around. For example, if you located a ready-to-run task, say task 1, the next
time you will start scanning from bit 2, and so on. A true round-robin
algorithm is not possible since you don't know in what order the tasks were
placed in the queue. The multitasking kernel supports both algorithms. The
algorithm to use is specified during the initialization of the multitasking
kernel when INIT is called.
DOS functions are non-reentrant--they cannot be called by different tasks at
the same time. This is likely to happen if we use an interrupt service routine
to invoke IPREEMPT to achieve preemptive multitasking. The workaround is to
avoid task switching when DOS is in a critical section. You can use the
undocumented DOS function 34h to obtain the critical-section pointer (returned
in ES:BX) for testing. If the byte pointed to by the critical section pointer
is nonzero, DOS is inside a critical section.
Since you may not be using DOS in the final product (to avoid licensing and/or
royalty payments, for instance), conditional assembly instructions are used in
the multitasking kernel source file KERNEL.ASM (available electronically; see
"Availability," page 3) to control the inclusion of the DOS critical-section
test. If the symbol MS-DOS is defined, a DOS critical-section test is
performed by IPREEMPT to check if it is safe to do a task switch. No task
switch is performed if DOS is inside a critical section. If MS-DOS is not
defined, no DOS critical-section test is performed.
eForth includes the multitasking kernel source file KERNEL.ASM in the assembly
process by using MASM's INCLUDE directive.


eForth Integration


Before describing how to integrate the multitasking kernel, I'll first provide
a quick backgrounder on eForth. I chose eForth as my base Forth system not
just because it is small, portable, ROMable and reasonably powerful, but also
because its source-code comments actually help me understand how eForth works.
(I wish I could say the same for some of the Forth implementation source code
I've seen.)
8086 eForth uses the small memory model: It uses the same segment for CS, SS,
ES, and DS, and can therefore only address 64K of memory. It uses SP as the
user stack pointer, BP as the return stack pointer, and SI as the Forth
instruction pointer.
eForth consists of 31 low-level hardware-dependent words (implemented in
assembler) and 168 high-level words (implemented in Forth). It is the
isolation of the hardware-dependent words from the rest of the system that
makes eForth easily portable to other processors.
Figure 2 shows the eForth memory organization. It has separate name and code
dictionaries. The name dictionary grows downward (toward low memory) and the
code dictionary grows upward. Immediately above the name dictionary lies the
user area where user variables are kept. Above that are the user stack,
terminal input buffer, and return stack. High memory is arbitrarily set at
4000h: eForth only uses 16K of memory. (The user is free to redefine the
constant EM, however, so that eForth uses up to 64K.)
In a multitasking environment, each task must have its own user and stack
areas. Consequently, RAM usage increases with the number of tasks in the
system.
As Table 2(a) shows, a number of new Forth words have been defined to
interface to the multitasking kernel. With the exception of THREAD, all
multitasking words call the multitasking kernel services directly via a
software interrupt.
The Forth word THREAD allows the user to execute a defined Forth word as an
independent task. For example, 1 THREAD TASK1 creates task 1 and uses the
user-defined word TASK1 as the body of the task. It is the user's
responsibility to see to it that TASK1 does not terminate--that is, the logic
of TASK1 should be enclosed within BEGIN and AGAIN. Example 1 is the Forth
definition for THREAD. In the source code, this is defined by the macro $COLON
and the assembler directive DW, as in Example 2.
THREAD places the code address of the word TASK1 on the user stack and
allocates memory for the task's user and stack areas. It then invokes DECLARE
to create the task. The memory is allocated from the code area of the most
recently created word. The user should create all tasks at once (see Example
3), and allocate all memory for them within the entry for TASKAREA.
The newly created task is put in the ready-to-run queue to await execution. A
task switch occurs either when the running task executes a PREEMPT or when a
timer interrupt-service routine performs an IPREEMPT. For the latter case to
happen, the user has to write the interrupt-service routine and hook it into
the timer interrupt.
Interrupt service routines are often written in assembler. However, it is much
easier to develop in high-level Forth than in low-level assembler. Also,
eForth does not have a built-in assembler. My advice is: Write them in Forth;
if they're not fast enough, use assembler. However, a well-designed
application should always keep processing to the absolute minimum within an
interrupt-service routine. Now that we have multitasking and semaphores at our
disposal, an interrupt service routine (ISR) can just do minimal,
time-critical processing and SIGNAL a task to continue with more
time-consuming work.
I have provided the word pair ISR:/ISR; for defining an interrupt-service
routine. The user should define an interrupt-service routine with a sequence
such as Example 4, which installs the word SERVICE as an ISR at interrupt 1Ch.
The ISR:/ISR; word pair is tightly coupled to the multitasking kernel. So if
the multitasking kernel is changed to save the CPU registers in a different
sequence, ISR:/ISR; must be changed too. The best way to understand how ISR:
and ISR; work is to compare the pair with the standard :/; word definition
pair.
: and ISR: work in similar fashion. They first parse the input stream for the
name to later be put into the dictionary entry. : then compiles an 8086 call
instruction to the doLIST routine (which executes a list of compiled words)
and puts Forth in the compile mode. ISR:, however, compiles a call instruction
to PUSHALL, allocates memory for the Forth interrupt-service routine's user,
and return stacks and compiles a Forth branch instruction to skip over the
allocated memory area before putting Forth in the compile mode.
When terminating a : definition, ; compiles the Forth word EXIT into the code
area, enters the interpret mode and links the new word to the dictionary. ISR;
does the same, but compiles POPALL instead of EXIT into the code area. A
compiled Forth interrupt-service routine is depicted in Figure 3.
PUSHALL and POPALL are words written in assembler that manipulate the stack.
PUSHALL sets up the Forth environment to execute compiled Forth words in the
interrupt-service routine and POPALL cleans up and returns control to the
interrupted task (see Figure 4).
When an interrupt occurs, control is transferred to the interrupt-service
routine, which makes an 8086 subroutine call to PUSHALL. Consequently, on
entry to PUSHALL, the stack content is like that in Figure 4 (a). PUSHALL
retrieves the return address (which is actually the Forth instruction pointer
indicating the first Forth word in the interrupt-service routine) from the
stack and saves all CPU registers in the same sequence that the multitasking
kernel saves them. This is necessary in case the interrupt-service routine
invokes IPREEMPT to switch to another task. PUSHALL then sets up the Forth
environment by switching to the Forth user and returns stacks allocated when
the Forth interrupt-service routine was defined. Once it switches to the user
stack, it pushes the old stack pointer (segment and offset) and the previously
saved call return address (Forth IP) onto the user stack and jumps to doLIST.
The stack content immediately before jumping to doLIST is depicted in Figure 4
(b). doLIST actually uses the return address as the Forth instruction pointer
to locate the list of compiled Forth words (body of the Forth
interrupt-service routine) to execute. The first Forth word it executes is
branch, which skips over the stack area and proceeds to execute the rest of
the Forth interrupt service routine.
The last Forth word executed by the interrupt-service routine is usually
POPALL. POPALL sends an EOI (end of interrupt instruction) to the interrupt
controller to reenable it, switches back to the original stack (using the
stack segment/offset saved in the user stack), restores all saved registers
(the interrupted task's context), and executes an 8086 IRET instruction to
return from interrupt.
If the interrupt-service routine definition ends with an IPREEMPT, as in ISR:
SERVICE ... IPREEMPT ISR;, IPREEMPT invokes the multitasking service IPREEMPT
to switch to another task. IPREEMPT, like POPALL, sends an EOI, switches the
stack, and switches to another task. In this case, POPALL is not executed.
The Forth words for interrupt-service support are documented in Table 2(b).

The startup sequence for Forth involves the following steps:
1. Set up the segment registers. 
2. Initialize SP.
3. Install the multitasking service software-interrupt handler (software
interrupt 79h has been chosen arbitrarily).
4. Initialize the multitasking kernel and specify the scheduling algorithm.
5. Install the Forth interpreter as task 14. 
6. Start the multitasking kernel (which creates the system IDLE task). 
The user then can use Forth to create new tasks and write interrupt-service
routines in Forth. I made slight changes to the standard Forth words KEY and
UP.
KEY waits for a character from the input device. I introduced a PREEMPT in the
loop so that it works better with the other tasks; see Example 5.
UP returns the pointer to the user area. Since we now have one user area per
task, the old UP code no longer works. The solution is to define an array CUPP
to store all the user-area pointers and modify UP so that it uses the
running-task number to index into the array to retrieve the correct UP
pointer. Example 6 provides the definitions of CUPP and UP.
All that was missing from our multitasking Forth was the ability to access
8086 input/output ports: eForth does not have I/O port access words. The only
reason that I can think of for this omission is that eForth is meant to be
portable, and many non-Intel processors/microcontrollers, most notably
Motorola's, use memory-mapped I/O instead of separate I/O instructions. 
To complete the enhancements to eForth, I created words for accessing both
8-bit and 16-bit I/O ports and for accessing memory in any segment:offset.
These words are documented in Table 2(c).


An Example Application


Listing One illustrates a possible organization of an embedded system. It
consists of four user tasks and one Forth interrupt-service routine.
The interrupt-service routine (ISR) is hooked into interrupt 1Ch (the timer
tick, which occurs 18.2 times a second). It SIGNALs task 0 and 1 (semaphore 0
and 1, respectively) and performs a task switch.
Task 0 WAITs on the periodic SIGNAL from the ISR and uses the SIGNAL as a
timebase for playing a simple tune using the PC's speaker. It requires the
newly added I/O port-accessing words to control the PC's 8253/8254 timer and
8255 I/O port to generate the musical notes. It then goes back to wait for
another SIGNAL (on semaphore 0).
Task 1 WAITS on the periodic SIGNAL from the ISR, uses the SIGNAL as a
timebase, displays the current time at the top right-hand corner of the screen
using the newly added inter-segment memory access words to update the screen
display buffer directly, and goes back to wait for another SIGNAL (on
semaphore 1). (Note that mutually exclusive access to the variables HOUR,
MINUTE, and SECOND is controlled by a semaphore. The user can use the word
SETTOD to set the current time from task 14, the Forth interpreter, without
causing side-effects.)
Task 3 is a runaway task that does nothing but loop. In a nonmultitasking
environment, this will lock up the processor. But in our system, the processor
is only slowed down a bit.
Task 14 is the Forth development environment that is created automatically
when you start Forth. It allows you to proceed with new development while your
other tasks are running.
The word BYE has been redefined to remove the timer interrupt and silence the
speaker before exiting Forth; otherwise your system would crash.
You could have used the ISR to perform all the work done by tasks 0 and 1 in
the aforementioned example. But when writing more-complicated applications,
the ISR should do the minimal amount of processing and delegate the
time-consuming processing to other tasks. Also, each task should do only a
simple but well-defined job to simplify the logic. Whatever you do, keep it
simple. But remember, only the Forth interpreter (that is, task 14) can start
a compile, wait for user input, and allocate memory. No user tasks can use
Forth words that do any of these things either directly or indirectly. Also,
an interrupt-service routine should only invoke the multitasking services:
SIGNAL, IPREEMPT, and ME.


Conclusion


With the added support for multitasking, semaphores, and Forth
interrupt-service routines, there are now more tools to better organize an
embedded application.
The multitasking Forth presented here can be used as an embedded-system
development environment. However, to put it in ROM for use in the field may
require a bit more work. You may want to exclude DOS (and the associated
royalty payments) from the final product. Since only a few DOS functions are
used, you may consider using the BIOS or your own routines instead.
You also might want to write your own simple I/O routines since neither DOS
nor BIOS functions are reentrant. Switching tasks while a task is inside such
a routine usually crashes the system.
Also, you can change the EXPECT, TAP, ECHO, and PROMPT vectors to point to
your own words that implement an interrupt-driven serial link for
communication with the outside world. This should be relatively simple to do
now that you have multitasking and Forth interrupt-service routines.
Finally, you can redefine the Forth memory map to suit your hardware's ROM/RAM
memory organization. You may want to get an eForth implementation on other
processor microcontrollers from one of the Forth ftp sites for reference.
Figure 1: Multitasking kernel state-transition diagram.
Figure 2: The eForth memory map.
Figure 3: Compiled Forth ISR definition.
Figure 4: (a) Stack content on entry to PUSHALL; (b) stack content immediately
before jumping to doLIST.
Table 1: Multitasking kernel services.
Service Description Input Output
INIT Initializes multitasker and AH = 0 none
 specifies scheduling algorithm AL = 0 priority
 AL = 1 rotate
DECLARE Creates a new task AH = 1 none
 AL = task # 
 DI = initial IP
 CX = initial SP
 DX = initial BP
GO Creates IDLE task and starts AH = 2 none
 all declared tasks
SIGNAL Signals a semaphore AH = 3 none
 AL = semaphore #
WAIT Waits on a semaphore AH = 4 none
 AL = semaphore #
PREEMPT Switches to another ready task AH = 5 none
 (called from a task)
IPREEMPT Switches to another ready task AH = 6 none
 (called from an ISR)

ME Retrieves task number of AH = 7 AL = task #
 running task (0 to 14 inclusive)
Table 2: (a) Forth multitasking words; (b) Forth interrupt-service words;(c)
input/output port and intersegment memory access words.
Word Description Code

(a)
DECLARE Creates task #u with assembler
( u ca usp rsp -- ) IP = ca
 SP = usp
 BP = rsp
 (internal use only,
 use THREAD instead)
SIGNAL Signals semaphore #u assembler
( u -- )
WAIT Waits on semaphore #u assembler
( u -- )
PREEMPT Switches to another task assembler
( -- ) (called from a task)
IPREEMPT Switches to another task assembler
( -- ) (called from an ISR)
ME Retrieves the running assembler
 task's task #
( -- u )
THREAD Creates task #u using the Forth
( u -- <name> ) defined Forth word <name> 
 as the body of the task

(b)
INT-ON Enables interrupt assembler
( -- )
INT-OFF Disables interrupt assembler
( -- )
INT-SET Installs interrupt handler assembler
( ca u -- ) (address ca) for interrupt 
 #u (for internal use only,
 use INT-INSTALL instead)
INT-REMOVE Removes interrupt handler assembler
( u -- ) for interrupt #u and
 installs a do-nothing handler
PUSHALL Sets up Forth interrupt assembler
( ca -- ) support (for internal use only)
POPALL Cleans up after Forth ISR assembler
( -- ) (for internal use only)
ISR: Starts a Forth ISR definition Forth
( -- ; < string> )
ISR; Terminates a Forth ISR Forth
 definition
( -- )
INT-INSTALL Installs the defined Forth Forth
( u -- <name> ) ISR <name> for interrupt #u

(c)
P@ Inputs value from 16-bit assembler
 port #u
( u -- n )
P! Outputs value n to 16-bit assembler
 port #u
( n u -- )
PC@ Inputs value from 8-bit assembler

 port #u
( u -- c )
PC! Outputs value c to 8-bit assembler
 port #u
( c u -- )
M@ Gets 16-bit value from assembler
 memory at seg:off
( seg off -- n )
M! Stores 16-bit value n to assembler
 memory at seg:off
( n seg off -- )
MC@ Gets 8-bit value from assembler
 memory at seg:off
(seg off -- c )
MC! Stores 8-bit value c to assembler
 memory at seg:off
( c seg off -- )
Example 1: Forth definition for THREAD.
: THREAD ( u -- <name )
 ' \ GET ADDRESS OF WORD
 UP @ HERE 128 CMOVE \ COPY USER AREA
 OVER CELLS CUPP \ SAVE USER AREA POINTER
 + HERE SWAP !
 HERE DUP
 256 ALLOT \ ALLOCATE USER STACK
 HERE CELLM
 DUP ROT 256
 + ! SWAP \ SAVE SP0
 256 ALLOT \ ALLOCATE RETURN STACK
 HERE CELLM
 DUP ROT 384
 + ! \ SAVE RP0
 DECLARE ;
Example 2: Defining by using the macro $COLON and the assembler directive DW.
$COLON 6, 'THREAD', THREAD
DW TICK ;get address of word
DW UP,AT,HERE,DOLIT,US,CMOVE ;copy user area
DW OVER,CELLS,CUPP ;save UP pointer
DW PLUS,HERE,SWAP,STORE
DW HERE,DUPP ;save user area ptr
DW DOLIT,US+SPS,ALLOT ;allocate UPP
DW HERE,CELLM
DW DUPP,ROT,DOLIT,SPPPOS
DW PLUS,STORE,SWAP ;save SP0
DW DOLIT,RTS,ALLOT ;allocate RPP
DW HERE,CELLM
DW DUPP,ROT,DOLIT,RPPPOS
DW PLUS,STORE ;save RP0
DW K_DECLARE,EXIT
Example 3: Creating all tasks at once.
CREATE TASKAREA
0 THREAD TASK0
1 THREAD TASK1
 .
 .
 .
Example 4: Using the word pair ISR: and ISR; to define an interrupt-service
routine.
ISR: SERVICE ... interrupt
 logic... ISR;

HEX
1C INT-INSTALL SERVICE
DECIMAL
Example 5: KEY waits for a character from the input device.
 $COLON 3,'KEY',KEY
KEY1: DW K_PREEMPT ;preempt
 DW QKEY ;get key
 DW QBRAN,KEY1 ;repeat if no key pressed
 DW EXIT
Example 6: Definitions of CUPP and UP.
$COLON 4,'CUPP',CUPP
DW DOVAR ;create array
DW MAXTHREAD DUP(UPP) ;to hold all UPs
$COLON 2,'UP',UP 
DW K_ME,CELLS,CUPP,PLUS,EXIT ;retrieve UP

Listing One
\ ***********************************************************
\ Example program to illustrate the use of multi-tasking and
\ Forth interrupt service routines
\ by Andy Yuen 1995 (C)
\ ***********************************************************
\ eForth has no CONSTANT, define one
\ (I have added DOES> to eForth)
: CONSTANT CREATE , DOES> @ ;
\ musical note generation frequency divider
6087 CONSTANT <G
5423 CONSTANT <A
4560 CONSTANT C
4064 CONSTANT D
3630 CONSTANT E
3044 CONSTANT G
\ hardware port and control definitions
HEX 
61 CONSTANT CONTROL-PORT \ speaker control port
43 CONSTANT TIMER-PORT \ timer mode control port
42 CONSTANT COUNT-PORT \ timer count register
0B6 CONSTANT SETUP \ counter 2 square wave generator mode
0FE CONSTANT MASK \ timer-driven speaker disable control mask
1C CONSTANT TIMER-INT \ PC clock tick interrupt
0B800 CONSTANT SCR-SEG
DECIMAL
\ define tune in musical note/duration pair
CREATE TUNE
C , 6 , E , 6 , G , 3 , G , 3 , G , 4 , E , 4 , G , 6 , 
C , 6 , E , 6 , D , 3 , D , 3 , G , 4 , G , 4 , <G , 6 , 
E , 6 , D , 6 , C , 3 , C , 3 , <A , 4 , <A , 4 , C , 6 , 0 , 20 ,
\ 18 ticks equals one second
18 CONSTANT TICKS/SECOND
0 CONSTANT TMUSIC# \ task# for music playing task
1 CONSTANT TTIME# \ task# for time-of-day display task
2 CONSTANT TLOOP# \ task# for runaway task
0 USER POS \ current position in tune
\ enable timer-driven tone-generation
: SPEAKER-ON ( -- ) CONTROL-PORT PC@ 3 OR CONTROL-PORT PC! ;
\ disable timer-driven tone-generation
: SPEAKER-OFF ( -- ) CONTROL-PORT PC@ MASK AND CONTROL-PORT PC! ;
\ delay for n signals
: DELAY ( n -- ) DUP FOR ME WAIT 1 - NEXT DROP ;

\ music playing task logic
: TMUSIC ( -- )
 SETUP TIMER-PORT PC! SPEAKER-ON \ setup timer and enable speaker
 0 POS ! ME WAIT BEGIN \ wait for first signal to start
 SPEAKER-OFF \ disable speaker
 SPEAKER-ON \ and enable it to give a brief pause
 TUNE POS @ + DUP DUP C@ COUNT-PORT PC! \ output frequency divisor to
 1 + C@ COUNT-PORT PC! \ timer count register
 2 + @ DELAY \ delay specified duration
 4 POS +! \ advance tune pointer
 TUNE POS @ + 
 @ 0 = IF 0 POS ! 
 SPEAKER-OFF 20 DELAY THEN \ pasue a while song finishes
 AGAIN ; \ replay
\ replace standard word to quit: need to remove ISR and disable speaker
: BYE ( -- ) SPEAKER-OFF TIMER-INT INT-REMOVE BYE ;
\ declare variable for keeping the time-of-day
VARIABLE XCOUNT
TICKS/SECOND XCOUNT !
VARIABLE SECOND
VARIABLE MINUTE
VARIABLE HOUR
\ a semaphore is used for safe-guarding time-of-day variables access
\ note that all words accessing them do a wait in the beginning
\ and a signal at the end to provide mutial exclusive access
20 CONSTANT TODSEM
TODSEM SIGNAL
\ set time-of-day
: SETTOD ( n n n -- ) TODSEM WAIT SECOND ! MINUTE ! HOUR ! TODSEM SIGNAL ;
\ advance time-of-day clock by one second
: SECOND> ( -- ) TODSEM WAIT 1 SECOND @ + DUP 60 < IF
 SECOND ! ELSE 60 SWAP - SECOND !
 1 MINUTE @ + DUP 60 < IF
 MINUTE ! ELSE 60 SWAP - MINUTE !
 1 HOUR @ + DUP 24 < IF
 HOUR ! ELSE 24 SWAP - HOUR !
 THEN THEN THEN TODSEM SIGNAL ;
\ convert number to two ASCII characters on the stack
\ cannot use words like <#, #, #>, etc. because they allocate
\ memory and may interfere with the interpreter task #14
: DECODE ( n -- n n ) 10 EXTRACT SWAP 10 EXTRACT SWAP DROP ;
\ write ASCII characters to memory location until 0 is reached
: SCRWRITE ( 0 n ... n seg off -- ) BEGIN 
 2 + ROT DUP WHILE 
 2 PICK 2 PICK MC! REPEAT 2DROP DROP ;
HEX
\ display time-of-day HH:MM:SS near top right hand corner of screen
: TODDISPLAY ( -- ) TODSEM WAIT
 0 SECOND @ DECODE 3A 
 MINUTE @ DECODE 3A 
 HOUR @ DECODE
 SCR-SEG 80 SCRWRITE
 TODSEM SIGNAL ;
DECIMAL
\ time-of-day display task logic
: TTIME ( -- ) 0 0 0 SETTOD BEGIN ME WAIT \ set time to 00:00:00
 XCOUNT @ 1 - DUP XCOUNT ! 0 = IF \ increment time every second
 SECOND> TICKS/SECOND XCOUNT ! THEN 
 TODDISPLAY AGAIN ; \ display time

\ runaway task logic
: TLOOP BEGIN AGAIN ; \ loop
HEX 
\ define interrupt service routine
ISR: CLOCKISR TMUSIC# SIGNAL \ signal tasks
 TTIME# SIGNAL IPREEMPT ISR; \ multi-task
DECIMAL
\ install timer ISR
TIMER-INT INT-INSTALL CLOCKISR
\ create and start all tasks
CREATE STACKAREA
TMUSIC# THREAD TMUSIC
TTIME# THREAD TTIME
TLOOP# THREAD TLOOP

















































Distributed Objects and the Internet


Client/server components go online




John Pompeii


John is founder and principal software architect of Secant Technologies. He
can be contacted at john@secant.com.


Many articles in DDJ and elsewhere have discussed distributed objects and the
Internet, but most have been conceptual in nature (see, for example,
"Networking Objects with CORBA," by Mark Betz, DDJ, November 1995). In this
article, I'll discuss a real-world implementation of distributed-object
technology called "LISA," short for "Leasing Information and Service
Assistant." LISA is a client/server, object-oriented property-management
application that automates many of the day-to-day details of managing
thousands of rental properties throughout Ohio and Michigan. The LISA system
comprises an OS/2 Presentation Manager app, daily processing programs, a set
of Distributed System Object Model (DSOM) object services, a TCP/IP network,
and more than 60 Oracle database servers. 
We developed LISA for Associated Estates Realty Corp., a real-estate
investment trust that stipulated the following business requirements:
Keep track of the suite, lease history, status, prior and present employers,
prior addresses, and credit information of each resident.
Manage the status, lease history, and features of each suite.
Record the time and date of each prospective tenant's visits, their current
addresses, and how they found out about the available suite (drive-by,
newspaper, magazine, or whatever).
If a prospect agrees to rent a suite, enter the application into the system
and transfer it to the credit department at the main office. The application
results, notes, and requests for additional information (which are generated
at the main office) must be transferred back to the site where the application
originated.
Replicate all resident, suite, prospect, and visit information recorded at the
individual properties in the central database at the main office. This
database is used in marketing-analysis tasks to determine where to apply
advertising resources.
Maintain a to-do list to remind property managers of events such as promised
follow-up calls, resident-satisfaction calls, lease expirations, thank-you
cards, and application results.
Print legal documents such as leases, applications, and attachments, as well
as a large list of reports.
Maintain details of user logins and the hours during which the leasing offices
are open.
Maintain high configurability. Since it is used in different cities and
states, the system must comply with the law in all of these areas. 
Use Oracle 7, the corporate standard, as the database.
Clearly LISA is a complex system. In addition to familiar problems involving
user interfaces, operating systems, and development environments, LISA
introduces new challenges: object services for distributed objects, naming,
event notification, workflow, concurrency control, and Internet connection. In
this article, I'll focus on two of these areas--distributed-object and
internet-connection services--and how they relate to the three most important
aspects of communications between the sites:
Connecting the main office to one or more of the 60+ remote properties on
demand.
The mechanics of data communications between two sites, including the means of
queueing messages when the two are not connected.
Exchanging complex business objects and workflow-task data. In other words,
"flattening" objects into a form that can be sent over the network.


The Client Application


On the client side, LISA is an OS/2 PM application written using Borland C++
for OS/2. We used a notebook paradigm common with many OS/2 CUA-91
applications to make the application easy for nontechnical users.
Each page of this notebook lists items such as prospects, residents, and
suites. An easy-to-use search facility searches for items and returns results
in a list. To edit a particular item, the user simply double clicks on the
desired item in the list.
The application style follows the CUA-91 guidelines. All list entries support
context menus, icons, usage emphasis, and drag and drop. Full
context-sensitive help is available from every part of the application via the
F1 key or the help menu.
To build LISA, we employed object-modeling techniques and software layering to
maximize reuse and achieve high maintainability. Figure 1 illustrates the
software-component layers.
The application's foundation is built on ObjectPM, an application framework
that we wrote. ObjectPM is an OS/2 C++ class library which presents OS/2's
GUI, graphics, and multitasking as a set of easy-to-use software components.
Other classes included in the foundation layer manage database access,
workflow, and reporting. The business and configuration model classes are
built on top of this base.
The business model and foundation classes are combined in the application
layer to form the finished application. The application layer consists of a
set of GUI and application-level classes used to glue the reusable components
underneath. This layer is relatively thin compared to the final assembly,
representing less than one-third of the code.


Overall System Organization


Figure 2 illustrates the global structure of the LISA system. The main AERC
office houses a backbone network supporting NetWare, DecNet, and TCP/IP. The
principal Oracle database is housed in a Compaq Proliant SMP computer with two
100-MHz Pentium processors running OS/2 SMP 2.1. This machine also runs a
number of DSOM object services that handle event management, naming,
properties, workflow routing, and concurrency control for all LISA clients on
the LAN.
Outside the main office, more than 60 properties run the LISA system. Each
site has between one and four computers, depending on the size of the
property. If there is only one machine, then all software--including Oracle,
the object services, and applications--must run on this machine. For larger
properties with more computers, the OS/2 peer services are used to share disk
and printer resources among users.
The main office and individual properties connect on an intermittent basis
using a TCP/IP point-to-point connection service implemented as a DSOM object
service. This server maintains a catalog of remote networks and their Internet
addresses. When a user or processing program needs to contact a distant
network, it finds the connection-service object and invokes the connect method
on it, supplying the name of the desired network. The connection service
responds by dialing the modem and waiting for the receiver to answer. When the
remote site answers, both sides exchange addresses, launch a TCP/IP PPP
driver, and configure the IP routing tables to join the two networks.
Some remote properties are large enough to warrant full-time connections to
the main office. These properties are equipped with an ISDN connection to a
frame relay network at the phone company. The main office is also connected to
this network. The connection service needn't worry about these sites since
they are online all the time.
The main office and each property are connected at least once a day to
exchange and update information. Users at the main office dial into a property
to connect with the local database. Since each Oracle server can be connected
through the TCP/IP network, users on one network can easily use databases from
other networks to perform queries and maintain the system.


Distributed-Object Services


LISA's client/server architecture uses an Oracle database server in addition
to the other servers used for workflow, synchronization, and moving
information throughout the system. These servers were written in C++ using the
MetaWare High C/C++ compiler with Direct-to-SOM (DTS) support. This allowed us
to write a set of SOM-based classes in native C++, and to make objects created
from these classes accessible to remote clients over the network.

Figure 3 shows the basic idea of distributed objects. The (sharable) objects
run in a process called a "DSOM server." To use the object, a client finds the
object through the Object Request Broker (ORB), which returns a "proxy"
object. As far as the client application is concerned, the proxy appears and
works like any other local object. However, when the client invokes a method
on the proxy, the ORB intercepts the method request and forwards the request
(along with the calling arguments) to the server that contains the real
object. The server then locates the object, invokes the operation, and sends
back the return value(s).
IBM's System Object Model (SOM) provides an object manager that allows
object-oriented functionality--defining classes, creating objects, and
invoking methods--outside the bounds of a specific programming language. In
other words, it provides the features of an object-oriented language (such as
Smalltalk or C++) without the language. SOM hides the implementation details
of a class from a program that uses it. For example, a SOM class library can
be coded in C++, then used by a Smalltalk program.
DSOM, a natural extension of SOM, also is an implementation of OMG's Common
Object Request Broker Architecture (CORBA), which defines the standard for
distributed-object technology and services. DSOM and CORBA extend the idea of
separating an object's client and implementation with "location transparency."
This means that a client can use an object without knowing where it's
physically located. Thus, objects can be located around a network and used by
client applications just as though they were located in the same process as
the client. This allows implementation of custom servers as distributed
objects. Figure 4 illustrates the object services used by LISA. Each
distributed service is a general-purpose software component that provides
service to all LISA clients on the network. 
The distributed-object facility implements communication between clients and
servers, as well as between remote properties. Unlike the classical
communications model, which sends raw messages to a distant socket, this
approach is easier to use, more robust, and allows the message data to be
structured in the form of calling arguments. It also hides the actual
communications transport and the details of low-level socket and IPC
programming.


Property Sets and Sonics


All object services use a common set of data-management structures. The first,
a PropertySet, is a block of memory that contains a collection of name-value
pairs. For instance, Example 1 is a property set that contains an address. The
PropertySet class contains methods to add, remove, query, and change these
name-value pairs.
PropertySets also can contain any CORBA datatype, including blobs, data
streams, and other PropertySets. Furthermore, there are no pointers in the
PropertySet, which allows PropertySets to be sent from process to process (via
DSOM method calls and event channels). Thus a PropertySet is a mobile, mutable
data object with many uses. In one way, it is used to get around the CORBA
limitation of not allowing objects to be copied by value between the client
and server. If an object is passed as a parameter to a remote method call, for
example, its reference is marshaled and converted to a proxy object at the
server. CORBA does not support passing objects by value.
With a PropertySet, you can flatten one or more objects into a stream in
memory using the Object Externalization Service, and place that block of
memory containing the object(s) in the PropertySet with a given name. The
memory block can then be used as a calling argument or return value to a DSOM
method call (since a PropertySet possesses the same structure as a CORBA
sequence). The receiving process can get the object(s) by extracting the block
of memory from the PropertySet and then using the Internalization Service to
convert the stream back into objects.


Persistence Services


Persistence is an important aspect of managing the LISA data model. More
specifically, we must be able to store and retrieve business objects to and
from an Oracle 7 database and memory-based streams. To solve the problem of
object persistence, LISA business classes were defined and implemented using
our Persistent Object Manager (POM). This tool, an enhanced implementation of
the Persistence Service defined by the Object Management Group (OMG), includes
both a precompiler for reading object definitions and a run-time persistence
engine that handles the mechanics of object storage. The basic idea is that
the precompiler reads a set of class definitions and produces a schema, which
is used to automatically store and retrieve the business objects to and from
the database. As far as the application is concerned, Oracle functions as an
object database. Figure 5 illustrates the architecture of the persistence
service.
Like the LISA application itself, the persistence service is built in layers.
The top layer, POM, provides a virtual object-database interface to the
application. All I/O requests, queries, and transaction operations are
submitted to POM. The next layer, the Persistent Data Service (PDS), contains
the bulk of the persistence engine. It is responsible for creating, reading,
and writing the business objects, and interfacing with the I/O drivers. The
bottom layer is composed of media drivers that accept the I/O requests from
the PDS and translate them into data-manipulation operations for a specific
data store (such as Oracle or I/O streams).
The fundamental idea of the persistence service is to provide a common
interface to all types of data stores for object-oriented applications.
Technologies such as file storage, streams, relational databases, and even
object databases are presented in a common way. This has been the result of
the work done by OMG and the Object Database Management Group (ODMG) consortia
to standardize database and transaction management.
In terms of communications, POM is a key component used to externalize the
business objects exchanged between the main office and the remote sites. The
Externalization Service is a simple class that uses the stream driver of POM
to write and read complex business objects into PropertySets. These
PropertySets, as described earlier, are sent to the receiver via method calls
on distributed objects.


Internet Connection Service


The Internet Connection Service (ICS) is a DSOM object service that plays two
important roles in the LISA system. First and foremost, it is responsible for
connecting and disconnecting remote TCP/IP networks from the calling network
on demand. It also must monitor these connections and restart the connection
if the system encounters trouble. The second purpose of this service is to
orchestrate the transfer of event and workflow data between two networks. 
To develop such a service, you need to first design the interface (operations)
of the service and then define it using the Interface Definition Language
(IDL). Listing One shows the IDL definition of the ICS. This file is compiled
by the SOM IDL compiler, which generates the SOM class tables, headers, and
C++ method templates. From there, the service is completed by implementing the
methods and building native C++ class equivalents of the structures defined in
the IDL file.
The service itself is compiled, linked into a DLL and executed by the DSOM run
time. We generally start this service when the machine boots, then leave it
running all the time. When it initializes, it reads a configuration file that
contains the catalog of remote network names, phone numbers, protocols, and IP
addresses. Once initialized, these networks can be connected and disconnected
on demand via the object-service methods.
When the client application starts up, it gets a list of the available distant
networks by invoking the list_networks method on the ICS object. This method
returns a collection of NetworkDef objects that the application uses to
initialize the Connect menu. Users choose an item from this menu when they
want to connect to the Oracle database at that property. When this occurs, the
ICS is used to establish the connection (if not done already) and return a
handle that we can use to close the connection when we're finished with it.
Listing Two is a code fragment used to interact with the ICS to accomplish
connection and disconnection. 
As Listing Two shows, using an object service first involves acquiring a proxy
to the server object by invoking the somdFindServerByName method on the DSOM
object manager. When remote methods are invoked that return an object either
as a return value or out parameter, proxy objects are created by DSOM. Once a
proxy is returned, it is used like any other object. The application doesn't
know if the object is local or remote.


Conclusion


Thanks to object technology, LISA can and will grow for a long time. The key
to long-term maintainability lies in the ability to modularize. We do this by
using objects that are software modules that hide their data and
implementation from each other. This eliminates many of the dependencies
inherent in traditional programs and allows updates to these modules without
upsetting the whole system.
These concepts are applied to the client/server components of the system as
well. Each "server" is a set of one or more distributed objects that provide
general-purpose services without disclosing the details of how the services
were implemented or where they exist. This allows them to be moved, extended,
or even completely rewritten in another language without breaking binary
compatibility.
Figure 1: System component layers.
Figure 2: LISA system structure.
Figure 3: Distributed objects.
Figure 4: Distributed object services.
Figure 5: Object persistence services.
Example 1: Typical property set.
Property Name Type Value

"Street" tc_STRING "220 Center St."
"City" tc_STRING "Mentor"
"State" tc_STRING "OH"
"ZipCode" tc_LONG 44060

Listing One
// file: connctsv.idl
#include <somobj.idl>
struct TimeStamp;
enum NPConnectStatus 
{ 
 NPConnected, 
 NPDisconnected, 

 NPTrying 
};
// structure to define a TCP/IP interface
struct NetworkPort
{
 unsigned long handle;
 string port_name;
 string attached_network;
 NPConnectStatus status;
 unsigned short connections;
 long last_error;
 long last_drverror;
};
// defines a typed array of NetworkPort structures
typedef sequence <NetworkPort> NetworkPortList;
// structure definition for specific connection
struct NetConnection
{
 string port_name;
 HConnect hcnct;
 string user_name;
 string user_host;
 TimeStamp connectTime;
};
typedef sequence <NetConnection> NetConnectionList;
typedef sequence<string> TelephoneNumberList;
struct NetworkDef;
{
 string netname;
 unsigned long address;
 string description;
 TelephoneNumberList phoneNumbers;
 string setup;
};
typedef sequence <NetworkDef> NetworkList;
// exceptions
exception ctsvException
{
 long errorCode;
 long driverCode;
};
// Internet Connection Service interface (class) definition
interface ConnectService : SOMDServer
{
 HConnect connect(in string network) raises(ctsvException);
 void disconnect(in HConnect hcnct) raises(ctsvException);
 boolean is_connected(in string network) raises(ctsvException); 
 void exchange_events(in HConnect hcnct) raises(ctsvException);
 NetworkPort get_port(in HConnect hcnct) raises(ctsvException);
 unsigned short get_port_count();
 void list_ports(out NetworkPortList ports);
 void list_connections(out NetConnectionList cncts);
 void list_networks(out NetworkList nets);
 #ifdef __SOMIDL______LINEEND____
 implementation
 {
 releaseorder : connect, disconnect, is_connected, 
 exchange_events, get_port, get_port_count, 
 list_ports, list_connections, list_networks;

 callstyle=idl;
 dllname = "ctserv.dll";
 memory_management = corba;
 majorversion = 1;
 minorversion = 0;
 somDefaultInit: override, init;
 somDestruct: override;
 };
 #endif
};

Listing Two
void 
LISAAppFrame :: ConnectNetwork(NetworkDef *net, LISAEnv *appl)
{
 Environment *ev = __SOMenv;
 // obtain a proxy object to the ICS by asking DSOM for the service by its
 // alias name. (defined when the server was installed in the system)
 
 ConnectService *ctsv = (ConnectService *) 
 SOMD_ObjectMgr->somdFindServerByName(ev, "connectService");
 if (!ev->OK())
 throw CORBA::SystemException(ev);
 // now that have the server object, initiate the connection:
 HConnection handle = ctsv->connect(ev, net->netname);
 if (ev->OK())
 {
 // network connection established OK, now attempt Oracle logon 
 // connection. First, build the Oracle SQL*Net connection
 // string: "userid/password@t:netname"
 char signOn[80];
 sprintf(signOn, "%s/%s@t:%s", appl->user, 
 appl->password, net->netname);
 try
 {
 // connect the the database via the persistence driver
 appl->oracleDb->Connect(signon);
 // Done! Save the connection handle and return
 appl->hConnect = handle;
 }
 catch(...)
 {
 // No go. Either the database is down or the user does not have
 // permission to access the database. Cancel the connection to the
 // remote network and return the error to the caller.
 ctsv->disconnect(ev, handle);
 throw;
 }
 }
 else
 throw CORBA::SystemException(ev);
}
void 
LISAAppFrame :: DisconnectNetwork(LISAEnv *appl)
{
 Environment *ev = __SOMenv;
 // connect the the database via the persistence driver
 appl->oracleDb->Disconnect();
 // obtain a proxy object to the ICS from DSOM

 ConnectService *ctsv = (ConnectService *) 
 SOMD_ObjectMgr->somdFindServerByName(ev, "connectService");
 if (!ev->OK())
 throw CORBA::SystemException(ev);
 // now that we have the server object, destroy the connection:
 ctsv->disconnect(ev, appl->hConnect);
 appl->hConnect = NULLHANDLE;
}























































Examining the Cocktail Toolbox


Tools for producing compilers, translators, and more




Rodney M. Bates


Rod, an engineer with Boeing aircraft, can be contacted at
bates@salsv6.boeing.com.


Half a century ago, General Motors and others began convincing railroads to
use diesel instead of steam locomotives. Maintaining steam locomotives, they
claimed, required a tremendous amount of labor, shop facilities, and supplies,
not to mention having valuable capital assets off-line during the maintenance
process. Steam fan that I am, it pains me to admit they were right--diesels
were an improvement, at least when it came to running a transportation
business. 
In particular, one marketing tactic diesel manufacturers used was to put a
demonstrator diesel locomotive on the tracks. In one case (or so the story
goes), the demonstrator wore out its wheels in just three months. Skeptics
immediately jumped on this unprecedented wear rate as a reason to reject the
newfangled machine. Closer investigation revealed that the locomotive ran
87,000 miles in those three months, which was about the normal life of a set
of wheels. However, steam locomotives could never approach that mileage in
such a relatively short time. In short, while steam engines were sitting in
the yards being oiled, greased, and ash-cleaned, diesels were on the tracks
wearing off their wheels. 
Using the Cocktail tool package for translator development makes me feel like
that demo diesel in that the work leaves me worn out. But when I look back at
what I have produced in a short time, I realize that a large amount of
relatively routine work has been done for me, allowing me to concentrate on
the real algorithmic meat of the task. The result is high mileage, which more
than compensates for the rapid wear.


What is Cocktail?


Cocktail is a collection of tools for producing compilers, translators, and
similar tools. It was originally developed at the German National Research
Center's subsidiary at Karlsruhe. Cocktail contains implementations for six
special-purpose languages that generate most of the parts of a translator. 
Cocktail is available for both MS-DOS and UNIX. To run the binaries under DOS,
you'll need an 80386/486, at least 2 MB of memory, and the go32 DOS extender
from DJ Delorie. To compile the sources, you'll need the gcc GNU C compiler
(Delorie's GCC port to DOS), the GNU make (gmake), and the ar and ranlib
archive-handling tools. For more information, see text box on page 82. A
commercial implementation (including support) is available from Josef Grosch
(grosch@cocolab.sub.com), whose company provides support for the free version,
along with translator-development services.
Two of the languages included with Cocktail, rex and lalr, are approximate
functional equivalents of the ubiquitous lex and yacc. They generate lexical
scanners and LALR parsers, respectively. However, they generate much faster
code than lex and yacc and have valuable additional features. Also, they
integrate smoothly with the other languages.
Another language, ell, is actually nearly identical to lalr. However, its
generator produces top-down parsers instead of bottom-up parsers. This
somewhat limits the grammars it can parse, but it also yields faster parsers
and makes some kinds of semantic processing easier.
The ast language supports specifying abstract syntax trees (ASTs). Many
translator programs build ASTs internally, and there is a very large amount of
relatively boring code to be written to support them. Ast generates it for
you. 
The ag language generates evaluator programs for attribute grammars (AGs). AGs
are a very powerful formal system for specifying static semantic analysis. An
AG defines, in a declarative style, how to compute information about the nodes
of a tree. The ag language is a superset of ast, which it uses to define the
AST on which it works. The same generator program, cg, implements both ast and
ag.
Finally, the puma language handles term-rewrite systems, another kind of
powerful formal system. They define how a tree can be transformed into another
tree, using patterns and replacements. 
The generators produce source code ("target code") in either Modula-2 or C,
which is then compiled as usual. Handwritten target code can be embedded in
many places in the input to the generators. This is a powerful escape
mechanism for handling things that can't be expressed in the relevant formal
system. 


Ast and Ag


Ast is a language for defining a tree grammar for an AST; lalr is for defining
a string grammar for a concrete syntax. Although concrete and abstract
syntaxes are considerably different in style, the developers of Cocktail
recognized that the same language could be used to define either style. 
The cg generator has a function that takes a concrete grammar and its related
semantic actions, written in the ast and ag languages, and converts it into
lalr. It can also emit rex code for most of a scanner specification. I find
this the easiest way to use Cocktail, as most of the specification is written
in a single notation.
I will use a primitive programming language called "Lox" for illustrative
purposes. Lox has a PRINT statement, an IF with only a THEN clause, a compound
statement for combining other statements, and simple expressions involving
integer literals, addition, multiplication, and parentheses.
The first 50 lines of Listing One show the concrete syntax in ast notation.
I've used the convention that all grammar symbols in the concrete syntax start
with "Cs". Before I adopted this and other, similar conventions, I found
myself confusing grammar symbols with attributes of the concrete and abstract
syntax.
A grammar symbol followed by an equal sign and more symbols to its right is a
production. For example, in line 27, a PRINT statement, denoted by grammar
symbol CsPrintStmt, consists of the literal characters 'PRINT' followed by an
expression, denoted by CsExpr, which is defined by the production beginning on
line 36. 
Alternative right-hand sides are enclosed between "<" and ">" and separated by
commas. Each alternative can have its own name ahead of an equal sign. For
example, a statement (CsStmt, line 26) can be either a PRINT statement, an IF
statement (CsIfStmt, line 28) or a compound statement (CsCompoundStmt, line
32). The names for the alternatives are optional in ast, and they are not
necessary for defining the syntax. However, they will be needed later to build
an AST.


Rex 


Together, cg and lalr can generate an LALR parser. In addition, from the same
input, cg generates the boring but voluminous part of a scanner specification
in the language rex, namely, all the tokens of fixed spelling such as "BEGIN"
and ";".
The rest of the scanner specification must be handwritten; see Listing Two.
For my simple language, the only token of variable spelling is an integer
literal, specified in line 47 as "Digit +." Digit is defined in line 39 as any
digit, and the "+" is the familiar "zero or more occurrences of" operator of
regular expressions. 
When a literal is recognized, lines 48 and 49 call the function LiteralValue
to compute its integer value and store this in attribute CaValue. An attribute
is a field attached to a node of a tree that is not part of the tree structure
itself. I use the convention that attributes of symbols in the concrete syntax
start with "Ca". 
Line 50 returns the recognized grammar symbol (TokLiteral, in this case) from
the scanner. The material between braces on these three lines is all target
code, handwritten in Modula-2, which rex copies almost verbatim into the
generated scanner. 
Function LiteralValue is defined in lines 16 through 34. This also is target
code, as the braces around it show. It converts the characters of the literal
into an integer value.


Building an Abstract Syntax Tree


Cg can also be used to define and build an abstract syntax tree (AST). Listing
Three contains an abstract syntax for Lox. Although the notation is the same
as for the concrete syntax, the style is quite different. All the delimiting
tokens, such as IF, "(", and the like, are absent from the abstract syntax. 

The concrete syntax describes the structure of input in string form. There,
the delimiters are needed to determine what construct is next and where its
subcomponents begin and end. The abstract syntax, in contrast, defines a tree
rather than a string. Each node in the tree contains a node operator that
tells exactly what construct it is (making delimiter symbols redundant).
You can think of this in object-oriented terms. A grammar symbol such as
AsExpr on line 27 is like an abstract class. (This is a different meaning of
"abstract.") That is, objects (or tree nodes) of type AsExpr are never
actually created. Instead, only nodes of type AsSumExpr, AsProductExpr, and
AsLiteral are created. When it is used on the right side of a rule, AsExpr can
refer to any of these; see line 19. My convention is that "As" begins the
names of abstract-syntax grammar symbols.
Conceptually, there is a tree even for a concrete syntax, usually called a
"derivation tree.'' This form of tree isn't the most desirable for further
processing. So parsers usually just discover the derivation tree one step at a
time but never actually construct it in memory. 
Given only a concrete syntax, cg and lalr generate a parser that doesn't
produce any record of what it has done. You must add definitions that tell the
parser what to produce. I have done this in a way that builds an AST,
according to the grammar in Listing Three, while parsing according to the
grammar in Listing One. 
The construction works by attaching attributes to the nodes of the derivation
tree and computing their values during parsing. In general, the values of
attributes of tree nodes can depend on other attributes of the same node or
attributes of its parent, children, or siblings. Thus, information can flow
all over the tree in all directions.
In cases where a tree is actually built, cg can infer the tree-traversal order
necessary to evaluate a set of attributes and generate a set of procedures
that traverse in this order while computing the attributes. You need to
specify only the local rules. This is the real power of attribute grammars and
evaluator generators like cg.
However, we are evaluating attributes of the concrete grammar--whose tree is
never built--during parsing, which is done bottom-up. So we are limited to
attribute values that depend only on attributes of the same node and its
immediate children. 
Fortunately, this is usually adequate for building an AST. The Build module,
starting at line 51 of Listing One, does this. Lines 71 through 73 declare
that every symbol in the concrete syntax has an attribute named CaAst of type
tAst, which is a pointer to an AST node. CaAst will hold the root of the AST
that corresponds to this concrete grammar symbol.
The attribute declaration just adds CaAst to the previously declared
properties of the grammar symbols CsProgram, CsStmt, and so on. All of their
descendent symbols, such as CsPrintStmt, CsIfStmt, CsSumExpr, and the like
will inherit this attribute.
The property SYNTHESIZED tells cg to copy the value of this attribute by
default from a child node if no explicit rule is given to compute it. 
Rules for computing the attributes CaAst of the concrete symbols begin at line
77. Take line 112: The function mAsIfStmt is one of a complete set of
functions generated by cg from the AST definition of Listing Three. It
allocates and constructs an AST node whose operator is AsIfStmt and whose
children are supplied as parameters. These are just the AST subtrees for the
expression and statement used to build the IF statement.
The notation CsExpr : CaAst in line 115 defines the attribute CaAst of the
child node whose node type is CsExpr. This is target code, as the braces show,
but it is not quite Modula-2. 
Cg recognizes CsExpr : CaAst as a reference to an attribute and replaces it
with the appropriate expression to access the actual attribute. Cg will have
stored this in some temporary place. Also, cg notes that CaAst of the parent
CsIfStmt node depends on CaAst of the CsExpr child node. It uses this
information to determine traversal orders. 
In a more complicated language, other attributes would collect information to
be stored into AST nodes whose construction is postponed. The linking of AST
nodes also could be more complicated. For example, I have a C parser and AST
builder which inverts the inside-out structure of declarators to match the
right-side-out structure of type specifiers and function formal-parameter
lists. 
Notice the parameter to the mAsLiteral constructor in line 146. Although
AsLiteral has no children, it has attribute AaInitValue, declared in line 34
of Listing Three. mAsLiteral sets the value of AaInitValue. The constructor
call just uses the value of attribute CaValue. 


An Attribute Evaluator


The Evaluate module, beginning at line 37 of Listing Three, defines an
attribute evaluator that works on the AST after it is already constructed. I
have written this to compute the values of Lox expressions, which contain only
constants.
Line 46 declares that every AsExpr node has an additional
attribute--AaValue--that AsSumExpr, AsProductExpr, and AsLiteral will inherit.

The property OUTPUT tells cg that the value of this attribute should be
explicitly stored in the tree, because it is expected to be used later.
Attribute AaInitValue has the property INPUT, which means it should have been
computed prior to attribute evaluation, during AST building. Cg will optimize
the storage of attributes that are neither INPUT nor OUTPUT by keeping them in
local variables of the evaluator, rather than in the tree nodes.
The rest of the rules in Listing Three tell how to compute AaValue for the
various subclass nodes of AsExpr. Line 61 just copies from AaInitValue to
AaValue of the AsLiteral node. This seeming redundancy gives every AsExpr node
a consistent attribute (AaValue), while the AaInitValue is unique to the
AsLiteral node and is computed at a different time.
Cg doesn't know that I intend never to create a node with operator AsExpr, so
it insists on a rule to compute its AaValue attribute. Line 63 provides this.
This attribute evaluator happens to work strictly bottom-up, so I could have
done this work at AST build time. In a more- complicated language, I might
need an attribute evaluator that traverses the tree in complex ways.


Tree Transformations


Puma generates procedures that match patterns in trees. Once a pattern is
matched, a new tree can be constructed and used in many ways: for checking
type compatibility, computing result types of functions, or transforming a
tree. 
Listing Four is a simple tree transformer that folds the IF statement in Lox.
Since all expressions are evaluated, it is possible to decide whether the THEN
clause should be executed. If so, the IF statement can be replaced by just the
THEN clause.
To avoid deallocation, I copy all tree nodes that could possibly have changes
in their subtrees. Expressions can't have any subordinate changes, so I reuse
them in place.
Most of the functions and rules in Listing Four simply make copies of AST
nodes. The pattern at line 42 is interesting: It matches an AsIfStmt whose
first child is an AsExpr whose first component (attribute AaValue) has value
zero. 
The pattern also binds local identifier ThenStmt, the second child of the
AsIfStmt. ThenStmt is used in the pattern replacement in line 46. This is just
a Fold done on the THEN clause. 


Odds and Ends 


In addition to the generators I've just described, Cocktail includes other
useful components. All host-platform-dependent functions are collected in a
single library. These routines can be modified as necessary to make all the
Cocktail tools run on a particular machine. I have installed (by recompiling)
Cocktail on two flavors of UNIX, without change to the portability library.
There also is a modest library of functions likely to be useful in a
translator program, including the generators themselves, which use the
library. Simple examples are manipulation and storage of variable-length
strings and hashed symbol-table mapping of variable-length strings to a
compact series of integers.
There are translator programs between lex and rex and between yacc and lalr.
Mtc is a good Modula-2-to-C translator that produces amazingly readable code
for a source-to-source translator. I have used mtc to bootstrap fairly large
bodies of Modula-2 code to a platform for which I didn't have a handy Modula-2
compiler. 
There also is a fair-sized collection of example scanners and parsers for
various programming languages. The commercial version has an expanded set,
including difficult languages such as Fortran. 
For the examples in this article, 300 lines of handwritten input code to the
various Cocktail tools generated 3700 lines of Modula-2 code. This ratio is
consistent with the ratio I've achieved on realistic-sized projects. Some of
this might not be used in a particular application and some is bigger than
handwritten code, but there is a high gain to be had.
Using Cocktail means writing very high-density code. There is a lot of meat in
a small space. It is tiring, but it gives so much leverage, you can afford to
take more breaks and still finish a project much faster. It's like a diesel
running 87,000 miles in three months. But for some reason, I don't have the
same "steam fan's" nostalgia for the old system of handwriting translator
code. 


For More Information


Delorie's GCC port to DOS is available at grape.ecs.clarkson.edu in the
/pub/djgcc directory. Free versions of Cocktail (with source and
documentation) are available at
ftp://ftp.ira.uka.de:/pub/programming/cocktail/ and
ftp://ftp.unistuttgart.de/pub/unix/programming/compilerbau/ (including a DOS
version using DJGPP). 

Listing One
 1 
 2 /* Concrete syntax */ 
 3 
 4 PARSER Parser 

 5 
 6 RULE
 7 
 8 START = CsProgram . 
 9 
10 /* Terminal symbol: */ 
11 
12 TokLiteral : 
13 [ CaValue : CARDINAL ] 
14 { CaValue := 0 ; } . 
15 
16 /* Main grammar: */ 
17 
18 CsProgram = CsStmt .
19 
20 CsStmtPlus
21 = < CsStmtPlusLast = CsStmt .
22 CsStmtPlusMore 
23 = CsStmt ';' CsStmtPlus . 
24 > .
25 
26 CsStmt
27 = < CsPrintStmt 
28 = 'PRINT' CsExpr .
29 CsIfStmt
30 = 'IF' CsExpr
31 'THEN' CsStmt .
32 CsCompoundStmt 
33 = 'BEGIN' CsStmtPlus 'END' . 
34 > . 
35 
36 CsExpr
37 = < CsSimpleExpr = CsTerm .
38 CsSumExpr = CsExpr '+' CsTerm . 
39 > . 
40 
41 CsTerm
42 = < CsSimpleTerm = CsFactor .
43 CsProductTerm = CsTerm '*' CsFactor .
44 > . 
45 
46 CsFactor
47 = < CsLiteral = TokLiteral .
48 CsParenthesizedExpr = '(' CsExpr ')' . 
49 > .
50 
51 MODULE Build
52 
53 /* Actions to build AST. */ 
54 
55 PARSER
56 
57 GLOBAL
58 { FROM Ast IMPORT 
59 AstRoot , tAst 
60 , mAsProgram 
61 , mAsStmtStarNone , mAsStmtStarAnother 
62 , mAsPrintStmt , mAsIfStmt
63 , mAsCompoundStmt , mAsSumExpr

64 , mAsProductExpr , mAsLiteral ;
65 }
66 
67 PROPERTY /* Disable implicit INPUT */
68 
69 DECLARE 
70 
71 CsProgram , CsStmt , CsStmtPlus 
72 , CsExpr , CsTerm , CsFactor
73 = [ CaAst : tAst SYNTHESIZED ] .
74 
75 /* Storing Final tree. */ 
76 
77 CsProgram
78 = { CaAst
79 := { CaAst 
80 := mAsProgram
81 ( CsStmt : CaAst ) ; 
82 AstRoot := CaAst ;
83 } ; 
84 } . 
85 
86 /* Linking statments into lists. */
87 
88 CsStmtPlusLast
89 = { CaAst
90 := mAsStmtStarAnother
91 ( mAsStmtStarNone ( ) 
92 , CsStmt : CaAst
93 ) ;
94 } . 
95 
96 CsStmtPlusMore
97 = { CaAst
98 := mAsStmtStarAnother
99 ( CsStmtPlus : CaAst
100 , CsStmt : CaAst
101 ) ;
102 } . 
103 
104 /* Building statement nodes. */ 
105 
106 CsPrintStmt
107 = { CaAst
108 := mAsPrintStmt
109 ( CsExpr : CaAst ) ;
110 } . 
111 
112 CsIfStmt
113 = { CaAst
114 := mAsIfStmt
115 ( CsExpr : CaAst 
116 , CsStmt : CaAst
117 ) ;
118 } . 
119 
120 CsCompoundStmt
121 = { CaAst
122 := mAsCompoundStmt

123 ( CsStmtPlus : CaAst ) ;
124 } . 
125 
126 /* Building expression nodes. */
127 
128 CsSumExpr
129 = { CaAst
130 := mAsSumExpr
131 ( CsExpr : CaAst
132 , CsTerm : CaAst
133 ) ;
134 } . 
135 
136 CsProductTerm
137 = { CaAst
138 := mAsProductExpr
139 ( CsTerm : CaAst
140 , CsFactor : CaAst
141 ) ;
142 } . 
143 
144 CsLiteral
145 = { CaAst
146 := mAsLiteral
147 ( TokLiteral : CaValue ) ;
148 } . 
149 
150 END Build 
151 

Listing Two
 1 
 2 /* Handwritten part of scanner spec. */
 3 
 4 SCANNER Scanner 
 5 
 6 EXPORT 
 7 { FROM Positions IMPORT tPosition ; 
 8 INSERT tScanAttribute 
 9 } 
10 
11 GLOBAL
12 { INSERT ErrorAttribute } 
13 
14 LOCAL
15 
16 { PROCEDURE LiteralValue ( ) : CARDINAL 
17 
18 ; VAR I , Value : CARDINAL ; 
19 ; VAR String : Strings . tString ; 
20 ; BEGIN
21 GetWord ( String ) 
22 ; Value := 0 
23 ; FOR I := 1 
24 TO Strings . Length ( String ) 
25 DO
26 Value 
27 := Value * 10 
28 + ORD ( Strings . Char 

29 ( String , I )
30 )
31 - ORD ( '0' )
32 END 
33 ; RETURN Value 
34 END LiteralValue ; 
35 }
36 
37 DEFINE
38 
39 Digit = {0-9} .
40 
41 START Literal 
42 
43 RULE
44 
45 INSERT RULES #STD# 
46 
47 #STD# Digit + 
48 : { Attribute . TokLiteral . CaValue
49 := LiteralValue ( ) ;
50 RETURN TokLiteral ;
51 } 
52 

Listing Three
 1 
 2 /* Abstract syntax */ 
 3 
 4 TREE Ast VIEW Ast 
 5 
 6 RULE
 7 
 8 AsProgram = AsStmt .
 9 
10 AsStmtStar
11 = < AsStmtStarNone = . 
12 AsStmtStarAnother
13 = Next : AsStmtStar 
14 Stmt : AsStmt . 
15 > . 
16 
17 AsStmt 
18 = < AsPrintStmt 
19 = AsExpr .
20 AsIfStmt
21 = AsExpr AsStmt .
22 AsCompoundStmt 
23 = AsStmtStar . 
24 > . 
25 
26 AsExpr
27 = < AsSumExpr 
28 = Left : AsExpr 
29 Right : AsExpr . 
30 AsProductExpr 
31 = Left : AsExpr 
32 Right : AsExpr .
33 AsLiteral 

34 = [ AaInitValue : INTEGER ] . 
35 > . 
36 
37 MODULE Evaluate 
38 
39 EVAL Eval
40 
41 PROPERTY 
42 
43 DECLARE 
44 
45 AsExpr 
46 = [ AaValue : INTEGER OUTPUT ] . 
47 
48 AsSumExpr
49 = { AaValue
50 := Left : AaValue
51 + Right : AaValue ; 
52 } . 
53 
54 AsProductExpr
55 = { AaValue
56 := Left : AaValue
57 * Right : AaValue ; 
58 } . 
59 
60 AsLiteral
61 = { AaValue :- AaInitValue ; } .
62 
63 AsExpr
64 = { AaValue := 0 ; } . 
65 
66 END Eval 
67 

Listing Four
 1 
 2 /* Transformation specification. */
 3 
 4 TRAFO Trans 
 5 
 6 TREE Ast 
 7 
 8 PUBLIC FoldProgram 
 9 
10 EXTERN Ast , ReleaseAst 
11 
12 GLOBAL
13 { FROM Ast IMPORT ReleaseAst ; } 
14 
15 FUNCTION FoldProgram 
16 ( AsProgram ) AsProgram 
17 
18 AsProgram ( Stmt )
19 RETURN 
20 AsProgram ( FoldStmt ( Stmt ) ) ; . 
21 
22 FUNCTION FoldStmtStar 
23 ( AsStmtStar ) AsStmtStar

24 
25 AsStmtStarAnother ( Next , Stmt ) 
26 RETURN 
27 AsStmtStarAnother
28 ( FoldStmtStar ( Next ) 
29 , FoldStmt ( Stmt ) 
30 ) ; . 
31 
32 Tree RETURN Tree ; . 
33 
34 FUNCTION FoldStmt 
35 ( AsStmt ) AsStmt 
36 
37 AsCompoundStmt ( Stmts ) 
38 RETURN 
39 AsCompoundStmt 
40 ( FoldStmtStar ( Stmts ) ) ; . 
41 
42 AsIfStmt 
43 ( AsExpr ( { 0 } , .. ) 
44 , ThenStmt 
45 )
46 RETURN FoldStmt ( ThenStmt ) ; . 
47 
48 AsIfStmt ( Expr , Stmt ) 
49 RETURN 
50 AsIfStmt 
51 ( Expr 
52 , FoldStmt ( Stmt ) 
53 ) ; . 
54 
55 Tree RETURN Tree ; . 
56 






























Using JavaScript to Create Interactive Web Pages


A cross-platform object scripting language




Tom Tessier


Tom is a student in the engineering physics department at the University of
Alberta, Canada. He can be reached at tessier@ee .ualberta.ca.


JavaScript is a cross-platform object scripting language that lets you glue
together HTML documents, Java applets, and Netscape plug-ins on both clients
and servers. One way of differentiating between Java and JavaScript is that
Java is typically used by programmers to create new objects and applets, while
JavaScript is used by HTML page authors to dynamically script the behavior of
those objects.
Although it was developed jointly by Netscape and Sun, JavaScript (which is
based on LiveScript, Netscape's HTML scripting language) has already been
licensed by a number of software companies, including Spyglass, Oracle,
Metrowerks, Sega, Borland, Adobe, and Sybase.
Implemented only within Netscape Navigator 2.0, Beta 2 and up, JavaScript
still is in its very early stages and is more a complement to ordinary Java
applets than a stand-alone replacement. However, even in its current,
primitive form, this scripting language transforms ordinary HTML into a
powerful, client-based interpreter. For example, JavaScript, embedded in a Web
page, can recognize and respond to user events such as mouse clicks, proofread
form inputs before sending data off to servers, and more.
JavaScript resembles Java in that support for most of Java's expression syntax
and flow-control features is available, as well as numeric, Boolean, and
string types. But unlike Java, which is a compiled language, JavaScript is
executed on the fly by the Netscape interpreter. JavaScript is relatively
secure in that no writes to a user's hard drive can occur. Also, JavaScript
programs can be run from any page, without requiring root or similar
file-access privileges. This is perhaps the main advantage of
JavaScript--average users can store complex CGI-like scripts on their current
page and still only pay the usual monthly Web rental fee. 


Using JavaScript


Take a look at Example 1 and the resulting page in Figure 1. The <SCRIPT
LANGUAGE="LiveScript"> tag is used to initiate the JavaScript session.
Alternatively, a URL filename may be specified by using SRC in the tag <SCRIPT
SRC="script.ls''>. The script is evaluated once after the page loads.
Functions, however, are stored in memory, allowing for repeated execution upon
user events. Notice the HTML comment tag (<!--) within the JavaScript. This
prevents older browsers from dumping the script contents onto the page. Also,
the <SCRIPT> tag was placed directly after the <HEAD> tag. Since everything
between <HEAD> is loaded first, placing the script tag right after it ensures
that the JavaScript code is available before the user has a chance to trigger
any event handlers. Keep in mind that the entire page is loaded before any
script tags are evaluated. 
The document object is used in Example 1. Although you cannot create your own
objects, the built-in objects are still quite useful. Figure 2 presents some
of the available objects and their properties, methods, and event handlers,
while Example 2(a) illustrates their uses. It is important that you think of
treating a form's input text areas as outputs. The first and second form
inputs are used to interact with users, while the last exists solely to
provide an output-text area. This works well if you can get used to using a
form input as an output. As you can see in Example 2(a), event handlers are
actually embedded within normal HTML code, inside the declaration of tags such
as <FORM>, <INPUT>, or <A HREF>. These event handlers only activate when the
desired action occurs--a mouse click, mouse over text area, and the like.
<SCRIPT> tags are not required for event handlers.
The onChange handler passes the properties of the calculation text area to a
custom restore function, which acts as the output text area watcher. If the
user changes the calculation area, the function restores the text to its
default value and displays a warning. In the onClick handler, calc(form) is
called with the argument this.form. Such an identifier indicates to pass all
form input properties to the desired function, allowing for smaller argument
lists. Although not recommended, all of the required identifiers could have
been passed instead, as in Example 2(b).
The eval command in Example 2(a) is a built-in function that evaluates the
mathematical expression stored in brackets. There are four different types of
built-in functions, three of which are listed in Figure 3. 
If you need to change the value of a form object not included in a
function-input list, the properties must be accessed directly. In the case of
forms, the properties are referred to as forms[0], forms[1], and so on. For
example, referring to a text field named "response" in a document's third form
requires using the object document.forms[2].response. To change the value of
the text in that form, you would do something like
document.forms[2].response.value = "New text." This type of direct access also
is illustrated in Example 2(c). Notice that the restore function has been
replaced with actual JavaScript code, semicolons separating each command. 
In its current implementation, JavaScript doesn't offer built-in array
creation. You must define your own "make-array" function to set up the
required properties and methods as in Example 3.


Creating an Interactive Multiple-Choice Program


Listing One utilizes all of the concepts I've discussed so far. When executed,
the program presents the client browser with an interactive form, asking users
to select an answer out of three options; see Figure 4. In essence, it is a
multiple-choice test. When users select a wrong answer, alert is used to
display the correct result. A counter keeps track of the total number of
questions correctly answered.
Note in Listing One, that trying to call a function involving document.write
after a script has been loaded is illegal; see Example 4. This is the reason
why form-text-area inputs are used as outputs throughout Listing One. Also,
simply reloading the JavaScript multiple-choice page into Netscape via Ctrl-R
isn't enough to reset the form, since all default values are changed by the
code. You either have to add a Reset button or a link (a link is used in
Listing One; if index.html is the name of the JavaScript testing program, then
the page will be reset).
While Listing One contains everything within a single HTML file, there are two
other ways to output new questions to the client browser. Both techniques
allow for the creation of dynamic Web pages, whereby new images can be placed
alongside new text (and not have all the text appear in form inputs). But both
methods require multiple HTML files to work. 
The first technique, easily implemented using the concepts I've presented
here, demands a new URL be loaded every time a radio button is pressed. This
involves setting up separate URLs for each question--a running total of the
number of correct answers cannot be maintained without using some CGI. 
The second method needs only three HTML files, and allows you to easily keep
track of the number of correct answers. This second technique makes use of a
relatively new concept called "frames" (again available only in Netscape 2.0
and up). Simply by taking advantage of Netscape's ability to share data
between frames and using a "hidden" frame as a temporary storage area, you can
compute new data for the main "display" frame. For example, frame 0 (hidden
from the user's view) would contain a form named "code," which is just an
empty scratch text area. The JavaScript code in frame 1 must access the code
from frame 0 (using parent.frames[0].document.forms[0]. code.value) and use
document.write to display it (first reload the display frame with parent.
frames[0].location = "sameurl"). The technique is complicated, and since
Netscape's current implementation of frames and data sharing is very unstable,
I won't go into it any further. After all, it's difficult to debug a page when
GPFs randomly occur. In fact, in the current version I am using (2.0, Beta 4),
simply resizing a Netscape window containing frames can crash the system.


Conclusion


JavaScript is a powerful, interpreted Web language. Used in conjunction with
normal HTML, Java applets, and CGI, interactive Web content can be vastly
improved. Load can be transferred off of Web servers, easing strains and
improving performance. With some imagination and ingenuity, anyone can create
exciting content. Good luck and happy surfing!
Figure 1: The Web page resulting from the code in Example 1.
Figure 2: A list of the common document specific objects available in
JavaScript (method = function): (a) window object; (b) document object; (c)
form object; (d) text-element object; (e) radio-button object.
(a)
Properties:
 parent -- parent frame.
 frames[index] -- array of frame objects, one per frame.
 frames.length -- number of frame objects in the window.
 status -- enables you to set the message in the status bar at the bottom of
the client window.
Example: 
<A HREF="" onClick="this.href=getURL()" onMouseOver="window.status='Stay on
target'; return true">Go!</A>
Methods:
 alert("string") -- pop up a window displaying "string" .

 confirm("string") -- pop up a window displaying "string" .
 Returns True if Okay clicked, False, if Cancel clicked.
 prompt("string",default) -- pop up a window displaying "string" and prompt
the user for input, where default is the default-input value.
Example:
<INPUT TYPE="text" NAME="inputarea" SIZE=10
onMouseOver="document.forms[0].inputarea.value= prompt('Enter a number:',0)">

(b)
Properties:
 forms[index] -- array of form objects, one per form.
 forms.length -- number of form objects in document.
 links[index] -- array of HREF link objects.
 links.length -- number of link objects in document.
Methods:
 write("HTML") -- write the raw HTML commands to the current window .
Example:
document.write("<HTML><TITLE>Digital Signal Processing</TITLE></HTML>")
 writeln("HTML") -- same as write(), but adds a carriage return.
 to the end.
 clear() -- clears the window.

(c)
Properties:
 action -- string value of the ACTION attribute.
Event Handlers:
 onSubmit() -- executed when the form is submitted.
Example:
<FORM action=post onSubmit="javafunc()">
Methods:
 submit() -- submits the form.

(d)
Properties:
 name -- the value of the NAME attribute (a string).
 value -- the contents of the field (string).
 defaultValue -- the initial contents of the field (string).
Event Handlers:
 onChange -- executes after the user modifies the text contained within the
box.
 onFocus -- executes when input focus enters the field.
 onBlur -- executes when input focus leaves the field.
 onSelect -- executes when something is selected inside the field .

(e)
Properties:
 name -- the value of the NAME attribute (string).
 value -- the value of the VALUE attribute (string).
Event Handlers:
 onClick -- execute when button clicked.
Methods:
 click() -- select a radio button.
Note: click() does not actually click on a radio button, just changes its
value to on. It cannot be used to activate an event handler.
Figure 3: Three built-in functions (methods) available in JavaScript (these
built-in objects do not have event handlers): (a) math object; (b) string
object; (c) eval object.
(a)
Properties:
 E -- value of the constant E (precision equal to that of a real
number)
 LN10 -- value of LN 10
 LN2 -- value of LN 2
 PI -- value of the constant PI

 SQRT2 -- value of the SQRT of 2
Methods: (standard math functions)
 abs(value)
 acos(value)
 asin(value)
 atan(value)
 cos(value)
 sin(value)
 tan(value)
 exp(value)
 log(value)
 round(value)
 sqrt(value)
Example:
Math.cos(Math.PI/2) gives 0.
Apply the "with" command when using math functions. This special command
allows Math objects to be written without the "Math" reference.
Example:
with Math {
 a = cos(PI/2)
 b = sin(PI/2)
}

(b)
Example:
to define a string, simply write the following JavaScript code:
mystring = "What String Equals"
Properties:
 length -- defines the length of a string
Methods:
 substring(i, j) -- takes a substring from i to j inside the string
Example:
result = mystring.substring(0,4) gives "What"
 toLowerCase() -- converts mystring to lowercase
 toUpperCase() - converts mystring to uppercase
Example:
result = mystring.toUpperCase gives "WHAT STRING EQUALS"

(c)
eval(expression) - calculates the result of expression.
<script language="LiveScript">
// This function defines an array such that the first
// property, length, (with index of zero), represents
//the number of elements in the array. The remaining
// properties have an integer index of one or greater,
// and are initialized to zero.
function MakeArray(n) {
 this.length = n;
 for (var i = 1; i <= n; i++) {
 this[i] = 0 }
 return this }
array = new MakeArray(2);
array[1] = "Apple"
array[2] = "Orange"
var i = 0
while (i < 2) {
 i++
 document.writeln(array[i]) }
</script>
Figure 4: A typical interactive form written in JavaScript.

Example 1: A sample HTML file that makes use of the JavaScript scripting
language.
<HEAD>
<SCRIPT LANGUAGE="LiveScript">
<!-- Use a comment to hide the script contents from unsupported browsers.
var n = 19
var max = 100
document.writeln("Hello world.")
document.writeln("Let's count from ", n+1, " to ",
 max, ":<BR>")
count(n, max)
function count(ninput, maximum) // count from ninput to maximum and
// display in document
 {
 while( ninput < maximum )
 {
 ninput ++
 document.write(ninput," ") // write the current value of n, placing a
// space between each number
 }
 document.writeln("Done writing.")
 }
// End old browser hiding here. -->
</SCRIPT>
</HEAD>
<BODY>
This is the usual HTML text body.
<SCRIPT LANGUAGE="LiveScript">
<!-- hide from old browsers
document.write("<BR>And this is a link coded into ")
document.writeln("the body via JavaScript.")
document.writeln('<A HREF="http://www.ee.ualberta.ca">EE Page</A>')
// -->
</SCRIPT>
</BODY>
</HTML>
Example 2: A sample form in HTML that uses JavaScript for user interactivity.
(a)
<HTML>
<HEAD>
<SCRIPT LANGUAGE="LiveScript">
function calc(form)
 {
 form.calculation.value = eval(form.matharea.value)
 form.calculation.defaultValue = form.calculation.value
// and set the default value in case have to restore the
// output textarea (ie: if the user puts some text into it)
 }
function restore(input)
 {
 input.value = input.defaultValue
 alert("Please do not touch the result window.")
 }
</SCRIPT>
</HEAD>
<BODY>
<FORM>
Enter a mathematical expression, such as 3.14 + 5 * 20.333 / 40.
<INPUT TYPE="text" NAME="matharea" SIZE=30>
<INPUT TYPE="button" VALUE="Calculate the math" onClick="calc(this.form)">

<br>Calculated result:
<INPUT TYPE="text" NAME="calculation" onChange="restore(calculation)"><BR>
</FORM>
</BODY>
</HTML>

(b)
replace
function calc(form)
with
function calc(finalresult, mathinput)
 {
 finalresult.value = eval(mathinput.value)
 finalresult.defaultValue = finalresult.value
 }
replace
<INPUT TYPE="button" VALUE="Calculate the math" onClick="calc(this.form)">
with
<INPUT TYPE="button" VALUE="Calculate the math" onClick="calc(this.form.
 calculation, this.form.matharea)">

(c)
replace
<INPUT TYPE="text" NAME="calculation" onChange=restore(calculation)>
with
<INPUT TYPE="text" NAME="calculation"
onChange='calculation.value=calculation.defaultValue;
alert("Please do not touch the result window.")'>
Although not necessary in this case,
onChange='document.forms[0].calculation.value=
document.forms[0].calculation.defaultValue;alert("...
may have been used instead.
Example 3: Creating arrays in JavaScript.
<script language="LiveScript">
// This function defines an array such that the first
// property, length, (with index of zero), represents
//the number of elements in the array. The remaining
// properties have an integer index of one or greater,
// and are initialized to zero.
function MakeArray(n) {
 this.length = n;
 for (var i = 1; i <= n; i++) {
 this[i] = 0 }
 return this }
array = new MakeArray(2);
array[1] = "Apple"
array[2] = "Orange"
var i = 0
while (i < 2) {
 i++
 document.writeln(array[i]) }
</script>
Example 4: Illegal code. You cannot write directly to the document after the
page has been loaded.
<SCRIPT LANGUAGE="LiveScript">
function writedoc () {
document.clear()
document.writeln("<HTML><BODY>Hi man.</BODY></HTML>")
 }
</SCRIPT>
<BODY>

<A HREF="la.html" onMouseOver="writedoc()">Place
mouse over me</A>
</BODY>

Listing One
<html>
<head>
<title>Sample JavaScript Testing Application</title>
<script language="LiveScript">
<!-- hide this script tag's contents from old browsers
// editable variables 
 var totalnum = 3 // total # of questions
 var totalans = 3 // total # of answers per question (used to
// generate the question/answer array below)
 var correctans = 1 // the question # (from 0 to N-1) of the
// correct answer to the first question. ***NOTE***: Be sure to 
// initialize this to the correct answer of the first question.
// fixed variables: don't touch these
 var count = 0
 var arrayind = 1 // index into the questans array - must
// start at 1 always, since that's where the first array string is
 var rightans = "none"
 var totalright = 0 // total # of questions answered right
// This function defines an array such that the first property, length, (with
// index of zero), represents the number of elements in array. The remaining 
// properties have an integer index of 1 or greater, and are initialized to 0.
function MakeArray(n) {
 this.length = n;
 for (var i = 1; i <= n; i++) {
 this[i] = 0 }
 return this
 }
// formula for # of array elements:
// total number of questions times total answers allowed per question
// + 2 (plus two because have to have a string for the actual
// question itself and for the value indicating the correct answer).
// eval used to convert (totalnum)*(totalans+2) into a number usable
// to pass to MakeArray... need eval since the expression is inside a
// function call (ie: inside the MakeArray call)
questans = new MakeArray(eval((totalnum)*(totalans+2)));
// array for question list (start with question #2 - define question
// one in the document's html code below)
questans[1] = "Pyconuclear reactions are:"
questans[2] = "reactions which require high density."
questans[3] = "reactions which depend on heat."
questans[4] = "reactions which require low density."
questans[5] = 0 // correct answer here is "reactions which require
// high density" (list answers from 0 to ans#-1)
questans[6] = "A white dwarf maintains its compact shape via:"
questans[7] = "coulombic repulsion."
questans[8] = "fermi neutron pressure."
questans[9] = "fermi electron pressure."
questans[10] = 2 // correct answer: "fermi electron pressure."
// end of editable questions. Don't change the below (just the text
// displayed at the end of the test)
questans[11] = "The test is complete. Thank you."
questans[12] = ""
questans[13] = ""
questans[14] = ""

questans[15] = 255 // correct answer: none
// create new form outputs in response to mouse click on a radio button
function checkout(form, questionnum)
{
 if ( count > totalnum) // if user clicks a radio button after the
// test is complete, display an alert
 {
 alert('To return to the main page, click on "Return to Main Page."')
 }
 else
 {
 if ( questionnum == correctans) // if the currently selected
// question is the correct response, say so
 {
 totalright++
 alert("Correct.")
 }
 else // if wrong answer, display the correct one
 {
 if ( correctans == 0)
 { rightans = form.answer1.value }
 if ( correctans == 1)
 { rightans = form.answer2.value }
 if ( correctans == 2)
 { rightans = form.answer3.value }
 alert("Incorrect. The right answer is:\n"+rightans)
 }
 count++ // increment count so can goto next question
 if ( totalright == 1) // if only one right, make sure use the
// word "answer" instead of the plural form "answers" (To Appease the
// Pro Literacy Net Movement)
 {
 form.completed.value = "You have completed "+count+" of "+
totalnum+" questions, with "+totalright+" correct answer."
 }
 else
 {
 form.completed.value = "You have completed "+count+" of "+
totalnum+" questions, with "+totalright+" correct answers."
 }
 form.completed.defaultValue=form.completed.value // and set the
// default value for the completed text area to the same as the current
// value (in case user enters garbage into the completed text
// area, so can restore it)
// increment arrayind after each use below
 form.question.value = questans[arrayind++]
 form.question.defaultValue=form.question.value // and set the
// default value to the new value in case the web user clicks on the
// actual Question text and changes it... so can restore the new 
// values from the questans array
 form.answer1.value = questans[arrayind++]
 form.answer1.defaultValue=form.answer1.value
 form.answer2.value = questans[arrayind++]
 form.answer2.defaultValue=form.answer2.value
 form.answer3.value = questans[arrayind++]
 form.answer3.defaultValue=form.answer3.value
 correctans = questans[arrayind++] // and set the new correct
// answer
 if ( count == totalnum) // if count = the total # of questions

 {
 count++ // then increment count again so is greater than
// totalnum so can activate the alert above if user tries to click on
// a radio button after the test is complete
 }
 }
}
// restore a form input to its default value if the user messed with it
function restoreval(input)
{
 input.value=input.defaultValue
 alert("Please click on the radio buttons only.")
}
<!-- done hiding from old browsers -->
</script>
</head>
<body>
<h1>JavaScript Multiple Choice Test</h1>
<form method="post">
<TEXTAREA name="question" rows=1 cols=50 wrap=soft onChange=
"restoreval(question)">
Question 1: When our star (Sol) dies it will most likely become:
</TEXTAREA>
<BR>
<BR>
<BR>
<LI><INPUT TYPE="radio" onClick="checkout(this.form, 0)">
<!-- parameter this.form means pass all parameters from this whole 
form to the checkout JavaScript function. The second parameter 
indicates the question # the user clicked on (from 0 to (Number of 
questions)-1) -->
<TEXTAREA name="answer1" rows=1 cols=50 wrap=soft onChange=
"restoreval(answer1)">
A black hole.
</TEXTAREA>
<BR>
<BR>
<LI><INPUT TYPE="radio" onClick="checkout(this.form, 1)">
<!--<input name="answer2" size=40 value="Mary Lou">-->
<TEXTAREA name="answer2" rows=1 cols=50 wrap=soft onChange=
"restoreval(answer2)">
A white dwarf.
</TEXTAREA>
<BR>
<BR>
<LI><INPUT TYPE="radio" onClick="checkout(this.form, 2)">
<!--<input name="answer3" size=40 value="Harry Fesie">-->
<TEXTAREA name="answer3" rows=1 cols=50 wrap=soft onChange=
"restoreval(answer3)">
A neutron star.
</TEXTAREA>
<br>
<br>
<br>
<TEXTAREA name="completed" rows=1 cols=50 wrap=soft onChange=
"restoreval(completed)">
You have completed 0 of 3 questions, with 0 correct answers.
</TEXTAREA>
</form>

<br>
<br>
<LI><A HREF="index.html">Return to main page.</A>
</body>
</html>
DDJ

























































PROGRAMMING PARADIGMS


Fear and Loathing in the Valley of the Silicon Dolls




Michael Swaine


The homeless woman turned out of the alley in front of me as I walked along
the sidewalk by Mr. Goody's in downtown Santa Cruz. If you ask me how I knew
she was homeless, I have to admit that I didn't. There was an air about her,
though, of someone down on her luck. The stooped posture, the unwillingness to
meet the eyes of passers-by. The big tip-off, of course, was the shopping cart
she was pushing, with its tattered garbage bag tied carefully shut, protecting
I could only guess what meager belongings.
There but for fortune and Moore's Law, I thought, go I; and I wondered, not
for the first time, at this strange passivity that has seized the American
populous. What does it say about us that so many are willing to accept being
homeless, or for that matter, that so many of the rest of us are willing to
accept the homelessness of others? Home ownership is becoming a luxury that
only the rich can afford, the gap between the rich and poor is growing, and
average workers are less well-off than their parents. Yet Americans act as
helpless as Russian peasants mystified at the concept of democracy, clamoring
for "leaders" to fix things, utterly unaware that it really is we who are in
charge.
Although there was a chill in the air, the woman was probably warm enough in a
bright white sweatshirt, across the back of which was emblazoned the
reassuring news that "Borland fights for programmers' rights."
There are two or three good punchlines that this story could support, but I
haven't the heart to deliver them. The best I can do is wistfully observe that
this is just another example of a proud emblem of Borland International
getting recycled and ending up where you'd least expect it. Sort of like Gene
Wang.


Silicon Roadkill


The Gene Wang saga slipped your mind? Never fear, Michael Hyman will remind
you. In PC Roadkill: Twisted Tales from Silicon Valley (IDG Books, 1995),
Hyman dishes up the dirt on that and other scandals, lawsuits, and dirty
tricks that have made life in the personal computer industry just a little
more interesting.
He describes the big look-and-feel lawsuits, and he doesn't miss that apt Guy
Kawasaki observation from 1992. Xerox had sued Apple over the look and feel of
the Macintosh user interface, only to lose because the statute of limitations
had run out. Kawasaki's comment: "Xerox can't even sue you on time." Hyman
also sorts out who sued whom in the spreadsheet world, a particularly
litigious product category. At one time or another, all of the following have
been somehow involved in spreadsheet lawsuits: Lotus, Personal Software,
Borland, Paperback Software, Santa Cruz Operation, WordTech, Mosaic, Novell,
and Borland's insurance carriers.
Of course Hyman reports on the big ones: the DOJ investigation of Microsoft
and the FTC investigation of IBM. The Microsoft case may not be over, and
Hyman reports that the 1956 decision against IBM may not be the final word in
that wrangle, either.
There's a lot more in the book than stories of litigation. Hyman has put
together a grab bag of untold stories and stories that some folks wish would
remain untold. Code names and Easter eggs. Classic office pranks and t-shirt
messages. Product marketing blunders and hiring and firing stories and
industry history. It's a fun book, and he's at work now on the sequel. 
Oh, the Gene Wang story. That's not the one in which Wang Labs, after filing
for bankruptcy protection, sues Microsoft over technology in OLE that it
claims infringes Wang patents, gets 90 million dollars out of Microsoft, and
gets back on its feet, sort of. No, it's the one in which a Borland executive
exchanges e-mail with the president of Symantec, quits to work for Symantec,
has his e-mail read by Borland, is accused of trade-secret theft, and is
threatened with jail time. This one also is still unresolved.
Gene Wang is scarcely the only ex-Borlander to end up in the camp of one enemy
or another. Michael Hyman himself once was the business unit manager for the
languages group at Borland and now works for "a large software company in the
Northwest." That grrmphing sound you hear is Frank Borland turning over in his
grave.


Paradigms Past


Hyman doesn't deal with every legal case in computing history in his coverage
of industry litigation. In fact, he skips what was, in a purely historical
sense, perhaps the most significant and certainly one of the most protracted
and contentious cases: the legal battle over credit for the invention of the
electronic digital computer. (He does, however, give full credit to the winner
of that battle, something that anecdotal computer historians sometimes fail to
do even today.)
This year is a good time to reflect on that battle and its outcome. For one
thing, several of the key figures died last year, so the events and issues
truly are history now. For the record, the computing pioneers who died last
year, all of whom fully deserve to be remembered as inventors of the
technology of electronic digital computers, are: J. Presper Eckert (June 3,
1995), John V. Atanasoff (June 15, 1995), and Konrad Zuse (December 18, 1995).
There's another reason why 1996 is particularly appropriate for a review of
this bit of history. On February 14, the University of Pittsburgh began an
18-month celebration of the 50th anniversary of ENIAC. It will be interesting
to see how delicately the participants tread as they walk around the topic of
the invention of the automatic digital computer, a breakthrough once claimed
by J. Presper Eckert and John Mauchly, the inventors of ENIAC.
In the 1940s, the electronic digital computer was struggling to come into
existence through various avenues: Eckert and Mauchly, John Atanasoff, Howard
Aiken at Harvard, several researchers in England, and Konrad Zuse in Germany.
All were pursuing work that could have led, in fact did lead in each case, to
the design and building of at least a prototype electronic digital computer.
But it was Eckert and Mauchly who first proved that there was peacetime use
for the things, and money to be made from building them. And it was Eckert and
Mauchly who for years got most of the credit, until the issue was brought to
court. Ironically, it was a claim by Sperry Rand, the holder of the patent for
Eckert and Mauchly's work, that caused Eckert and Mauchly to be stripped of
the title of inventors of the automatic digital computer.
From Judge Earl R. Larson's decision, on October 19, 1973: 
The subject matter of one or more claims of the ENIAC was derived from
Atanasoff, and the invention claimed in the ENIAC was derived from Atanasoff.
SR and ISD are bound by their representation in support of the counterclaim
herein that the invention claimed in the ENIAC patent is broadly "the
invention of the automatic digital computer." Eckert and Mauchly did not
themselves first invent the electronic digital computer, but instead derived
that subject matter from one Dr. John Vincent Atanasoff.
Note the "bound by their representation" bit. The case was really about
something else. Honeywell had sued Sperry Rand Corp. (SR above) and Illinois
Scientific Developments (ISD), claiming that they were attempting to enforce a
bogus patent. SR, in its counter argument, claimed that the invention covered
in the ENIAC patent was nothing less than the invention of the automatic
digital computer. I don't know if that claim was essential to their defense of
their patent, but it certainly raised the stakes.
Then things took an unexpected turn. It emerged that:
1. Although he had done related work, Mauchly had developed no plans for an
automatic digital computer before meeting Atanasoff in Ames, Iowa;
2. Atanasoff had by that time developed plans that any competent engineer
could use to build an automatic digital computer; and
3. Atanasoff had shown those plans to Mauchly over the course of several days,
allowed Mauchly to stay at his house, and let Mauchly take the plans home with
him.
The judge had no trouble concluding that the developments covered in the ENIAC
patent were derived from Atanasoff's work. And Sperry Rand had boldly claimed
that those developments were the whole shebang: the invention of the automatic
digital computer.


The Real Inventor


As a result, official computer history now gives that title to John Atanasoff.
Except, that's not the whole story....
As hairy and protracted as the legal battle was, and as clear and decisive as
the decision apparently was, the whole thing was really only a parochial
squabble. History (or some historians' histories) really only credits
Atanasoff and graduate student Clifford Berry with creating the first
functioning prototype electronic digital computer in the United States.
Worldwide, the credit goes to Konrad Zuse. Never heard of him? You're not
alone. Zuse was pretty much unknown outside Germany while he was developing
the computer and remained largely unknown in the United States until recent
years. He did, however, unlike Atanasoff, get patents, make money, start a
business employing 1000 people, and retire unambiguously honored in his own
country.
And Atanasoff? Although he was late to get the credit he felt he deserved and
never profited significantly from his invention, his story is not really a
tragic one. He did not die either homeless or forgotten. His story, including
the long drive in the night he took to clear his head and the roadhouse where
he stopped and designed the electronic digital computer on paper napkins (or
was it the back of the menu?), is a romantic adventure. Howard Rheingold, in
Tools for Thought (Simon & Schuster, 1985; sadly, out of print), called him
"the last of the lone inventors in the field of computation."
Meanwhile, back in present-day Iowa, Gary Sleege and John Erickson are
building a full-scale replica of the Atanasoff-Berry Computer. It's scheduled
to go online in August of this year. I'll keep you posted.
Now, does anyone remember the first successfully marketed toy digital
computing device? The answer is at the end of the column.


AI Fights Back with Humor



Two years ago I reported on the efforts of the artificial intelligence
community to get some respect from the public in general and from research
funding sources in particular. The phrase "artificial intelligence" had been
overused by hype artists, and using it in a research proposal or bragging that
your company used AI technology was regarded, at best, as empty rhetoric, and
at worst, as bad science. But after a long AI winter of disrespect, it looked
like some buds of acceptance were peeking through the snow.
That perspective lends some flavor to an exchange in the Winter 1995 issue of
Artificial Intelligence (Volume 16, Number 4), the journal of the American
Association of Artificial Intelligence. 
It seems that Patrick J. Hayes and Kenneth M. Ford presented some, er,
unofficial awards at the last International Joint Conference on Artificial
Intelligence (IJCAI): the Simon Newcomb Awards, named for a distinguished
astronomer whom they characterize as having been "hilariously wrong about
artificial flight." Newcomb produced important tables for the motions of the
moon and the planets and worked with Michelson in determining the velocity of
light. He also wrote a number of popular articles asserting, in the strongest
terms, the impossibility of heavier-than-air flight, some of them written
after the Wright brothers flew at Kitty Hawk. The Simon Newcomb Awards are
given out to distinguished philosophers or scientists who are hilariously
wrong about artificial intelligence. According to Hayes and Ford, the kind of
wrong arguments they are looking for are "those that a graduate student in
computer science might find hilarious." 
Several readers of Artificial Intelligence thought that the awards were
unwise, that ridicule is not a proper critical method, blah blah blah. No
sense of humor, obviously. One reader sniffed that he didn't recall being
taught the method of ridicule in his graduate experimental methods courses. No
doubt he's right, but the reason is that ridicule comes naturally to graduate
students. If graduate students in computer science are not ridiculing their
professors and anyone else who dares to offer an opinion about their field, I
fear for the future of computer science.
Frankly, I'm glad to see the AI community responding in kind to its most
vociferous detractors, some of whom are more than willing to treat the goals
of AI researchers with ridicule. And after all, the granting of the awards was
not positioned as an AAAI-sanctioned event, but merely as a whimsical
presentation of some AAAI members at a buffet. Oh, the recipient of the Simon
Newcomb Award at that buffet was Roger Penrose, author of The Emperor's New
Mind, which I must admit to having liked, with some (significant)
reservations.
At that IJCAI meeting there was another robotics contest. The participants are
getting pretty adept at picking up trash and throwing it into wastebaskets.
The emergent theme of the conference was Sifting Through the Trash, er, I
mean, Data Mining. A number of presentations dealt with the growing problem of
databases too large to search manually and the need for tools for intelligent
search. One such tool is described in that issue of Artificial Intelligence: a
tool for sifting through records of financial transactions to find possible
money laundering schemes. The cynic might wonder if the most likely customers
for such a program are not, themselves, the biggest money launderers, but we
don't think like that, do we?


The Swiss Army Chainsaw


"We will encourage you to develop the three great virtues of a programmer:
laziness, impatience and hubris." 
--Larry Wall, in Programming Perl
Perhaps while we're sifting through the electronic trash, we'll use a tool
called a "Pathologically Eclectic Rubbish Lister." Of course, some think that
Perl stands for "Practical Extraction and Report Language." Perl is an
interpreted language developed by Larry Wall at NASA and distributed over
USENET. It's not AI. According to that authoritative source, the Jargon File,
it "superficially resembles awk, but is much hairier. UNIX sysadmins, who are
almost always incorrigible hackers, increasingly consider it one of the
languages of choice. Perl has been described, in a parody of a famous remark
about lex, as the 'Swiss-Army chainsaw' of UNIX programming."
I've been describing somewhat obscure little languages in this column for a
few months now, and I have to admit that Perl is not nearly as obscure as
those described thus far. It also is not a new or different paradigm--
explicitly not, according to its creator, who, in a refreshingly agnostic
passage, says: "It has been stated that a language is not worth knowing unless
it teaches you to think differently. Perl is the exception to that rule...
because much of Perl is derived in spirit from other portions of Unix." Okay,
it's not novel. It's UNIX-like and C-like and utterly derivative.
Nevertheless, in the space I have left this month, there's just enough room
for a peek at the Perl paradigm, so let's peek.


Free and Easy


Perl was intended as a data-reduction language. The low-level, non-AI analog
of data mining. It has come to be widely used for Internet programming. One
reason might be the dataflow tracing mechanism that determines which data may
be derived from insecure sources. That alone makes it a Webmeister's dream.
Wall, a recipient of one of this year's Dr. Dobb's Journal Excellence in
Programming awards (see page 16), began by trying to combine what he liked
about C with what he liked about UNIX shell programs. He also decided it would
be good to have all the capabilities of awk and sed, so he lumped them in.
Perl is a capability superset of awk and sed. And he made it an interpreted
script language for speed of development, but built in such strong
pattern-matching and text-manipulation features that it can often outperform C
programs in these areas. Webmeisters also appreciate that it's easy to write
short programs in Perl that respond to text messages with specific actions.
Perl is free and available for most platforms. The distribution comes with a
bunch of libraries and there are translators that turn awk and sed programs
into Perl programs, so your "legacy" awk and sed code can make the transition
to Perl.
But the most intriguing thing about Perl is that it has caught the eye of
Cygnus Support. Cygnus Support does for-pay support for free software,
especially gcc, the C++ compiler written by free-software champion Richard
Stallman. If Cygnus Support--or another company with the same vision--supports
more free software products, it could make a large amount of excellent, but
not commercial-grade, academically developed software available in a practical
way to the world at large. It may even mark a shift in how people make money
from software. And that would be good.
The first successfully marketed toy digital computing device, at least
according to some sources, was Geniac, developed by Edmund C. Berkeley around
1955-56.





































C PROGRAMMING


A Glimpse Into the Future and Quincy 96 Continued




Al Stevens


In his article "Visual Programming in 3-D" (DDJ, December 1995), Marc Najork
writes about "Cube," his tiny, visual programming environment. If you missed
that article, I urge you to find a copy and read it. Cube is a 3-D visual
programming model and environment, albeit a simple one. Pay close attention to
its message because Cube foreshadows what programming will be like early in
the 21st century. Something resembling Cube--something more complex and more
comprehensive--will be a major computer programming model.
In the 50-odd years that digital computer programming has been a significant
human endeavor, the activity has seen numerous shifts in its approach, from
hard-wired plug boards and patch panels to object-oriented and quasi-visual
programming models. Each shift attempted to raise the programmer's level of
abstraction farther from the hardware and closer to the problem being solved;
each one was determined to provide an expression of the solution that more
closely suggested the problem. Some failed; some endured. Yet through all that
change, one thing stayed the same; programmers remained essentially stuck with
a single medium of expression, one characterized by flat, two-dimensional,
textual representations of objects and algorithms. Source code, we call it.
Early on, when the von Neumann stored-program model was recognized as the
obvious way to make computers solve problems, programmers began almost
immediately to search for better and easier ways to apply that model.
Programming thus evolved from cryptic, textual, machine-language source code,
written and read top-down and serially to cryptic, textual, object-oriented,
high-level source code, written and read top-down and serially.
Indeed there have been and are other ways to represent programs--flow charts,
structure charts, HIPO charts, schema diagrams, CRC cards, Booch diagrams, and
the like. But none of these expressions could be automatically translated into
executable code. They were meant instead to assist us with our objective,
which is the writing of source-code expressed as structurally indented rows of
digits, letters, and symbols written on a flat plane. This pervasive
source-code programming model reflects thousands of years of convention in the
communication of ideas; the format is prescribed and constrained by both the
flat medium--cave walls, papyrus, paper, video screen--and the
message--procedural, serial thought, one thing at a time, expressed and
regarded in a start-to-finish, top-to-bottom fashion.
Even the engineers on Star Trek TNG write their programs on flat panels.
But the flat model just won't do for much longer. As the computer gains power
and the network becomes the operating system, concurrency and parallelism will
take a more essential role in software. The computer can no longer be treated
as if its attention span resembles that of the conscious human mind, doing one
thing at a time, in a prescribed order, interrupting one thread only to
process another. Artificial intelligence never realized its full potential,
and one reason is that we never provided the technology with the essential
component that enables real intelligence: subconscious thought and process.
Once we free ourselves from that notion--that computer programs must be
written and read like prose and equations and must resemble our own limited
conscious ability to think serially--we can see the third dimension in
software. And what will free us from these mental bonds? The hardware will.
The exponential growth of computing power on the desk of self-directing
individuals with vision will not only enable but will compel research in these
areas. It has already begun. Someday, we will be able to deal with software as
we deal with the universe, moving about in three dimensions and interacting
with its objects, some above us, some below us, some behind, some in front,
all doing different things, doing them simultaneously, some related to others,
some not.
Predicting the future in print invites ridicule; the prediction remains a
matter of permanent record regardless of its accuracy. I'll take that chance
and be in some good company. In 10 years--15, tops--you will develop software
from a virtual reality environment. You will meander in three dimensions
throughout your program, rummaging among and examining graphical components to
reveal and modify their behavior. Data conduits will connect components to
pass arguments and results. Source, transform, and sink. This network of
components and connections--the program--will look more like a molecular
structure than like the video printouts of today. You will be able to view the
implementation, behavior, and interface of a component at any of several
levels of abstraction, depending on your mission and your interest. You will
explode components into their assemblies and subassemblies to see what makes
them tick. You will collapse your view of assemblies into component views and
treat them as trusted boxes once you see that they work properly. At the
highest level of abstraction, your program will be collapsed into one
component with its input and output pipes connected to its external devices
and data files.
The work to realize this fantasy is already underway.
Will such 3-D visual programming environments eliminate the need for
programmers to understand source code as we know it today? Not likely. C++ and
Visual Basic have not eliminated assembly language. Or machine language, for
that matter. In the future, when all else fails, you will look at an
occasional memory dump, stack pointer, and interrupt vector, just as you do
today. At levels of abstraction close to the bottom of a component you will be
permitted to view the conventional source code that the 3-D visual development
environment generates to implement the component. The source code will be of a
language that can be interpreted and compiled. You will try at every step to
avoid going that low for a looksee. Many of your younger colleagues will not
know how. The hidden language could look very much like C++.


Quincy 96, Chapter 2 


Return with us now to those thrilling days of yesteryear--er, today. Last
month, I launched the Quincy 96 project, a Windows 95 integrated development
environment for the GNU C and C++ compiler systems. I said then that there was
no debugger yet but that I was working on one. This month, the debugger is
partially completed, and I'll tell you about my experience so far with that
part of the project. I'm using this project as a deep learning experience with
Visual C++ 4.0 and the Microsoft Foundation Classes (MFC). Last month, I began
a list of design patterns about that platform, and I'll continue that
discussion, too.


The Quincy 96 Debugger


Quincy 96 is an MDI editor that launches the GNU compiler to compile and link
C and C++ programs. There's nothing unique about that. You can get some really
good Windows programmer's editors that do the same thing. RimStar from RimStar
Technology (rimstar@world.std.com or 603-778-2500) and Visual SlickEdit from
MicroEdge (http://www.slickedit.com or 800-934-3348) are two examples. Both
packages are Windows-hosted MDI programmer's editors with C-like macro
languages, and both allow you to launch the compiler of your choice from
within the editor. I like both of them. I'd be hard-pressed to choose a
favorite. (This project involved a lot of research. If it seems like I'm
plugging a lot of products and books in this column, it's because those
resources made the project possible.)
I'm developing Quincy 96 as a training tool, one that I can distribute
royalty-free on a CD-ROM. It is launched from within an interactive tutorial
that specifies what program to load, which source file and line number to
display, what breakpoints and watch variables to set, and so on. These
requirements imply OLE automation, which I will address in a future column
after I've figured it out, and an integrated debugger, a feature that
programmer's editors do not usually include.


Writing a Source-Level Debugger


There are three things to learn if you want to roll your own debugger. First,
how does a debugger work to interact with the program being debugged? Second,
where does the executable program file store the necessary symbolic debugging
information? Third, what is the format of that information? These three
questions are answered by delving into the mysteries of three complex
architectures: the Win32 SDK's debug API, the Win32 portable executable file
format, and the GNU symbolic table (stab) system for encoding debug
information.


Inside the Debugger


You can debug a program at the assembly-language level without any debugging
information. The MS-DOS DEBUG program does that, allowing breakpoints and
single-stepping, all of which implies that the hardware must be cooperating.
It is. Back when my brother Fred and I were building embedded systems with
4-MHz Z-80s, I used a homebrew debugger that plugged interrupt op codes into
the instruction stream to generate breakpoints.
Nothing has changed. That's how you debug a program on a Pentium. The x86
architecture includes software interrupts. The 1-byte op code 0xCC is the INT
03 instruction, reserved for debuggers. You can put the INT 03 op code in
place of the program's instruction op code where the break is to occur and
replace the original op code at the time of the interrupt. In the 386 and
later, you can set a register flag that tells the processor to generate an
unintrusive INT 01 instruction for every machine instruction executed. That
device supports single stepping. The INT 01 and 03 information and everything
else you want to know about interrupts comes from Ralf Brown and Jim Kyle's
invaluable PC Interrupts, Second Edition (Addison-Wesley, 1994).
Now, let's raise ourselves up a notch or two in the levels of abstraction. The
Quincy 96 debugger is a Windows 95 32-bit GUI program. At the time I was
planning this project, I had no idea how to connect such a program to
interrupt vectors, but I figured it would be hairy and was mulling it over
when I happened by the Nu-Mega Technologies (http://www.numega.com) booth at
the Software Development '95 East conference. Frank Grossman, the MFWIC ("main
fellow who's in charge") at Nu-Mega, was demonstrating BoundsChecker for
Windows. (Another plug: You should not write another Windows program without
having this tool. It's like air conditioning; once you've had it, you can't do
without it.) Frank used one of his own programs for a demo. I spotted a call
to a WaitForDebugEvent function in the source code and asked him what it was.
As it turns out, the Win32 SDK includes functions that allow one program to
launch another program and debug it. You can forget about how the interrupts
and interrupt vectors get managed. The SDK's debug API takes care of all that.
A debugger program launches a program to be debugged by calling the
CreateProcess function, specifying in an argument that the program is to be
debugged. Then the debugger program enters a loop to run the program. At the
top of the loop, the debugger calls WaitForDebugEvent. Each time
WaitForDebugEvent returns, it sets indicators that tell about the event that
suspended the program being debugged. This is where the debugger traps
breakpoints and single-step exceptions. WaitForDebugEvent fills in an event
structure that contains the address that was interrupted, the event that
caused the interrupt, and so on. The debugger calls GetThreadContext to get
the running context of the debugged program, including the contents of the
registers. The debugger can, as the result of programmer interaction, modify
these values and the contents of the debugged program's memory.
The debugger sets breakpoints by saving the op code at the instruction to be
intercepted and putting the INT 03 op code in its place. When the breakpoint
occurs, the debugger replaces the original op code in the program's
instruction memory, and decrements the interrupted program counter in the
saved context so that execution resumes at the instruction that was broken.
To single step a program, the debugger sets a bit in the context's flags
register that tells the processor to generate an INT 01 for every instruction
cycle. When that interrupt occurs, the debugger checks to see if the
interrupted address is at a new source-code line number. If not, the debugger
continues execution. Otherwise, the debugger displays the new line in the IDE
and waits for the programmer to take an action that resumes the program.
While the debugged program is suspended, the debugger interacts with the
programmer and provides full access to the debugged program's context and
memory. This access permits the programmer to examine and modify variables,
which requires that the debugger know something about the program's source
code, symbols, and memory organization. More about that later.
To resume the debugged program, the debugger resets the program's context by
calling SetThreadContext and calls ContinueDebugEvent. Then, the debugger
returns to the top of the loop to call WaitForDebugEvent again.
Listing One is a fragment that demonstrates this debugging procedure. Take
note. The code that executes as a result of the EXCEPTION_BREAKPOINT and
EXCEPTION_ SINGLE_STEP exceptions does not itself call the ResumeBrokenProgram
and ResumeSteppingProgram functions. That logic would have the program getting
deeper and deeper into its own stack. Those exceptions display the current
program counter's source file and line number and then return from
DebugApplicationProgram, which leaves the IDE in control and the debugged
program suspended. Subsequent actions by the user call ResumeBrokenProgram and
ResumeSteppingProgram, which themselves resume the program being debugged.


Debugging Information



Programs can be compiled with or without debug information. If you load a
program that was compiled without debug information into a source-level
debugger, a typical stand-alone debugger will display the unassembled machine
code of the program and then politely tell you that there is no symbolic
information in the executable file. When you compile debugging information
into the program, however, the executable file includes information that
relates symbols to memory addresses and memory addresses to source-code files
and line numbers. Given a path to the source code, the debugger can then
properly display the program's source code and variable values as you step
through the program's execution.
A program compiled with debugging information takes no run-time performance
hit if it is not being debugged. The debugging information is stored in
special sections in the executable file, and these sections are not loaded
when DOS loads the program to execute outside of a debugger. There is nothing
added to the executable code itself to support the debugger. Therefore, the
debugger must read and interpret the debugging information from the debugged
program's .EXE file before launching the program.


The Win32 Portable Executable File Format


To extract debug information from a Win32 executable file, you must understand
the format of that file. I started out by knowing only that the data values
were in there somewhere. After a bit of simple reverse engineering (consisting
of poring over an ASCII HEX dump of a small executable file produced with the
-g option from the GCC compiler) I learned that the executable file has two
sections not found in other executable files. Those two sections are named
".stab" and ".stabstr." How nice that they used names that suggest their
purpose. Otherwise, I would still be searching. I knew that .stab and .stabstr
were so-called "sections" because I found them in what appeared to be a table
of fixed-length entries that included entries for .text, .bss, .data, and
.idata. I knew from previous experience that those things are sections into
which compilers put different parts of a program. A quick run of Borland's
TDUMP utility program against the executable verified those findings.
The header part of the executable file did not match my earlier understanding
of that format, and so I dug into Andrew Schulman's Unauthorized Windows 95
(IDG Books, 1995) to learn something about the new Win32 portable executable
file format. Recently, I got a copy of Matt Pietrek's Windows 95 System
Programming Secrets (IDG Books, 1995) and learned a good bit more about
portable executables. If Matt's book had been here a few days sooner, it could
have saved me a lot of time, not only in learning about the format of
executable files but in understanding the details of the SDK debug API. I
highly recommend this book to anyone interested in systems programming and
Windows 95. This month's "Programmer's Bookshelf," by Lou Grinzo (page 127)
discusses Matt's book in more detail. 


GNU Stab Information


There are several different formats for encoding debug information in an
executable file. Borland's Turbo Debugger uses one format. Microsoft's
CodeView uses another. The GNU C/C++ compiler, being implemented on many
platforms, uses at least two different formats, maybe more, depending on which
port you have. The gnu-win32 port from Cygnus is the one that Quincy 96
launches, and it uses the stab format, which is apparently an acronym meaning
"symbol table," although the table contains much more than just symbol
information.
The .stab section in a portable executable file is a table of fixed-length
entries that represent debugging information in the stab format. The .stabstr
section contains variable-length, null-terminated strings into which the .stab
table entries point.
The documentation for the stab format is available in text and .inf formats on
the Cygnus ftp site (ftp.cygnus.com//pub/gnu-win32). That document explains
that the compiler reacts to its -g option by inserting macros into the
assembly-language file. The assembler translates the macros into stab entries
in the .stab and .stabstr sections of the object file. The linker combines the
sections from the object files into two common sections in the executable
file.
Stabs contain, in a most cryptic format, the names and characteristics of all
intrinsic and user-defined types, the memory address of every symbol in
external memory and on the stack, the program counter address of every
function, the program counter address where every brace-surrounded statement
block starts and ends, the memory address of line numbers within source-code
files, and anything else that a debugger needs. The format is complex and
cryptic because it is intended to support any source-code language. It is the
responsibility of a debugger program to translate the stab entries into
something meaningful to the debugger in the language being debugged.
There is a strange bug in the GNU compiler that has been tolerated for a long
time because somebody built a workaround for it in the GNU debugger. The bug
assigns zero memory addresses to external, nonstatic symbols in the macros
that the compiler emits in the assembly-language file. Apparently, the GNU
debugger figures out what the address should be from the symbol's
characteristics, its position among the stab macros, and the sizes of the
other variables that surround it. That looked like a daunting algorithm to me.
Rather than spending forever getting that to work in my debugger, I inserted a
filter into the compile stream. The filter reads the assembly-language file
and replaces the zero in each offending macro with the name of its global
variable. I wonder why the GNU hackers didn't think of that.
I cannot begin to describe the stab format to you in this column. If it
interests you, read the document that Cygnus distributes. My efforts to date
have produced a Windows 95 GUI source-level debugger that can set breakpoints,
intercept breakpoints, single step, and examine, modify and watch variables of
intrinsic types.
The Quincy 96 debugger cannot yet examine or watch pointers, arrays, or
structure and class members. Doing that involves first parsing the stab
entries that describe those things and then implementing an expression parsing
algorithm that translates C/C++ expressions that include symbols, integer
constants, arithmetic operators, address-of operators, pointer operators,
structure member operators, and subscript operators into memory addresses. The
old Quincy interpreter already has such a parser, and I intend to adapt it to
this project. By the time this column is published, I will have everything
working.


About GDB


The GNU compiler includes its own debugger named "gdb." Gdb runs from the
command line and is a line-oriented source-code debugger. The source code is
available, and I learned some of what I needed to know by reading the gdb
code. After all, gdb does most of the low-level stuff that I need to do in the
Quincy 96 debugger. I would have liked to have used some of gdb's source code
in implementing my own debugger. That would have made Quincy 96 a "derivative
work" in the language of the GNU license, and I want to avoid that association
for reasons that I explained last month. But there is another reason not to go
that route. Gdb is a portable, platform- and language-independent debugger,
and its code tends to be difficult to read. There are many levels of
indirection through #define macros and tables all of which are designed to
support several platform-specific portability layers for the particular
implementation.
Then there is the code itself. The gdb programmers use a bewildering
indent-outdent style in the code that adds to the arcane quality of this
program. Before anyone gets testy, let me add that my opinion is a personal
observation not meant to start any religious wars. Anyone who disagrees with
me has that right and should do so. There is no one right way to code C.
Gdb is an impressive piece of programming, but I was certain that I'd spend
less time and learn more if I ignored gdb and went my own way. Every time I
peeked into its code to try to unearth a solution to some new mystery, that
certainty was reinforced. Finally, I deleted the gdb source code from my hard
disk so that I'd stop wasting time that way.


Debugging Console Applications


Quincy 96 compiles, executes, and debugs C and C++ training exercises. It is
not a Windows programming tool. Therefore, the programs that it launches are
assumed to be console applications that use the stdio and iostream libraries
for console input and output. A console program opens an MS-DOS window when it
starts and closes that window when it exits. The console window is the
equivalent of the screen and keyboard when you run a program from a DOS box.
When you run such a program from Quincy 96 without stepping or breakpoints,
the program runs to completion, and the MS-DOS window closes. You have no
opportunity to view the output unless the program includes its own "Any key to
continue" operation at the end. That won't do. To keep the MS-DOS window
around after the program quits, Quincy 96 has to provide the console window
and let the debugged program inherit it. Then, when the debugged program
exits, Quincy 96 can display a message on the console or a dialog box for you
to respond to before it closes the MS-DOS window.
A Windows 95 application creates and destroys a console window by calling the
AllocConsole and FreeConsole functions. In between, there are three handles
open for the standard input, output, and error devices, and the program can
retrieve them by calling GetStdHandle. The CreateProcess function, which
launches the application, includes arguments that allow the creating process
to provide the standard devices to be inherited by the created process. By
using this mechanism, Quincy 96 is able to retain the MS-DOS window and even
write to it after the debugged program has shut down.
There are some side effects to this strategy. If you close the MS-DOS window
with the mouse or system menu, that action terminates Quincy 96, too. If you
had changed any files in the editor that were not saved, the closing action
does not prompt you to save them. In an attempt to provide a more hospitable
user interface, Quincy 96 tries to manage the position and condition of the
MS-DOS window, but these actions are often ignored by Windows 95. There are
ways for CreateProcess to control the console window of a program that
provides its own console, but trying to manage your own AllocConsole window is
unreliable. Either I'm missing something or Microsoft's developers did not
give enough attention to the way that console applications work.


Displaying Tokens in the Margins of a CEditView Control


Quincy 96's IDE displays tokens in the left margin of the editor window to
indicate breakpoints and the current program counter. Getting it to do that
was no easy task. A CEditView window expects the edited text to be flush to
the left margin. To make space for the tokens, I called the SetMargins
function from the CTextView::Create function (in the editor's view class) as
shown in Listing Two.
With a margin set, my next task was to get the program to write those tokens
in the margin. After trying several different ways, I found that writing
anything into the margin of a CEditView window involves writing text to the
device context rather than to the CEdit control. Listing Three is the
CTextView::WriteMarginCharacter function that I contrived to do that. I wanted
to use a color bar for the program counter rather than a token, but that meant
that I'd have to redisplay the line of code with different color settings. I
had no problem changing colors but was never able to change the font from the
default font of a CWnd object to the Courier 10 font that the editor uses.
Everything I tried generated strange results, so I postponed the idea until I
have more time to deal with it.


Managing Asynchronous Processes


Quincy 96 runs two kinds of programs. When debugging a program, Quincy 96 runs
the program to be debugged and watches its progress through the
WaitForDebugEvent function. When building a project, Quincy 96 runs the
preprocessor, compiler, assembler, and linker programs in that order, starting
each one only when the previous one has exited. Quincy 96 needs to sense that
a program is still running and that it has stopped.
When you use CreateProcess to launch another program, that program takes off,
and the launching program continues. The created process runs asynchronous to
the creating program unless you are debugging and have called
WaitForDebugEvent. When not debugging, you can periodically call the
GetExitCodeProcess function, which returns an exit code of STILL_RUNNING while
the program is still running. Quincy 96 cannot simply go into a loop waiting
for that exit code because the Windows 95 event and message system would not
permit any user interaction during that loop, and the user would not be able
to stop the process if it was taking too long or running away.
Quincy 96 handles this matter by overriding the CWinApp::OnIdle function,
which gets called by the system whenever nothing else is going on in the
debugger program. That function calls GetExitCodeProcess to see if the current
compiler program is still running, launching the next one as appropriate.
Listing Four shows a fragment of that operation.


OnContextMenu


This discussion is about those menus that pop up when you click the right
mouse button. I don't know what you're supposed to call them. I've seen them
called popup menus, context menus, right-click menus, and shortcut menus.
Whatever they're called, you have to use them to get Microsoft's blessing to
put the Windows 95 logo on your work. Developer Studio provides such a menu
for document views derived from CEditView. The
popup-context-shortcut-right-click menu that they provide is a subset of the
standard Edit menu. But I wanted to add one to Quincy 96's project document
view, which is derived from CListView, and Developer Studio does not provide
one for that.

When you follow the instructions in the VC++ online help and attach an
OnContextMenu handler to a CListView-derived document view, it takes a
double-click of the right mouse button to open the menu. It looks like a bug
in VC++ 4.0 or MFC to me. To get the preferred behavior (single-click
activation), I overrode OnContextMenu in the derived CMDIChildWnd class. Menus
opened that way are somehow impervious to the global enabling and disabling of
commands through OnUpdate functions in the document and view classes, so I had
to repeat every test and use EnableMenuItem function calls for the context
menu.
Next month I'll fill you in on how far I've gotten with Quincy 96. No doubt
I'll have some more VC++ and MFC design patterns. I hope by then to have
completed the debugger, and I plan to be well into the mechanisms that
integrate Quincy 96's role as the development environment with the host
interactive multimedia tutorial presentation.


Source Code


The source-code files for the Quincy 96 project are free. You can download
them from the DDJ Forum on CompuServe and on the Internet by anonymous ftp;
see "Availability," page 3. To run Quincy, you'll need the GNU Win32
executables from the Cygnus port. They can be found on
ftp.cygnus.com//pub/sac.
If you cannot get to one of the online sources, send a 3.5 inch, high-density
diskette and an addressed, stamped mailer to me at Dr. Dobb's Journal, 411
Borel Avenue, San Mateo, CA 94402, and I'll send you the Quincy source code
(not the GNU stuff, however--it's too big). Make sure that you include a note
that says which project you want. The code is free, but if you care to support
my Careware charity, include a dollar for the Brevard County Food Bank. 

Listing One 
// ---- run an application program with the debugger
void CQuincyApp::RunApplicationProgram(CString& strCmd)
{
 // .... 
 CreateProcess(0, strCmd.GetBuffer(0), 0, 0, TRUE,
 DEBUG_ONLY_THIS_PROCESS DEBUG_PROCESS,
 0, 0, &suinfo, &m_ProcessInformation );
 DebugApplicationProgram();
}
// ------ resume a program that was interrupted by a breakpoint
void CQuincyApp::ResumeBrokenProgram(BOOL bStep,
 unsigned char cSaveInst)
{
 // ---- adjust the instruction pointer back to the 
 // breakpoint (int 3) op code
 --m_Context.Eip;
 ResumeSteppingProgram(bStep);
}
// ------ resume a program that was interrupted by a single step
void CQuincyApp::ResumeSteppingProgram(BOOL bStepAgain)
{
 if (bStepAgain)
 m_Context.EFlags = FLAG_TRACE_BIT;
 SetAllBreakpoints();
 SetThreadContext(m_hThread, &m_Context);
 ContinueDebugEvent(m_dwProcessId, m_dwThreadId, DBG_CONTINUE);
 DebugApplicationProgram();
}
// ----- debug an application program
void CQuincyApp::DebugApplicationProgram()
{
 while (!bDone) {
 WaitForDebugEvent(&event, INFINITE);
 switch (event.dwDebugEventCode) 
 {
 case CREATE_PROCESS_DEBUG_EVENT:
 // startup procedures
 break;
 case EXIT_PROCESS_DEBUG_EVENT:
 // shutdown procedures
 break;
 case EXCEPTION_DEBUG_EVENT:
 // breakpoints and single-stepping
 m_Context.ContextFlags = CONTEXT_FULL;
 GetThreadContext(m_hThread, &m_Context);
 switch (event.u.Exception.ExceptionRecord.ExceptionCode)
 {
 case EXCEPTION_BREAKPOINT:

 // --- process breakpoints
 break;
 case EXCEPTION_SINGLE_STEP:
 // --- process single steps
 break;
 }
 }
 ContinueDebugEvent(event.dwProcessId, event.dwThreadId, DBG_CONTINUE);
 }
}

Listing Two
BOOL CTextView::Create(LPCTSTR lpszClassName, LPCTSTR lpszWindowName,
 DWORD dwStyle, const RECT& rect,
 CWnd* pParentWnd, UINT nID,
 CCreateContext* pContext) 
{
 BOOL rtn = CWnd::Create(lpszClassName, lpszWindowName, dwStyle,
 rect, pParentWnd, nID, pContext);
 // ...
 CEdit& rEdit = GetEditCtrl();
 rEdit.SetMargins(13, 0); // left margin for tokens
 return rtn;
}

Listing Three
// ----- writes a character in the editor's left margin 
void CTextView::WriteMarginCharacter(int nLine, int ch, COLORREF clr)
{
 char strch[] = " ";
 strch[0] = ch;
 HideCaret();
 CDC* pCDC = GetDC();
 pCDC->SetTextColor(clr);
 pCDC->TextOut(0, nLine, strch, 3);
 ReleaseDC(pCDC);
 ShowCaret();
}

Listing Four
BOOL CQuincyApp::OnIdle(LONG lCount) 
{
 CWinApp::OnIdle(lCount);
 DWORD exitcode;
 if (m_CompileStatus != idle) {
 GetExitCodeProcess(m_CompileInformation.hProcess, &exitcode);
 if (exitcode != STILL_ACTIVE) {
 switch (m_CompileStatus) {
 case preprocessing:
 // finished preprocessing, launch the compiler ...
 m_CompileStatus = compiling;
 break;
 case compiling:
 // finished compiling, launch the assembler ...
 m_CompileStatus = assembling;
 break;
 case assembling:
 // finished assembling, launch the linker ...
 m_CompileStatus = linking;

 break;
 case linking:
 // finished linking, project is built
 m_CompileStatus = idle;
 break;
 }
 }
 }
 return (m_CompileStatus !






















































ALGORITHM ALLEY


A Compact Logarithm Algorithm




John K. DeVos


John is an engineer at Ircon, a manufacturer of optical thermometers. He can
be contacted at devos@delphi.com.


When you become accustomed to high-level languages, it is easy to forget that
the whole point of using them is to free you from worrying about the
bare-metal details of processing. In a low-level embedded environment,
however, a language like C can be a reminder of exactly how free from this
worry we've become. In these environments, questions such as "Is this causing
the right kind of divide?" and "Is the For or the While smaller?" crop up
regularly. It is also likely that before too long, you will miss a library
routine or two.
This was the situation I faced while calculating the natural logarithm, ln(x),
in a program for an embedded controller; see Figure 1. My resources were
limited to internal RAM and PROM, and timing limitations were a significant
consideration. An "infinite" series (the kind a high-level-language library
routine probably uses) seemed like a good starting point, but unfortunately
would have required the calculation of a large number of terms to provide
accuracy over a useful range of x. This would have resulted in a large,
time-consuming routine.
I suspected, however, that if I could tolerate less-than-stellar accuracy,
there might be a compact way to calculate logarithms. Converting logs to
different bases is trivial, and log-base-2, log2(x), is easier to get at in a
binary environment than ln(x) or log10(x).
In my application, it was necessary to evaluate an expression of the form in
Figure 1. This expression can be rewritten assuming the use of log2(x), as in
Figure 2, in which the constant ln(2) can be included in the constants A and
B, making run-time base conversion unnecessary.


Approximating log2(x)


In another part of the same program, I needed an exponential function only
requiring rough accuracy. The function was implemented using the equivalence
of the expressions 2x and 1<<x (one left-shifted x bits). This equivalence
suggests that a rough estimate of log2(x) could be obtained by determining the
number of binary digits in x. Refining this idea slightly reveals that the
integer part of log2(x) is equal to the position of the most-significant bit
in x; the least-significant bit (LSB) = 0. I'll refer to this initial estimate
as "L0(x)."


Error Correction


Figure 3 shows that the error between L0(x) and log2(x) depends on x's
relationship to the surrounding powers of 2 and not its absolute magnitude,
suggesting that an additive correction based on this relative value of x is in
order. I would like to report that I found a correction after some methodical
analysis, but the truth is that I guessed a workable solution on the second
try. Trials with linear interpolation showed that the residual error after
such a correction was not only large, but difficult to approximate and
therefore difficult to cancel.
An interesting, and in this case useful, property of logarithms is that the
slope of a logarithmic curve is easy to calculate. Instead of using a straight
line to interpolate, why not use a slope based on that of the function you
hope to match? Examination of the plot of log2(x) and L0(x) shows that a
linear addition to L0(x) would be about right if the slope used was that of
the curve log2(x) at an "effective" x half-way between x and the power of 2
below it. The slope of log2(x) is given by Figure 4(a) and the effective x is
simply the average of x and 2L0(x); see Figure 4(b). Combining these
expressions with the relative x value of (x-2L0(x)) gives the first correction
term C0(x); see Figure 4(c).
After this correction, C0(x) has about the right shape, but is increasingly
low as x increases from one power of 2 to the next. There is also a
discontinuity as x approaches each power of 2 from below. The limit of C0(x)
as x->2n from below evaluates to 2/(3*ln(2)), so dividing C0(x) by this factor
would make the discontinuity vanish. The resulting expression for the
correction term C1(x) resembles a textbook problem in which a complicated
looking expression evaluates to an integer, as in Figure 5(a), and the first
corrected estimate in Figure 5(b).


More Error Correction


A plot of the difference L1(x)-log2(x) shows that the residual error is
(conveniently) very nearly parabolic between powers of 2 with the apex of each
curve lying very close to half-way between the powers of 2; see Figure 6.
Also, the height of each apex, like the error in L0(x), is independent of x.
The expression in Figure 7(a) describes a parabola that closely approximates
this residual error, while Figure 7(b) is the twice-corrected approximation.
Since this parabola does not exactly match the actual error, selecting PA to
equal the maximum value of the residual error results in L2(x) having error
with a positive average. In my application, the constant PA was chosen
numerically to make the remaining error symmetric in amplitude. Although the
average error over the range of x is nonzero (albeit small), this approach
minimizes the obligatory +/- error spec.


Implementation


Listing One is the log2(x) routine implemented in Microsoft Visual C/C++ 1.0.
Although this source code works with the Intel iC-96 compiler I use for
embedded development, the embedded version of the routine uses a smaller set
of local variables (to conserve stack space) and is therefore less readable.
This routine works for signed integers (2-bytes) although long integers could
be accommodated without much additional code. Table 1 summarizes the routine's
performance.
Although I haven't tested it, I suspect that an algorithm similar to this one
could be developed for floating-point numbers that would probably be faster
and smaller (if less accurate) than a library routine or an algorithm based on
a series expansion. Another possible use for such an algorithm is calculating
the square root of x, since Figure 8 is true for an arbitrary base B. 
Table 1: Summarizing the routine's performance.
Argument range tested: 3 to 32767
Highest relative error: @ log2(3) is 0.0153 percent
Lowest relative error: @ log2(7) is -0.0102 percent
Highest absolute error: @ log2(10815) is 0.000438
Lowest absolute error: @ log2(15199) is -0.000514
Average absolute error: 0.000002
RMS absolute error: 0.000272
Figure 1: Evaluating an expression.
 A
f(x) = ----------- for constants A, B, or C

 B+C*ln(x)
Figure 2: Rewriting the expression in Figure 1(a).
 A*ln(2)
f(x) = ------------------
 B*ln(2)+C*log2(x)
Figure 3: Linear interpolation.
Figure 4: (a) Slope of log2(x); (b) average of x and 2L0(x); (c) the first
correction term C0(x).
(a)
 d 1
m(x) = ---(log2(x)) = -------
 dx ln(2)*x

(b)
 x+(2L0(x))
xeff = ----------
 2

(c)
C0(x) = m(xeff)*(x-2L0(x))

 2*(x-2L0(x))
C0(x) = ----------------
 ln(2)*(x+2L0(x))
Figure 5: (a) Resulting expression for the correction term C1(x); (b) first
corrected estimate.
(a)
 (x-2L0(x))
C1(x) = 3*----------
 (x+2L0(x))

(b)
L1(x) = L0(x)+C1(x)
Figure 6: Error in L1(x).
Figure 7: (a) Parabola that closely approximates this residual error; (b)
twice-corrected approximation.
Figure 8: Calculating the square root of x.

Listing One
/* log2.h */
/* Prototypes */
long log2K (int);
/* Defines & Macros */
#define LOG2_ERROR 666L /* Impossible result */
#define BASE_FACTOR 16384 #define BASE_SHIFT 14
#define ERR_C1 242 /* PA * BASE_FACTOR */
#define ERR_C2 726 /* 3 * ERR_C1 */
/* log2K -- Returns the log-base-2 of the argument relative to
BASE_FACTOR. If the parameter is not positive, it returns a defined
impossible result. */
long log2K (int x) {
int j, hiBitPos, C1, C2;
long L0;
 if (x <= 0) return LOG2_ERROR;
/* Zero-order estimate */
 j = 1;
 while (x >> j++);
 hiBitPos = j - 2;
 j = 1 << hiBitPos; /* j = 2 ^ hiBitPos(x) */
 L0 = (long)hiBitPos << BASE_SHIFT;
 if (x == j) return L0; /* Catch powers of 2 */
/* First correction */

 C1 = (int)(((((long)x - j) * 3) << BASE_SHIFT) / ((log)x + j));
/* Second correction */
 j >>= 1; /* j = 2 ^ (hiBit(x) - 1) */
 C2 = (int)(((long)x * ERR_C1) / j) - ERR_C2;
 C2 = ERR_C1 - (int)(((long)C2 * C2) / ERR_C1);
 return (L0 + C1 - C2);
}
























































PROGRAMMER'S BOOKSHELF


Secrets of Windows 95 Programming




Lou Grinzo


Lou is a programmer, consultant, and technical writer who lives in Endwell,
NY. He is the author of the recently published Zen of Windows 95 Programming
(Coriolis Group, 1995) and can be contacted at 71055.1240@compuserve.com.


Matt Pietrek, author of Windows Internals and coauthor (with Andrew Schulman
and Dave Maxey) of Undocumented Windows is back. His recently published
Windows 95 System Programming Secrets is a 760-page volume so densely packed
with details about the inner workings of Windows 95 that it's hard to believe
it was written by someone outside of Redmond. If you're familiar with
Pietrek's earlier books, you may wonder if Secrets is "Undocumented Windows
95," or if it's more akin to "Windows 95 Internals." As it turns out, it's
much closer to "Internals" than "Undocumented," but it's a more useful and
better book than either.
Pietrek gives us only ten chapters, the first clue that he covers topics in
exhaustive detail. The chapters are as follows: "Putting Windows 95 in
Perspective;" "What's New in Windows 95;" "Modules, Processes, and Threads;"
"USER and GDI Subsystems;" "Memory Management;" "VWINKERNEL32386" (Pietrek's
contraction of VWIN32.VXD, KERNEL32.DLL, and KRNL386.EXE); "Win16 Modules and
Tasks;" "The Portable Executable and COFF OBJ Formats;" "Spelunking on Your
Own;" and "Writing a Win32 API Spy."
After starting off with a good overview of Windows 95 and its place in the
Microsoft world, the book begins its heavy lifting in Chapter 3. Microsoft
Systems Journal editor Eric Maffei mentions in the foreword that "Pietrek has
a degree in physics. Normally, I don't care about such biographical material,
but in this case, it's surprisingly relevant. Read this book and you'll likely
imagine Pietrek firing up his personal bit accelerator to whack Windows 95
with high-energy ones and zeros, causing structures and functions, documented
and otherwise, to fly out in all directions. Pietrek then collects the pieces
and presents them to the reader in the form of numerous pseudocode listings
and detailed structure definitions. This approach and underlying philosophy
explain the strengths and one potential weakness of this book.
Secrets is packed with details not likely documented anywhere else, certainly
not to this extent. Some of my favorites include the following:
How to create and run Ring 0 code from within a Ring 3 application program.
How Windows 95 invokes dozens of INT 21h services from 32-bit code, including
KERNEL32.DLL.
Thread-local storage (declared via __declspec(thread), not allocated via
TlsAlloc() calls) is available to a DLL only if it is implicitly loaded by a
program.
The presence of Krn32Mutex, another mutex besides the now-infamous Win16Mutex,
which apparently controls access to certain parts of the kernel (although
Pietrek only mentions it in passing).
KERNEL32 functions that can be blocked by the Win16Mutex, even though
Microsoft says this isn't the case.
The biggest eyebrow raiser of all is that Windows 95 fudges the "free system
resource" numbers. At boot time, Windows takes a snapshot of its resources and
then biases all ensuing calculations; "80 percent free system resources" no
longer means 80 percent of all resources but 80 percent of what was free after
the system booted and already used part of the total.
You get the idea. A complete listing of the goodies would fill several pages.
There's one drawback to Pietrek's approach, however. Secrets is almost
entirely descriptive, and it sometimes leaves the reader to draw conclusions.
For example, Pietrek talks about the mechanisms for locating .DLL-resident
routines by name or by export ordinal. He says it's more efficient to use
ordinals instead of function names when establishing linkage between an .EXE
and a .DLL, since no system memory is needed to store the list of names. This
is true as far as it goes, but it overlooks one of the truly perverse
situations that can trip up a Windows program--an .EXE and .DLL that don't
agree on the mapping of ordinals to functions. For example, if a .DLL is
changed after its matching .EXE is built and the .DLL now assigns the ordinal
5 to Fred() instead of Barney(), the application can accidentally call Fred(),
even though its source refers to the function as Barney(). Given that programs
are often used with the wrong .DLLs in the real world, this can lead to
disaster. Secrets is an excellent scientific treatment but offers little
advice on application engineering. In all fairness, such coverage isn't the
book's intent, either stated or implied, but it's something readers should
keep in mind as they watch Pietrek crack Windows 95 like a walnut.
At a time when most Windows programmers are struggling just to stay current
with the latest releases of compilers, frameworks, and third-party libraries,
it's natural to ask what place an "under the covers" book has on our already
sagging shelves. While not directly justifying his book, Pietrek himself
addresses this issue on page 184, where, after detailing some of the hoops one
of his sample programs had to jump through, he says: 
The "black box" approach to programming that Microsoft wants us to take is
nice when writing "Hello World" programs, but it fails miserably when
attempting to write anything other than toy applications.
I couldn't agree more. Microsoft is largely responsible for making the
creation of robust, intelligent, and accommodating 32-bit Windows programs as
tough as it is. The Win32 documentation seems vast, almost luxurious, until
you begin serious work and you discover its shortcomings, like the fact that
extended error codes aren't documented, and numerous APIs are documented
either incorrectly or so poorly that you must burn precious time testing them
to see how they really work (or debugging your code once you, or your users,
see that your assumptions were wrong). This sorry situation creates a genuine
need for third-party books that fill in the gaps. Secrets provides so much
solid information and does it in such an enjoyable and readable fashion that,
even at $50.00, it's easily one of the best titles you can add to your Windows
shelf.
Windows 95 System Programming Secrets
Matt Pietrek
IDG Books Worldwide, 1995
759 pp., $49.95 
ISBN 1-56884-318-6



























SWAINE'S FLAMES


Allegislation and Lexicunking


Last winter, the alleged representatives of the people of the United States in
the House Conference Committee on Telecommunications Reform approved a
proposal to make the Internet the most-censored medium in the country, voting
to impose massive fines and prison terms on anyone posting such material as
James Joyce's Ulysses or using what George Carlin once called "the seven dirty
words."
This month's column, which contains neither redeeming social value nor appeal
to prurient interest, no dirty words and no politics except for that first
paragraph, is nevertheless dedicated to James Joyce and George Carlin, those
two masters of social value and prurient interest. And, more to the point,
wordplay.
I recently found myself reflecting on the relationship between the words
"speleology" and "spelunking." One is serious, the other fun. One is science,
the other recreation. One ends in "-eology," the other in "-unking." Working
out the implied analogies, I came up with some interesting facts. For example,
did you realize that the recreational use of language--the sort of thing we're
up to here--is properly known as philunking? Philology, philunking. See?
When my cousin Corbett drops by to see me here at Stately Swaine Manor, he
usually has some invention or innovation to share with me, and we engage in
some casual technunking. With Corbett, you couldn't call it technology. It's
technunking.
Up here at the Manor this winter, we're clearing some land to put in a garden
in the South 0.40, and there's a lot of digging involved. While leaning on my
shovel, I pointed out to my Significant Other that we were not just digging,
but archunking. Or possibly just gunking. I spelled that for her, so she
wouldn't think I meant junking. Archunking, gunking, whichever, I said, we're
sure redefining the topunking of the South 0.40. She responded that, while I
was unquestionably redefining things, I wasn't going to have much effect on
the topunking until I quit leaning on the shovel and started using it for its
intended purpose. Rather than get bogged down in a futile discussion of
methodunking, I went back to thinking about words.
You might try the -eology/-unking game yourself. For example, you could
observe that sparks always fly when a cosmunker and a thunker get together, as
each argues the eschatunking of his or her idunking. Just watching them go at
it is quite a little exercise in socunkical psychunking. But be careful. It's
easy for words to get you into trouble. If I were to suggest that you
occasionally engaged in the recreational use of controlled substances, I would
be accusing you of pharmacunking, and you might demand an apology, or at least
an apunking. So I don't suggest that, nor do I suggest that you might ever
employ liquid refreshments in a recreational manner, which is called--but
you're way ahead of me. That's right, hydrunking.
I could go on like this indefinitely. I imagine I could put together a little
anthunking of this sort of terminunking. I suspect, though, that readers of
this neunkical anthunking would soon tire of working out the etymunking of
each word. A little too much like cryptunking, I'm afraid. Just then, my S.O.
interrupted with a suggestion that I was reluctant to ignore, and I tabled
this line of thought.
So the moral I wish you to take away from this month's column is the
following: If your friends advise you to get a life, do not assume that they
consider you deficient biologically. No, they've probably just concluded that
you could use a little bunking.
Michael Swaineeditor-at-large
mswaine@cruzio.com 













































OF INTEREST
Hypercube has announced HyperView++ for MFC 1.0, a class library for adding
dialog/view editor capability into an application. HyperView++ comes with a
base set of default objects and controls. Custom objects and controls can be
easily created for use within the editor and run time. 
For instance, according to Hypercube, the CHyperView class, which is based on
MFC's CView, is somewhat like a "supercharged" CFormView. CHyperView provides
additional functionality for editing, creating, and using dialogs/views
without recompiling, and creating new custom objects using inheritance. The
class library, which is compatible with Win16/Win32/DBCS, sells for $2000.00
and comes with C++ source code.
Hypercube Inc.
10542 Bradbury Road
Los Angeles, CA 90064
310-559-2354
harlan@hcube.com
AccuSoft has released its AccuSoft Image Format Library 5.0 as an OCX for
Windows 95. The OCX32 version of the imaging library provides 32-bit
performance for image import, display, rotation, zooming, compression,
printing, and the like. Features specific to OCX32 Pro Gold include rotate to
screen, subdegree rotation, decompress rectangle, and the like. Both versions
support 36 different image formats, including TIFF, JPEG, PCX, Photoshop, and
more. OCX32 sells for $795.00, and OCX32 Pro Gold for $1995.00.
AccuSoft Corp.
Two Westborough Business Park
Westborough, MA 01581
508-898-2770
Bradly Associates is shipping Windows NT and Windows 95 versions of its GINO
graphics libraries. GINO is a family of graphics programming tools consisting
of a 2-D/3-D low-level toolkit, scientific graphing toolkit, contour and
surface toolkit, and programmable GUI toolkit. This release is built with the
Salford FTN77 for Win32 Fortran compiler and delivered as a DLL. The Win32GINO
libraries are provided with a driver that accesses the Windows SDK directly. 
Bradly Associates Ltd.
Manhattan House
140 High Street
Crowthorne, Berkshire RG45 7AY
England
+44-0-1344-779381
http://www.bradassoc.co.uk/
Motorola's Advanced Microcontroller Division has announced a real-time
embedded kernel called RTEK for 68300 32-bit microcontrollers based on the
CPU32 core. The RTEK kernel offers a real-time operating-system framework for
a broad range of embedded software applications. This kernel supports both
static and dynamic kernel objects.
Motorola's RTEK kernel provides over 180 services to manage system resources
such as the CPU, program tasks, memory, and time. The kernel services are
divided into eight classes: task, semaphore, queue, mailbox and message,
memory, mutex, timer, and interrupt processing services. You can also choose
from three methods for scheduling tasks: preemptive, time-sliced, and round
robin. 
The RTEK API for the 68300 family is written in both C and assembly languages;
the API for the PowerPC MPC500 family is written in C. Included with the RTEK
package is a graphical-system generation program called RTEKgen for
configuring the system. It also lets you specify System and Object Class
properties and define the properties of static objects. RTEKgen produces
source-language structures that can be compiled or assembled to give accurate,
error-free system tables. 
The PowerPC RTEK SDK for the MPC500 family runs on a Sun workstation host and
is priced at $3950.00 per seat. The 68300 (CPU32) RTEK SDK runs on a PC
(Windows 3.1) host and is priced at $3450.00 per seat. Both SDKs include RTEK
object code, source code for drivers and utilities, the RTEKgen
system-generation utility, language interface libraries, and complete
documentation. 
Motorola Advanced Microcontroller
Division
P.O. Box 13026
Austin, TX 78711
800-765-7795 x921
Atemi's recently released AtemiConnect software lets you connect Macintoshes
to remote AppleTalk LANs via the Internet, giving you access to all the data,
programs, and services on that LAN. When used with NetShade, Atemi's
network-security utility, AtemiConnect sets up a secure virtual network across
the Internet.
Once you have established an AtemiConnect link to a remote LAN, all of the
services available on that network--file servers, databases, groupware
applications, AppleTalk e-mail systems, network printers, network games, and
the like--can be accessed as if you were locally connected to that network.
The client and server must both be running the AtemiConnect software before
the link can be established, and the remote LAN's AppleShare security settings
will apply to the link.
Atemi will include a free copy of NetShade 1.1--an AppleTalk encryptor that
combines RSA's public key technology with secure ciphers such as DES and
Triple DES--with every purchase of AtemiConnect through March 1996. The
AtemiConnect Personal Server (allowing one in-bound or out-bound connection at
a time) sells for about $200.00, and includes a two-user license for NetShade;
the Small Business Server (allowing up to five simultaneous connections) sells
for about $400.00. 
Atemi Corp.
202 W. Hill, Suite 3W
Champaign, IL 61820-3539
217-352-3688
http://www.atemi.com 
ATC and Bluestone have released an integrated GUI Builder for Ada developers
called "Ada-UIM/X." The software makes full use of Bluestone's UIM/X Builder
Engine plus the ability to generate Ada code conforming to ATC's proven AXI
Ada interface to X and Motif. Ada-UIM/X is priced at $4000.00 for a
single-seat license, with volume discounts available. The tool is initially
supported on the SunOS, Solaris, and HP platforms.
Advanced Technology Center
22982 Mill Creek Dr.
Laguna Hills, CA, 92653
714-583-9119
Qualitas is shipping Qualitas MAX 8, Version 8 of memory manager. In addition
to providing traditional DOS-memory management, MAX 8 also manages Windows
resources through its new "Go Ahead" technology, which uses extended memory
instead of low-DOS memory for DLLs. Instead of using RAM compression
techniques to manage Windows resources, Go Ahead helps Windows use global DOS
memory more efficiently by managing low DOS resources. The memory manager
supports Windows 95 and Plug and Play machines. Included in MAX 8 is DOSMAX
for Windows 95 which provides up to 736-KB low DOS memory. Also included is a
set of Windows tools to make memory management easier: a Windows resource
monitor called "MAXmeter," a Windows file editor called "MAXedit," and a MAX 8
configuration utility called "Toolbox." The package is available for a
suggested retail price of $99.95; the upgrade price is $29.95.
Qualitas Inc.
7101 Wisconsin Avenue, Suite 1024
Bethesda, MD 20814
301-907-6700
http://www.qualitas.com
Microsoft has begun shipping Visual C++ 4.0, Cross-Development Edition for
Macintosh. The toolset enables you work with one set of sources and tools to
create applications for Windows NT, Windows 95, and Windows operating systems,
and the Macintosh and Power Macintosh platforms. You can also add MIPS R/4000,
DEC's Alpha AXP, and PowerPC machines (running Windows NT) to its list of
targets. New features of Version 4.0 include OLE support, Power Mac targeting,
Windows 95 common controls, and ODBC support. 
The development kit sells for $1999.00. Visual C++ 2.0, Cross-Development
Edition for Macintosh users can upgrade for $499.00. Visual C++ 4.0, RISC
Edition for PowerPC, MIPS, and Alpha sells for $499.00.
Microsoft Corp.
One Microsoft Way
Redmond, WA 98052
206-882-8080
http://www.microsoft.com/devonly 
Promptus Communications has begun shipping its ISDN SDK 3.20V1. When used with
Promptus ISDN network access modules, this toolkit lets you build integrated
ISDN access features to your applications. 

The SDK 3.20V1 supports single, dual, and quad Basic Rate ISDN (BRI) network
interface cards, single and dual Primary Rate ISDN (PRI) network interface
cards, BONDING-compliant Inverse Multiplexing (IMUX) cards, and dual
data/dialer modules.
The SDK 3.20V1 includes drivers, function libraries, firmware, downloaders,
demo programs, and an API.
Promptus Communications
207 Highpoint Avenue
Portsmouth, RI 02871
401-683-6100 
http://www.promptus.com/promptus/ 
HiTecSoft has announced release of a Network Management Extension (NMX)
standard for creating open architecture NetWare Loadable Module (NLM)
libraries. The NMX spec creates open architecture NLM libraries for sharing
NLM functionality through a set of common APIs. These APIs let you create NLMs
that share common libraries and call on other NLMs from other vendors as if
they were part of their own library. The libraries are created using Novell's
NLM SDK and C/C++ compiler.
NMX libraries can be accessed within applications developed using C/C++ and
ManageWare or any other NetWare-based languages (including scripting
languages). They can automatically load and unload into and out of memory,
providing functionality similar to DLLs. They also support dynamic parameter
passing allowing for development of high-level routines that can perform
multiple tasks based on type and number of passed parameters (parameter
overloading, similar to C++). Finally, they provide run-time parameter type
checking allowing applications to guard against abnormal termination due to
type mismatch errors.
HiTecSoft Corp.
3370 North Hayden Road
Suites 123-175
Scottsdale, AZ, 85251
602-970-1025 
Hewlett-Packard has introduced Version 5.0 of its SoftBench development
environment for C/C++ and Cobol programmers. SoftBench 5.0 includes a new
Graphical Class Editor that lets you create and manipulate C++ code, then
automatically generate that code. Extending the browsing and navigation
capabilities of the C++ SoftBench Static Analyzer's Class Graph, the Graphical
Class Editor enables the graphical creation, deletion, and modification of
classes, relationships, member functions, and data members. Changes are
reflected throughout the source code, not just in the module being edited. 
The tool also incorporates a CodeAdvisor, which provides rule-based C++ code
checking that identifies and locates potentially serious problems in C++ code.
CodeAdvisor comes with a shared library of common C++ rules and can be
extended to include additional user-defined rule sets.
Finally, SoftBench CM gives geographically dispersed development teams the
ability to work together on the same software project. Available for all
programming languages, SoftBench CM snaps into existing SoftBench environments
and is accessible from SoftBench tools, including the Editor, Builder or
Static Analyzer. SoftBench CM is built on the Revision Control System (RCS)
standard.
At the same time, HP introduced a SoftBench SDK that lets you extend and
customize SoftBench environments. The SoftBench SDK consists of the existing
SoftBench Encapsulator, the SoftBench Static Read API, and instructions for
creating user-defined rules for C++ SoftBench CodeAdvisor. 
The SoftBench suite is available for HP-UX and Solaris platforms starting at
$1895.00.
Hewlett-Packard 
321 East Evelyn Avenue
Mountain View, CA 94041
800-742-6795
Lead Technologies has introduced Leadtools OCX, an imaging SDK that enables
the integration of black/white (1-bit), grayscale (8-bit) and color
(4-,8-,16-,24-and 32-bit) images into applications. Designed as an OLE Custom
Control, Lead's OCX is available in a 16-bit version for Windows 3.1, and a
32-bit version for Windows 95, Windows NT, and WIN 32s for Windows 3.1.
Leadtools OCX contains over 130 methods and properties, and is compatible with
any application that can make use of an OCX. Specific examples are provided
for Visual C/C++, Visual Basic 4.0, Visual FoxPro, and Access.
Leadtools OCX provides methods and properties which read, write, compress
(JPEG, CMP, CCITT G3/G4, and more), convert, and display images. Developers
can scroll, zoom, dither, and process images (such as resize, rotate, flip,
invert, reverse, transpose, crop, sharpen, blur, edge and line detection,
mosaic, hue and saturation, combine, histogram equalize, gamma correction and
intensity detection, shear, grayscale, halftone, auto-deskew, despeckle, and
more). Leadtools OCX can import and export over 40 standard-image file formats
including JPG, PNG, BMP, TIFF, WMF, EPS, PCX, TGA, RAW FAX, WINFAX, CALS,
IOCA, PICT, PCD, PSD, GEM, RAS, WPG, and more. This toolkit also supports
printing (multi-image and text), TWAIN-compliant single and multi-page
scanning devices, video capture, and the Windows clipboard. 
The toolkit starts at $295.00, and is available free of royalties or license
fees. 
Lead Technologies Inc. 
900 Baxter Street
Charlotte, NC 28204
800-637-4699 
http://www.leadtools.com
Executive Tools has released ET Desktop, a port of the Common Desktop
Environment (CDE) to Linux. CDE is a GUI and application toolbox that is, or
will soon be provided by major UNIX system vendors, including DEC, IBM, HP,
and Sun. It is an application framework giving you the ability to create
portable, machine-independent applications. 
Executive Tools, Inc.
PO Box 215
Round Rock, Texas, 78680-0215
800-864-5150
http://www.etools.com 


























EDITORIAL


Coming In Out of the Code


If the Blizzard of '96 reminded us of anything, it's that when hype butts head
with reality, reality usually wins. At least that was the case this winter
when wind and snow caused thousands of people to log-on instead of dig out.
From Maine to Missouri, digital grid-lock resulted as workers telecommuted
until storms subsided. In some parts of Massachusetts, Nynex reported a 50
percent increase in phone-line demands. ("It's like everybody who has an
automobile trying to drive on the Massachusetts Turnpike at the same time,"
said Nynex spokesperson John Johnson from his home office.) In parts of
Missouri, Southwestern Bell phone traffic was 168 percent above normal.
Goodness knows, I'm as big a fan of telecommuting and home-based businesses as
anyone else on my party line. But before we cash in our subway tokens for
modems, it looks like the telecommunications infrastructure has a way to go.
And with the number of home-based businesses expected to double over the
decade (it's estimated that there currently are more than 45 million
home-based workers) phone companies will be hard pressed to keep up with
demand.
***
After a two-year judicial bobsled ride, Phil Zimmermann couldn't be happier
about coming in out of the cold. Zimmermann, the author of Pretty Good Privacy
encryption, was facing up to 51 months in a federal prison if convicted of
violating munitions export laws because his software was posted on the
Internet. 
Without explanation, federal prosecutors suddenly dropped charges against
Zimmermann. Still, said U.S. Attorney William Keane, "people should not read
anything in this decision," leaving the courtroom door ajar for similar
investigations in the future. In the end, tax dollars were wasted, Zimmermann
was besmirched, detained, and bankrupted, and Keane and crew go merrily about
their business.
***
Tough talk is cheap, except when high-priced corporate attorneys are doing the
yapping. That's what makes me think the European Computer Manufacturers
Association (ECMA) and Willows Software, a startup backed by former Novell
kingpin Ray Noorda, will end up before the bench with Microsoft. 
At the heart of the brouhaha is the ECMA-blessed APIW specification, a
platform-independent, vendor-neutral interface to the Windows API. Willows
Twin Cross-Platform Developers Kit (XPDK), available in both source and binary
forms, is APIW based. APIW essentially allows you to run Windows binaries on
just about any flavor of UNIX, as well as Macintosh, NetWare, and OS/2
platforms without having to pay Microsoft. 
Not only is Willows charging as little as $79.00 for the source and binary on
CD-ROM, but you can also download it at no cost for individual, noncommercial
uses from http://www.willows.com. Fees for commercial licenses, starting at
$250.00, seem modest and reasonable.
Microsoft is not amused. It lobbied hard against the ECMA vote and one of its
lawyers recently said that "Microsoft objects to the publication of the ECMA
APIW standard and expressly reserves its intellectual property rights." In the
meantime, the ECMA will be submitting the initiative to the ISO.
***
From the "If the snowshoe fits..." department: Looking for a warm refuge on a
cold day, I recently ducked into the Spencer Museum of Art. As luck would have
it, the exhibit "Roger Shimomura: Delayed Reactions," had just opened and
included paintings featuring sushi and Superman, geisha girls and Marilyn
Monroe, and American Pop art and traditional Japanese wood-block prints. 
Of relevance here was the painting entitled "Sansei Story, 1989-90" which, to
my surprise, was on loan from Microsoft. As we've come to expect from
Microsoft, the painting wasn't the best-of-show, although it was the biggest.
Still, Microsoft should be commended for its support of the arts and for
making that art available to the public.
Jonathan Ericksoneditor-in-chief









































LETTERS


Executable Content and Prior Art


Dear DDJ,
In reading Greg Guerin's letter concerning patents and executable content (DDJ
"Letters", January 1996), I was surprised to see no mention of an old system
that indeed provided the capability of including executable code as part of a
"page" on a BBS, which would be sent to the client and executed to provide
animation during the online session.
Grafterm for the 8-bit Atari was never upgraded beyond 300bps (there were
problems with updating the graphic screens at high speeds, though text screens
in other telecommunications programs can run at up through at least 4800bps)
and did provide, in the later version I saw, the capability of sending
executable code with samples of animation (as I recall). The only copy I saw
was in the mid to late'80s (long after all Grafterm BBSs had gone the way of
the dodo), which did have sample pages of code and which seemed quite
impressive. As the system was machine specific, there was no need to worry
about drivers--the code was in machine language making use of the client's
hardware, the CPU, sound system, graphics, and so on--all were standardized.
While the idea of sending executable code in telecommunications (interactively
run on the client while connected) may be new to some, it is old hat to
others. Unlike the current generation of executable content, the pages
containing the code were proprietary to the BBS system. Nor was an interpreter
used to run the code.
Unfortunately, all I saw was the aforementioned letter about prior art (which
did not target contemporaneous execution of code sent in real time from a
telecommunications server to improve the interface). Still, I wonder if there
were similar systems available on machines such as the venerable Apple II and
Commodore 64.
John McGowan 
Brooklyn, New York 
jmcgowan@inch.com


ANSI C++


Dear DDJ,
In his January 1996 "C Programming" column, Al Stevens talked about his recent
experiences with the ANSI C++ Standards Committee. I've been programming in C
for nearly a decade, and recently started using C++. I've spent some time
experimenting with the language and looking through C++ programming books to
see what works and what doesn't (using Watcom C++ 10.0 running under MS-DOS).
Here are some of my thoughts on the subject.
The for statement should be left the way it is. New notation should be created
to implement the new, desired behavior, possibly using square brackets
immediately after the for keyword to declare variables. So a loop that
declared its own counter variable would look like Example 1(a). This would
have several advantages over the proposed change.
It would not break any existing code.
It would allow several different types of variables to be declared. 
It could also be applied to do/while loops. This would make Example 1(b)
equivalent to the for loop in Example 1(a) (with the exception of a "continue"
in the body of the while loop). It would have to be determined if the
initializers in the brackets executed before or after the for..int statement. 
One of the things I tried to do would have benefited from being able to
declare template-member functions. I had to use global-template functions that
used a public member to gain access to the class (because you can't use
templates in friend declarations). There would have to be some restrictions on
template member functions (for instance, they could not be virtual).
I wrote an array class that would allow the arrays to increase their size as
new elements were added. This would occasionally require me to allocate more
storage for the array. I've always viewed enums as just convenient ways to
name a related series of constants (which would treat them all as ints). If
the stronger typing is desired (as Al implied was the case with the Microsoft
implementation), then this should be made explicit by either a new keyword
(there seem to be plenty of those) or perhaps by using class enum or enum
class notation.
One other moderately useful change would be the ability to break out of an
outer-enclosing loop from an inner one. This could be done by specifying a
number after a break (or continue), as in break 2;. This would remove some of
the need for gotos or flag variables; see Example 1(c).
Glenn Dill
Latrobe, Pennsylvania 
gdill@westol.com


About Face


Dear DDJ,
In his review of Alan Cooper's book, About Face: The Essentials of User
Interface Design (DDJ, January 1996), Michael Swaine perpetuates glaring
inaccuracies about the Magic Cap user interface. While Cooper indeed "rips
apart the MagiCap [sic] interface" (to quote Swaine), much of his criticism is
factually incorrect. It's shocking that such a well-known and acclaimed
individual is so cavalier with the truth. For example:
Cooper: "I'll bet [Magic Cap] is a real pain to use. Once you have learned
that the substantial-looking building with the big 'AT&T' on its facade is the
phone company, you must forever live with going in and out of that building to
call people." 
In fact, the AT&T building is not where one makes phone calls in Magic Cap.
The Magic Cap interface lets users place a call by simply touching the phone
on the desk, or they can call a person by touching that person's name in the
Address Card file and then touching the phone number to automatically dial
(for example, home, work, cellular, pager). 
The tight integration of all parts of the Magic Cap UI enables users to
contact someone, anywhere within Magic Cap. Users can tap the Contact button
to call, fax, or e-mail someone from every scene.
What's more, if a user wishes to frequently visit a particular place such as
the AT&T building, Magic Cap provides obvious and simple shortcuts to jump
directly there. Over 1000 user tests indicate that the Magic Cap interface is
not "a pain to use," and, on the contrary, many people are convinced it is
superior to other user interfaces.
Cooper: "You enter a building to begin a service, which is represented by a
walk down a hallway that is lined with doors representing functions. This
heavy reliance on the metaphor means that you can intuit the basic functioning
of the software, but its downside is that the metaphor restricts all
navigation to a very rudimentary, linear path. You must go out on the street
to go to another service."
Magic Cap enables users to intuit the basic functioning of the software, and
as the user becomes more familiar with the user interface, one is able to take
advantage of the short cuts just mentioned. 
Ease of learning and ease of use are not exclusive as Cooper implies; Magic
Cap provides an easy-to-learn, real-world framework, but this same framework
provides the navigational accelerators power users appreciate, much as
keyboard shortcuts are used by some computer users once they have mastered the
easy-to-learn menu interface.
Cooper: "Wouldn't it be better to go beyond those confining technologies
[address books that look just like a paper version, phones that look like
phones] and deliver some of the real power of the computer? Why can't our
communications software allow multiple connections or make connections by
organization or affiliation, or just hide the use of phone numbers all
together?"
Magic Cap does exactly what Cooper desires. Such a design does not require
giving up what people already understand. Magic Cap provides a unified mailbox
that collects mail from multiple mail services automatically. Magic Cap
provides dialing by name and location. 
Magic Cap tightly integrates the Address Cards, Phone, Fax, Datebook, and
Electronic Mail connectivity through multiple services such as the Internet
and AOL. Address Cards can be grouped by organization or affiliation. Make an
appointment and Magic Cap can automatically send an invitation to that person,
complete with the correct contact information. Magic Cap is not only a tightly
integrated, easy-to-use communications product but also a powerful platform
for developers and third parties to add their own applications and services.
More examples are available, and I encourage DDJ readers to verify my
statements by reading (or rereading) the book review and then trying either
Sony's Magic Link PIC-2000, Motorola's Envoy communicator, or requesting a
version of Magic Cap for Windows from Magic (you can download the software
from our web site at http://www.genmagic.com). Suffice to say Cooper's
irresponsible and inaccurate comments about the Magic Cap user interface
should lead any reasonable reader to seriously question his credibility as an
author and wonder whether he even bothered to use Magic Cap before writing
about it.
In his review of Cooper's book, Swaine writes, "I think that everyone would be
better off if they would just commit the entirety of About Face to memory." As
it relates to Magic Cap, this would leave DDJ readers with an entirely
inaccurate impression of the Magic Cap interface--an interface which has been
refined based on thousands of hours of user testing and consistently described
by responsible reviewers as easy to learn, use, and customize.
Steve Schramm
VP & General Manager, 
Magic Cap Division
General Magic Inc.
Example 1: ANSI C++.
(a)
for [int i = 0;] (; i < 10; i++) {

 // ...
}

(b)
while [int i = 0;] (i < 10) { /* ... */ i++; }

(c)
extern int a[], b[];
int i, j;
for (i = 0; i < 10; i++) {
 for (j = 0; j < 10; j++) {
 if (a[i] == b[j]) break 2;
 }
}

















































An Interview with Donald Knuth


DDJ chats with one of the world's leading computer scientists




Jack Woehr


Jack reaches over his shoulder for his "knuth" from his office in Golden,
Colorado, where he can be reached at jax@well.com.


For over 25 years, Donald E. Knuth has generally been considered one of the
world's leading computer scientists. Although he's authored more than 150
publications, it is Knuth's three-volume The Art of Computer Programming which
has become a staple on every programmer's bookshelf. In 1974, Knuth was the
recipient of computer science's most prestigious prize, the Turing Award. He
also received the National Medal of Science in 1979.
In addition to his work developing fundamental algorithms for computer
programming, Knuth was a pioneer in computer typesetting with his TeX,
MetaFont, and WEB applications. He has written on topics as singular as
ancient Babylonian algorithms and has penned a novel. Knuth currently is
Professor Emeritus at Stanford University.
Born in Milwaukee, Knuth exhibited an early aptitude for patterns, as
evidenced by his creation of crossword puzzles for the school newspaper. This
ability culminated in the eighth grade when Knuth won a national contest
sponsored by a candy manufacturer. According to Dennis Shasha and Cathy
Lazere, authors of Out of Their Minds: The Lives and Discoveries of 15 Great
Computer Scientists, the challenge was to compose as many words as possible
using the letters in the phrase "Ziegler's Giant Bar." The judges had about
2500 words on their master list; Knuth came up with approximately 4500 words
without using apostrophes. 
Still, it was music that Knuth chose to study at Case Institute (later Case
Western Reserve) until he was offered a physics scholarship. This lead him to
his first encounter with a computer in 1956--an IBM 650. 
In this interview, Knuth chats with frequent DDJ contributor Jack Woehr about
these and other topics. 
DDJ: What distinguishes a "computer scientist" from a "computer programmer?"
DK: The difference between a computer programmer and a computer scientist is a
job-title thing. Edsgar Dijkstra wants proudly to be called a "computer
programmer," although he hasn't touched a computer now for some years. He
wrote his really terrific essay on the Humble Programmer discussing this. To
me, "computer programmer" is an honorable term, but to some people a computer
programmer is somebody who just follows instructions without understanding
what he's doing, one who just knows how to get through the idiosyncrasies of
some language.
To me, a computer scientist is somebody who has a way of thinking, which
resonates with computer programming. The way a computer scientist views
knowledge in general is different from the way a mathematician views
knowledge, which is different from the way a physicist views knowledge, which
is different from the way a chemist, lawyer, or poet views knowledge.
There's about one person in every fifty who has this peculiar way of viewing
knowledge. These people discovered each other at the time computers were born.
There's a profile of different intellectual capabilities which makes somebody
resonate, which makes somebody really in tune with computer programming.
There were computers in the 19th century, the 17th century... I imagine there
are computer scientists in the pygmy forest. I haven't really carried this out
as an experiment, but I imagine that people may not have machines but one in
fifty of them, wherever you go, has this profile, this ability. I'm not a
sociologist, nor an anthropologist, but reading publications, reading
literature, I can sense how much people think like I do, even if they were
writing from a different century.
This is the true explanation of why computer science became a university
department so fast all around the world. The reason is not that computers are
important tools for mankind, or something like that. The reason is that there
were these people out there who had this way of thinking that never had a home
before. They get together, they can communicate high bandwidth to each other,
the same kind of analogies are meaningful to them. All of a sudden they could
come together and work much more efficiently, not in someone else's territory
that wasn't for them.
There was a time when there were no physics departments, there was "Natural
Philosophy," which combined all kinds of different skills. Over the years,
these strong areas of focus materialize, become recognized, and then they get
a name. "Computer Science" happens to be one of the more recent ones to
crystallize in this way. 
DDJ: When can we buy volume four of The Art of Computer Programming?
DK: I'm going to be putting it out 128 pages at a time, about twice a year
over the next eight years. I'm estimating it now at a little more than 2000
pages. There will be volume 4-a, volume 4-b, and volume 4-c. Volume 4 in
general is combinatorial algorithms. Volume 4-a is about finding all
possibilities: There's a lot to be said about generating them in good
ways--problems where finding all reasonable solutions is not a trivial task.
4-b is going to be about graph and network algorithms, and 4-c is about
combinatorial optimization. So 4-a is "find all arrangements," 4-b is "find
arrangements that have to do with graphs and networks," and 4-c is "find the
best arrangement."
Into those 2000 pages, I have to compress about 200,000 pages of literature.
I've been working on it a long time.
DDJ: Is the hiatus between volumes 3 and 4 writer's block, that you say, "If I
study this more..."
DK: No, that's a pretty good hypothesis. But I had started volume 4 and then
realized I had to work on typography. There was a revolution in the printing
industry. The printing industry became computer science. It changed from
metallurgy to bits, to 0s and 1s. There was no way to print my books with the
quality they had before.
I was going to take a year and give a computer scientist's answer to how to
print books, and that took ten years. The systems are in common use today. So
I think I'm going to be able to recoup that. It wasn't that I didn't have
anything to say in volume 4, but that I had this other thing to do that was
very timely. My students and I worked very hard on the desktop-publishing
revolution.
DDJ: One goes to accomplish a task on the computer and realizes that to finish
it requires another task, and to finish that one requires another....
DK: Niklaus Wirth always wanted to design airplanes, but he needed a better
tool, so he designed many computer languages and microcomputers and so on. I
needed to write my books in some way that would be immune to changes in
technology, to capture the book in some electronic form that wasn't going to
change when the printing establishment changed.
DDJ: I've read the laments of the memorizers of Egypt that were recorded by
the scribes when introduced in the ancient kingdom. You're one who is not
going to curse the darkness, you're getting ready for the time when books
aren't printed anymore.
DK: I have books that are going to exist no matter how the technology changes.
DDJ: Has your digression into TeX and MetaFont been profitable, as well as
rewarding?
DK: I put it in the public domain, but I do get royalties on the books. I give
one-third of those to the user group. The important thing, once you have
enough to eat and a nice house, is what you can do for others, what you can
contribute to the enterprise as a whole.
DDJ: I wonder if that same philosophy informs your scientific discipline. To
the vast majority of the computer literate, you made a large contribution in
stating once and for all how one, for instance, divides two binary numbers
efficiently. A programmer wonders how to do something and reaches over his or
her shoulder for "knuth," it's like reaching for a "thermos."
DK: I tried to write things up in a way that was jargon-free, so the
nonspecialist would understand it. What I succeeded in doing is making it so
that the specialist can understand it, but if I hadn't tried to write
jargon-free, then I would have written for specialists, and then the
specialists wouldn't be able to get it either. So I've been reasonably
successful in boiling down a large volume of material, but still my books
aren't easy reading except for specialists.
I'm about to publish a book, Selected Papers in Computer Science, which is a
collection of papers I've written over the years for nonspecialists. It's
being published jointly by the Center for the Study of Linguistics at Stanford
(CLSI) and the Cambridge University Press. It has 15 chapters, each of which
was an expository lecture. I enjoyed reading them again, though I've edited
them to take out sexist language! I think this book is something that might be
of interest to the scientific community as a whole.
I plan eight books collecting all the papers I've written: There's going to be
Selected Papers in Analysis of Algorithms, Selected Papers in Digital
Typography, Selected Papers in Fun and Games, Selected Papers in Programming
Languages, and so on. This is the second volume; the first volume was Literate
Programming.
DDJ: What has the reward you offer for finding bugs in TeX reached? I had
heard it was up to $40.96.
DK: Oh, that! The reward doubled every year until it reached $655.36 and I
froze it at 216 pennies. It wouldn't take another ten or fifteen years before
it began to exceed the GNP, you know! I paid out a couple of those last
February.
DDJ: That was the problem posed by the inventor of the chessboard in ancient
India, who asked for one grain of wheat on the first square, two on the
second, four on the third...
DK: It was al-Biruni in 10th-century Baghdad who explained how to calculate
264 efficiently by repeated squaring.
DDJ: This is one of the computer scientists of other eras about whom you spoke
earlier.
DK: He was definitely a computer scientist. He knew how many grains of wheat
there were without doubling 64 times. al-Khwarizmi, who lived about 150 years
before that, gave us his name as "algorithm." There were great books about
chess already in the 9th century. The successor to Haroun al-Rashid of the
Thousand and One Nights, Caliph al-Ma'mun, established a great center of
learning and a 25-year flowering of art and science -- al-Khwarizmi published
there about algebra and arithmetic and geography. One of his books is about
the Jewish calendar. I discuss this in Selected Papers in Computer Science.
DDJ: I understand you are not entirely a partisan of the C++ language.
DK: C++ has a lot of good features, but it has a lot of dirty corners. If you
don't mind those, and you stick to stuff that can be counted well-portable,
it's just fine. There are many constructions that are ambiguous, there's no
way to parse them and decide what they mean, that you can't trust the compiler
to do. For example, you use the "less-than" and "greater-than" signs not only
to mean less-than and greater-than but also in templates. There are lots of
little things like this, and many things in the implementation, that you can't
be sure the compiler will do anything reasonable with.
Languages keep evolving, and that's necessary. I find it impossible to write
books for archival without resorting to the English language, though. Whatever
computer language is in fashion, you can guarantee that within a decade or two
it will be completely out of fashion. In my books, I try to write things that
aren't trendy, but are things that are going to be worth remembering for other
generations. I'm trying to distill what, in my best judgment, out of thousands
and thousands of things that are coming out now, is most deserving to be
remembered.
DDJ: You've mentioned Edsgar Dijkstra. What do you think of his work?
DK: His great strength is that he is uncompromising. It would make him
physically ill to think of programming in C++.
DDJ: There's a great difference between his scrupulous examination of each and
every algorithm, and the practice of commercial programming, where megabytes
of code with known and unknown bugs are thrust at the user.
DK: I know, I know...I'm somewhere in the middle. I try to clean up all bugs
and I try to write in such a way that, aware of the formal methods, I can be
much more sure than I would be without the formal methods. But I don't see any
hope of proving everything in these large systems, because the proof would be
even more likely to have bugs!
DDJ: Programs nowadays depend on huge bodies of code that aren't written by
the main author.
DK: And you look at them and see how each spends most of its time trying to
defeat the other. It's all these black boxes you can't open, so you have to
build your own firewall. 

This is nothing new. In the '60s, someone invented the concept of a "jump
trace." This was a way of altering the machine language of a program so it
would change the next branch or jump instruction to retain control, so you
could execute the program at fairly high speed instead of interpreting each
instruction one at a time and record in a file just where a program diverged
from sequentiality. By processing this file you could figure out where the
program was spending most of its time.
So the first day we had this software running, we applied it to our Fortran
compiler supplied by, I suppose it was in those days, Control Data
Corporation. We found out it was spending 87 percent of its time reading
comments! The reason was that it was translating from one code system into
another into another.
DDJ: GUIs haven't made this any better.
DK: We now have so much more processing power that the only people who see
what's happening are people writing game programs, who need real high-speed
animation. I got new software at Christmas, and I'm really discouraged by the
number of failures that I noticed. I'm giving up the idea of using this
software to get much done. I'm going to go back and write my books with good
old reliable Emacs and TeX.
I still haven't found an editor on the Macintosh where I can create a 1-byte
file that has the letter "a" in it that I can send to the PC and read on an
Emacs-like editor. I got optical-character-recognition software that has a
choice of 50 output formats. Each one of them included all kinds of
font-changing mechanisms, and so on. Finally, I found one called "database
text" and if I output it in database text, that was what I was expected to get
-- raw ASCII characters.
It's a natural way to get job security, to make computer systems that need
one's expertise.
My TeX system is trying to go the other way, so I wouldn't have to go through
the professional typesetters, the professional font designers. I could still
use these professionals, but I could use them to provide added value. I didn't
have to go through a level of writing something for them to do and then check
on it. I can write something, and they can tell me how to improve, but I don't
need to write something that they already have on the menu.
DDJ: Is the profile of a programmer (which we were discussing earlier) one of
an individual who needs this sort of control?
DK: That's an interesting concept, the need for power! I've always thought of
it more in other terms, that the psychological profiling is mostly the ability
to shift levels of abstraction, from low level to high level. To see something
in the small and to see something in the large.
When you're writing a program, you're saying, "Add one to the counter," but
you know why you're adding one to the counter. You can step back and see a
picture of the way a process is moving. Computer scientists see things
simultaneously at the low level and the high level.
The other main characteristic of the computer scientist is dealing with
nonuniformity of discreet, nonuniform things. We have ten cases instead of
one. We don't have one law of thermodynamics, or something. We have case one,
case two, case three -- each is different. We build models out of nonuniform
parts. We're so good at that, we don't see sometimes that a uniform part would
work, if it would. But people who are only good at the uniform things could
never build a model out of nonuniform parts, could never do the things that a
computer scientist does, because they have to find a single rule that's going
to cover everything.
We have this aesthetic of cost, how much work it takes to do things. If your
mental model is APL, you optimize in different ways than if your mental model
is a RISC machine.
DDJ: When you look back at the first three volumes of The Art of Computer
Programming is there anything you would change?
DK: I have about 900K of changes to the first three volumes, plus changes to
other books that I've written, that I plan to make available on my Web page.
There are four kinds of changes, and the different kinds are distinguished
typographically.
One kind is a "bug" and that means that I have to correct something that is
technically wrong. One kind is an "amendment," which means that there is some
goodie that deserves to be in there. One kind is an "improvement," something
which would go in a future edition of the book, but is probably not worth
people's writing it in their own copy. The fourth thing is called a "plan,"
something still under construction, where the picture is changing so fast that
I don't think it's cost efficient for me to write on, since I'll just have to
do it again, the kettle is still boiling, but I wish to state that I intend to
retool something in a certain way.
It will probably be a while before publishing changes so that entire books can
be available online. I don't know how to convert the present system to a
better one that will still have the proper incentive structure. There's
something all fouled up about the way that software is compensated and font
designers are compensated and musicians are compensated, and other
intellectual-property rights issues. A scientist can be supported by the
National Science Foundation but a font designer is not supported by a National
Font Foundation. Both of them are doing things that contribute to the public
good.
DDJ: Is this just an expression of a love for symmetry, or is there a social
injustice being performed here?
DK: I think that the fact that somebody's expertise is in the field of beauty
and graceful strokes and another's is in the field of integrals and
differential equations shouldn't mean they have completely different ways of
getting paid.
The Free Software Foundation people are putting out good stuff. It's hard for
the untrained person to wade through the jargon to install it. Richard
Stallman and I don't agree all the way down the line, but he can be an
effective arguer! Stallman is one of my heroes, of course. He probably likes
some of the things I do, too!
It offends me when people get patents on trivial stuff that we would expect
any student to do. I come from a culture where the compensation came because
one's work was recognized as advancing civilization. Of course, in literature
there were royalties, not grants. But mostly it was that people had done good
work, so you figured they deserved a continuing job. If I were to consider a
strategy of becoming rich, it would be so I could support people who need
support, who I consider are doing things for the future, like the dukes and
duchesses used to support Mozart. 
DDJ: If you could climb in the pulpit and scold, exhort, and encourage every
working programmer in the United States, what would you tell them?
DK: The first thing I would say is that when you write a program, think of it
primarily as a work of literature. You're trying to write something that human
beings are going to read. Don't think of it primarily as something a computer
is going to follow. The more effective you are at making your program
readable, the more effective it's going to be: You'll understand it today,
you'll understand it next week, and your successors who are going to maintain
and modify it will understand it.
Secondly, ideas that are mathematical in nature should be the property of the
world and not of the individual who thinks of the theorem. I'd prefer that all
but the most sophisticated algorithms be made public and that everybody use
them, and not that every time you use such-and-such a method you should pay a
nano-penny to some fund.
I wrote an open letter to the head of the U.S. Patent Commission, published in
the current printing of the CWEB manual. I said, "What if lawyers were to have
rights to their precedents? What if people had patents on words of the English
language, and every author who wanted to write a novel would have to check
which words they were using and pay royalties to the owners of those words?
Can't you see how obvious it is that the quality of the legal system and the
quality of published books would go down? Because you're taking away the
building blocks that people need to do their job."
The basic building blocks that software designers need to do their jobs are
algorithms and languages and mathematics. It's traditionally impossible to
patent a mathematical formula, for very good reason. Anyone who would wish to
calculate the area of a circle and use (pi)r2 should have to pay a royalty for
that: It's exact, it's a universal thing. I think that algorithms should be in
exactly the same category. Algorithms are mathematics.
Algorithms are the building blocks to create large, useful systems. The
service that you're providing for people is making those systems more
accessible, packaging them better, giving better help on the phone, but not
just having a method that other people could put into another system.
I would encourage programmers to make their work known the way mathematicians
and scientists have done for centuries. It's a comfortable, well-understood
system and, you get a lot of satisfaction knowing people like what you did.
The whole thing that makes a mathematician's life worthwhile is that he gets
the grudging admiration of three or four colleagues.


Acknowledgments


The author wishes to acknowledge the help of Steven R. Wheeler of Vesta
Technology in Wheat Ridge, Colorado, in preparing for this conversation with
Dr. Knuth.
Figure 1: Donald E. Knuth





























Hashing Rehashed


Is RAM speed making your hashing less efficient?




Andrew Binstock


Andrew is editor-in-chief of UNIX Review and coauthor of Practical Algorithms
for Programmers (Addison-Wesley, 1995). He can be reached at
abinstock@mfi.com.


Hashing algorithms occupy a unique place in the hearts of programmers.
Discovered early on in computer science, they are among the most approachable
algorithms and certainly the most enjoyable to tinker with. Moreover, the
endless search for the Holy Grail, a perfect hash function for a given data
set, still consumes considerable ink in each year's batch of computer-science
journals. For developers who program for a living, however, recondite
refinements to unusual hashing functions are of little use. In general, when
you need a hash table, you want a quick, convenient hashing function. And
unless you know where to get one, you will almost certainly fall prey to the
fallacy that you can write and optimize one quickly for your particular
application.
In this article, I will discuss a good, general-purpose hashing algorithm and
then analyze some of the standard wisdom on how hashing can be made more
efficient. Finally, I'll examine the surprising effect of hardware advances on
hashing--in particular, the widening gap between CPU performance and
memory-access latency.


Basics


Hash tables are just like most tables (the old-fashioned word for "arrays"). A
table becomes a hash table when nonsequential access to a given element is
achieved through the use of a hash function. A hash function is generally
handed a data element, which it converts to an integer that is used as an
index into the table. For example, if you had a hash table into which you
wanted to store dates, you might take the Julian date (a positive integer
assigned to every date since January 1, 4713 bc) and divide it by the size of
the hash table. The remainder, termed the "hash key," would be your index into
the hash table.
In a closed hash table, the hash key is the index of the element into which
you store the date and the data associated with that date. If the slot in the
table is occupied by another date (this is termed a "collision"), you can
either generate another hash key (called "nonlinear rehashing") or step
through the table until you find an available slot (linear rehashing).
In an open hash table (by far the most common), each element in the table is
the head of a linked list of data items. In this model, a collision is handled
by adding the colliding element to the linked list that begins at the slot in
the table that both keys reference. This approach has the advantage that the
table does not implicitly limit the number of elements it can hold, whereas
the closed table cannot accept more data items than it has elements.


Hash Functions


Hashing is useful anywhere a wide range of possible values needs to be stored
in a small amount of memory and be retrieved with simple, near-random access.
As such, hash tables occur frequently in database engines and especially in
language processors such as compilers and assemblers, where they elegantly
serve as symbol tables. In such applications, the table is a principal data
structure. Because all accesses to the table must be done through the hash
function, the function must fulfill two requirements: It must be fast and it
must do a good job distributing keys throughout the table. The latter
requirement minimizes collisions and prevents data items with similar values
from hashing to just one part of the table.
In most programs, the hash function occupies an insignificant amount of the
execution time. In fact, it is difficult to write a useful program in which
the hash function occupies as much as 1 percent of the execution time. If your
profiler tells you otherwise, you need to change hash functions. I present
here two excellent, general-purpose hash functions. HashPJW() (see Example 1)
is based on work by Peter J. Weinberger of AT&T Bell Labs, and is better
known. ElfHash() (Example 2) is a variant of HashPJW() that is used in UNIX
object files that use the ELF format. The functions accept a pointer to a
string and return an integer. The final hash key is the modulo generated by
dividing the returned integer by the size of the table. The portable version
of Weinberger's algorithm in Example 1 is attributable to Allen Holub.
Unfortunately, Holub's version contains an error that was picked up in
subsequent texts (including the first printing of Practical Algorithms for
Programmers, by John Rex and myself). The bug does not prevent the code from
working, it simply gives suboptimal hashes. The version in Example 1 is
correct. Example 2 is taken from the "System V Application Binary Interface"
specification. This algorithm is occasionally misprinted (see the Programming
in Standard C manual for UnixWare, for instance) in a form that will give very
bad results. The version I presented here is correct.
Both functions attain their speed by performing only bit manipulations. No
arithmetic is performed and the input data is dereferenced just once per byte.
Because a program spends so little time hashing, you should not tinker with an
algorithm to optimize its performance. You should focus instead on the quality
of its hashes. To do this, you will need a small program that will hash a data
set and print statistics on the quality of the hash. The wordlist.exe program
(a self-extracting source-code file) is available electronically (see
"Availability," page 3) and will test your functions. The file also contains
some useful test data sets. The program reads a text file and stores every
unique word and a count of its occurrences into a hash table. A variety of
command-line switches (all documented) allow you to specify the size of the
hash table, print table statistics, and even print all the unique words and
their counts.
This test program shows that the two algorithms mentioned previously do a
superb job distributing hash values throughout the table. In 16-bit
environments, HashPJW() is marginally better at hashing text while ElfHash()
is significantly better at hashing numbers. In 32-bit environments, the
algorithms are identical.


Performance Concerns


While a speedy algorithm does not lend itself to optimization, its hashes
should be given attention if they are of poor quality. However, this attention
should be viewed in context. Even hash-table-intensive programs (compilers do
not figure in this group) will rarely be improved by more than 4 or 5 percent
by migration from a so-so implementation to a perfect hash algorithm. So if
poor-to-best under the most favorable circumstances generates only a 5 percent
improvement, common sense suggests that tuning hash functions should be one of
the very last optimizations you perform. This is doubly true because hash
tuning requires considerable empirical testing: You do not want to improve
performance on one data set while degrading it significantly on another.
Again, unless you have exceptional needs, use one of the aforementioned
functions. Several aspects of your hash table can be tuned easily to improve
performance, however.
The first of these is to make sure the table has a prime number of slots. As
stated previously, the final hash index is the modulo of the hash value
divided by the table size. The widest dispersal of modulo results will occur
when the hash value and the table size are relatively prime (no common
factors), which is most often assured when the table size is prime. For
example, hashing the Bible, which consists of 42,829 unique words, into an
open hash table with 30,241 elements (a prime number) shows that 76.6 percent
of the slots were used and that the average chain was 1.85 words in length
(with a maximum of 6). The same file run into a hash table of 30,240 elements
(evenly divisible by integers 2 through 9) fills only 60.7 percent of the
slots and the average chain is 2.33 words long (maximum: 10). This is a rather
substantial improvement for adding only a single element to the table to make
prime.
A second easy improvement is increasing the size of the hash table. This
optimization is one of the classic memory-for-speed trade-offs and is
effective as long as the data set has more elements than your hash table. (I'm
still talking about open hash tables here. Closed hash tables will be
discussed shortly.) Going back to the example of the Bible, when it was hashed
(with HashPJW()) into a table of 7561 elements, 99.7 percent of the slots were
occupied and the average chain was 5.68 words long; at 15,121 elements,
occupancy was 93.9 percent and an average chain was 3.01 words. The values for
a table of 30,241 elements were shown previously. In terms of performance for
this example, quadrupling the table size nearly halved the chain length.
Consistent with the (overall) small role played by hash operations, the
program's timed performance only improved by 1.7 percent. However, after all
other optimizations are performed, this difference could be meaningful.
Finally, be careful with the construction of the linked lists emanating from
the table. Significant savings can be attained by not allocating memory each
time a data element is added to the table, but by using a pool of nodes
allocated as a single block. Likewise, the lists should be ordered, either by
ascending value or by recency of access. Compiler symbol tables, for example,
often place the most recently accessed node at the head of the list, on the
theory that references to a variable will tend to be grouped together. This
view is especially validated in languages such as C and Pascal, which have
local variables.


Closed Hash Tables


As mentioned previously, the two primary methods of resolving hashing
collisions in closed tables are linear rehashing or nonlinear rehashing. The
idea behind nonlinear rehashing is that it helps disperse colliding values
throughout the table.
Closed hash tables are particularly effective when the maximum size of the
incoming data set is known and a table with many more elements can be
allocated. It has been proven that once a closed table becomes more than 50
percent full, performance deteriorates significantly (see Data Structures and
Program Design in C, by Robert Kruse, Bruce Leung, and Clovis Tondo, Prentice
Hall, 1991). Closed tables are commonly used for rapid prototyping (they're
easy to code quickly) and where fast access (no link list) is a priority and
memory is easily available.
Traditional computer science has found that, given equal load (the ratio of
occupied slots to table size), tables tend to perform better by using
nonlinear rehashing than by stepping through the table. This view has
generally been accepted without much debate because the math behind it is
fairly compelling.
However, hardware advances (of all things!) are beginning to change the math
in favor of linear rehashing. Specifically, the reasons are memory latency and
CPU cache construction.


Memory Latency



High-end CPUs today commonly operate at speeds in excess of 250 MHz. At such
speeds, each instruction cycle takes exactly 4 nanoseconds. By comparison, the
fastest widely available dynamic RAM runs at 60 nanoseconds. In effect, this
means that a single memory fetch is approximately 15 times slower than a CPU
instruction. From this large disparity, you can see that your programs should
avoid memory fetches as much as possible.
Most CPUs today have data caches on board, and the more your programs use this
memory, rather than general DRAM, the better they will perform. As with all
caches, their benefit accrues only if the data you need to access is already
in the cache (when it is, you have a cache "hit"). CPU caches tend to hold
chunks of data brought in over previous operations, and these chunks tend to
include data past the byte or bytes used for the one instruction. As a result,
cache hits will increase if you access adjoining bytes in subsequent
instructions. Hence, you will have far more cache hits if you use linear
rehashing and simply query the adjoining table element than if you use
nonlinear rehashing.
How significant is this difference? The program memtime.c (available
electronically) shows you the effects of memory latency. It allocates a
programmer-defined block and performs as many reads as there are bytes. In one
pass, it does the reads randomly; in the second, it does the reads
sequentially. Table 1 presents the results for 10 million reads into a 10-MB
table.
Predictably, the ratio becomes even more unfavorable when the program runs on
operating systems with paged memory, such as UNIX. When the previous program
was run on a 486/66-MHz PC with 70-ns memory running UnixWare, the sequential
reads took 3 seconds and the random reads took 35 seconds.
The point is clear: Memory accesses that occur together should be located
together.
You could argue that saving a handful of seconds across 10 million memory
accesses is hardly worth the effort. In many cases this will be true.
Nonetheless, coding for this optimization may often be a simple matter of
choosing between equal efforts (as in the case of rehashing approaches).
Moreover, every indication suggests that CPU speed will continue to increase,
while DRAM access speeds have remained stable for several years. As time
passes, the benefits of considering memory latency will increase.
In the specific case of rehashing, the linear approach has an additional
benefit: A new hash key does not have to be computed. So, for closed hash
tables, keep them sparsely populated and use linear rehashing.
Memory latency is bound to become more significant in other areas of
programming. Bear this in mind as you code.


Acknowledgments


I'd like to thank Marshall K. McKusick and John Mashey for their ideas, and
John Rex and Ralph Barker for help in refining them.
Example 1: HashPJW(). An adaptation of Peter Weinberger's generic hashing
algorithm.
/*--- HashPJW ---------------------------------------------------
 * An adaptation of Peter Weinberger's (PJW) generic hashing
 * algorithm based on Allen Holub's version. Accepts a pointer
 * to a datum to be hashed and returns an unsigned integer.
 *-------------------------------------------------------------*/
#include <limits.h>
#define BITS_IN_int ( sizeof(int) * CHAR_BIT )
#define THREE_QUARTERS ((int) ((BITS_IN_int * 3) / 4))
#define ONE_EIGHTH ((int) (BITS_IN_int / 8))
#define HIGH_BITS ( ~((unsigned int)(~0) >> ONE_EIGHTH ))
unsigned int HashPJW ( const char * datum )
{
 unsigned int hash_value, i;
 for ( hash_value = 0; *datum; ++datum )
 {
 hash_value = ( hash_value << ONE_EIGHTH ) + *datum;
 if (( i = hash_value & HIGH_BITS ) != 0 )
 hash_value =
 ( hash_value ^ ( i >> THREE_QUARTERS )) &
 ~HIGH_BITS;
 }
 return ( hash_value );
}
Example 2: ElfHash(). The published hash algorithm used in the UNIX ELF format
for object files.
/*--- ElfHash ---------------------------------------------------
 * The published hash algorithm used in the UNIX ELF format
 * for object files. Accepts a pointer to a string to be hashed
 * and returns an unsigned long.
 *-------------------------------------------------------------*/
unsigned long ElfHash ( const unsigned char *name )
{
 unsigned long h = 0, g;
 while ( *name )
 {
 h = ( h << 4 ) + *name++;
 if ( g = h & 0xF0000000 )
 h ^= g >> 24;
 h &= ~g;
 }
 return h;
}
Table 1: The results for 10 million reads into a 10-MB table.
Processor/Memory Sequential Random

100-MHz Pentium/ 1 second 7 seconds
60-ns memory
66-MHz Cyrix 486/ 2 seconds 9 seconds
70-ns memory



























































A Cubic Spline Extrema Algorithm


Determining discrete data-set extremes




Mike J. Courtney


Mike is a senior software engineer at Computerized Medical Systems in St.
Louis, Missouri. He can be reached at mikec@cms-stl.com.


When collecting or computing discrete data (particularly time-varying data),
the resulting resolution often is limited by influences such as computational
burden, A/D sampling rates, storage limitations, and transmit rates. This can
be a problem when you are interested in the data that describes the minimum
and/or maximum values within the continuous function. Examples of this include
the determination of the true peak of a computed-signal cross-correlation
profile and the reconstruction of the precise peaks and valleys of an
environmental measurement.
Determining these minima and maxima, collectively referred to as "extrema,"
requires interpolation between the known discrete data points. However, if the
interpolation resolution is too course, the true extrema might be overlooked;
if the resolution is too fine, the computation takes entirely too long.
William Press et al. point out a number of algorithms for more efficiently
finding the extrema of a data set in Numerical Recipes in C, Second Edition
(Cambridge University Press, 1988). These algorithms typically consist of
bracketed searches and are often iterative.
In this article, I'll present an alternative approach derived from the
well-established equation for cubic-spline interpolation. (For background
information on splines, see the accompanying text box entitled "Splinal Tap,
or What's My Spline?") This approach performs a direct, noniterative
determination of the extrema of a discrete data set. 


Algorithm Overview


The cubic-spline extrema algorithm computes the relative extrema of the
continuous function that describes the discrete data set. A relative or local
extremum is the highest or lowest value within a finite portion of the input
data set. A global or absolute extremum is the highest or lowest value within
the entire data set. Therefore, given all the relative extrema, the absolute
extrema can be determined.
The method uses all the given data to compute second derivatives at each point
(also called "knots"). The space between each knot (an "interval") is analyzed
in the cubic-spline sense to determine if and where extrema exist. This
process directly yields the x values, with no iterative searching. The x
values are then used to compute their corresponding y values. This entire
process is fast and accurate, and its implementation is quite robust.


Algorithm Derivation


The goal of cubic-spline interpolation is to derive an interpolation formula
in which the first and second derivatives of the spline polynomials are equal
at the knots. This results in a formula with interval splines that touch at
the knots and exhibit a smooth transition from interval to adjacent interval.
Given a data set described by the general function yj = y(xj), the linear
interpolation in the interval between xi and xi+1 can be expressed as Figure 1
(a), where " denotes the second derivative and Figure 1 (b) is true. The first
derivative of Figure 1 (a) is denoted as dy/dx and is solved as Figure 1 (c).
The derivation of these equations is available in a number of books, including
Press et al. However, if you take this further and set Figure 1 (c) to zero,
then a new equation can be derived that represents the maximization and
minimization of Figure 1 (a). This new equation will allow identification of
the points at which y remains constant with respect to finite changes in x.
Finally, Figure 1 (c) can be expressed in the quadratic form ax2 + bx + c = 0,
such that x can be solved to yield the cubic-spline extrema equation of Figure
1 (d).
Using these cubic-extrema quadratic coefficients and a quadratic-root solver
yields the candidate extrema x1 and x2. If one or both of these abscissa lie
within the current interval of examination xi to xi+1, then the candidate is a
valid abscissa value at which an ordinate extremum exists. If neither lies
within the interval, then no extremum lies within the interval.


Implementation


Listing One is cubic_io.c, a data input and output program that reads and
parses an input ASCII data file, calls FindCubicExtrema(), and prints the
returned relative extrema. The format of the ASCII data is one x, y point per
line, where the x and y are either real or integer and are separated by a
comma. Each x value should be greater than the previous one.
In main(), the data file is opened and the number of points is determined so
that the input data arrays x_in[] and y_in[] can be allocated to a length of
num_pnts. Then the file pointer is reset to the beginning of the file and each
line is read and parsed into the x_in and y_in arrays. Also allocated is the
structure which will contain the first of a singly linked list of computed
extrema. This structure is defined in cubic.h (Listing Two). Each allocation
should be checked for success before continuing. However, these checks and
their associated error handling (for example, message posting, memory freeing
returning an error code) are not shown in the listings.
Because I am not using a doubly linked list that could provide pointers to the
previous data, I return to the first structure in the list by saving its
address in the pointer variable first_extr prior to calling
FindCubicExtrema().
FindCubicExtrema() is then called and passed the number of input data points,
the input x and y arrays, and the first output extr structure. If no extremum
is found, the function returns a Failure and the user is notified. If the
function returns Success, the linked-list pointer is reset to the first one
and each computed extremum is listed as the linked list is stepped through
until a Null pointer is reached.
All relative extrema are returned--not just the absolute extrema--because no
assumptions are made about the user's application. For example, computing the
average period of the input data would require all extrema, and examining the
rate of decay of an underdamped oscillatory function would require all maxima.
If only an absolute extremum were desired (for a cross-correlation peak for
example), then it would be very simple to loop through all the relative
extrema to find the absolute maximum. In fact, the loop that lists the
relative extrema could be easily modified to perform this task. For these
reasons, the implementation of FindCubicExtrema() is generalized such that all
relative extrema are computed and returned to the calling program.


Cubic-Extrema Computation


The primary routine that computes the data extrema is FindCubicExtrema() (see
Listing Three), which performs the following:
1. Calls ComputeSecDerivs() to compute the second derivatives of each
interval. 
2. Calculates the cubic extrema quadratic coefficients and calls
FindQuadRoots() to compute the roots of the quadratic equation. 
3. Determines if the roots, which are candidate extremum abscissa, lie within
the current interval.
4. Calls ComputeY() to determine the corresponding ordinate value for each
valid extremum abscissa.


Computing the Second Derivatives



The derivative form of the cubic-spline equation in Figure 1 (c) has
everything required to perform the calculation of the quadratic coefficients
except the second derivatives of the input data ordinates, denoted as y".
Therefore, these must first be computed using the routine ComputeSecDerivs().
This computation is performed as the solution of a system of N spline
equations in N unknowns of the general form a11x1 + a12x2 + ... + a1NxN = b1
through aN1x1 + aN2x2 + ... + aNNxN = bN. These equations may be represented
in matrix form as Ax = b, where A represents the matrix of as. This system of
equations is a symmetric tridiagonal form that is easily solved with a
degenerate version of Gaussian elimination. In the software implementation,
however, different variables are used to avoid confusion with the quadratic
coefficients of FindCubicExtrema(). The matrix variables denoted as a are
instead represented in the code as main_diag[] and diag[], which correspond
respectively to the main diagonal and off-diagonal elements. The bs are
represented as right[] to indicate the right side of the equations. You are
solving for the second derivative variables x, which are denoted as
sec_deriv[].
The computation of the second derivatives in ComputeSecDerivs() involves
division by the diagonals, which are computed from the abscissa data x[]. The
requirement that the abscissa values be increasing implies that the data must
be unique, which is the case for the problem I have posed. If the abscissa did
not change but the ordinate did, then the data would represent an impulse
function and the corresponding derivative would be infinity.
If these abscissa data were nonunique, a divide-by-zero error would occur.
Therefore, to practice defensive programming, the values of diag[] and
main_diag are asserted to ensure that they are not zero. A debug-environment
assertion approach is used instead of an "if...then return Failure" approach
because this is a nonrecoverable error and, according to the rules I have
established, this condition should theoretically never occur. If it does,
something has gone awry in allowing this data to get through earlier defenses.
If a "ship" version of this routine were not guaranteed unique data,
additional error handling would be required beyond the assert().


Computing the Quadratic Roots


Because the second derivatives at the knots are now available, the
cubic-extrema quadratic coefficients are easily solved using the equations in
Figure 1 (d). These coefficients are then passed to the routine
FindQuadRoots() to determine candidate abscissa extrema. Because one or more
of the coefficients may be very small, the results are computed using a
quadratic solver that avoids associated accuracy errors.
The software implementation of FindQuadRoots() includes checks for conditions
that could result in disastrous operations--taking the square root of a
negative number or dividing by zero. Either one of these indicates early-on
that the interval under scrutiny does not have extrema and thus a Failure is
returned. Furthermore, if only one of the divisors is zero, then its
respective status (Failure1 or Failure2) will be returned to
FindCubicExtrema() so that it knows which of the two roots is a valid
candidate. If all these tests pass, then a general status is returned to
indicate that both are candidate extrema in the associated interval.
Assertions are not used because these conditions can occur in a final product
and are not the result of improper programming.


Performing Bounds Checking


If FindQuadRoots() returns a status other than Failure, then at least one of
the roots is a valid abscissa candidate. The roots' return status is tested,
and if they are accepted for further testing, bounds checking is performed to
determine if the root lies within the current interval. If it does, then it is
a valid root and thus an abscissa location at which an extremum lies.
Returning the specific Failure1 or Failure2 status from FindQuadRoots() avoids
finding an invalid root in the current interval. For instance, if the variable
a were computed to be zero and thus x1 were not computed, the previously
computed value for x1 would still persist in FindCubicExtrema(). It would then
be quite possible for this value to fall outside of the previous interval
bounds but now be within the current interval bounds and thus be incorrectly
selected as a valid root. Additionally, the variables x1 and x2 could not
simply be set to a known value, such as zero, to indicate failure because that
may also be a valid abscissa within the next interval.


Extremum-Ordinate Computation


If a valid abscissa has been found, it is then utilized to compute the
corresponding ordinate value with ComputeY(). The second derivatives have
already been computed so this computation is easy using the standard equation
for a cubic spline.
If one or more roots were found in FindCubicExtrema(), Success is returned to
main(). Thus, no interrogation of the list is required unless valid data
exists. Furthermore, since the first structure was allocated in main(), this
is the only way to determine if the structure contained valid data, outside of
passing back the number of extrema. If no extrema were found, then the next
pointer would never have been set to Null. Therefore, prior to returning
Failure in FindCubicExtrema(), the next pointer is Nulled so that the loop in
main() responsible for freeing the linked list can recognize the end of the
list.


Test Cases and Results


Ensuring that software is accurate and consistent requires that it be
validated using input data that yield calculable and thus comparable outputs.
These different data sets represent test cases that provide performance
verification and "stress" testing. I developed several test cases for testing,
debugging, and validating the cubic-spline extrema algorithm and its software
implementation. 
Four-point Data. The first data set is simple, consisting of the four data
points plotted in Figure 2, with a polynomial interpolating curve overlaid for
visual aid. Symmetry implies that a maximum is expected at an x value of 0.0
and a y value of approximately 1.1. This data set permits validation that the
program can handle negative input data and solve for an extremum at an
abscissa of zero. I ran the program using these data and the extremum computed
were as expected: 
Expected:
x = 0.0, y~1.1

Computed:
x = 0.00, y = 1.15
Three-point Data. The last three of the four data points from the previous
data were then used to illustrate two other important attributes of the
algorithm and software implementation. It is intuitively apparent that two
data points cannot yield an extremum--only a straight line. At least three are
required, therefore representing a minimal data set. This "stress" test also
shows that nothing in the algorithm or software precludes an extremum from
being located in the first or last interval. This data set was input and
yielded the results of 0.0774 and 1.096 for the x and y extremum,
respectively. These represent errors of less than 10 percent, which is quite
satisfactory given the small amount of data provided to the algorithm.
Trajectory data. The next data set is an extension of the first in that it
yields a single maximum. However, this seven-point data set is derived from
the equation expressing the trajectory of a projectile. Therefore, the
expected abscissa and ordinate values are calculable. The data plotted in
Figure 3 was hand-calculated based upon a projectile shot at 1500 feet/second
at an initial angle of +45 degrees and a position sampling at 10-second
intervals until ground impact. 
The standard trajectory equations for the maximum height and corresponding
distance were then used to hand calculate the expected extremum for comparison
to the algorithm results. 
Expected:
x = 34,937.89, y = 17,468.94

Computed:
x = 34,896.04, y = 17,469.07
The x error was only about 0.10 percent and the y error was negligible.
Cubic splines operate best when provided exact data. But what if the data were
approximate due to some real-world measurement errors. To evaluate the impact
of this on the algorithm, the trajectory data was rounded to the nearest 20
feet. This provided a good simulation of the errors that might be expected if
the global positioning system (GPS) were used to track the projectile with an
accuracy of approximately +/-18 feet. This rounded data set was input to the
program and yielded the results of x = 34,960.77 and y = 17,485.27, which each
translate to errors of less than 0.10 percent.
System response data. The next data set is from control-system theory. Given
the step response of the underdamped second-order system y(t) = 1 - 1.414 e-t
cos(4t - 45) and evaluated from t = 0 to 3.5 seconds in 0.25 steps, the 15
points are yielded as in Figure 4. 
As illustrated, this data set should yield both maxima and minima. An
interpolating curve is not shown in Figure 4 because even a single fifth order
polynomial is not of high-enough order to characterize the function
represented by the data. However, by computing the first derivative of the
given function y(t), its expected theoretical extrema can be hand calculated.
Five expected extrema were found using this function-dependent calculation.
Supplying the data set to the cubic extrema algorithm yielded the following
results:
Expected: 
t = 0.135, y(t) = -1.986 
t = 0.920, y(t) = 1.547 
t = 1.706, y(t) = 0.751 
t = 2.491, y(t) = 1.114 
t = 3.277, y(t) = 0.949 

Computed:

t = none, y(t) = none
t = 0.915, y(t) = 1.551
t = 1.707, y(t) = 0.751
t = 2.492, y(t) = 1.114
t = 3.279, y(t) = 0.949
The first extremum was not found, although the cause is obvious from
observation of Figure 4. The data supplied to the algorithm indicated no trend
toward the first minimum. Thus, the algorithm could not have been expected to
locate this extremum without more finely sampled data. However, of the extrema
detected, the largest extremum error noted in the time axis was only 0.5
percent and the maximum y(t) error was about 0.25 percent.
Two-Root Data. The last data I'll present provide the validation of several
algorithm/software characteristics which represent a stress-test situation.
These are the ability to perform with irregularly sampled and/or missing data,
to find two extrema within a single interval, and to find very subtle extrema.
This combination is a rare occurrence that represents a worst-case scenario.
The eight-point data set depicted in Figure 5 comprises samples of the
function f(x) = x3 - 2x2 + x + 1.
Note that the 4th interval is quite large compared to the others. Manually
taking the derivative of this function shows both a maximum and a minimum in
this interval. Intuitively, it makes sense that there can be no more than two
extrema in an interval. The combination of irregular sampling and two subtle
extrema make this situation challenging and uncommon. The function's expected
and resulting data are:
Expected: 
x = 0.333, f(x) = 1.148 
x = 1.000, f(x) = 1.000 

Computed:
x = 0.318, f(x) = 1.147
x = 1.030, f(x) = 1.007
These data indicate abscissa errors of less than 5 percent and ordinate errors
of less than 1 percent. This is very successful given the circumstances.
However, it has been shown empirically that if the first data point were not
provided to the program, then the first extremum would not be detected. This
is because although a root is calculated in the third interval, its value
falls within the second interval due to extremum subtlety, and the root is
therefore invalidated by the bounds checking. In such situations, the more
data, the better.


Miscellaneous


I conducted several additional tests to validate the performance of the
algorithm and code. These included a degenerate case of both a positively and
a negatively sloped straight ramp. For these cases, no extrema exist and none
were detected. I also tested a data set that has decreasing abscissa values,
which is a violation of our increasing-data condition. Using this data showed
that while no extrema were calculable with this data, the software did not
fault, either. Additionally, I used a data set which itself contains an
extremum. This was important to ensure that the root computations and
bounds-checking logic would not preclude an extremum that coincides with an
interval knot.


Conclusions


The cubic-spline extrema algorithm effectively determines the relative extrema
of a given data set. It can accommodate negative input data and solve for an
extremum at an abscissa of zero. An extremum can be found with just three data
points, and nothing in the algorithm or software precludes an extremum from
being located in the first or last interval. Two extremum can be detected per
interval, subtle extremum are calculable, and accurate results are achievable
with and without exact input values. Finally, the higher the data resolution
and accuracy, the more accurate the computed extrema will be.
Splinal Tap, or What's My Spline ?
Curve or data fitting finds a function that matches a set of observed values.
The primary methods are interpolation and least-squares data fitting.
Interpolation assumes the given data to be exact and attempts to find a smooth
function that passes through all the data points. Least-squares fitting
assumes the given data to be approximate and determines a function that passes
through or near all the points as well as possible.
A disadvantage of some interpolators, such as polynomial interpolators, is
that a single polynomial function may not accurately satisfy all of the data
points, unless of higher order, and thus error may be introduced. Spline
interpolation, on the other hand, computes a different polynomial in each
interval. These polynomials are computed such that every point, or knot, is
exactly intersected and a smooth transition exists from interval to interval.
The term "spline" originates from drafting, where flexible pieces of wood
(splines) were used to draw smooth curves by bending them between knots. The
shape assumed by the spline between the knots is a third-degree (or cubic)
polynomial.
--M.J.C.
Figure 1: Equations for deriving the algorithm.
Figure 2: Simple maximum data.
Figure 3: Trajectory data.
Figure 4: Underdamped second order system.
Figure 5: Two-root function.

Listing One
/* File: cubic_io.c -- Input/output program for the FindCubicExtrema()
routine.
 Reads an ascii file of x,y data, calls the routine, and lists the
 returned extrema. -- by Mike J. Courtney
*/
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <math.h>
#include "cubic.h"
main()
{
 /* the following are required for main() */
 unsigned int i; /* array indices */
 unsigned int num_extr; /* number of extrema found */
 BYTE len; /* input data parsing variable */
 char data_file[80]; /* ascii data file name */
 char str[40], line[80]; /* input data parsing variables */
 FILE *fp; /* ascii data file pointer */
 struct point *first_extr; /* pointer to the first extreme structure */

 /* the following are required for FindCubicExtrema() */
 unsigned int num_pnts; /* number of x,y pairs */
 float *x_in, *y_in; /* arrays of x and y input data values */
 struct point *extr; /* pointer to current extreme structure */
 /* open input data file and determine number of input points */
 printf("\n Enter name of input file: ");
 gets(data_file);
 if((fp = fopen(data_file, "r")) != NULL)
 {
 num_pnts = 0;
 while(!feof(fp))
 {
 fgets(line, 80, fp);
 num_pnts++;
 }
 num_pnts -= 1;
 printf("\n The number of input points was %d.",num_pnts);
 
 /* allocate the input data arrays */
 x_in = malloc(num_pnts * sizeof(float));
 y_in = malloc(num_pnts * sizeof(float));
 
 /* read in the each data line and parse into x and y arrays */
 rewind(fp);
 for (i = 0; i < num_pnts; i++)
 {
 /* get a line */
 fgets(line, 80, fp);
 len = strcspn(line,",");
 /* get the x value */
 strncpy(str, line, len);
 /* NULL terminate the x value */
 str[len] = '\0';
 x_in[i] = atof(str);
 /* get the y value */
 strcpy(str, line+len+1);
 y_in[i] = atof(str);
 }
 fclose(fp);
 /* allocate first structure of linked list of output extrema */
 extr = (struct point *)malloc(sizeof(struct point));
 /* save the address of the first structure in the list */ 
 first_extr = extr;
 /* call the routine that computes the extrema */
 if (FindCubicExtrema(num_pnts, x_in, y_in, extr) == FAILURE)
 printf("\n\7 No extrema found !\n");
 else
 {
 /* print the linked list of extrema */
 printf("\n\n Relative extrema computed:");
 extr = first_extr;
 num_extr = 0;
 while (extr)
 {
 printf("\n %d. y = %f at x = %f", num_extr+1, extr->y, extr->x);
 extr = extr->next;
 num_extr++;
 }
 }

 free(x_in);
 free(y_in);
 /* free the linked list of extreme structures */
 do
 {
 /* point to first structure */
 extr = first_extr;
 /* save address of next extreme structure so that when we free the
 current one, we still have a pointer to the next one */
 first_extr = extr->next; 
 free(extr);
 }
 while (first_extr != NULL);
 }
 else
 printf("\n Couldn't open %s !", data_file);
} 

Listing Two
/* File: cubic.h. -- #defines, typedefs, and prototypes for cubic_io.c & 
 cubic_extrema.c. -- by Mike J. Courtney 
*/
/* status defines */
#define SUCCESS 1
#define FAILURE 0
#define FAILURE1 -1
#define FAILURE2 -2
/* flag defines */
#define TRUE 1
#define FALSE 0
/* typedefs */
typedef int BOOL;
typedef unsigned char BYTE;
/* structures */
struct point {
 struct point *next;
 float x;
 float y;
};
/* prototypes */
BOOL ComputeSecDerivs (unsigned int, float *, float *, float *);
BOOL FindCubicExtrema (unsigned int, float *,float *, struct point *);
BOOL FindQuadRoots (float, float, float, float *, float *);
void ComputeY (unsigned int, float *, float, float *, float *, float *);

Listing Three
/* File: cubic_extrema.c -- Contains FindCubicExtrema() and supporting
routines
 that implement the cubic spline extrema algorithm. Given a set of x,y data 
 points, determine in the cubic spline sense the relative extrema of the 
 function describing the data. -- by Mike J. Courtney
*/
#include <stdlib.h>
#include <math.h>
#include <stdio.h>
#include <assert.h>
#include "cubic.h"
/* Primary routine that implements the cubic spline extrema algorithm. Calls
 ComputeSecDerivs() to compute the second derivatives, computes quadratic 
 coefficients, calls FindQuadRoots() to solve quadratic roots, determines if

 roots are valid abscissa, and calls ComputeY() to compute ordinates.
*/
BOOL FindCubicExtrema (
 unsigned int num_pnts, /* input - address of number of x & y points */
 float *x, /* input - array of x values */
 float *y, /* input - array of y values */
 struct point *extr) /* output - singly linked list of extrema */
{
 float a, b, c; /* coefficients of quadratic equation */
 float x1, x2; /* roots of quadratic equation to be computed */
 unsigned int i; /* array index */
 float *sec_deriv; /* array of second derivatives of each data interval */
 BOOL root_stat; /* computation status returned by FindRoots() */
 BOOL valid_flag; /* TRUE if at least one valid root found */
 BOOL first_flag; /* TRUE if current root is the first one */
 
 /* allocate array for second derivatives */
 sec_deriv = malloc(num_pnts * sizeof(float));
 /* compute the second derivatives */
 ComputeSecDerivs(num_pnts, x, y, sec_deriv);
 /* initialize extrema flags */
 valid_flag = FALSE;
 first_flag = TRUE;
 /* loop through all the input points and find the extrema */
 for (i = 0; i < num_pnts - 1; i++)
 {
 /* compute the quadratic coefficients */
 a = 3.0 * (sec_deriv[i+1] - sec_deriv[i]);
 b = 6.0 * (x[i+1] * sec_deriv[i] - x[i] * sec_deriv[i+1]);
 c = 6.0 * (y[i+1] - y[i]) - (2.0 * x[i+1] * x[i+1] - x[i] * x[i] + 2 *
 x[i] * x[i+1]) * sec_deriv[i];
 c -= (x[i+1] * x[i+1] - 2.0 * x[i] * x[i+1] - 2.0 * x[i] * x[i]) *
 sec_deriv[i+1];
 /* determine the roots of the cubic extrema quadratic equation */
 root_stat = FindQuadRoots(a, b, c, &x1, &x2);
 if (root_stat != FAILURE)
 {
 /* if root x1 was calculated successfully */
 if (root_stat != FAILURE1)
 {
 /* Determine if root is within the interval */
 if ((x1 > x[i]) && (x1 < x[i+1]))
 {
 /* first root (extremum) */
 if (first_flag == TRUE)
 first_flag = FALSE;
 /* beyond first valid root so allocate next extremum structure */
 else
 {
 extr->next = (struct point *)malloc(sizeof(struct point));
 extr = extr->next;
 }
 extr->next = NULL;
 extr->x = x1;
 /* compute the corresponding value of y at the extreme x value */
 ComputeY(i, x, extr->x, y, sec_deriv, &extr->y);
 valid_flag = TRUE;
 }
 }

 /* if root x2 was calculated successfully */
 if (root_stat != FAILURE2)
 {
 /* Determine if root is within the current interval */
 if ((x2 > x[i]) && (x2 < x[i+1]))
 {
 /* first root (extremum) */
 if (first_flag == TRUE)
 first_flag = FALSE;
 /* beyond first valid root so allocate next extremum structure */
 else
 {
 extr->next = (struct point *)malloc(sizeof(struct point));
 extr = extr->next;
 }
 extr->next = NULL;
 extr->x = x2;
 /* compute the corresponding value of y at the extreme x value */
 ComputeY(i, x, extr->x, y, sec_deriv, &extr->y);
 valid_flag = TRUE;
 }
 }
 } /* end of if(root_stat ! = FAILURE) */
 } /* end of for(i) */
 free(sec_deriv);
 if (valid_flag == TRUE)
 return SUCCESS;
 else
 {
 /* Set next to NULL just in case it was not set in the loop - this is
 so that free loop will operate properly upon return */
 extr->next = NULL;
 return FAILURE;
 }
 }
/* Use input x,y data to form tridiagonal matrix and compute second 
 derivatives of function in the cubic spline sense. */
BOOL ComputeSecDerivs (
 unsigned int num_pnts, /* input - number of x & y points */
 float *x, /* input - array of x values */
 float *y, /* input - array of y values */
 float *sec_deriv) /* output - array of 2nd derivatives of intervals */
{
 unsigned int i; /* index */
 float ftemp; /* temporary float */
 float *main_diag; /* ptr to matrix main diagonal array */
 float *diag; /* ptr to matrix diagonal array */
 float *right; /* ptr to array of right sides of matrix equations */
 main_diag = malloc((num_pnts - 2) * sizeof(float));
 diag = malloc((num_pnts - 2) * sizeof(float));
 right = malloc((num_pnts - 2) * sizeof(float));
 
 /* compute the matrix main and off-diagonal values */
 /* even though the calling program is suppose to have guaranteed that the
 x values are increasing, assert that neither of the diagonal 
 differences are zero to avoid a divide by zero condition */
 for (i = 1; i < num_pnts - 1; i++)
 {
 main_diag[i-1] = 2.0 * (x[i+1] - x[i-1]);

 assert(main_diag[i-1] > 0);
 }
 for (i = 0; i < num_pnts - 1; i++)
 {
 diag[i-1] = x[i+1] - x[i];
 assert(diag[i-1] > 0);
 }
 /* compute right hand side of equation */
 for (i = 1; i < num_pnts - 1; i++)
 right[i-1] = 6.0 * ((y[i+1]-y[i])/diag[i-1]-(y[i]-y[i-1])/diag[i-2]);
 /* forward eliminate tridiagonal */
 sec_deriv[0] = 0.0;
 sec_deriv[num_pnts - 1] = 0.0;
 for (i = 1; i < num_pnts - 2; i++)
 {
 ftemp = diag[i] / main_diag[i];
 right[i] -= (right[i-1] * ftemp);
 main_diag[i] -= (diag[i-1] * ftemp);
 }
 /* backward substitution to solve for second derivative at each knot */
 for (i = num_pnts - 2; i > 0; i--)
 sec_deriv[i] = (right[i-1] - diag[i-1] * sec_deriv[i+1]) / main_diag[i-1];
 free(main_diag);
 free(diag);
 free(right);
 return SUCCESS;
}
/* Solve for roots x1 and x2 of a quadratic equation of the form 
 a * (x * x) + b * x + c = 0 using the following formula x1 = d / a and
 x2 = c / d, where d = -0.5 * [b + sgn(b) * sqrt(b*b - 4ac)].
 This algorithm is particularly good at yielding accurate results when 
 a and/or c are small values.
*/
BOOL FindQuadRoots (
 float a, /* input - coefficient a of quadratic equation */
 float b, /* input - coefficient b of quadratic equation */
 float c, /* input - coefficient c of quadratic equation */
 float *x1, /* output - first root computed */
 float *x2) /* output - second root computed */
{
 float d; /* root algorithm variable */
 BOOL root_stat; /* status of root computations */
 
 d = b * b - 4 * a *c;
 if (d < 0)
 return FAILURE;
 else
 {
 d = (float)sqrt((double)d);
 /* make the result of sqrt the sign of b */
 if (b < 0 )
 d = -d;
 d = -0.5 * (b + d);
 /* solve for the roots of the quadratic */
 /* if both root computations will yield divide by zero ... forget it! */
 if ( (a == 0) && (d == 0) )
 return FAILURE;
 
 root_stat = SUCCESS;

 /* compute first root if denominator a is not zero */
 if (a == 0)
 root_stat = FAILURE1;
 else
 *x1 = d / a;
 /* compute second root if denominator d is not zero */
 if (d == 0)
 root_stat = FAILURE2;
 else
 *x2 = c / d;
 return root_stat;
 }
 }
/* Given an abscissa (x) location, computes the corresponding cubic spline
 ordinate (y) value.
*/
void ComputeY (
 unsigned int i, /* input - array index */
 float *x, /* input - array of x values */ 
 float x_value, /* input - x value at which to solve for y */
 float *y, /* input - array of y values */
 float *sec_deriv, /* input - array of second derivatives of each data
interval */
 float *y_value) /* output - address of y extreme value at x */
{
 float A, B, C, D; /* cubic spline coefficients */
 float ftemp; /* temporary float */
 /* compute the standard cubic spline coefficients */
 A = (x[i + 1] - x_value) / (x[i + 1] - x[i]);
 B = 1 - A;
 ftemp = (float) pow((double)(x[i + 1] - x[i]), 2.0) / 6.0;
 C = (A * A * A - A) * ftemp;
 D = (B * B * B - B) * ftemp;
 /* compute the ordinate value at the abscissa location */
 *y_value = A * y[i] + B * y[i + 1] + C * sec_deriv[i] + D * 
 sec_deriv[i + 1];
 return;
}


























Calling 16-bit DLLs from Windows 95


Incorporating 16-bit DLLs into 32-bit applications




Steve Sipe


Steve is a developer with GE Fanuc Automation in Charlottesville, Virginia. He
can be reached at steve.sipe@cho.ge.com.


Porting 16-bit Windows 3.x applications to 32-bit Windows 95 can sometimes be
difficult, especially if the 16-bit apps depend on third-party DLLs. In some
cases, the 32-bit, third-party DLLs may not yet be available or, if they are,
the expense of replacing these DLLs with 32-bit versions is prohibitive. These
are the problems I encountered when converting a Windows 3.x application that
used a 16-bit TWAIN scanner DLL to a Windows 95 app.
In this article I'll present a technique for easily incorporating 16-bit DLLs
into 32-bit applications. The source code and DLLs (available electronically;
see "Availability," page 3) will let you build interfaces between 32-bit
applications and 16-bit DLLs using only Visual C++ 2.x and Visual C++ 1.x.


The Problem


Most scanners currently have only 16-bit device drivers that communicate with
16-bit TWAIN DLLs. Unfortunately, 32-bit applications can't directly call a
16-bit DLL. Microsoft suggests two ways to accomplish this: 
Build a 16-bit server .EXE and communicate through DDE.
Implement a thunking layer.
On one hand, implementing a server .EXE seemed to be the easiest approach. It
didn't involve generating, assembling, and linking "thunk scripts" to
thunking-layer DLLs. On the other hand, the server .EXE approach had a major
drawback--I could not display TWAIN modal dialog boxes in my application and
have it automatically disable the application's main window. Since each .EXE
has its own task space, making a dialog box in one task appear modal to
another task is not a simple process. Implementing a thunking-layer DLL would
solve this problem because the DLL would be in the same task as my
application's main window. I was hesitant about thunking mainly because of its
reputation as a "black art." In spite of this, I began to implement a thunking
layer so that my 32-bit application could call 16-bit TWAIN DLL functions
directly.


Why is Thunking Necessary?


Windows 95 and Windows NT use flat memory addressing. Pointers exist as 0:32
(flat) addresses. Windows 3.x is based on segmented memory addressing, and
pointers exist as 16:16 (segment, offset) addresses. The two types of pointers
are not directly interchangeable. 
Enter the thunking layer. Windows 95 implements a thunking layer to allow
existing 16-bit Windows applications to call 32-bit Windows 95
operating-system code. This thunking layer has several useful functions. One
function is the translation of 0:32 pointers to 16:16 pointers, something I
definitely needed to accomplish. Another important function is the ability to
call 16-bit code from 32-bit applications. The Windows 95-specific thunking
solution (known as the "flat thunk") directly supports the thunking layer
built into Windows 95. This would enable me to pass pointers across 32-bit to
16-bit boundaries as well as call 16-bit code. The only drawback is that this
thunking approach is only available when running on Windows 95. For my
application, this was not a concern because our target platform was Windows 95
anyway.


The Adventure Begins


Armed with my Microsoft Developer's CD, Win32 SDK, MASM, and Visual C++, I
began to explore the basics of building thunking DLLs. I found a good article
on the Developer's CD that gave me an idea of exactly what I wanted to do. It
described the steps involved in building a thunking layer:
Write a thunking script that describes each function and parameter in the
16-bit DLL.
Run the thunk compiler to generate the assembly-language file that does the
actual thunking.
Assemble the assembly-language file as a 16-bit and 32-bit object.
Build a 32-bit DLL that initializes the 32-bit side of the thunking layer and
link the 32-bit object file to it.
Build a 16-bit DLL that initializes the 16-bit side of the thunking layer and
link the 16-bit object file to it.
Run the Win32 SDK 16-bit version of RC to mark the 16-bit DLL as "Windows 4.0
compatible".
Build a 32-bit application and link the 32-bit thunking DLL to it.
I started by analyzing the sample program from the Developer's CD. I examined
each piece of code to determine which pieces were important for my
implementation. Some things became immediately apparent. Building and
maintaining my thunking layer involved incorporating some new (and old) tools
into my build environment: the thunk compiler, assembler, and the new 16-bit
version of the RC compiler (required for version stamping). 
I began to implement my first thunking function call. First, I added a
prototype in my 32-bit application for the 32-bit side of the thunking
function; see Example 1(a). Then, I added the code for MyNew32Func() to the
32-bit thunk DLL and exported it; refer to Example 1(b). Next, I wrote an
entry for the 16-bit function call in the thunk script using the C-like
language supported by the thunk compiler; see Example 1(c). Finally, as
Example 1(d) illustrates, I added the code for MyNew16Func() to the 16-bit DLL
and exported it. 
I then ran the make files that generated an assembly-language file from the
thunk script, assembled it, compiled the DLLs, and linked in the thunk object
files. 
It quickly became apparent that adding new functions would be a tedious task.
I soon began to search for an approach that would yield the desired results
with much less effort, preferably one that didn't require using the thunk
compiler and assembler when adding new functions.


The Design


One approach for simplification would be to devise some "generic" function
that I could call in my 32-bit and 16-bit thunking DLLs. Using one function
meant that I would only need to write the thunk script once, compile it, then
assemble it. From that point on, I would only have to link to the thunk object
because the function definition would not need to change. I formulated a plan.
I knew that I could pass variable arguments to C-style functions, but the
thunk compiler requires Pascal functions (which have a fixed number of
arguments). I also wanted the thunking layer to translate my pointers from
0:32 style pointers to 16:16 pointers. 
I decided to build a simple function in the 32-bit DLL that is exported with
the C calling convention and then use the C-variable-argument routines to
access the parameters. This function would be responsible for copying the
variable parameters I specified to fixed parameters and calling my 16-bit
thunk function. The next challenge was specifying the types of parameters that
I wanted to pass. Were they pointers requiring translation, or just values? 
Having done some MFC OLE programming, I was familiar with the proxy-dispatch
setup it uses. A proxy specifies an array of parameter types that it passes to
InvokeHelper() along with a dispatch ID. The parameter-type array specifies
the type of each parameter (see the DTC_xxx values in parm.h for the types
that my implementation uses). The dispatch ID corresponds to an entry in the
dispatch table of the connected OLE object. This entry specifies the
appropriate function to call.

The OLE interface that InvokeHelper() calls also performs what Microsoft
describes as "marshaling" and "unmarshaling" of data. This basically means
that data is grouped together on the sending end and ungrouped on the
receiving end. The Invoke16Func() function that I eventually implemented
serves the same purpose as the OLE InvokeHelper() method. It is called by a
proxy method and is responsible for grouping variable parameters into fixed
parameters so the thunking layer can convert them properly. On the 16-bit side
the thunk function ungroups the parameters, places them on the stack frame
using inline assembly statements, and looks up the corresponding function (in
the dispatch table) and calls it. 


The New Process


I now had an easy way to define new 16-bit functions and call them from 32-bit
applications--a way that no longer required the tedious iterations of
building, compiling, and assembling thunk scripts for each new function. 
First, I defined a simple proxy in my 32-bit application; see Example 2(a). I
chose to use C++ but could have just as easily used C. This proxy described my
parameter types for the marshaling interface, then called the interface. The
return code (ulError) contains the return code from the 16-bit function.
In the 16-bit thunk DLL, I added the dispatch table in Example 2(b) and added
the code for MyNewFunc(); see Example 2(c). 
That's it. I just recompiled the 16-bit thunk DLL and recompiled my 32-bit
application. No more thunk compiler, thunk scripts, MASM, or the like. I had a
simple way to implement 16-bit function calls in 32-bit code. In fact, I could
use the 16-bit header files as a starting point for building my proxies. 


A Small Hurdle


Then came my first obstacle. I found that quite a few of the functions in my
favorite graphing package need to use floats and doubles, but the thunk
compiler doesn't support these data types. After a little research, I found
that converting numbers between 32-bit and 16-bit apps was fairly
straightforward. The only tricky data type is int. The int type is 32 bits in
a 32-bit application and 16 bits in a 16-bit application. I was familiar with
this ambiguity because OLE forces you to use short to describe 16-bit integers
and long to describe 32-bit integers. It turned out that all the other C data
types are the same size on both the 32-bit and 16-bit sides.
Realizing that manually converting numbers would not be a problem, I still had
one last hurdle to overcome--passing floating-point values across the thunk
boundary. I speculated that it might be possible to build an array of double
values and load it with the various double parameters that I would need, then
pass a pointer to the array across the thunk boundary. This pointer could then
be translated to a proper 16-bit pointer. This works because the values don't
really need any translation across the boundary. I can load up all my values
and just pass one pointer to the appropriate array. This also turned out to be
a more-efficient approach. I could now pass quite a few values using only one
fixed parameter.


Some Minor Restrictions


My implementation has a few minor restrictions, mainly because of limitations
in the thunk compiler. You can pass a maximum of 11 pointers (values only take
a maximum of two pointers: one for doubles and one for all other value types).
Of course, you can easily get around this restriction by passing structure
pointers. I haven't found any functions yet that require 11 pointers. 
Each of the major data types are defined in the header file parm.h. Keep in
mind that the only important thing is the size of the data. In other words, an
unsigned short is 16 bits and a signed short is 16 bits. The data definition
DTC_SHORT defines a 16-bit value; the sign bit does not matter to the
Invoke16Func() interface.


What Tools do I Need?


Given the DLLs and samples provided, you should be able to implement your own
interfaces to 16-bit DLLs. As long as you have no need to modify the
thunk-script behavior, you will only need Microsoft Visual C++ 2.x for the
32-bit side and Visual C++ 1.x for the 16-bit side. I have also included a
quick-and-dirty program called "stamp" to update the version stamp of the
16-bit DLL. This eliminates having to run the new 16-bit RC compiler to update
the version stamp.


Helpful Hints


There are a few things to keep in mind when implementing your proxies and
dispatch methods:
Be sure you choose the proper data size. For example, the DTC_INT type is 16
bits (same as short) because int values are 16 bits in 16-bit DLLs, but int is
32 bits in your 32-bit calling code. A good rule of thumb is to always use
short and long for data definitions instead of int. If you are writing proxies
for third-party packages be sure to translate any ints in the prototypes to
shorts in your proxy.
Make sure that the number of parameters in the parms[] array exactly matches
the number and type of parameters that the 16-bit function expects. Otherwise,
the "illegal instruction" gremlin will haunt you. If you do crash the 16-bit
DLL, you will probably have to reboot. Unfortunately, crashing often leaves
the 32-bit and 16-bit thunking DLLs loaded in memory.
Don't pass pointers to pointers. This will defeat the 0:32 to 16:16
translation. If your 16-bit function needs to allocate and return data, use
GlobalAlloc(), and pass back the memory handle.
Be careful with structure alignment. 32-bit and 16-bit applications align
structures on different boundaries. A good idea is to use the #pragma pack(1)
directive around the structure definitions to ensure that no boundary
alignment occurs. Also, don't pass pointers in structures because they won't
get translated.
You can always write local functions in the thunking DLL to manipulate data,
then call 16-bit imported functions. For instance, suppose you had an imported
16-bit function MyRect(HWND hwnd, RECT rect) that required a RECT structure.
Because the proxy interface does not directly support passing structures by
value, you would have to call MyRect(HWND hwnd, RECT *lpRect). You could then
write an intermediate local function in the 16-bit thunking DLL _MyRect(HWND
hwnd, RECT *lpRect) that calls the imported function _MyRect(hwnd,*lpRect) and
passes the RECT structure by value.
Pointers can reference a maximum of 64K of data. This limitation exists
because the thunking layer maps the flat segment of a 0:32 pointer into a
one-segment 16:16 pointer. If you need to address more that 64K of data, use
GlobalAlloc() and pass the memory handle.
One last thing:The 32-bit DLL can fail to load for various reasons, but
usually for reasons related to the 16-bit DLL. Verify that the 16-bit DLL's
version stamp was updated. Also, verify that any imported 16-bit DLLs are in
the path.


The Sample Code


The PARMTEST sample application (available electronically) is an MFC
application that calls a simple 16-bit DLL named "DummyDLL." DummyDLL
implements a couple of simple functions. One displays a modal message box; the
other draws random shapes into the caller's device context. The file
dummypxy.cpp implements the proxy methods for DummyDLL. The dispatch table for
the functions is contained in the file parm16ds.c. 
Example 1: Implementing a thunking function call. (a) Adding a prototype in a
32-bit application for the 32-bit side of the thunking function; (b) adding
and exporting code for MyNew32Func() to the 32-bit thunk DLL; (c) entry for
the 16-bit function call in the thunk script; (d) adding and exporting code
for MyNew16Func() to the 16-bit DLL.
(a)
void MyNew32Func(HWND hDlg, LPSTR lpszSomeText);

(b)
void MyNew32Func(HWND hDlg, LPSTR lpszSomeText)
{
 // Call the 16 bit thunked function
 MyNew16Func(hDlg,lpszSomeText);
}


(c)
void MyNew16Func(HWND hDlg, LPSTR lpszSomeText)
{
 lpszSomeText = input;
}

(d)
void FAR PASCAL _export MyNew16Func(HWND hDlg, LPSTR lpszSomeText)
{
 // Set a dialogbox field from the 16 bit side
 SetDlgItemText(hDlg,lpszSomeText);
}
Example 2: (a) Defining a proxy in a 32-bit application. (b) Adding the
dispatch table in the 16-bit thunk DLL; (c) adding code for MyNewFunc().
(a)
LONG CMyProxy::MyNewFunc (HWND hDlg, LPSTR lpszSomeText)
{
 unsigned long ulError;
 BYTE parms[] = {DTC_HWND,DTC_PTR,DTC_END};
 Invoke16Func(0x01,&ulError,hDlg,lpszSomeText);
 return((LONG) ulError);
}

(b)
BEGIN_DISPTABLE
 DISP_FUNCTION(0x01,MyNewFunc)
END_DISPTABLE

(c)
LONG FAR PASCAL _export MyNewFunc(HWND hDlg, LPSTR lpszSomeText)
{
 MessageBox(hDlg,lpszSomeText,"Hello from 16 Bits!",MB_OK);
 return(0L);
}





























Tuning Java Performance


Fast execution for a dynamic language




Paul Tyma


Paul, who is a PhD candidate in computer engineering at Syracuse University,
is president of preEmptive Solutions, an Internet technologies company. He can
be reached at ptyma@preemptive.com.


Although the Java programming language has opened up new dimensions in the
world of programming, it's also uncovered some new challenges. Given that Java
is platform independent and interpreted, writing code that performs well is no
longer cut-and-dried. Java programmers need to focus their optimization
efforts on a higher level, independent of architectural idiosyncrasies. 
In this article, I'll examine how compiled Java runs, then present techniques
for speeding things up. The coding guidelines I'll present will perform well
on any platform. As always, your first consideration in getting the best
performance out of your code should be your choice of algorithms and data
structures. Binary searches, quick sorts, and hash tables offer great benefits
in the right situations. 
The data in this article were recorded from running Java on several different
machines. The optimizations I discuss do not assume any platform-specific
features. For the most part, these are well-published techniques, but focused
for Java.


What to Optimize?


Your greatest performance gains will come from speeding up the most-used code,
such as highly iterated loops and popularly called methods. Focusing
optimization on these areas will give you the best gain-for-effort ratio.
One of Java's greatest selling points is its architectural independence, which
it accomplishes by compiling its code into its own intermediate
representation, not any specific machine language. Though it's typical for
compilers to generate intermediate code and then create actual machine code,
Java doesn't go that far. Instead, it leaves its executable in this
intermediate form.
Java's intermediate representation is made up of instructions termed
"bytecodes." The individual bytecodes look much like the assembly language of
any given machine. The intermediate representation (and, subsequently, the
Java virtual machine) is designed as a stack-based architecture. Most people
who know the assembly language of a modern machine are used to register-based
machines. In a stack-based machine, there are no registers. There is a stack
where you can push, pop, and manipulate data, but that is the extent of the
design.
Another point to consider in deciding where to optimize is that Java bytecodes
are currently interpreted. The future promises dynamic ("just-in-time")
compilers that jump in just as a Java program is executed to compile the
bytecodes into the target machine code. The final result is that the Java
program is running the same as any other compiled program (with a short pause
at the start while the dynamic compiler works its magic). Unfortunately, these
compilers might not be widely available for a long time. So for now, you're
forced to live with interpretation. 
A significant portion of interpretation time is spent dealing with overhead.
Instruction execution requires the following steps:
Fetch the instruction.
Decode the instruction.
Fetch the arguments of the instruction.
Execute the instruction.
These steps are there whether it's a CPU executing its machine code or the
Java nterpreter executing its bytecodes. The difference is that in an
interpreter, the first three steps are overhead. For a CPU executing machine
code, much of the work of the first three steps is soaked up in the pipe of
modern superscalar processors. 
There is no real way of removing this overhead for each instruction, short of
dynamic compilation. Because of Java's architectural independence, we can't
rely on the performance tricks of any given architecture. You don't know if
the target processor has an on-chip cache, what its jump-prediction algorithms
are, or anything else. Even the CISC concept of expensive versus inexpensive
instructions doesn't carry over very well: Interpretation tends to smooth over
individual instruction costs. Given these uncertainties, the best solution is
to simply reduce the total number of instructions that are executed. 
This doesn't necessarily mean to reduce the size of your code. Small bits of
code can certainly loop themselves into painfully slow performance, but we're
more interested in decreasing the number of executed instructions wherever
possible--while still achieving the desired output. All this talk has been at
the level of interpreters, but don't fret--even when dynamic compilation comes
around, it too will benefit from fewer instructions to compile and run.
Many general programming-language optimization rules apply to Java. The int
data type is faster than long (as is float compared to double). A long is
twice the size of an int, making accesses comparably slower. In addition,
Java's int data type is already bigger than some programmers are used to (it
is 32 bit, whereas 16 bit is common for many PC C compilers). So, wherever you
can, keep things down to int and float.
In general, iterative code is faster than equivalent recursive code. This is
true in any language, since method calls are expensive.


Use the Best Algorithm, Use the Best Data Structure


The class file (bytecodes) that results from compiling Java programs is
amazingly small. Even intensive graphical programs will often be only a few
kilobytes, because accesses to Java's API library rely on finding the classes
on the machine executing the program. For example, drawing a line actually
calls code in the java/classes directory. Unfortunately, calls to system
functions (drawing graphics, opening files, and the like) are largely a
crapshoot. Often when optimizing, you are (transparently) leaving the
beautiful, architecture-independent world of Java and implicitly executing
native code. For our interests, it's important to note that you can't count on
the speed of such operations for different architectures. Drawing a line may
be swift on some machines but a painful operation on others. In general,
performance of code that is heavily dependent upon API calls is largely out of
your hands.
Squeezing performance out of object-oriented languages has never been easy.
For one thing, object-oriented programming encourages reusable classes. From a
grand code-development standpoint, the ability to reuse code for future
projects increases long-term programmer productivity. However, generic classes
usually require many conditional statements to support a relatively generic
set of input. This level of support increases code size and costs processing
time.
Consider Java's prebuilt stack class. This class accepts any object to be
pushed on the stack. Certainly, that encompasses a broad usage. However, often
you only need to stack something as mundane as integers. The built-in stack
class will do this--at least after you perform conversions of your integers to
and from object status. To achieve better performance, you would be better off
creating a stack class specifically designed to stack integers.


Inlining


Inlining is everybody's favorite optimization technique, and for good
reason--performance gains can be dramatic. Good program design usually
involves relatively fine-grained modularity. With increasing numbers of
discrete methods comes increasing numbers of method calls. From a readability
and design standpoint, that's great: Discrete tasks in a class are logically
placed in their own methods. Unfortunately, from a performance standpoint,
method calls are expensive. Depending upon the implementation, this involves
saving the current state, performing a jump (which may entail moving to
currently uncached memory), and keeping track of where to return. 
Again, interpreted environments are hit with extra instructions that don't
actually work towards performing the goal of the program but instead deal with
overhead. Object-oriented programming often exacerbates the problem by
providing countless accessor methods within a class to get (effective)
read-only access to its private variables. Accessor methods are nothing more
than a variable access (a theoretical single memory read) but still incur all
the calling overhead. To overcome this problem, compilers often attempt to
inline methods--in effect, placing the method body at the call site and
ignoring any semblance of an actual subroutine call. Listing One is part of a
class intended to store and manipulate matrices.
The tester class creates two matrix_work objects and multiplies them together.
The size of the matrices are substantial to expose the effect of
optimizations. The A.multiply(B) statement took several seconds on all tested
machines. For future comparisons, I'll use the relative run time of 5000
milliseconds.
To have the javac compiler perform method inlining requires two steps. First,
it will only inline static, private, or final methods. You won't always be
able to ensure inlining--that is, a private accessor method is only so useful.
Secondly, you need to specify the -O option when compiling. After testing (and
debugging) is done, there is rarely a good reason not to compile your classes
with this option. For Listing One, I inlined the three accessor methods by
making them final, as in Example 1. This brought the run time from 5000
milliseconds to approximately 3500. Across all architectures tested, the
savings was at least 25 percent. Considering that adding the final modifier to
these statements is a trivial act, it's certainly worth the effort. The change
in the bytecode generated was evident, as a comparison between Listing Two and
Listing Three shows. The only change is from a method invocation to a getfield
(that is, an instance variable access). 
Declaring methods as static or private also allows inlining where applicable.
According to post-inlined bytecodes, the A object is accessing private
variables of the B object. The compiler runs its checks and ensures that your
code does not violate any access restrictions. After that, it takes whatever
liberties it can to create faster code. To put it another way, once the
compiler is sure you've followed the rules, it knows where it can safely break
them.
You can also inline values by declaring variables as final. In general, this
is analogous to utilizing the preprocessor in C to replace some string by a
value (specified via the #define macro). This technique improves readability,
maintainability, and saves a memory access at run time.



Synchronized Methods


Synchronization of methods is applicable when using multithreading, and it is
effectively how Java utilizes the operating-system concept of monitors.
Declaring a method as synchronized requires the method to obtain a lock before
it is permitted to execute. If another method currently has that lock, the
first must wait for the lock until it is available. 
This level of synchronization is key in assuring the protection of critical
sections of code. However, it is important to remember that every object and
every class has exactly one lock. Therefore, if a thread has the lock for a
given object, then no other thread may enter any of the synchronized methods
within that object. If a thread must needlessly wait for a lock, you may want
to redesign your class. To ensure data integrity in Example 2, it is important
that not more than one thread is accessing any one account at a time.
Logically, there is no reason one thread can't deposit into checking while
another thread is withdrawing from savings (assuming these are separate
accounts). With the aforementioned code, for a given banking object, that is
not possible. The problem is that as soon as any thread enters one of the
methods, it obtains a lock for the entire myMoney object, generating an
unnecessary performance hit. 
There are several solutions to this dilemma. First, consider whether the class
is designed with logical contents. For the aforementioned example, it would
likely have made more sense to have two separate classes, a savings account
class and a checking account class. Another solution would be to not declare
methods as synchronized. Protection of critical sections is vital, but overuse
hits performance and increases the chance of deadlock. 
Besides the possibility of uselessly blocking a thread, the overhead generated
by synchronizing a method is significant. Tests have shown that a noninlined,
unsynchronized accessor method can run four times faster than a synchronized
one. In addition to slowing running time, declaring a method as synchronized
removes the compiler's ability to inline that method.
To bring synchronization down to a finer level, you can specify objects to be
used as synchronization objects. Example 3 is a slightly modified version of
the code in Example 2. The synchronizations do not lock the myMoney object;
they lock the checking and banking objects. Again, it would be more desirable
to split this type of class in two. However, in some situations that may not
be possible and the use of synchronization objects can be useful.


Code Motion


The JDK Beta 2.0 version of the compiler surprisingly did not support code
motion; see Example 4(a). 
Many contemporary compilers realize the value-2 expression has no dependencies
within the loop. Therefore, a compiler could calculate value-2 prior to loop
execution and use the result in the conditional with g (as a temporary
variable). Unfortunately, in the javac compiler, the expression value-2 is
calculated for each iteration of the loop even though the result is the same
every time. In Example 4(a), the effect is not very detrimental. But it should
be obvious that complex operations in that situation could needlessly waste
processing time. Eliminating this type of transgression is nearly free and can
be done with a temporary variable; see Example 4(b).
The practice of eliminating redundant calculations can be applied to array
indexing. Every time you access an array, the system must determine the
indexed memory location--that is, an implicit calculation. On top of that,
Java's VM does array range checking. Although run-time array bounds checking
has become a well-optimized science, its cost can still be noticeable.
Wherever you can, eliminate redundant array references. Looking back to
Listing One, there are a significant number of redundant array index
calculations embedded in the multiply method. Listing Four shows a new
multiply method exhibiting (somewhat overzealous) array index calculation
movement.
The temprow vector is assigned to a row of the matrix array. For all
processing past this point, the code will be interested only in one row of
matrix, so pointing to the row directly will save having to follow the row
pointer each time. Additionally, a temporary variable is used to accumulate
the values and is assigned to the T array location after the loop. This way,
calculating the index into T is done only once instead of for each iteration
of the k loop (i and j do not change inside the k loop). 
Recall that inlining the accessor methods brought the run time down from a
relative value of 5000 milliseconds to approximately 3500. Implementation of
the code change in Listing Four brought the run time down to 2400
milliseconds. Inlining and code motion (array index calculation motion, to be
specific) reduced run time by over 50 percent. As a parting triviality, you'll
also notice that the loop variable declarations (variables i, j, and k) were
moved. The Java VM peculiarly treats the first three local variables of a
method (parameters are counted first) slightly differently than all subsequent
ones. Several bytecode instructions are tailored specifically for the first
three locals, providing a tiny performance improvement (a 2 percent speed
increase was measured for Listing Four). 


Conclusion


Given Java's close relationship to C, many of the same high-level
optimizations pay off. Java's architecture neutrality forbids using CPU tricks
to speed up code. No assumptions are allowed. Subsequent generations of Java
compilers and interpreters (and dynamic compilers) are likely to improve
performance all on their own. Java code that relies heavily on
system-dependent API calls are destined to be unpredictable. Good
object-oriented design and performance coding have always had their conflicts.
This is not to say that good design can't be fast--it just takes a smart
programmer and compiler. 
Example 1: Inlining accessor methods.
public final int getrows() {
 return rows; }
public final int getcols() {
 return cols; } 
public final int getcoord(int r,
int c) { return matrix[r][c]; }
Example 2: Ensuring data integrity.
class banking {
 synchronized public void deposit_checking() {...}
 synchronized public void withdraw_checking() {...}
 synchronized public void deposit_savings() {...}
 synchronized public void withdraw_savings() {...}
 }
// instantiated elsewhere with
 banking myMoney = new banking();
// multiple threads can now access myMoney
 .
 .
Example 3: Modified version Example 2.
class banking {
 Object banking = new Object();
 Object checking = new Object();
 public void depositChecking() {
 synchronized(checking) {...}}
 public void withdrawChecking() {
 synchronized(checking) {...}}
 public void depositSavings() {
 synchronized(savings) {...}}
 public void withdrawSavings() {
 synchronized(savings) {...}}
 }
Example 4: (a) The JDK Beta 2.0 version of the compiler does not support code
motion; (b) using a temporary variable to improve performance.
(a) for (g=0;g<value-2;++g) {
 x[g] = g*2;

 }
(b) int temp = value-2;
 for (g=0;g<temp;++g) {
 x[g] = g*2;
 }

Listing One
/*============ Matrix Work class ==============*/
 class matrix_work {
 private int matrix[][];
 private int rows,cols;
 matrix_work(int r, int c) {
 rows = r;
 cols = c;
 matrix = new int[rows][cols];
 populate();
 }
 public int getrows() { return rows; }
 public int getcols() { return cols; } 
 public int getcoord(int r, int c) { return matrix[r][c]; }
/* Matrix multiplication - returns number of elements in new matrix */
 public int multiply(matrix_work B) {
 if (cols != B.getrows()) throw new badMultException();
 int numels = rows * B.getcols();
 int T[][] = new int[rows][B.getcols()];
 int i,j,k; 
 for (i=0;i<rows;++i) {
 for (j=0;j<B.getcols();++j) {
 T[i][j] = 0;
 for (k=0;k<cols;++k) {
 T[i][j] += matrix[i][k] * B.getcoord(k,j);
 }
 }
 }
 matrix = T;
 cols = B.getcols();
 return numels;
 }
/* Populates the matrix */
 public void populate() { ..enlightening population code here }
/* ============= Testing class =========== */
 class tester {
 public static void main (String args[]) {
 matrix_work A = new matrix_work(80,40);
 matrix_work B = new matrix_work(40,65);
 int numels = A.multiply(B);
 }
 }

Listing Two
Method int getcols()
 0 aload_0
 1 getfield #19 <Field matrix_work.cols I>
 4 ireturn
 . 
 .
 .
 90 iinc 5 1
 93 iload 5

 95 aload_1
 96 invokevirtual #20 <Method matrix_work.getcols()I>
 99 if_icmplt 35
 .
 .
 .

Listing Three
Method int getcols()
 0 aload_0
 1 getfield #13 <Field matrix_work.cols I>
 4 ireturn
 .
 .
 .
 92 iinc 5 1
 95 iload 5
 97 aload_1
 98 getfield #13 <Field matrix_work.cols I>
 101 if_icmplt 35
 .
 .
 .

Listing Four
public final int multiply(matrix_work B) {
 if (cols != B.getrows()) throw new badMultException();
 int j,k,i;
 int numels = rows * B.getcols();
 int T[][] = new int[rows][B.getcols()];
 int temp,temprow[];
 for (i=0;i<rows;++i) {
 temprow = matrix[i];
 for (j=0;j<B.getcols();++j) {
 temp = 0;
 for (k=0;k<cols;++k) {
 temp += temprow[k] * B.getcoord(k,j);
 }
 T[i][j] = temp;
 }
 }
 matrix = T;
 cols = B.getcols();
 return numels;
 }


















Using the Parallel Adapter as a Host Interface Port


New uses for old tools




Dhananjay V. Gadre


Dhananjay is a scientific officer for the instrumentation laboratory at the
Inter-University Centre for Astronomy & Astrophysics (IUCAA) in Pune, India.
He can be contacted at dvg@iucaa.ernet.in.


There's a world of difference between cameras used to collect astronomical
image data and those we use for home videos. Personal video cameras typically
provide 500x500-pixel resolution, but Astronomical Image Cameras usually offer
2048x2048 resolution; while emerging state-of-the-art cameras called "mosaic
CCDs" provide 8096x8096 resolution. With data-acquisition cameras, the pixel
voltage is almost always encoded to 16 bits. Consequently, each frame of data
can range in size from 8 MB to 12 MB. 
In a recent "star-gazing" project involving data acquisition for
astronomy-related applications, I needed to connect a high-performance, charge
coupled devices (CCD) camera controller to a PC. Eventually the controller
would be connected to the PC via a high-speed fiber-optic link. For
prototyping and diagnostic purposes, however, I decided to use the PC's
parallel-printer port as a link to the camera controller. (The RS-232
interface was immediately ruled out because, at more than 20 KB/sec, the
data-transfer rate was too high. Nor, for that matter, did I want to invest in
a dedicated async transmission circuit on the controller.) In this article,
I'll describe the link between a PC and the controller using the
parallel-printer adapter. While this discussion is based on a digital-signal
processor (DSP), the design is flexible enough to accommodate virtually any
microprocessor or controller, even an 8051.
The camera controller is built around a fixed-point DSP from Analog
Devices--the ADSP-2101. Several features make this processor an ideal camera
controller: It is fast, executes any instruction in one clock cycle, has
zero-overhead looping, has on-chip program and data memory, and has two
buffered synchronous serial ports capable of transmission rates up to 5
Mbits/sec.
As a camera controller, the DSP helps acquire CCD images. The image parameters
are set by the user through a host computer. These parameters define the
exposure time, size of the CCD image, pixel, row binning, and so on. To do
this, the ADSP-2101 performs the following:
Receives commands from the host.
Waits for the signal-integration time.
Generates CCD clock waveforms to shift out each pixel signal.
Reads CCD pixel signal voltage through an analog/digital converter (ADC).
Transfers the ADC data over a suitable link to the host.
Figure 1 is the block diagram for the camera controller. The serial link
between the host and the controller is implemented with a high-speed,
fiber-optic link.
The components of the controller are: 
A backplane bus that carries interconnections between the various cards of the
CCD controller.
An ADSP-2101 processor card to implement a programmable-waveform generator,
optional host-communication link (using the synchronous serial interface of
the DSP), serial ADC interface, and backplane bus interface to connect the
other components. The waveform generator is a crucial component of the CCD
controller. Having a programmable-waveform generator allows users to operate
the CCD camera in a variety of modes by downloading a new waveform-description
table from the host in real time. 
A high-speed (50 Mbits/sec) serial link using a TAXI chipset that interfaces
to an 850-nm fiber-optic physical link. This card connects to the backplane
bus. The TAXI chipset receives 8-bit-wide characters from the DSP card to be
transmitted on the fiber link to the host. Characters received from the host
are read by the DSP, eight bits at a time.
A temperature controller, shutter driver, and telemetry card connect to the
backplane bus. This card has a proportional-control temperature controller to
maintain the CCD chip temperature, which is set by a preset potentiometer. A
voltage multiplier using a high-frequency transformer charges a reservoir
capacitor that discharges into the shutter through a FET switch when the
shutter is to be operated. The DSP controls the voltage multiplier and shutter
operation. A multichannel, 12-bit serial ADC reads chip temperature, dewar
temperature, shutter status, and so on. This card also has stepper-motor
drivers for controlling two stepper motors.
An analog-signal processor (with double-correlated sampling) and serial ADC
card. This card also connects to the backplane. The ASP circuitry receives the
CCD pixel voltage from the backplane bus. This voltage is encoded by the
16-bit serial ADC. The serial-ADC-control signals on the backplane bus are
derived from the serial port of the DSP.
A CCD clock bias and driver card that uses the majority of waveform signals
(generated by the DSP card) present on the backplane bus. These signals pass
through appropriate line drivers before being filtered on the backplane card.
The CCD clock levels are referenced by the bias voltage generator on this
card. The bias voltages are generated by 24 DACs initialized and controlled by
the DSP.


Testing the Camera Controller


The most important components of the controller are the DSP processor card,
bias/clock-driver card, and signal-conditioning/ADC card. Prototyping a CCD
controller begins with these functions. However, testing the functions and the
performance of the CCD controller require that the host computer send
parameters and receive encoded data.
A host interface port (HIP) to connect the controller hardware to the PC via
the parallel-printer adapter simplifies evaluation of the ADC performance and
allows testing the waveform-generator algorithm and noise characteristics of
the signal-conditioning circuit. The bidirectional data-transfer of the PC
parallel-printer port provides high enough data-transfer rates to support
prototyping. Also, the parallel port can be controlled by manipulating common
latches, buffers, and decoders. At the software level, only device-level
routines need to be modified, as the final optical link is a parallel port at
the host end.


The Parallel-Printer Adapter


The printer adapter has three independent TTL-compatible ports:
The data port, an 8-bit output port.
The control port, a 4-bit output port with open collector drivers. 
The status port, a 5-bit input port.
The data and control output ports have input registers at the same address
that can be read back to get the latched value. The control-port outputs have
open collector drivers, typically with 4.7-Kohm pull-up resistors. The control
port can also be used as an input port, by writing 1s to the control register
and then performing a read operation. One of the status-port pins can be
connected to one of the PC IRQs, under control of the control register. The
port addresses are stored in the data RAM (0040:0000 for LPT1) of the PC.
Table 1 shows the pin position of the adapter's three ports on the 36-pin
centronics connector (toward the printer end) and the DB-25 connector (on the
PC), the corresponding bit location in a byte-wide register, and the logic
relation between the register contents and the output pin. A True relation
means that if the register contains 1, the output pin also is at 1. An Invert
relation indicates that if the register bit contains 1, the output pin is at
logic 0.
The parallel-printer adapter can be converted into a simple HIP using the
circuitry in Figure 2. (Figure 3, on the other hand, shows how to connect the
HIP to an 8051-based circuit.) The port can be used to connect virtually any
processor to the PC for bidirectional data transfer. A conservative estimate
of the data-transfer rates between a 486/66-MHz PC and the CCD controller is
in the range of 20-50 KB/sec. The test program uses routines in Listing One
for communication. Suitable programs are written for the DSP on the CCD
controller. Coding parts of the routines in Listing One in assembler can
significantly increase data-transfer rates.
To convert the parallel-printer adapter into a host interface, the host uses
the data port to transmit eight bits of data to the application. A flip-flop
U6-A is reset to indicate to the application hardware that a data byte is
available. This flag could also be used to interrupt the application hardware.
In my case, the DSP application hardware reads the data bits through its input
port buffer U5 and sets up the flag U6-A. The host program monitors U6-A
before transmitting a new byte.
To receive a byte from the application hardware, the host monitors flag U6-B.
If it is reset, a byte is ready for it. Reading this byte is tricky, as the
parallel-printer adapter can read only five bits at a time, so a set of
tristate buffers U1 and U2 are used. U1 allows eight bits at its input and
transmits only four of these inputs to the output. Nibble control pins 1G and
2G on U1 and U2 are controlled by the decoder outputs of U4 to determine which
four of the 16 possible inputs are connected to the output pins. The four
output pins of U1 and U2 are connected to the status-port bits.
The host program manipulates the decoder U4 to enable the lower four bits of
the incoming byte to reach the status port. The status port is then read and
its contents stored temporarily away. The decoder is then manipulated to read
the upper four bits of the byte into the status port. The actual byte is
reconstructed by shifting the first status-port read four bits to the right
and bitwise ORing with the second status-port read result. The seventh and
third bits are complemented to reconstruct the actual byte.
Thereafter, flag U6-B is set to indicate to the application hardware that the
current byte has been read by the host.
A note of caution here. The process of reading the eight bits of incoming data
and the setting up of the flag by the host is not atomic. The flag is set up
by executing a few instructions by the host program. On my 486/66 PC, this
translates to about 5ms, so for error-free data transmission, the application
hardware waits for, say, 10ms after the byte has been read by the host, before
transmitting a new byte to the host. Such a precaution is not required for
data transmission from the host to the application hardware, as reading the
byte and setting up the flag U6-A is atomic.


HIP Software 



Listing One is a sample program that communicates bidirectionally through the
HIP. The program contains four routines that help regulate data flow.
(Additional support software is available electronically; see "Availability,"
page 3.)
The function rxstat() reads the status of U6-B. U6-B is reset by the
controller to indicate that a byte has been latched into U3. rxstat() is used
by the main program to detect the presence of a new byte before reading the
byte.
Function txstat() returns the status of U6-A. When the host program transmits
a byte to the controller, it resets U6-A. After the controller reads the byte,
U6-A is set by the controller. The host program uses txstat() to ensure that
the previously transmitted byte has been read by the controller.
The functions rx() and tx() receive and transmit a byte, respectively.
Function rx() reads the byte sent by the controller (as indicated by a logic 0
of flag U6-B). After the byte is read, rx() sets U6-B to logic 1. Function
tx() resets flag U6-A after a byte is latched into the data-port register of
the printer adapter.
The main program reads the status of the printer-adapter ports and transmits a
sequence of 256 bytes to the DSP application circuit. The application circuit
echoes back the received bytes. This is a good test for missing bytes. 
It is possible to add interrupt capability to the HIP. The interrupt signal is
derived from pin 10 of the 25-pin D as well as the 36-pin Centronics connector
(connected to bit 6 of the status port). A logic 1 (the control-port bit 4)
enables pin 10 to interrupt the PC. The interrupt signal can be derived from
either U6-A or U6-B. Apart from incorporating an interrupt routine, the
current HIP design would need modifications.
With increasing PC speeds, it is possible to induce glitches on the
control-port bits using standard printer cables. I'd advise using a cable with
a twisted pair of wires for each signal.


References


Gadre, Dhananjay V. "Parallel-Printer Adapters Find New Use as Instrument
Interfaces." EDN (June 22, 1995).
Gadre, Dhananjay V., Pramod K. Upadhyay, and Vijaya S. Varma. "Choosing the
Right Bus, Part II: Using the Parallel Printer Adapter as an Inexpensive
Interface." Computers in Physics (January/February 1994).
Hook, Brian and Dennis Shuman. "Digital I/O with the PC." Dr. Dobb's Journal
(April 1994).
Smith, Roger. "Programmable Digital Waveform Generator." Microprocessors and
Microsystems (April, 1989).
Webb, Steve. "Two Chip Video Digitizer." Electronics World + Wireless World
(June 1995).
Figure 1: CCD camera-controller block diagram.
Figure 2: ADSP-2101-based HIP design.
Figure 3: 8051-based HIP design.
Table 1: Pin number for the bits of the three printer-adapter ports. *
Indicates that the pin-to-bit relation is inverted: A logic 0 at the pin
corresponds to a logic 1 in the corresponding register. The $ entry at bit 4
in the control-port register is the interrupt-enable bit. 

Listing One
/* Connect PC through the printer adapter to the DSP-based CCD controller.*/
/* Dhananjay V. Gadre */
#include<stdio.h>
#include<conio.h>
#include<dos.h>
#include<process.h>
#include<time.h>
/* external variables */
extern unsigned dport, sport, cport;
/* external routines. gets addresses of 3 ports from the DOS data RAM */
extern void set_lpt_base_address(int);
/* status port */
#define pin_11 0x80
#define pin_10 0x40
#define pin_12 0x20
#define pin_13 0x10
#define pin_32 0x08
/* control port */
#define pin_17 0x08
#define pin_16 0x04
#define pin_14 0x02
#define pin_1 0x01
/* op & ip semaphores */
#define ip_buffer_flag 0x04
#define ip_buffer_Flag 0xfb
/* this flag is on bit 2 (pin 16 ) of the control port
and can set by a logic low on the pin 16*/
#define op_latch_flag 0x08
#define op_latch_Flag 0xf7
/* this flag is set by pulsing a low on pin 17 (bit 3) of the control port. 
SET condition of this flag indicates that the oplatch contains a new byte */
/* local routines */
unsigned char txstat(void); 
/* check to see if the o/p latch is empty; empty=0 */

unsigned char rxstat(void); 
/* check to see if the i/p buffer has any char; empty=0 */
void tx(unsigned char); /* transmit the char to the latch */
unsigned char rx(void); /* receive a char from the buffer */
void enable_nibble(unsigned char); 
/* this function controls which nibble gets connected to status port pins */
/* txstat: This routines checks pin_13 of printer adapter status port
if the PIN is SET, the o/p latch is full & should not be written to
again. When the DSP reads the latch, the PIN is RESET. Now the latch
can be written to again */
/* return value: 1 is latch full, 0 if latch empty */
unsigned char txstat(void)
{
char latch_status;
enable_nibble(1); /* this function connects the sport to nibble 1*/
latch_status=inportb(sport) & pin_13;
return latch_status;
}
/* rxstat: this function checks pin_12 of the status port. If the PIN is
set, the buffer is full & should be read. if RESET, it is empty. */
/* return value: 0 if the buffer is empty, 1 if the buffer is full */
unsigned char rxstat(void)
{
char buffer_status;
enable_nibble(1); /* this function connects the sport to nibble 1*/
buffer_status=inportb(sport) & pin_12;
return buffer_status;
}
/* tx: This routine latches a byte into the o/p latch */
/* return value: none */
void tx(unsigned char op_byte)
{
unsigned char temp;
outportb(dport, op_byte); /* latch the byte*/
/* now set up the op_latch_flag to indicate that a new byte is available */
temp=inportb(cport) & (0xff ^ op_latch_flag);
temp=temp ^ op_latch_flag;
outportb(cport, temp);
temp=temp ^ op_latch_flag;
temp=temp op_latch_flag;
temp=temp ^ op_latch_flag;
outportb(cport, temp);
return;
}
/* rx: This routine reads the i/p 8 bit buffer */
/* return value: the byte read from the buffer */
unsigned char rx(void)
{
unsigned char ip_byte, temp;
enable_nibble(3); /* set the buffer to read the lower nibble */
temp=inportb(sport);
temp=temp >> 4;
enable_nibble(2); /* set up the buffer to read upper nibble */
ip_byte=inportb(sport);
ip_byte = ip_byte & 0xf0; /* reset lower 4 bits */
ip_byte=0x88 ^ (ip_byte temp); /* concatenate 2 nibbles & flip bit 7 & 3 */
/* now reset the flag to indicate that the byte has been read */
temp=inportb(cport) & (0xff ^ ip_buffer_flag);
outportb(cport, temp);

temp = temp ip_buffer_flag;
outportb(cport, temp);
return ip_byte; /* return the converted byte */
}
void enable_nibble(unsigned char nibble_number)
{
unsigned char cport_status;
cport_status=( inportb(cport) & 0xfc) ; /* isolate bit 0 & 1*/
nibble_number = nibble_number & 0x03;
nibble_number = 0x03 ^ nibble_number; /* invert bit 0 & 1 */
cport_status=cport_status nibble_number;
outportb(cport, cport_status);
return;
}
main()
{
unsigned long count;
unsigned char portval, tempp, tempq;
time_t t1,t2;
FILE *fp1;
int temp=1;
clrscr();
printf("\n\nFinding Printer adapter lpt%d...", temp);
set_lpt_base_address(temp);
if(dport == 0) {printf("\nPrinter adapter lpt%d not installed...", 
 temp); exit(0); }
else
{
printf("found. Base address: %xhex", dport);
portval=inportb(sport);
printf("\n\n D7 D6 D5 D4 D3 D2 D1 D0");
printf("\nStatus port value = %x %x %x %x %x X X X ", \
(portval & pin_11)>>7, (portval & pin_10)>>6, (portval & pin_12)>>5, \
(portval & pin_13)>>4, (portval & pin_32)>>3 );
portval=inportb(cport);
printf("\nControl port value = X X X X %X %X %X %X ", \
(portval & pin_17)>>3, (portval & pin_16)>>2, (portval & pin_14)>>1, \
(portval & pin_1) );
portval=inportb(dport);
printf("\nData port value = %X %X %X %X %X %X %X %X ", \
(portval & 0x80)>>7, (portval & 0x40)>>6, (portval & 0x20)>>5, \
(portval & 0x10)>>4, (portval & 0x08)>>3, (portval & 0x04)>>2, \
(portval & 0x02)>>1, portval & 0x01 );
printf("\n\n\n");
}
/* set up reset states on the control port, all pins to logic 1 */
outportb(cport,0x04);
fp1=fopen("tx_rx", "w");
t1=time(NULL); /* just to log time */
for (count=0;count<256;count++)
{
while (!txstat()); /* wait till the DSP application reads the previous byte*/
tx(tempp); /* transmit a byte*/
while(rxstat()); /* wait till a byte is transmitted by the DSP */
tempq=rx(); /* byte is available, read it */
fprintf(fp1, "TX=%x, RX=%x\n ", tempp, tempq); /* store it in a file */
tempp=tempp++;
}
fclose(fp1);

t2=time(NULL);
printf("time taken = %ld secs", t2-t1);
}




























































The Harvest Object Cache


Making Internet information services scale better




Peter B. Danzig


Peter is a professor of computer science at the University of Southern
California. He can be reached at danzig@usc.edu. The software system described
here is the result of a collaboration between Anawat Chankhunthod, Chuck
Neerdaels, and Peter Danzig of USC, and Michael Schwartz and Duane Wessels of
the University of Colorado-Boulder.


As traffic on the Internet continues to grow and new kinds of information
(such as audio and video data) are transmitted across it, some of the design
limitations behind popular Internet applications become increasingly apparent.
If you've recently tried to reach a popular Web site such as Lycos or Yahoo,
for instance, you've perhaps encountered delays. Likewise, if your Internet
provider has added lots of new users lately, you've likely found the system
unusably slow at peak times during the day.
Although some of these problems are strictly local, others result from
nonlocal design limitations. Internet information services such as FTP,
Gopher, and the World Wide Web have evolved so rapidly that their designers
and implementors postponed performance and scalability in favor of
functionality and easy deployment. These popular services have been designed
with little regard for efficient use of network bandwidth. As an example, they
lack caching support in their core protocols.
There are a variety of approaches for addressing Net-related performance
problems. The Harvest cache, for example, is a hierarchical object cache
designed to make Internet information systems scale better. It has been in use
for two years at about 100 sites on the Net and it can function both as a
proxy-cache and as an httpd accelerator. As an httpd accelerator, the cache
works in conjunction with existing HTTP daemons (Web server software) to
increase throughput dramatically. This can be accomplished in a mostly
transparent manner. 
In this article, I'll present measurements that show that the Harvest cache
achieves an order-of-magnitude performance improvement over other proxy
caches, such as the cache used in the CERN 3.0 server software. Our results
demonstrate that HTTP is not an inherently slow protocol, but rather that many
popular implementations have ignored the sage advice to make the common case
fast.
The Harvest cache is designed to support a highly concurrent stream of
requests with minimal queueing for operating-system-level resources. This is
achieved by use of implementation techniques such as nonblocking I/O,
application-level threading, and virtual memory management.
The Harvest cache runs under several operating systems, including SunOS,
Solaris, DEC OSF/1, HP/UX, SGI, Linux, and IBM AIX. Binary and source
distributions for the cache are available from http://excalibur.usc.edu.
General information about the Harvest system, including the user's manual, is
available from http://harvest.cs.colorado.edu. A commercial version should be
available at press time.


Origins of the Harvest Cache


Hierarchical caching distributes server load away from server hot spots raised
by globally popular information objects, reduces access latency, and protects
the network from erroneous clients. High performance is particularly important
for higher levels in the cache hierarchy, which may experience heavy
service-request rates. The Harvest cache allows individual caches to be
interconnected hierarchically in a way that mirrors the topology of an
internetwork, resulting in additional efficiency increases.
In a hierarchical cache, misses at one level are passed to caches located at
higher levels, as illustrated in Figure 1. In addition to the parent-child
relationships, the cache supports a notion of "siblings" (that is, caches at
the same level in the hierarchy) provided to distribute cache server load.
Each cache in the hierarchy independently decides whether to fetch the
reference from the object's home site or from its parent or sibling caches,
using a simple resolution protocol that works as follows.
If the URL contains any of a configurable list of substrings, then the object
is fetched directly from the object's home, rather than through the cache
hierarchy. This feature is used to force the cache to resolve noncacheable
("cgi-bin") URLs and local URLs directly from the object's home. Similarly, if
the domain name of a URL matches a configurable list of substrings, then the
object is resolved through the particular parent bound to that domain.
Otherwise, when a cache receives a request for a URL that misses, it performs
a remote procedure call (RPC) to all of its siblings and parents, appropriate
for the particular URL checking if the URL hits any sibling or parent. The
cache retrieves the object from the site with the lowest-measured latency.
Additionally, a cache option can be enabled that tricks the referenced URL's
home site into implementing the resolution protocol. When this option is
enabled, the cache sends a "Hit" message to the UDP echo port of the object's
home machine. When the object's home echoes this message, the cache treats it
like a hit generated by a remote cache that had the object. This option allows
the cache to retrieve the object from the home site if it happens to be closer
than any of the sibling or parent caches.
A cache resolves a reference through the first sibling, parent, or home site
to return a UDP "Hit" packet, or through the first parent to return a UDP
"Miss" message if all caches miss and the home's UDP "Hit" packet fails to
arrive within two seconds. However, the cache will not wait for a home machine
to time out; it will begin transmitting as soon as all of the parent and
sibling caches have responded. The resolution protocol's goal is for a cache
to resolve an object through the source (cache or home) that can provide it
most efficiently. This protocol is really a heuristic, as fast response to a
ping indicates low latency. We plan to evolve to a metric that combines both
response and available bandwidth. Hierarchies as deep as three caches add
little noticeable access latency. The only case where the cache adds
noticeable latency is when one of its parents fails, but the child cache has
not yet detected it. In this case, references to this object are delayed by
two seconds, which is the length of the parent-to-child-cache timeout. As the
hierarchy deepens, the root caches become responsible for more clients. To
keep root cache servers from becoming overloaded, we recommend that the
hierarchy terminate at the first place in the regional or backbone network
where bandwidth is plentiful.
Our trace-driven simulation study of Internet traffic in 1993 showed that
hierarchical caching of FTP files could eliminate half of all file transfers
over the Internet's WAN links. Other studies seem to arrive at different
conclusions. For example, both "Long-term Caching Strategies for Very Large
Distributed File Systems," by Rafael Alonso and Matthew Blaze (Proceedings of
the USENIX Summer Conference, June 1991), and "Multi-level Caching in
Distributed File Systems, Or Your Cache Ain't Nuthin' but Trash," by D. Muntz
and P. Honeyman (Proceedings of the USENIX Winter Conference, January 1992),
show that hierarchical caches can, at best, achieve 20 percent hit rates and
cut server workload in half. We believe the different conclusions reached by
these studies is a result of examining different kinds of workloads. Our study
traced wide-area FTP traffic from a switch near the NSFNET backbone. In
contrast, the other studies analyzed LAN workstation file-system traffic.
Because LAN files rarely change over, say, a five-day period, the other
studies found little value in hierarchical caching over flat-file caches at
each workstation.
In contrast to workstation file systems, applications such as FTP, WWW, and
Gopher facilitate read-only sharing of autonomously owned and rapidly evolving
object spaces. We found that over half of NSFNET FTP traffic is due to sharing
of read-only objects, and since Internet topology tends to be organized
hierarchically, that hierarchical caching can yield a 50 percent hit rate and
can reduce server load dramatically. Claffy and Braun reported similar
statistics for Web traffic, which has displaced FTP traffic as the largest
component of Internet packets.


An HTTPD Accelerator


As an httpd accelerator, the Harvest cache pretends to be a site's primary
HTTP server (running on well-known port 80). This faux server forwards
references that are not present in the cache to the site's real HTTP server,
which is attached to port 81. References to cacheable objects such as HTML
pages and GIF images are served by the Harvest cache; references to
noncacheable objects, such as queries and cgi-bin programs, are served by the
true httpd daemon on port 81. If a site's workload is biased toward cacheable
objects, this configuration can dramatically reduce the site's Web workload.
This configuration is easily deployed and does not require major changes to
existing software or document contents. 
While the benefit of running an httpd accelerator depends on a site's specific
workload of cacheable versus noncacheable objects, the httpd accelerator
cannot degrade a site's performance. Further note that objects that don't
appear to be cacheable at first glance can be cached with some slight loss of
transparency. For example, if a site is providing stock quotes or sports
scores, and the workload becomes overwhelming, one can decide to have these
objects cached also (this assumes that your user population accepts
information being out-of-date by 30 seconds or so).
The Harvest cache supports three access protocols: encapsulating,
connectionless, and proxy-http. The encapsulating protocol encapsulates
cache-to-cache data exchanges to permit end-to-end error detection via
checksums, and eventually, digital signatures. This protocol also enables a
parent cache to transmit an object's remaining time-to-live (TTL) value to the
child cache. The cache uses the UDP-based connectionless protocol to implement
the parent-child resolution protocol. This protocol also permits caches to
exchange small objects without establishing a TCP connection, for efficiency.
While the encapsulating and connectionless protocols both support end-to-end
reliability, the proxy-http protocol is supported by most Web browsers. In
that arrangement, clients request objects via one of the standard
information-access protocols (FTP, Gopher, or HTTP) from a cache process. (The
term "proxy" arose because the mechanism was primarily designed to allow
clients to interact with the Web from behind a firewall gateway.) To reduce
the costs of repeated failures (for example, from erroneously looping clients)
we implemented two forms of "negative caching." First, when a DNS lookup
failure occurs, we cache the negative result for five minutes (chosen because
transient Internet conditions are typically resolved within that time frame).
Second, when an object-retrieval failure occurs, we cache the negative result
for a parameterized period of time, with a default of five minutes.


Other Implementation Techniques


For efficiency and portability across UNIX-like platforms, the cache
implements its own nonblocking disk and network I/O abstractions directly atop
a BSD select() loop. The cache avoids forking except for misses to FTP URLs;
we retrieve FTP URLs via an external process because the complexity of the
protocol makes it difficult to fit into our select() loop state machine. The
cache implements its own DNS cache, and when the DNS cache misses, the cache
performs nonblocking DNS lookups, although without currently respecting DNS
time-to-live (TTL) values. As the referenced bytes pour into the cache, they
are simultaneously forwarded to all sites that referenced the same object and
written to disk, using nonblocking I/O. The only way the cache will stall is
if it takes a virtual-memory page fault; the cache avoids page faults by
managing the size of its VM image. The cache employs non-preemptive,
run-to-completion scheduling internally, so it has no need for file- or
data-structure locking. However, to its clients, it appears multithreaded.
The cache keeps all metadata for cached objects (URL, TTL, reference counts,
disk file reference, and various flags) in virtual memory. The amount of
memory required per entry is 48 bytes (on machines with 32-bit word size),
plus the length in bytes of the URL string. The cache will also keep
exceptionally hot objects loaded in virtual memory if this option is enabled.
However, when the quantity of VM dedicated to hot-object storage exceeds a
parameterized high-water mark, the cache discards hot objects by LRU until VM
usage hits the low-water mark. Note that these objects still reside on disk;
only their VM image is reclaimed. The hot-object VM cache is particularly
useful when the Harvest cache is deployed as an httpd accelerator.
The Harvest cache is write-through rather than write-back; even objects in the
hot-object VM cache appear on disk. We considered memory mapping the files
that represent objects, but could not apply this technique because it would
lead to page-faults. Instead, objects are brought into cache via nonblocking
I/O, despite the extra copies.
Objects in the cache are referenced via a hash table keyed by URL. Cacheable
objects remain cached until their cache-assigned TTL expires and they are
evicted by the cache replacement policy, or the user manually evicts them by
clicking the browser's Reload button. If a reference touches an expired Web
object, the cache refreshes the object's TTL with an HTTP "get-if-modified."
The cache keeps the URL and per-object data structures in virtual memory but
stores the object itself on disk. We made this decision on the grounds that
memory should buy performance in a server-bottlenecked system: The metadata
for one million objects will consume 60 to 80 MBs of real memory. If a site
cannot afford the memory, then it should use a cache optimized for memory
space rather than performance.


Performance


We evaluated performance in two ways: comparing the cache against the CERN
proxy-http cache, and measuring the performance gain achieved by the Harvest
cache when used as an httpd accelerator with the Netscape Netsite, NCSA 1.4,
and CERN 3.0 Web servers. These measurements were taken on Sparc 20 model 61
and Sparc 10 model 30 workstations. 

Our figures show that the Harvest cache is an order of magnitude faster than
the CERN cache on hits, and on average, about twice as fast on misses. For
misses, there is less difference because the response time is dominated by
retrieval costs across the WAN. For hits, our figures show that 60 percent of
the time the CERN cache returns a hit in under 500 milliseconds (ms), while 95
percent of the time the Harvest cache returns an object in under 100 ms. The
median response time for hits is 20 ms for Harvest versus 280 ms for CERN. The
average response time is 27 ms for Harvest, versus 840 ms for CERN. Note that
in the cumulative distribution of response times, the CERN response-time tail
extends out to several seconds, so its average is three times its median.
When used as an httpd accelerator, the Harvest cache serves documents that are
present in the cache with a median response time of 20 ms. By comparison, the
medians for hits with Netscape's Netsite and NCSA's 1.4 httpd are each about
300 ms. In the case of a miss, the Harvest cache adds only about 20 ms to the
response times of the companion Web servers running on port 81.
We attribute the Harvest cache's high performance to our disk directory
structure, our decision to place metadata in virtual memory, and our threaded
design.


Conclusion


The Internet's autonomy and scale present difficult challenges to the way we
design and build system software. Once a piece of software becomes accepted as
a de facto standard, both its merits and its deficiencies may live forever.
For this reason, the real-world complexities of the Internet make us face
difficult design decisions. The maze of protocols, independent software
implementations, and well-known bugs that comprise the Internet's upper layers
frequently force tradeoffs between design cleanliness and operational
transparency. In building and using the Harvest cache, we faced these
tradeoffs. In running the cache over time, we also encountered more subtle
problems, such as the interaction between the DNS and our negative cache.
These issues go beyond the scope of this article and are discussed in the
research report available at http://catarina.usc.edu/danzig/cache.ps.


Reference


Alonso, Rafael and Matthew Blaze. "Long-term Caching Strategies for Very Large
Distributed File Systems," Proceedings of the USENIX Summer Conference, June
1991. 
Danzig, Peter, Michael Schwartz, and Richard Hall. "A Case for Caching File
Objects Inside Internetworks," ACM SIGCOMM '93, September 1993.
Muntz, D. and P. Honeyman. "Multi-level Caching in Distributed File Systems
Or: Your Cache Ain't Nuthin' but Trash," Proceedings of the USENIX Winter
Conference, January 1992.
Figure 1: Hierarchical cache arrangement.













































Speeding Up C-tree Plus Database Searches


In search of the homogenized quantum effect




John Mudd


John, a programmer in Florida, can be contacted at mudd%nat@satchmo.oau.org.


When we began evaluating database systems to develop a customer application,
our needs were fairly straightforward: We needed fast, indexed access,
portability, source code, and an inexpensive price. FairCom's c-tree Plus
file-handling software seemed to fit the bill. C-tree is a cross-platform C
function library for performing database I/O. Based on B+tree routines, the
software has been ported to over 100 platforms, ranging from Windows to HP
9000, and provides low-level routines and high-level, multi-key ISAM routines
for high-speed random or sequential access. C-tree also supports variable
record lengths, key compression, client/ server architecture,
ascending/descending key segments, space reclamation, and variable-length key
fields. The software includes transaction processing, alternate collating
sequence, duplicate keys and/or automatic sequence numbers, dynamic space
reclamation, high-speed hashed data and index caching, and more. 
Still, one thing c-tree doesn't provide is a means of performing searches on
partial keys, unless the key is a leading subset of the index. For example, a
compound index such as [Group_IDAccountAmount] can't be used to reduce search
time unless the criterion includes at least a Group_ID clause. A search such
as "Amount > 100" would require sequentially reading all records and testing
each against the Amount criterion since the compound index doesn't make
records available in purely Account-value order. In this article, I'll present
an algorithm that enhances c-tree searching by treating all parts of an index
identically. This search technique can decrease access time, no matter what
parts of an index are specified in the criteria--even if it's in the middle or
the trailing parts of a compound index.


Homogenized Search Criteria 


In a query such as Bank_ID = 1234 and Account < 123456, a [Bank_IDAccount]
index can speed processing by limiting database reads to only records for Bank
1234. This would eliminate reading all records with Bank IDs less than 1234 or
greater than 1234. 
The query code would then sequentially filter out records that had an amount
less than 123456. Unfortunately, my first program to process this sort of
SQL-style query continued to read records until it hit the end of the file,
turning what should be a subsecond query into one that lasts several minutes.
This problem occurred because my program was bogged down in the countless
possible permutations of fields and operands that could appear in the ad-hoc
search criteria. There was too much custom code for each different type of
operand and too little understanding of how different clauses in the search
criteria could be used to limit reading data records. One solution to this
problem is to transform (homogenize) the different operators so that each can
be expressed in identical terms. Once homogenized, it becomes relatively easy
to use the entire search criteria in controlling the direction and progress of
a search.
It probably never would have occurred to me that any clause can be expressed
in a uniform fashion if it hadn't been for the range operator. While range
isn't standard SQL, I still had to implement it to enable a user interface
that provides a range-type query capability to the users. Eliminating the
range operator from my code would mean one less operator to support. While
daydreaming about removing support for range, it occurred to me that a better
approach was to instead convert all other operators to, of all things, ranges.
Table 1 illustrates how SQL-style clauses can be described as one or two
ranges. Note that the range operator is listed first. Obviously it doesn't
require any sort of conversion to be described as a range. The = operator is
also an easy conversion, since the min and max values in the range are both
simply the target value (10) taken from the original criteria. To convert the
rest of the operators, I had to use implied criteria. 
The <= operator example explicitly states that the amount must be less than or
equal to a value of 10. But, since all database types have absolute minimum
and maximum values, the same amount <= 10 criterion also implies that the
amount must be greater than or equal to the absolute minimum value associated
with the Amount field. If the Amount field is a 4-byte unsigned integer field,
then the respective absolute min and max values are 0 and 4,294,967,295. The
end result is that amount <= 10 is equivalent to amount range 0 10.
The < and > operators add another twist to the conversion. The value (10)
specified in the criteria cannot be used directly in the equivalent range. The
solution is to decrement the target value for the < operator and increment the
target for the > operator. Since all database fields are stored as discrete
binary codes, they all can be adjusted one higher or lower as part of the
transformation. I use a collection of C functions to handle the job of
incrementing and decrementing database fields. There is one function for each
basic data type that I support, including char, short, int, long, and float.
The tedious production of this repetitious code is simplified by a single
generic C preprocessor macro that is invoked once for each data type at
compile time. I use another, similar macro to generate all of the functions
necessary for comparing the different types of database field values.
The last two example range conversions, != and (null criteria), use variations
on the techniques just described. The != is the same as amount < 10 or amount
> 10. This compound criterion is then converted into two separate ranges using
the same method as for the < and > operators. The null criterion is what a
user implicitly specifies whenever there's no criterion for a field. No
criterion implies that any value is acceptable, so this criterion simply
converts to the absolute range corresponding to the field type.


Quantum Effect Search Algorithm 


The search algorithm I eventually implemented takes advantage of this more
easily expressed search criterion. It does so by skipping over blocks of data
records that a conventional approach would stubbornly read one after another.
Even though I developed this solution, it still was amazing to watch the
program shift between indexed and sequential access in order to skip over data
records that couldn't satisfy the search criteria. Since this reminded me of
how electrons jump across quantum energy levels, I named the algorithm
"quantum effect" (QE, for short).
The QE is achieved by checking each record as it is read to see if it is too
high or low with respect to the current set of ranges. If too high or low,
there is an opportunity to switch from sequentially reading records to jumping
ahead by performing an indexed read. Reducing the number of reads necessary to
satisfy a query provides a real increase in performance. I label a key value
as too high if the first component to violate its current range is above its
range. I check the components from most significant to least significant--in
other words, from left to right. A compound key value is too low if the first
component to violate its range is below the range. If there are no range
violations, the search may proceed with sequential reads. Figure 1 illustrates
these situations.
At position 1, the initial read is based on a target composed of the three
minimum values [102040] for the three index components. The read searches for
a record with a key value greater than or equal to this target. The subsequent
three records are read sequentially. Each fits within the ranges until
position 2.
At position 2, the value of "c" has fallen below its range of (40-40). A new
target value is built based on the current values of fields to the left of the
low field ("c"). The current values for "a" and "b" are 10 and 21. The minimum
value of "c," the field in violation, also is used to build the new target.
The resulting target is [102140]. Searching for a record with an index value
greater than or equal to this target jumps over several unusable records.
Sequential reading resumes at position 3.
At position 4, field "c" is above its maximum of 40. This record is therefore
too high. A new target is constructed to jump ahead. Minimum values are used
from all index components except just to the left of the field in violation.
The target for "b" will be its current value plus one. The new target for this
example is [102640]. Again, a search is performed for a value greater than or
equal to the target.
The quantum jump lands at position 5 and another block of unusable records is
avoided. Sequential reading resumes. At position 6, the record is too high.
The next target requires incrementing the field to the left of "b," which is
"a." Since "a" is already at its maximum value there is no chance of finding
further matching records. The search can be terminated.
Remember that, because of the null criterion, the algorithm doesn't depend on
the search criteria to specify ranges for the leading portion of the index.
Any missing portions are automatically filled by using the null criterion. The
earlier example would perform roughly the same even if the a = 10 clause was
not specified. The missing clause would simply be substituted with the default
"a" range 0 max_integer clause. The only change to the previous results is
that the search would have continued past the position 6 record since there
would be no restriction on incrementing "a" past a value of 10.
An unexpected advantage of this new freedom to use any part of an index is
that it is possible to merge similar indexes. Figure 2(a) lists three fairly
redundant indexes that have been replaced by the single index in Figure 2(b).
The QE search algorithm makes the single new index perform the jobs of all
three former indexes. Clearly, reducing the number of indexes increases
performance for inserts and updates and saves disk space. Even without the
keys beginning with the Account field, it is still possible for the QE search
to perform a quick Account lookup. One indexed-read-per-unique-Group ID value
quickly checks each group for the presence of a target Account value. In my
particular data there may be as few as ten different Group ID codes in a
database of one million records. The ten indexed reads necessary to scan such
a database for a particular Account deliver approximately the same response
time to the end user as directly reading account records using one of the
original redundant indexes. Decisions regarding index selection still depend
on the amount and distribution of the working data. The QE algorithm helps by
making otherwise-impossible indexing options a reality.
The use of homogenized clauses simplified my code so that my new version is
actually shorter than the first crude attempt that had such poor performance.
It's not often I find a way to simplify my code, drop indexes, and speed up
the queries.


Future Possibilities 


Combinations of ranges is an important concept in this approach. Figure 3
shows how a search criterion can generate a series of combinations of ranges.
This aspect of the approach lends itself to additional performance maneuvers.
The simplest method is to process each search-range combination separately.
Having multiple combinations really doesn't add much complexity to the search
algorithm. It's just a matter of repeating the QE search process once for each
distinct combination of ranges. The overall search continues until all of the
combinations have been processed.
The combinations are listed for Figure 3 in the order necessary to read
records in the sorted order of the index. An alternative is to process
multiple combinations in parallel. A multithreaded version can distribute the
job of searching by handing out individual combinations to separate processes.
Results can then be collected by a central controlling process to be merged,
sorted if necessary, and returned to the calling application. The search time
should be dramatically reduced.
A simple query such as amount > 100 doesn't generate multiple combinations of
ranges needed for a multithreaded approach. Still, it's even possible to
divide this plain search into two parts. The trick is to add the ability to
perform a QE search in either direction. That is, either starting at the
minimum values and reading forward (as described earlier) or starting at the
maximum values and reading backwards. This way it would be possible to run any
query in a minimum of two parallel processes. The opposing threads would
execute until it becomes apparent that they have crossed paths.


For More Information


c-tree Plus 
FairCom Corp

4006 W. Broadway
Columbia, MO 65203-0100
314-445-6833
http:\\www.faircom.com
Table 1: SQL-style clauses can be described as one or two ranges.
 Example Equivalent
 Criteria Ranges
 amount range 1 10 1 10
 amount = 10 10 10
 amount <= 10 min 10
 amount >= 10 10 max
 amount < 10 min (10-1)
 amount > 10 (10+1) max
 amount != 10 min (10-1)
 or
 (10+1) max
 (null criteria) min max
Figure 1: The algorithm checks each record as it is read to see if it is too
high or too low with respect to the current set of ranges.
 Index: [abc]
 Search Criteria: a = 10 and b range 20 30 and c = 40
 Assumptions: All fields are integer type. The index permits
 duplicate values.
 Data:
 a b c
 ----- ----- -----
 10 0 40
 10 15 40
 1) 10 20 40 <-- Initial read starts here.
 10 20 40 <-- Read next.
 10 20 40 <-- Read next.
 2) 10 21 0 <-- Too low (c < 40).
 10 21 10
 10 21 20
 10 21 30
 3) 10 21 40 <-- Quantum jump to here.
 10 25 40 <-- Read next.
 10 25 40 <-- Read next.
 4) 10 25 42 <-- Too high (c > 40).
 10 25 43
 10 25 44
 10 25 45
 5) 10 29 40 <-- Quantum jump to here.
 10 30 40 <-- Read next.
 10 30 40 <-- Read next.
 10 30 40 <-- Read next.
 6) 10 31 40 <-- Too high (b > 30). The end.
 10 31 40
Figure 2: Redundant indexes that have been replaced by a single index. (a)
Before; (b) after.
(a) 
[GroupIDAccountSerial]
[AccountSerial]
[AccountAmount]

(b)
[GroupIDAccountAmountSerial]
Figure 3: How a search criterion can generate a series of combinations of
ranges.
 Search Criteria: a <= 10 and b != 20 and c != 30
 Equivalent Ranges: a range 0 10
 b range 0 19 or b range 21 256

 c range 0 29 or b range 31 256
 Range combinations:
 a b c 
 -----------------+---------------+------------------
 min max min max min max
 ---------+-------+-------+-------+-------+----------
 Combo 1 0 10 0 19 0 29
 Combo 2 0 10 0 19 31 256
 Combo 3 0 10 21 256 0 29
 Combo 4 0 10 21 256 31 256





















































HTML Conversion and FTP Automation


Automating the conversion process while empowering users




Lauren Hightower


Lauren is a general partner of the Calico Company in Tallahassee, FL. Her
company specializes in client/server and Internet development. She can be
contacted at calico@supernet.net.


With the growing popularity of the Internet, information systems (IS)
departments are finding themselves in the odd position of maintaining and
managing unfamiliar data on their companies' World Wide Web sites. But it
doesn't really make sense for IS departments to interpret data developed and
maintained by other departments just so it can be translated to a different
format for distribution. This is equivalent to having IS print reports for
other departments, rather than giving those departments the printer and
software to do it themselves. The people responsible for the data should
prepare it for any type of distribution--paper or electronic. After all, IS is
in the business of managing information, not creating and maintaining it. 
The HTML tags used in World Wide Web documents are not designed to be viewed
in a WYSIWYG environment. Instead, HTML codes are applied to an ASCII document
and interpreted by a browser when it displays the document. This open
architecture allows an enormous amount of flexibility and ensures that any
browser capable of parsing and interpreting HTML tags can display any Web page
that adheres to the HTML standard. 
Unfortunately, HTML is nonsense to many people accustomed to working with
slick, user-friendly business applications. Until software developers create
HTML filters for business software (as they have done for most word
processors), IS is faced with maintaining the information or training end
users to mark up and update their own HTML documents. 


What HTML Automator Does


Overwhelmed with the volume of information my company wanted to deliver across
the Web, I developed a program called "HTML Automator" to automate the process
of converting data stored in spreadsheets and databases to HTML format; see
Figure 1. I also added an FTP module to allow users to replace existing pages
on our Web server without IS intervention. As a result, HTML Automator
eliminates unnecessary work for IS, automates a task that might otherwise
require a long training process, and ensures a standard on all HTML documents
churned out by departments within the company. 
The design of HTML Automator contains two major components--file conversion
and file transfer. The file conversion module has two requirements:
Support the most popular database and spreadsheet file formats represented in
the company.
Reliably convert a selected file to an ASCII file that can then be marked up
using a custom filter designed to format the information into company-standard
HTML files.
The FTP module has four requirements:
Run over a Windows socket that can be used over a physical or dial-up
connection to a TCP/IP network.
Login to the FTP server with a unique ID and password that allows access to
WWW directories.
Change directories on the FTP server.
Reliably transfer the converted file to the appropriate directory on the FTP
server. 
After researching available tools, I settled on IST's OpenExchange library to
handle the file-conversion process and Distinct Corp.'s TCP/IP Winsock library
and SDK to transfer the resulting HTML file to the Web server. Distinct's tool
offers excellent documentation to the Winsock APIs and a reliable, high-speed
dial-up or direct-connection TCP/IP stack. Because both tools are DLLs, they
can be called from any development tool that supports calls to external
libraries, including C++, PowerBuilder, and Visual Basic. I chose Delphi as my
development tool because of the speed and deployment advantages of its
compiled executables and its underlying object-oriented language, Object
Pascal. 
HTML Automator uses configuration files to track the important pieces of
information it needs to create HTML files. Once you create a configuration
file, you can open it from the command line and perform the conversion and
transfer without user intervention. Table 1 shows the contents of a
configuration file.
IST's OpenExchange DLL supports Lotus, Excel, Quattro Pro, Symphony, dBASE,
and Paradox file formats. Knowing that most word processors have built-in HTML
filters and most dynamic data comes from spreadsheet and database applications
(not word processing documents), I chose to focus my filter on spreadsheet and
database files. In a typical spreadsheet, there is a title or a row of column
headings on the first row, a column of row numbers or identifiers in the first
column, and data in the remaining columns and rows. Similarly, databases have
a set of field names that serve as column headings and data stored in the
rows. Using this as the basis for my layout scheme, I developed a filter that
converts either a database or spreadsheet file to an HTML marked-up document
using the <PRE></PRE> tags to properly format the information in evenly spaced
columns.
To enforce a company-wide standard on all our Web pages, I developed standard
header and footer files. The header file determines the graphic at the top of
the page, the background color for the body, and a place holder for the title;
see Example 1(a). The footer file, see Example 1(b), contains the markup for a
line separating the body of the document from the standard .GIF file that
allows a user to move to different pages in our Web site.
HTML Automator uses header and footer files on the network. If the standard
header and footer files change, HTML Automator incorporates the changes in
every Web page it produces.
As Figure 2 illustrates, HTML Automator generates the output file by knitting
together the header, body, and footer files to create one HTML document. The
conversion component of HTML Automator generates the body of the document.
HTML Automator incorporates the standard header and footer files at the top
and bottom of each HTML file it generates. The result is a Web page consistent
with all of the other pages on our site.
HTML Automator lets me define a setup for a particular conversion, then save
it in a configuration file. I can develop as many configuration files as
needed for any number and type of input files. I can pass the configuration
filename on the command line for an automatic conversion, or open the
configuration file from inside HTML Automator. 
Using the command-line option, I can use a timer application to update files
on the server on a regular basis without user intervention. I can also run the
application from a remote machine with access to a shared directory that
contains the data file across a peer-to-peer network such as Windows for
Workgroups. If the volume of conversion is high enough, I can dedicate one
machine to convert and transfer files from multiple shared directories to the
server on a regular basis.


The Conversion Module


OpenExchange comes with C, Delphi, and Visual Basic header files. I added the
Delphi header file to my project and openex to the uses clause of the
implementation section of my data in the input file. 
The procedure convert_file (available electronically; see "Availability," page
3), uses oe_OpenInputFile to open the input file, oe_ReadDict to read the
structure of the file, and oe_GetNCases to determine the number of rows. A
call to oe_SelectAllVars returns the number of columns in the spreadsheet or
database. Alternately, you could use oe_SetVarSelect to select individual
columns or fields. Calling oe_InitReadOnly performs initialization on the
input file. OpenExchange has several high-level functions designed to transfer
data from one supported type to another, for example from Paradox to Excel.
These functions automatically initialize the input file. If you handle the
conversion and output yourself, you must call oe_InitReadOnly to force the
initialization because the high-level functions are not being used.
oe_ReadRecord performs the bulk of the work, reading in an entire record. Once
read, the oe_CopyVarDataStr function copies individual columns in the record
to string variables. Because I am interested in formatting the text in
fixed-width columns, I use oe_GetVarPrintWidth to find the physical width of a
database field or the print width of a spreadsheet column. I add three spaces
to each width to allow some white space between columns, then pad each field I
read with the appropriate number of spaces. To tidy up, I call oe_CloseFiles
to close the input file. 
I use the <PRE> tag at the beginning of the data file to force a Web browser
to display the data in a fixed-width font. At the end of the data file, I use
the </PRE> tag to end fixed-width formatting. The result is a table-like
appearance for the data in the input file (see Figure 3).
To incorporate the standard header and footer files, I use the Pascal readln
and writeln functions to read the information from the header file and write
it to the output file before I convert the data. Similarly, I read information
from the footer file and write it to the output file after the data
conversion. 


The Transfer Module


Once the output file is complete, the FTP process moves it to the server where
it replaces the "live" version of the file. I always encourage users to view
the output file as a local HTML file on their PC before they transfer it to
the server. Using the URL file:///C/EXCEL/WORK/ RATES .HTM, users can open a
local file in their browser the same way they would any other URL.
The FTP process is handled through calls to WINSOCK.DLL. I used Distinct's
TCP/IP SDK and Windows socket to implement my file transfer. The Distinct FTP
API documentation provides in-depth instructions for using and calling
standard FTP services in the WINSOCK.DLL. Assuming users are physically
connected (or have established a SLIP or PPP connection) to the TCP/IP network
and are set up to communicate across that connection, they should be able to
open an FTP session and transfer a file to any directory on the server to
which they have access. 
To establish a connection to the server, I call the ftp_open function. Once
the connection is made, I call ftp_login to pass the login name and password.
The trickiest and most important part of the transfer uses the ftp_put
function. ftp_put transfers the file from the local drive to the server.
ftp_put requires an additional function, sendfilecontents, to read data from
the input file and move bytes of information into a buffer. The
sendfilecontents function returns the number of bytes in the buffer while
there is still information in the input file, and 0 when it is finished
reading the file. The ftp_put uses the results of sendfilecontents to transfer
the correct number of bytes from the buffer to the server.

To use my function, ftp_put must know where to find it. The last parameter of
the ftp_put function is a pointer to my function, sendfilecontents. I obtain
the pointer by calling MakeProcInstance, specifying the exported function as
the first parameter (see Listing One). When the transfer is complete, I call
ftp_close to close the connection.


Enhancements


HTML Automator was designed for a very specific purpose. A few enhancements
can open the door to a host of other useful possibilities. For starters, in
its current design, HTML Automator reads an entire spreadsheet or database.
OpenExchange has the capacity to narrow its focus to a particular range of
fields or cells. Users accustomed to keeping several sources of data in one
spreadsheet can specify particular ranges in different configuration files to
create several HTML files from one spreadsheet.
For the most part, my HTML Automator ignores security issues. Our Web server
runs on the same box as our FTP server, making it easy to transfer files
directly to the appropriate directories on the Web server. If your
organization requires a greater degree of security, you might devise a system
to place the files in a temporary directory where they can be reviewed before
they go "live."
In the current implementation of HTML Automator, there is no way to add text
to an HTML file, other than what is stored in the header, data, and footer
files. Adding a memo box might help users write their own HTML to include
before or after the data, as introductory or summary text. 
For aesthetic reasons, you might prefer to write your filter to produce HTML
files that incorporate the <TABLE></TABLE> tags. Tables give your page a
professional look that doesn't come through with the traditional <PRE></PRE>
tags. Unfortunately, tables (defined in HTML 3.0) are a new and evolving
standard on the Web. Some browsers don't support them yet and your filter
might change as the standard develops.
HTML Automator assumes users are connected physically to a TCP/IP network or
already have established a connection through a SLIP or PPP connection.
Additional calls to functions in WINSOCK.DLL can check for a TCP/IP connection
before the FTP process and initiate the call if it doesn't find one. This is
especially useful if a significant number of users are remote.
If your organization maintains several Web servers, it would be prudent to
have a list of FTP servers and the corresponding login names and passwords
stored so that users can choose a server from a list rather than type in a
cryptic IP address or domain name. 


For More Information


IST
OpenExchange DLL 1.0
P.O. Box 3774
Joplin, MO 64803
417-781-3282
74777.2651@compuserve.com

Distinct Corp.
Distinct TCP/IP SDK 
12901 Saratoga Ave. 
P.O. Box 3410 
Saratoga, CA 95070 
408-366-8933
mktg@distinct.com
Figure 1: Main HTML Automator screen.
Figure 2: HTML Automator conversion process.
Figure 3: Sample output.
<PRE>
<EM>TYPE OF LOAN</EM> <EM>PERCENTAGE</EM>
2 Year Fixed Loan 4.50
3 Year Fixed Loan 5.50
4 Year Fixed Loan 6.00
5 Year Fixed Loan 7.00
6 Year Fixed Loan 8.25
7 Year Fixed Loan 9.00
8 Year Fixed Loan 10.25
9 Year Fixed Loan 10.25
10 Year Fixed Loan 10.75
11 Year Fixed Loan 11.00
12 Year Fixed Loan 11.25
13 Year Fixed Loan 11.50
14 Year Fixed Loan 12.10
15 Year Fixed Loan 12.25
</PRE>
Example 1: (a) Standard header file; (b) footer file.
(a)

<HTML>

<HEAD>
<TITLE></TITLE>
</HEAD>

<BODY BGCOLOR="#FFFFFF">
<P>
<CENTER><IMG SRC=logo.gif ALIGN=MIDDLE><CENTER>
<HR>

(b)

<HR>
<CENTER><IMG SRC=banner.gif ALIGN=MIDDLE></CENTER>
</BODY>
</HTML>
Table 1: HTML Automator configuration-file contents.
Information Purpose
Input Filename Name of the file to be converted.
Output Filename Name of the resulting HTML file.
Type of File Type of input file that OpenExchange supports.
FTP Server Name or IP address of the FTP server.
Login Name Login Name for the user on the FTP server.
Password Password for the user on the FTP server.
Server Directory Directory where the FTP module will place the output file.
HTML Header Name of the HTML file to be used as the header
 on the output file.
HTML Footer Name of the HTML file to be used as the footer
 on the output file HTML.
Title Title for the resulting output file.

Listing One
 .
 .
 .
{Check to see if you want to automatically FTP the output file}
 if CheckBox_FTPAutomate.State=cbchecked then
 FTP_file;
end;
procedure THTML_AUTOMATE.Save1Click(Sender: TObject);
var s_filename : string;
begin
 if SaveDialog_CFG.Execute then
 begin
 s_filename := SaveDialog_CFG.Filename;
 save_file(s_filename);
 end;
end;
procedure Open_File(s_filename : string);
var AutoConfigFile : Tinifile;
begin
AutoConfigFile := TIniFile.Create(s_Filename);
 with AutoConfigFile do
 begin
 HTML_Automate.Edit_Server.text :=
 ReadString('AUTOHTML Configuration File', 'FTPServer', '');
 HTML_Automate.ComboBox_FileType.text :=
 ReadString('AUTOHTML Configuration File', 'FileType', '');
 HTML_Automate.Edit_LoginName.text :=
 ReadString('AUTOHTML Configuration File', 'FTPLoginName', '');
 HTML_Automate.Edit_Password.text :=
 ReadString('AUTOHTML Configuration File', 'FTPPassword', '');
 HTML_Automate.Edit_ServerDirectory.text :=
 ReadString('AUTOHTML Configuration File', 'FTPDirectory', '');

 HTML_Automate.Edit_HTMLHeader.text :=
 ReadString('AUTOHTML Configuration File', 'HTMLHeader', '');
 HTML_Automate.Edit_HTMLFooter.text :=
 ReadString('AUTOHTML Configuration File', 'HTMLFooter', '');
 HTML_Automate.Edit_Input.text :=
 ReadString('AUTOHTML Configuration File', 'InputFile', '');
 HTML_Automate.Edit_Output.text :=
 ReadString('AUTOHTML Configuration File', 'OutputFile', '');
 HTML_Automate.Edit_HTMLTitle.text :=
 ReadString('AUTOHTML Configuration File', 'HTMLTitle', '');
 If ReadString('AUTOHTML Configuration File','FTPAutomate','No')<>'No' then
 HTML_Automate.CheckBox_FTPAutomate.State := cbchecked
 else
 HTML_Automate.CheckBox_FTPAutomate.State := cbunchecked;
 end;
AutoConfigFile.Free;
end;
function sendfilecontents(ftphandle : integer; buf : pChar; len : integer) : 
 Integer; EXPORT;
var read_line : string;
begin
 if eof(send_file) then
 sendfilecontents:= 0
 else
 begin
 readln(send_File, read_line);
 HTML_Automate.Label_Status.caption := read_line;
 read_line := read_line + Chr(13) + Chr(10);
 StrPCopy(buf, read_line);
 sendfilecontents := length(read_line);
 end;
end;
procedure FTP_File;
var host : array[0..80] of Char;
 user_name : array[0..80] of char;
 user_pass : array[0..80] of char;
 file_transfer : array[0..80] of char;
 ftphandle, i : integer;
 send_file_proc : pointer;
 ftpfile : string;
begin
 {Open a connection to FTP Server}
 StrPCopy(host, HTML_Automate.Edit_Server.Text);
 HTML_Automate.Label_Status.caption := 'Connecting to host ' + 
 HTML_Automate.Edit_Server.Text;
 ftphandle:=ftp_open(Application.Handle,host);
 if ftphandle <> 0 then
 HTML_Automate.Label_Status.caption := 'Successful connection with handle #' +
inttostr(ftphandle)
 else
 begin
 HTML_Automate.Label_Status.caption := 'Can not connect';
 exit;
 end;
 {Login to the server, pass user name and password}
 StrPCopy(user_name, HTML_Automate.Edit_LoginName.Text);
 StrPCopy(user_pass, HTML_Automate.Edit_Password.Text);
 HTML_Automate.Label_Status.caption := 'Logging in...';
 i:=ftp_login(ftphandle, user_name, user_pass);
 if i = FTP_OK then

 HTML_Automate.Label_Status.caption := 'Successful login'
 else
 begin
 HTML_Automate.Label_Status.caption := 'Can not login';
 ftp_close(ftphandle);
 exit;
 end;
 {change file type to ASCII}
 i:=ftp_type(ftphandle, TYPE_ASCII);
 if i = FTP_OK then
 HTML_Automate.Label_Status.caption := 'Changed type to ASCII'
 else
 begin
 HTML_Automate.Label_Status.caption := 'Can not change type to ASCII';
 ftp_close(ftphandle);
 exit;
 end;
 {get the address of the sendfilecontents function}
 send_file_proc := MakeProcInstance(@sendfilecontents, HInstance);
 ftpfile:=HTML_Automate.Edit_Output.Text;
 {parse the output file to get only the filename}
 while Pos('\', ftpfile) > 0 do
 ftpfile := Copy(ftpfile, Pos('\', ftpfile) + 1, length(ftpfile));
 {Transfer file}
 ftpfile:= HTML_Automate.Edit_ServerDirectory.Text + '\' + ftpfile;
 StrPCopy(file_transfer, ftpfile);
 HTML_Automate.Label_Status.caption := 'Sending file ' + 
 HTML_Automate.Edit_Output.Text;
 Assign(send_file, HTML_Automate.Edit_Output.Text);
 Reset(send_file);
 i := ftp_put(ftphandle, FALSE, file_transfer, Send_File_Proc);
 if i = FTP_OK then
 HTML_Automate.Label_Status.caption := 'File sent successfully'
 else
 HTML_Automate.Label_Status.caption := 'Error sending file ' + 
 inttostr(i);
 Close(send_file);
 {close FTP session}
 ftp_close(ftphandle);
end;
procedure Save_File(s_filename : string);
var F : Text;
begin
 assign(F, s_filename);
 rewrite(F);
 writeln(F, '[AUTOHTML Configuration File]');
 writeln(F, 'InputFile=' + HTML_Automate.Edit_Input.Text);
 writeln(F, 'FileType=' + HTML_Automate.ComboBox_FileType.Text);
 writeln(F, 'OutputFile=' + HTML_Automate.Edit_Output.Text);
 writeln(F, 'FTPServer=' + HTML_Automate.Edit_Server.Text);
 writeln(F, 'FTPLoginName=' + HTML_Automate.Edit_LoginName.Text);
 writeln(F, 'FTPPassword=' + HTML_Automate.Edit_Password.Text);
 writeln(F, 'FTPDirectory=' + HTML_Automate.Edit_ServerDirectory.Text);
 writeln(F, 'HTMLHeader=' + HTML_Automate.Edit_HTMLHeader.Text);
 writeln(F, 'HTMLFooter=' + HTML_Automate.Edit_HTMLFooter.Text);
 writeln(F, 'HTMLTitle=' + HTML_Automate.Edit_HTMLTitle.Text);
 if HTML_Automate.CheckBox_FTPAutomate.state = cbchecked then
 writeln(F, 'FTPAutomate=Yes')
 else

 writeln(F, 'FTPAutomate=No');
 close(F);
 HTML_Automate.Caption := 'HTML Automator ' + '[' + s_filename + ']';
end;
procedure THTML_AUTOMATE.OutputFile1Click(Sender: TObject);
var F : system.text;
 S : string;
begin
f_view.Memo_HTM.Clear;
If FileExists(Edit_Output.Text) then
begin
System.Assign(F, Edit_Output.Text);
Reset(F);
 repeat
 readln(F, S);
 f_view.Memo_HTM.Lines.Add(S);
 until eof(F);
system.close(F);
f_view.ShowModal;
end;
end;
procedure THTML_AUTOMATE.Button2Click(Sender: TObject);
begin
 FTP_file;
end;
procedure THTML_AUTOMATE.FTP1Click(Sender: TObject);
begin
 FTP_file;
end;
procedure THTML_AUTOMATE.FormCreate(Sender: TObject);
begin
 Open_File(paramstr(1));
 HTML_Automate.Caption := 'HTML Automator ' + '[' + paramstr(1) + ']';
end;
procedure THTML_AUTOMATE.Panel1Click(Sender: TObject);
begin
end;
end.

























PROGRAMMING PARADIGMS


VTOS, VRML, and other Virtualities




Michael Swaine


As we went to press, The Wall Street Journal was reporting that Apple Computer
was about to be purchased by Sun Microsystems. If you keep your ear close to
the ground, you hear one such rumor a month, on a rough average. This one,
though, sounded more serious than most. The Journal said that the announcement
was "imminent." Apple immediately denied that the company was for sale. The
Journal also opined, in its trademarked deadpan style, that Apple could
benefit from Sun's "focused management." Focused management. What a concept.
Focused management would certainly be a novelty for Apple, but it does cross
my skeptical mind that this is the same Sun Microsystems about which the
following story was once told:
Sun Microsystems is a three-billion-dollar firm that produces a variety of
advanced computing hardware and software. Sun focuses on powerful solutions to
the big problems facing companies in the '90s: too many vice presidents and
too few promotion slots in upper management.
 Sun Microsystems today announced the creation of yet another subsidiary,
bringing the total number of Sun wholly owned subsidiaries to 1,207. After
successfully creating SunSoft, Sun Microsystems Computer Corporation, SunPICS,
SunConnect, and a number of other smaller firms, Sun has created SunLHMPPP.
SunLHMPPP is tasked with addressing the specific niche market of users who
require both Left Handed Mice and a Parallel Printer Port on their
workstations.
The SunLHMPPP announcement is part of a longer story by Chuck Musciano posted
to Usenet back in 1991. Chuck went on to report that IBM was entering into a
joint venture with Denny's restaurants: IBM's logic being, well, the
competition has been eating our lunch for years, so.... Both stories are made
up. They are parodies, but in the computer industry it's often hard to tell
the difference between a parody and focused management. Andrew Davison,
fortunately, can tell the difference, and has collected a bookful of parody
press releases, song lyrics, true but incredible stories, and untrue but
credible stories, that is, stories that are not actually true but that ring
true--virtually true stories, you might say.
Davison's Humor the Computer was published last year by MIT Press (ISBN
0-262-54075-4). Davison, who is a frequent contributor to DDJ, collected the
stories from the Usenet, Datamation, Byte, Creative Computing, and other
computer magazines, technical journals, popular magazines, books, newsletters,
and a few other odd sources. Well, mostly odd sources, judging by the odd bits
he collected.
I particularly liked Tony Karp's report from Datamation on the language
BABBAGE, which implements the WHAT IF, OR ELSE, BRIEF CASE, and DON'T DO WHILE
NOT statements. BABBAGE runs on top of VTOS, the Virtual Time Operating
System. 
While virtual memory systems make the computer's memory the virtual resource,
VTOS does the same thing with CPU processing time. The result is that the
computer can run an unlimited number of jobs at the same time. Like the
virtual memory system, which actually keeps part of memory on disk, VTOS has
to play tricks to achieve its goals. Although all of your jobs seem to be
running right now, some of them are actually running next week.
Virtually true stories, virtual time. All part of the gradual virtualizing of
reality, I suppose.
Reading Humor the Computer sent me off searching for online humor on the
Usenet and the Web, and that in turn got me following my many links to
computer history sites on the Internet--and that should serve to usher us
gracefully into the latest installment of "Paradigms Past."


Paradigms Past


Last month, in discussing the historical dispute over credit for the invention
of the automatic digital computer, I mentioned that several of the key
inventors of computer technology died last year. I failed to mention Allen
W.M. "Doc" Coombs, who was one of the principal designers of the British Mark
2, the production version of the Colossus code-breaking machines. Coombs died
peacefully at home on January 30, 1995. Doc Coombs, like the rest of his
British colleagues, was under a ban of secrecy about the Colossus work until
recently. An obituary can be found at
http://ei.cs.vt.edu/~history/Coombs.html.
Sadly, I'm sure Doc Coombs and J. Presper Eckert, John V. Atanasoff, and
Konrad Zuse, all of whom I mentioned last month, are not the only computer
pioneers who died last year. 1995 was the 50th anniversary of the culmination
of their work, which puts those pioneers in that awkward age bracket, the
three-score-and-ten-somethings.
1995 was the 50th anniversary of many important events in the history of
computers, not to mention the 50th anniversary of the atomic age, the baby
boom, the United Nations, and, according to my mother, "Swaine's Flames." This
year, 1996, is notable as the 100th anniversary of the death of Alfred Nobel
(December 10). While some famous people are remembered on their birth dates,
there is particular significance in Nobel's death date: When he died, he left
behind a rather unusual will establishing the Nobel prizes. The prizes are
even awarded on that date every year.
Nobel's will was a surprise to almost everyone who knew him, because he was
not the sort of guy you'd expect to establish a Peace prize. It wasn't just
that he was the inventor of dynamite. He made his fortune (over 7 million
19th-century dollars) partly on the invention of dynamite, smokeless powder,
and the detonator, and partly as a munitions merchant. So when the will set up
the Nobel prizes, emphasizing the peace angle, and put the interest on
practically all his millions into the prize pot, everyone was stunned.
Everyone, that is, except Bertha Kinsky, a long-time friend of Nobel's whom he
met through a personal ad placed in an 1876 issue of the Vienna Neue Freie
Presse. It was Kinsky, a pacifist activist, who inspired Nobel to establish
the Nobel Peace prize in his will. The will was written without the benefit of
a lawyer, and it's not likely that the way the prizes have been distributed
over the years exactly matched Nobel's intentions. An inventor himself, Nobel
apparently had encouraging young Edisons in mind rather than the usually
towering figures of theoretical science who typically win. Nobel might very
well have thought it scandalous that no award was ever given for the invention
of the automatic digital computer.
The Peace prize, though, has always been awarded, apparently as Nobel
intended, for efforts in the advancement of peace in the world. It's possible
that Nobel has done more to advance peace since he died than he did to advance
war while he was alive. At least it's pleasant to think so. When old computers
die...


Scrap Iron


When old computers die, either as the victims of the onrush of technological
progress or of focused management, they are occasionally reanimated,
Frankenstein-like, by scavenging technophilic computer history buffs. From
Douglas W. Jones at
gopher://wiretap.spies.com:70/00/Library/Techdoc/Lore/oldiron:
Back at surplus, I bought the third PDP-8. It was battered, but it had an RX8E
interface in it that I needed if I was to hook an RX01 drive to my machines,
and besides, it only cost $7. I kept what I needed and shipped the remainder
to Charles Lasner in NY. Later, for $40, I picked up another surplus DEC relay
rack so I could get everything up off the floor and nicely rack mounted.
You picked up on the crucial detail in that excerpt, right? There's some guy
in New York that you can ship all your obsolete computer junk to. Charles,
watch for the UPS truck delivering that Sol-20 that I dropped a crate of books
on back in 1985. And thanks!
Just kidding, Charles. But technophilic computer history buffs aside, most of
us wouldn't want to give up the living room or office space that a PDP-8 would
demand, even if we could buy one for $7. Maybe that explains the appeal of old
slide rules: the small footprint. Anyone interested in becoming obsessive
about slide rules should visit The Slide Rule Home Page at
http://photobooks.atdc.gatech edu/~slipstick/. The maintainer of that page
keeps track of sources for old slide rules, among his other slipstickish
obsessions.
I suspect, though, that part of the appeal is the fact that anything you can
carry in a leather holster clipped to your belt is by definition cool. That's
how I felt about my Newton PDA until it died recently. I don't know whether it
was from a broken heart when it learned that the recently released and
way-cool (relative to version 1) version 2 of the Newton operating system
won't run on it, or the fact that the warranty was up, but it's one DOA PDA. I
don't suppose it has a measurable street value, but if anybody out there is
interested, I've kept it in really good condition.


Daisy May


That PDP-8 is well out of warranty, too. I wonder what Douglas W. Jones ever
did with it? Once you've got that obsolete computer cleaned up and set up in
the spare bedroom, apart from showing it off to your friends, what do you do
with it?
I suppose you could use it to play music. You can always use any computer to
play music.
Paul Freiberger and I described one technique in our computer history book,
Fire in the Valley. A fanatic computer hacker, Steve Dompier, bought one of
the first MITS Altair computers. Although it wasn't obsolete yet, the focused
management at MITS hadn't delivered any software or peripherals to speak of,
so he was faced with the question: What do I do with this thing? His solution
was to write a machine-language program, loaded with the toggle switches that
served as an input device: A program that cleverly manipulated the RFI that
the unshielded machine put out to cause a nearby radio to emit static in
controlled frequencies, producing that old favorite "Daisy" or "A Bicycle
Built for Two."
Yes, I've told that story before. It had all been done before, too. Steve
Dompier wasn't the first programmer to cause a computer to play music, not
even the first to make a computer play that particular tune. The best-known
computer ever to play "Daisy" was HAL in the movie 2001. I have it on good
authority that you could make Dompier music on the beloved Trash-80, or TRS-80
Model 1, as Tandy called it, as well as on many other early microcomputers,
all now pushing up daisies. On the theory that a peripheral is a peripheral,
many disturbed individuals have stooped to using disk drives and printers to
play music. Scott H. Redd reports:
One day I stopped by the machine room at the University of Maryland and some
one had trotted out an archaic piece of software that played Christmas Carols
by writing variable block sizes to the old Uniservo 7 track tape, causing
different notes to emanate from the vacuum columns. Unfortunately, nobody
updated the program for the 9 track drives.
 For its day, the Commodore 64 was quite capable at making music. However, I
remember a program that someone had made that played "Bicycle Built For Two"
by vibrating the 1541's disk drive head at different rates. [There] was very
quiet, but distinct, music, coming out of the floppy disk drive.
It gets worse: Both dot-matrix printers and Daisy wheel (you make up the joke)
printers have been bent to the task of playing "Daisy." Ric Werme confesses:
I designed some fonts that were variable numbers of vertical bars in
half-inch-wide characters. The printer's horizontal resolution was 0.001",
better than laser printers, but not good enough for decent music. I had to
compute line spacings in 0.0001" units and round to the nearest 0.001". About
an octave-and-a-half would fit in a 2Kb PROM (this was before 16K ram chips
made down-loaded fonts practical). I arranged "A Bicycle Built for Two." It
attracted a fair amount of attention at the trade shows." 
I have a Laserwriter IINTX that squeals like a pig if I don't oil it; has
anybody written PostScript code to make a laser printer squeal "Daisy?" There
are recorded cases of computer operators making practical use of those pig
squeals. I'm not referring to the Macintosh with its diagnostic chord; that
was deliberately put in for diagnostic purposes, and sounds sort of nice. But
diagnosing a computer by the unhappy noises it makes dates back at least to
1943 and the British work on the Colossus code-breaking machines. Ironically,
when Doc Coombs took over supervision of the production of the machines, they
were soon running too well to make the noises, and a useful diagnostic tool
was lost.


A World of Your Own



Future histories will record that on February 14, 1994, a three-dimensional
banana appeared on the Internet. Visitors to the banana could execute banana
fly-bys using their mice as joysticks, examining all sides of the fruit and
poking around in the dark corners of this banana space. The banana was hooked
into the World Wide Web: You could click on a link in a Web page and find
yourself face-to-face with the banana, or click on the banana to return to the
familiar two-dimensional Web.
According to Mark Pesce, who, along with Tony Parisi, put that banana out
there, that event marked the beginning of virtual reality on the Web. By
October 17th, Parisi and Gavin Bell had presented a draft spec for VRML 1.0,
the Virtual Reality Modeling Language, a language for describing 3-D objects
and worlds on the World Wide Web. Here's the long-form definition from the
spec: 
VRML is a language for describing multi-participant interactive
simulations--virtual worlds networked via the global Internet
and hyperlinked with the World Wide Web.
Here in the back of the column in recent months I've been talking about little
or old or obscure languages. VRML is not old or obscure, but it does qualify
as a little language, in some sense, and I've recently spent some time
learning it. Thus the following brief sketch of VRML: VRML 1.0 is designed to
provide platform independence, extensibility, and the ability to work over
low-bandwidth connections.
The language is, at the highest level of abstraction, a way for objects to
read and write themselves. The objects can contain 3-D geometry, MIDI data,
JPEG images, or whatever.
VRML defines useful objects for doing 3-D geometry and calls them "nodes."
"Scene graphs" are structures of nodes. Nodes contain fields with specified
field values, and can contain other nodes as child nodes. A node is
represented in VRML like this: DEF objectname objecttype { fields children }
but only the object type and the curly braces are required. So a sphere, which
has one field, radius, could be represented like this: Sphere { radius 2.3 }.
Not all nodes actually result in something being drawn; there are property
nodes and coordinate-system nodes, for example, that set characteristics of
the drawing environment.
Extensibility is supported via self-describing nodes. A self-describing node
is a node that is not defined in the language definition, and defines itself
by specifying its own fields:
Cube {
fields [ SFFloat width, SFFloat height, SFFloat depth ]
width 10 height 4 depth 3 }
Anyone wanting to write a VRML browser will want to get hold of QvLib, the
essential library, and read about building browsers and QvLib in the
definitive book on VRML, VRML: Browsing and Building Cyberspace, by Mark Pesce
(New Riders Publishing, 1995). Good Web sites for following VRML developments
are the VRML Repository at http://www.sdsc.edu/ vrml/, Mecklermedia's site at
http://www.mecklerweb.com/netday/vrml.html, and Wired magazine's VRML page at
http://vrml.wired.com/.

















































C PROGRAMMING


Plumbers, Programmers, and Quincy 96




Al Stevens


I've been speculating about the future of programming and what it will look
like in the next couple of decades. I invite you to respond with your own
notions about where we might be headed.
Programming has gone through several shifts since its beginnings about 50
years ago. Each new programming model tried to make the task of programming
easier and more disciplined so that software systems are timely, responsible,
extensible, reliable, and maintainable. Noble objectives.
A timely system is available when it is needed. A responsible system supports
the requirements of its user. An extensible system can grow and change when
the requirements change. A reliable system rarely fails. A maintainable system
can be repaired. These are the attributes that describe a system of good
quality.
To maintain the social level established by fellow columnist Michael Swaine, I
had a swimming pool installed a couple of years ago in my Florida winter home.
(It's also my summer home, but Michael doesn't need to know that.) I asked the
contractor in advance what variables could affect the schedule. There were
two: weather and the remote possibility that my backyard would turn out to be
an undiscovered archeological site or burial ground. I asked what variables
could affect the quality of a pool. Again, there were two: the expertise of
the installers and the quality of the materials. Those guys had a clear
understanding of their craft. The weather cooperated, no departed Seminoles or
Spanish ruins were unearthed, the installers were a family with years of
experience, they used trusted materials, and a good pool was finished on time.
For years our industry has tried to achieve the same level of performance, to
remove the obstacles that impair our ability to deliver quality software on
time. Many so-called methodologies have been advanced. Writers sold books,
lecturers hit the seminar trail, consultants ran their meters, and few, if
any, of the methodologies ever worked consistently. The excuse offered by the
methodology peddlers is always the same: Their clients did not adhere
dogmatically to the disciplines of the methodology. Otherwise, the methodology
would have worked.
For a software-development methodology to have any chance at all, it must be
thoroughly understood by its practitioners, and it must have the total
commitment of everyone on the team from start to end of the project. That
never happens. The looming specter of a missed deadline always eliminates all
those good intentions, and strict adherence to some time-consuming methodology
is sacrificed. All able coders drop whatever else they are doing and get down
to writing code. Those who cannot write code stand around wringing their
hands.
The Psychology of Computer Programming (Van Nostrand Reinhold, 1971), written
by Gerald Weinberg 25 years ago, advances the notion of "egoless programming"
as the basis for overcoming the problems of human ego versus technical merit
in the resolution of technical issues. Presumably your ego is emotionally
wrapped around your work, and any suggestion from someone else that you modify
your work is taken as an assault on your competence and intelligence.
According to Weinberg, if everyone on the team regularly reviews everyone
else's work, each individual is less likely to react personally to criticism.
Review begets change. Programmers resist change, not so much because they
don't like interference, but because they don't want to revisit finished work.
Some insecure programmers view each suggestion as a mustache painted on their
Mona Lisa, but those poor souls are usually in the minority. Most programmers
do not have that particular problem. Consequently, I think that although much
of Weinberg's wisdom has endured the test of years, egoless programming is a
naive concept upon which to base a methodology. First, where ego is a factor,
you cannot eliminate it no matter how you organize the people and the work.
Everyone has an ego. Second, ego is not the real problem.
Every programmer knows what a pain it will be to change his or her own work to
implement someone else's suggestion. Every programmer is the sole authority on
the complexities of his or her own code. That's the problem. Periodic code
reviews do not change this condition.
Consider this. If I agree that your change has merit, I have just signed up
for a bunch of work that might compromise my ability to complete other
assignments. Only I know by how much because only I know what has to be
changed. It is usually far easier to argue against the merit of your
suggestion, no matter how ineffective the argument and no matter how
reasonable the suggestion, than it is to implement the change. Knowing that, I
naturally seek the path of least resistance and beat you down with rhetoric
rather than search for the merit in your idea. You, knowing how you would have
built my part of the system, cannot understand why I am resisting what should
be such a trivial change. We argue endlessly from two different perspectives,
neither one related to the original issue. The question is eventually solved
based on the relative debating skills of the combatants rather than on the
technical merits of the question.
Such debates sink to the level of ego only because there is no common
technical platform, even when one should naturally exist. The original issue
gets lost when the overriding issue becomes who is right. The person making
the suggestion is defending his or her right to make suggestions and be
respected. The person arguing against the change might be guarding what could
be exposed as a nonextensible piece of work and is usually defending the
schedule, too. The technical merits of the issue get lost in the fray almost
from the start.
Why does this happen? It happens because programmers work alone, designing and
building their components mostly out of view of the rest of the team. A
feature does not get effectively examined and reviewed until a significant
body of individual work has been invested in the feature.
Weinberg tries to solve that problem by making everyone's work the collective
intellectual property of the team. Programmers tend to resist that idea. No
one wants to expose code not completed. Everyone wants time to smooth the
bumps before peers, bosses, and the public gets a look.
I've seen projects attempt to implement egoless programming after someone in
charge has read Weinberg. If the development team is a hierarchy, only the
code of the lower levels ever gets reviewed. Those in positions of authority
always exempt their own work from the common scrutiny of their subordinates.
That's not what Weinberg had in mind, but that's what has always happened.
All this chaos is supported and encouraged by what I call the "private code
paradigm" in which a programmer codes in private. Not until we eliminate the
private code paradigm will we begin to find solutions to the problem.
The makers of methodology often try to associate software design and
implementation with other construction disciplines. The designer is an
architect. The programmer is a carpenter. This analogy fails. A real architect
designs a building, and, when you look at the design, it looks like a picture
of a building viewed from different angles. Furthermore, the architect does
not have to design known components, such as toilet tanks, breaker boxes, and
down spouts, from scratch, but simply inserts those reusable components into
the design where they are known to fit. What a concept.
Anyone can, prior to the construction of that building, look at the design and
have a good idea of what the building will look like. Anyone can, during the
construction of that building, look at the design, look at the building, and
make a reasonable guess as to whether the building complies with the design.
As carpenters, masons, electricians, plumbers, roofers, drywall hangers,
painters, tile layers, and so on, add their individual components to the
building, everyone gets a look at the collective work in progress. Everyone
can tell at a glance if things are shaping up, how the schedule is looking,
how the budget is holding up, if codes are being met, and so on. The
designers, builders, inspectors, and users can walk through the construction,
see what is happening, and make concrete comments in a language that everyone
understands. Everyone's work is public.
It has been observed that the practitioners of building design and
construction have simply gotten it right after centuries of experience. That
is true. It has been observed that software designers and builders, with only
about 50 years of experience and with a constantly moving technology, have had
little opportunity to get it right, That, too, is true.
The construction industry is regulated by local governments and regulations.
Inspectors who have no vested interest in the outcome can declare the product
to be unusable because it does not adhere to established standards. Bring the
building "up to code" or you may not occupy it. Luckily, software builders are
not regulated that way--we have no inspectors, but the freedom that we enjoy
is part of the problem. Our work is private.
Several years ago, I had an abominable assignment, mercifully brief, but one
that I hope never to repeat. I was the government's technical advisor on a
hardware/software project being built by a contractor. There was a bunch of
new hardware and a big program designed and written by one programmer. My role
was like that of a building inspector. I was to observe the project in
progress and tell the government if they were getting what they were paying
for. The programmer, engineers, and managers who worked for the contractor
hated me. During what was to be the final acceptance test, the government
operators sat at the console and ran the system. The contractor programmer sat
beside them. Every time an operator had a problem, the programmer explained
why the procedure didn't work and what to do differently. No one else--not the
engineers, not the documentation writers, not the managers, certainly not
me--knew how to operate that software. I suggested that, as an experiment,
they put a muzzle on the programmer and see how far the operators could get on
their own. Guess how far they got? The government project manager asked me to
write my opinion of the test results. I reported that the system should be
accepted only if that particular programmer was part of the delivery and
available to baby-sit the system 24 hours a day. Without him, in my opinion,
the system was not usable.
We've got to get programmers out of the closet so that everyone can view all
the work in progress all the time. Last month, I described a visual,
virtual-reality, software development environment of the future in which
designers and programmers meander around the design, adding components and
working with the ones already in place. Taken one step further, that
environment is populated by all the members of the team. As you cobble away at
your part, you can look a few nodes over and see virtual renderings of your
coworkers busily doing their parts. You'll be like the drywall hanger who
knows not to hang anything because the insulation isn't up, the electrician
isn't finished running wire through the studs, and the in-wall plumbing is
incomplete.


C FILE Stream Text and Binary Mode


The Standard C Library, as defined by ANSI, provides two modes for the fopen
function. You can open the file as a text file or as a binary file. If you do
not specify a mode, the default is text mode.
MS-DOS programmers understand this requirement well. MS-DOS C and C++ programs
translate newline characters ('\n') in memory into the CRLF ("\r\n") pair when
writing to a text file. Those programs convert a CRLF pair into a single
newline when reading a text file into memory. UNIX programs make no such
conversion. The newline in memory is a newline in a text file. Therefore, no
apparent difference exists between text and binary files that are read and
written by UNIX programs. UNIX programmers are aghast when they hear that
MS-DOS programs employ two different file formats. The following quote, taken
from a newsgroup discussion among programmers of both platforms, is typical of
the UNIX programmer's reaction when they discover text and binary modes: "It's
not clear MS-DOS needs a different file format for text (they may, but it's a
mistake)."
This argument is wrong on three counts. First, the two modes have nothing to
do with operating systems and everything to do with compilers and hardware.
Second, MS-DOS compilers do use a different format than those of UNIX. Third,
it was no mistake. The difference was intentional, and when you understand
why, you see that it was a reasonable solution to a prevailing problem.
This is how I remember it, although some details are fuzzy. Maybe this
perspective will enable those who were not involved with computers 20 years
ago to understand why MS-DOS and UNIX compilers have different formats for
text files and why, given the perfect wisdom of 20/20 hindsight, neither
approach is inferior. I invite those of you whose memories are better than
mine to correct any errors I might make in this reminiscence.
When Dennis Ritchie built the C language, he worked with a PDP-11. The PDP
11's console, a typewriter-like device called a "DecWriter," behaved, either
in hardware or through its device driver, like a typewriter with respect to
the Enter key. The Enter keystroke sent to the computer the line-feed
character, which, when echoed to the console, moved the type ball down one
line and to the left margin. Consequently, C adopted the convention wherein a
single newline character, which is the line feed in ASCII character sets,
signifies left margin, down one line.
Early, so-called "home" computers--the ones that predate the IBM PC--used TTY
devices as consoles. Some were traditional paper-based TeleTypes, others were
dumb video terminals, called "glass TeleTypes." The early terminals did not
translate a newline character into a CRLF pair. Send a newline character to
one of those terminals, and the cursor, type ball, or whatever simply moves
down one line without returning to the left margin. Send only a carriage
return, and the type ball moves to the left margin staying on the same line.
It takes two characters to get the effect of what we now think of as '\n'.
Many of the video terminals had an option to enable a CRLF insertion, but the
programmer could not count on everyone having the same setting.
The single newline character did not work on the TeleType, either when typing
or when copying a text file to the console or printer device. When CP/M, an
anarchistic operating system, became the dominant OS for microcomputers,
nothing concrete could be assumed about its character devices and their device
drivers. Consequently, C compiler builders for those early machines built into
their I/O libraries the translation of LF in memory to CRLF on output and CRLF
on input to LF in memory. This convention gave natural birth to the two modes,
text and binary, for file streams because, of course, such translations mutate
nontextual data.
The textmode translation was contrived in the interest of source-code portable
programs but at the expense of portable data files. No problem, because the C
statement while ((c=getchar())!=EOF) putchar(c);, when compiled with a PC
compiler, converts a UNIX text file into a PC text file.
When UNIX programmers cite this two-mode contrivance as evidence of yet
another weakness of MS-DOS and of the clear superiority of UNIX, they aim
their shots in the wrong direction. Nothing in the MS-DOS API has anything to
do with text and binary file open modes. The DOS API functions open and close
files and read and write binary streams just like UNIX. Text and binary modes
were invented for and implemented in language translators and file-system
libraries to accommodate hardware that UNIX compilers did not anticipate.
Early IBM PC C compilers perpetuated the convention presumably so that files
and programs would be convertible from the then-dominant CP/M platform.
Eventually, the PC's overwhelming dominance of the desktop market mandated
that the text/binary convention become a part of the ANSI C Standard, and,
whether we like it or not, text and binary modes are with us to stay.
In an ideal world, the solution to this problem would have been implemented in
the device drivers of the character devices instead of in the file systems of
the compilers. Inasmuch as CP/M depended on installers to write their own
device drivers in assembly language, such an assumption could have been made
except for one thing: C compilers for microcomputers came along after there
was already a substantial installed CP/M base with devices and drivers that
performed no such translation.
If the designers of the PC had known that their machine would have become a
dominant C platform, and if they had had sufficient vision, they could have
put the translation in the MS-DOS character device drivers, an obvious choice
in retrospect. That's a bit of a stretch, however. No one could have
accurately made that prediction. Because the PC designers did not anticipate
the problem and because the PC's behavior became a de facto standard, the C
compiler builders had to do something about newlines. They had no choice.
Microcomputer systems programmers of the 1970s and early 1980s could not know
that an obscure, cult language would take over the microcomputer programming
world and assume things about hardware.


Quincy 96


Quincy 96 is the current ongoing "C Programming" column project. It is a
Windows 95 GUI application that serves as an integrated development
environment for the Win32 port of the GNU C and C++ compilers. I am using
Quincy 96 as the environment for programming exercises in a C and C++ training
CD-ROM that DDJ and I are developing.
Quincy 96 is close to being completed. There are a few knots left to untangle,
and I'm sure that new requirements will surface as I use it in the development
of the training tutorial. Since writing last month's column, I added an
expression parser to the debugger and the ability for an external program to
send commands to Quincy 96.



Parsing Debug Expressions


The expression parser supports the examining and watching of variables during
a debug session of a C or C++ program. During debugging you often want to look
at a subscripted element in an array or a member of a class. You want to use
variables for subscripts and arithmetic operators in the expression. You want
to dereference pointers and references. The expression parser provides that
capability. It's similar to the parser in an interpreter or compiler. In fact,
I adapted the old Quincy expression parser for just this purpose. The new
parser is not a complete C++ expression parser. You cannot call functions or
perform floating-point math. The parser does not include assignment or logical
operators. It's just a simple expression parser.
The parser needs the cooperation of the stabs section of the debugger. I
described stabs last month. They are the debugging information tables that are
embedded in a program compiled by GNU C or C++ with the -g option. In parsing
an expression, the parser eventually will need a value from the debugged
program's memory as defined by a symbol in the stabs symbol table. If the
identifier turns out to be the name of a structure or class object, the parser
needs to tuck it away until it finds a member operator and the identifier of a
member. And more. So, it stands to reason that you cannot parse an expression
unless a target program has been loaded and its stabs tables have been
initialized from its .EXE file.
The parser uses a typical recursive-descent algorithm, so it takes advantage
of C++ exception handling to find its way out of the descent when it finds an
error in the expression. C++ exception handling does what setjmp and longjmp
do in C except that in addition to unwinding the stack, a thrown exception
calls destructors for any local objects that were declared along the way. C++
exception handling is a language feature rather than the function kludge that
setjmp and longjmp use, so exception handling is a lot more intuitive.
Expression parsing is a two-step process. First comes the lexical scan, which
translates the expression into tokens and eliminates unnecessary white space.
Then comes the evaluation, which scans the tokens left to right and evaluates
them one at a time. Operator precedence and associativity are managed by the
algorithm. I discussed these subjects several times over the years in this
column when I published interpreters, scripts, and query languages. Rather
than revisit them, I'll refer you to the Dr. Dobb's/CD Release 3, which has
the text and code from January 1988 to June of 1995 and a very fast search
engine. If you want to look at Quincy 96's expression parser, download the
code and open the parser.h and parser.cpp files.


The Tutorial and the IDE


The interactive tutorial, which I am building using Asymetrix Toolbook
Multimedia, must be able to launch Quincy 96 with the source files loaded for
a specific exercise. The tutorial must also be able to specify the variables
to watch and the breakpoints for the exercise. I decided to use the format of
the Windows API's private profile variables to define each of the exercises.
An .INI file for each exercise has text settings that specify everything the
tutorial needs to run the exercise automatically for the student. By
overriding the CWinApp:: OnDDECommand function and intercepting opens of .INI
files, the program can load the exercise's source-code files and set the
watches and breakpoints.
The tutorial also needs to send Step, Run, Step Over, and Stop commands to
Quincy 96's debugger. I decided to use DDE commands for these operations as
well. Then I discovered a neat hack for testing. Long before the interactive
tutorial is ready, I can test every exercise. I used the Windows 95 registry
to associate files with the .Qcmd extension with the Quincy 96 executable
program. I put dummy-text files on the desktop with the names Step.Qcmd,
Run.Qcmd, and so on. Dragging and dropping an exercise's .INI file onto the
Quincy 96 icon starts Quincy 96 with the associated exercise loaded and ready
to go. Double clicking the .Qcmd file icons sends DDE commands to Quincy 96 to
step through, run, and stop the program as if the user had clicked the buttons
or the tutorial had sent the DDE commands.
As a final aid to development, I put a tool button onto Quincy 96's toolbar.
It's off to the right and labeled TUT. It won't be in the released version on
the CD-ROM, but I'll leave it intact on the download version so that you can
play with it. I can manually set up a tutorial exercise in the Quincy 96 IDE
with source-code files, breakpoints, and watch variables. When I click that
button, the program generates an .INI file that appropriately describes the
exercise.


The Color Bar Cursor


Last month I reported that I had not figured out how to display a color cursor
bar for the program counter during a source-level debugging session. Now I
know how, and I also know why Visual C++ and other Windows-hosted debuggers
use a token in the margin for that purpose instead of a cursor bar.
Example 1 is the code that you put into a member function of a class derived
from the MFC CEditView class. In this example, "newfont" is the name of a
CFont object representing the font that the text editor is using. The nLine,
and nWidth values are the line number and the width in characters of the text
to be displayed. The lpText pointer points to the text of the line, a value
that you get by using the technique that I described in the February column.
This technique works fine, except in one case: If the user does any horizontal
scrolling, the cursor bar still displays text from the left margin. There is
no adjustment in display units that I know of for the CDC::TextOut function.
It uses character units. I must decide if this behavior is acceptable for
Quincy 96 or find another way to define a cursor bar.


Source Code


The source-code files for the Quincy 96 project are free. You can download
them from the DDJ forum on CompuServe and on the Internet by anonymous ftp;
see "Availability," page 3. To run Quincy, you'll need the GNU Win32
executables from the Cygnus port. They can be found on
ftp.cygnus.com//pub/sac. Get Quincy 96 first and check its README file to see
which version of gnu-win32 you need. Every time they release a new beta, I
have to make significant changes to Quincy 96. As I write this, the latest
beta is Version 12 and Quincy 96 build 1 works with Version 10.
If you cannot get to one of the online sources, send a 3.5-inch high-density
diskette and an addressed, stamped mailer to me at Dr. Dobb's Journal, 411
Borel Avenue, San Mateo, CA 94402, and I'll send you the Quincy source code
(not the GNU stuff, however--it's too big). Make sure that you include a note
that says which project you want. The code is free, but if you care to support
my Careware charity, include a dollar for the Brevard County Food Bank.
Example 1: Displaying a Color Cursor Bar in a CEditView Control.
HideCaret();
CDC* pCDC = GetDC();
pCDC->SelectObject(&newfont);
pCDC->SetBkColor(0x00ff00);
pCDC->TextOut(0, nLine, lpText, nWidth);
ReleaseDC(pCDC);
ShowCaret();
























ALGORITHM ALLEY


A Fast Integer Square Root




Peter Heinrich


Peter is a video and computer game programmer who has worked on products for
Amiga, PC, Sega, 3DO, and Macintosh. He's currently working for Starwave and
can be contacted at peterh@starwave.com.


Complex calculation has always frustrated speed-conscious programmers, since
mathematical formulas often form bottlenecks in programs that rely on them. To
cope with this problem, three primary tactics have evolved: eliminate,
simplify, and be tricky.
Rarely will a programmer eliminate a calculation completely. (If a program
operates without it, why was it there in the first place?) Instead, integer or
fixed-point may replace expensive floating-point math. At the same time, a
simpler version of the formula may be sought--one which is easier to compute
but gives roughly the same result.
If this proves difficult (as it often does), a tricky solution may provide the
answer. This approach requires almost as much luck as programming skill, and
is definitely the most difficult. Then again, the fun is in the challenge.


Trick or Treat


The square-root function certainly qualifies as a complex calculation, as
anyone who has actually computed one by hand will readily attest. In general,
square roots are avoided in speed-critical code, and rank even higher than
division on the list of things to avoid. The technique I present here is an
iterative approach to finding _N_, the largest integer less than or equal to
the square root of N. Like many tricky solutions, it's also simple, fast, and
elegant.
Before attacking the actual algorithm, it might be useful to look briefly at
two other iterative methods for computing the square root. Example 1(a) simply
applies Newton's Method, a straightforward way to zero in on a value given an
initial guess. This method is theoretically fast, having order O(log2N).
Unfortunately, it uses a lot of multiplication, which may form a bottleneck in
itself.
Example 1(b) uses a different approach, summing terms until they exceed N. The
number of terms summed to that point is the square root of N. While this
method eliminates the multiplication, it has a higher order of O(N).
It would be nice to find a practical algorithm that also is efficient, that
is, one which requires only elementary operations but also is of low order.
The Binomial Theorem suggests a possible approach. Assume N is the sum of two
numbers, u and v. Then N=(u+v)2=u2+2uv+v2. Choosing u and v carefully may
simplify calculation of the quadratic expansion. But what constitutes a good
choice?


Finding Your Roots


For any number N, it's easy to determine _log2N_--simply find the position of
the highest set bit. Similarly, _log2N_=_log2 N1/2_=_1/2 log2N_ indicates the
position of highest bit set in result, _N_. Now the problem just entails
finding which of the remaining (less significant) bits, if any, also are set
in _N_.
Let u=2_1/2 log2 N_; that is, let u take the value of the highest bit set in
the result, _N_. It isn't known if the next-lower bit is also set in the
result, so let v take its value, then solve u2+2uv+v2. This calculation is
easy because each term is a simple shift. Since v is known to be a power of
two, even the middle term, 2uv, reduces to a shift operation.
If the sum of all three terms is less than or equal to N, the next-lower bit
must be set. In that case, the result just computed will be used for u2 and
u=u+v for the next iteration. If the sum is greater than N, the next lower bit
isn't set, so u remains unchanged. In either case, move on to the next-lower
bit and repeat the process until there are no more bits to test.
Example 2(a) implements (in C) an algorithm that appears to satisfy both
design goals. It uses only elementary operations (addition and shift) and is
extremely efficient, weighing in at O(log2 N). However, a few minor
optimizations still can be performed: determining _1/2 log2 N_ can be
improved; v doesn't have to be recomputed from scratch every iteration; and
noticing that 2uv+v2=v(2u+v) simplifies some computation inside the loop.
Example 2(b) is the final result.
Actually, many assembly languages make the first optimization moot. In fact,
two of the three assembler listings presented here use a shortcut. Only the
ARM processor lacks a specialized instruction to find the highest set bit in a
number (but it's a RISC chip, after all). Listings One through Three present
implementations of the optimized algorithm for the Motorola 68020, Intel
80386, and ARM family of processors, respectively. 


Conclusion


For programmers developing high-performance code, complex mathematical
calculation is not always practical. Some may spurn floating-point math
altogether, especially if a math coprocessor isn't guaranteed to be present on
the target platform. The algorithm I present here computes an integer square
root suitable for just such situations. Even as hardware speeds increase,
programs demand more and more. Fast and elegant little tricks like this one
can still be useful.
Example 1: (a) Newton's Method; (b) summing terms.
(a)
// Newton's Method -- O( log2 N )
unsigned long sqroot( unsigned long N )
{
 unsigned long n, p, low, high;
 if( 2 > N )
 return( N );
 low = 0;
 high = N;
 while( high > low + 1 )
 {
 n = (high + low) / 2;
 p = n * n;
 if( N < p )

 high = n;
 else if( N > p )
 low = n;
 else
 break;
 }
 return( N == p ? n : low );
}

(b)
// Summing terms -- O( sqrt N )
unsigned long sqroot( unsigned long N )
{
 unsigned long n, u, v;
 if( 2 > N )
 return( N );
 u = 4;
 v = 5;
 for( n = 1; u <= N; n++ )
 {
 u += v;
 v += 2;
 }
 return( n );
}
Example 2: (a) Binomial theorem; (b) optimized binomial theorem.
(a)
// Binomial Theorem -- O( 1/2 log2 N )
unsigned long sqroot( unsigned long N )
{
 unsigned long l2, u, v, u2, v2, uv2, n;
 if( 2 > N )
 return( N );
 u = N;
 l2 = 0;
 while( u >>= 1 )
 l2++;
 l2 >>= 1;
 u = 1L << l2;
 u2 = u << l2;
 while( l2-- )
 {
 v = 1L << l2;
 v2 = v << l2;
 uv2 = u << (l2 + 1);
 n = u2 + uv2 + v2;
 if( n <= N )
 {
 u += v;
 u2 = n;
 }
 }
 return( u );
}

(b)
// Optimized Binomial Theorem
unsigned long sqroot( unsigned long N )
{

 unsigned long l2, u, v, u2, n;
 if( 2 > N )
 return( N );
 u = N;
 l2 = 0;
 while( u >>= 2 )
 l2++;
 u = 1L << l2;
 v = u;
 u2 = u << l2;
 while( l2-- )
 {
 v >>= 1;
 n = (u + u + v) << l2;
 n += u2;
 if( n <= N )
 {
 u += v;
 u2 = n;
 }
 }
 return( u );
}

Listing One 
 MACHINE MC68020
 EXPORT sqroot
;; unsigned long sqroot( unsigned long N ).
;; This routine assumes standard standard Macintosh C calling conventions, 
;; so it expects argument N to be passed on the stack. Macintosh C register 
;; conventions specify that d0-d1/a0-a1 are scratch.
sqroot PROC
 ; If N < 2, return N; otherwise, save non-scratch registers.
 move.l 4(sp),d0 ; just past the return address
 cmpi.l #2,d0
 bcs.b done
 movem.l d2-d3,-(sp)
 ; Compute the position of the highest bit set in the root.
 ; Using a loop instead of BFFFO will make this code run 
 ; on any 680x0 processor.
 movea.l d0,a0 ; preserve N for later
 bfffo d0{0:0},d3 
 neg.l d3
 addi.l #31,d3
 lsr.l #1,d3
 ; Determine the initial values of u, u^2, and v.
 moveq.l #1,d0
 lsl.l d3,d0 ; u
 move.l d0,d1 ; v starts equal to u
 movea.l d0,a1
 lsl.l d3,d1 ; u^2
 exg.l d1,a1
 ; Process bits until there are no more.
checkBit dbf.w d3,nextBit
 movem.l (sp)+,d2-d3
done rts
 ; Solve the equation u^2 + 2uv + v^2.
nextBit lsr.l #1,d1 ; v = next lower bit
 move.l d1,d2

 add.l d0,d2
 add.l d0,d2 ; n = 2u + v
 lsl.l d3,d2
 add.l a1,d2 ; n = u^2 + v(2u + v)
 ; = u^2 + 2uv + v^2
 ; If n <= N, the bit v is set.
 cmpa.l d2,a0
 bcs.b checkBit
 add.l d1,d0 ; u += v
 movea.l d2,a1 ; u^2 = n
 bra.b checkBit
 END 

Listing Two
 NAME sqroot
 PUBLIC _sqroot
;; unsigned long sqroot( unsigned long N ). 
;; This routine assumes the argument N is passed on the stack, and eax-edx 
;; are scratch registers.
TEXT SEGMENT PUBLIC 'CODE'
 ASSUME CS:TEXT
 P386
_sqroot PROC FAR
 ; If 2 > N, return N; otherwise, save the non-scratch registers.
 mov eax,[esp+4] ; just past the return address
 cmp eax,2
 jb short done
 push edi
 push esi
 ; Compute position of the highest set bit in the root. It's just 
 ; half the position of the highest bit set in N.
 mov esi,eax ; preserve N for later
 bsr ecx,eax
 shr ecx,1
 ; Determine the initial values of u, u^2, and v.
 mov eax,1
 shl eax,cl ; u
 mov ebx,eax ; v starts equal to u
 mov edx,eax
 shl edx,cl ; u^2
 ; Process bits until there are no more.
checkBit dec ecx
 js short restore
 ; Solve the equation u^2 + 2uv + v^2.
 shr ebx,1 ; v = next lower bit
 mov edi,eax
 add edi,eax
 add edi,ebx ; n = 2u + v
 shl edi,cl
 add edi,edx ; n = u^2 + v(2u + v)
 ; = u^2 + 2uv + v^2
 ; If n <= N, the bit v is set.
 cmp edi,esi
 ja short checkBit
 add eax,ebx ; u += v
 mov edx,edi ; u^2 = n
 jmp short checkBit 
restore pop esi
 pop edi

done ; Return to caller.
 mov edx,eax
 shr edx,16 ; necessary, but seems silly...
 retf
_sqroot ENDP
TEXT ENDS
 END

Listing Three
 AREA object,CODE
 EXPORT sqroot
;; unsigned long sqroot( unsigned long N ).
;; This routine observes the ARM Procedure Call Standard (APCS), so it expects
;; the argument N to appear in r0 (referred to as a1 by the APCS). Likewise,
;; the first four registers, r0-r3 (a1-a4 in the APCS), are treated as
scratch.
sqroot ROUT
 ; If N < 2, return N; otherwise, save non-scratch registers.
 cmp a1,#2
 movcc pc,lr
 stmfd sp!,{v1,v2,lr}
 ; Compute position of the highest bit set in root. It's just 
 ; half the position of the highest bit set in N.
 mov a2,a1 ; preserve N for later
 mov a3,a1
 mov v1,#0
findlog2 movs a3,a3,LSR #2
 addne v1,v1,#1
 bne findlog2
 ; Determine the initial values of u, u^2, and v.
 mov a1,#1
 mov a1,a1,LSL v1 ; u
 mov a3,a1 ; v starts equal to u
 mov a4,a1,LSL v1 ; u^2
 ; Process bits until there are no more.
checkBit cmp v1,#0
 ldmeqfd sp!,{v1,v2,pc}
 sub v1,v1,#1
 ; Solve the equation u^2 + 2uv + v^2.
 mov a3,a3,LSR #1 ; v = next lower bit
 add v2,a3,a1,LSL #1 ; n = 2u + v
 add v2,a4,v2,LSL v1 ; n = u^2 + v(2u + v)
 ; = u^2 + 2uv + v^2
 ; If n <= N, the bit v is set.
 cmp v2,a2
 addls a1,a1,a3 ; u += v
 ldmeqfd sp!,{v1,v2,pc} ; exit early if n == N
 movls a4,v2 ; u^2 = n
 b checkBit
 
 END









































































UNDOCUMENTED CORNER


What Is Undocumented MFC?




Scot Wingo and George Shepherd


Scot is a cofounder of Stingray Software, an MFC extension company. He can be
contacted at ScotWi@aol.com. George is a senior computer scientist with
DevelopMentor where he develops and delivers courseware for developers using
MFC and OLE. George can be contacted at 70023.1000@compuserve.com. They are
the coauthors of MFC Internals (Addison-Wesley, 1996).


MFC comes with full source code and a great set of online documentation.
However, while writing our book, MFC Internals, we discovered a plethora of
interesting undocumented classes, functions, and MFC behavior. Since then,
we've spent a great deal of time learning how these undocumented aspects of
MFC work, what they do, and documenting them.
Microsoft only documents the non-implementation portions of MFC so that it can
change the implementation details from release to release. As a C++ class
library provider, this is desirable since it allows the maximum flexibility to
change classes around from release to release. However, MFC programmers will
find themselves having to decipher undocumented MFC behavior time and time
again when writing MFC applications that push the bounds of the MFC
documentation. For example, have you ever ended up in the middle of
undocumented MFC classes when debugging? Or have you ever tried to customize
the MFC print-preview engine? Do you need to know how MFC OLE Automation is
implemented so you can extend it? How about OLE documents or OLE controls?
In this series of articles, we will expose interesting undocumented MFC
behavior discovered during our many MFC spelunking sessions and in the process
answer many of the aforementioned questions. In addition, we will show you how
to exploit the undocumented features in your MFC applications. Whenever
possible, the MFC source-code filename will be mentioned so you can follow
along in your editor, or do some further exploring on your own.
Of course, all undocumented disclaimers apply--Microsoft can (and probably
will) change undocumented areas of MFC, so use any undocumented details you
find here with caution and at your own risk. Also, for the time being, we'll
always refer to the latest version of MFC (currently 4.0). Older versions of
MFC do behave differently than what we're describing--more proof that the
undocumented MFC internals always are in flux. We also would like to note that
Microsoft (specifically the MFC team) has been very gracious in helping us
discover some of the more obscure behavior of MFC. In particular, we would
like to thank Dean McCrory, MFC lead at Microsoft, for his valuable time and
insight.


Interesting Undocumented CDocument Behavior


MFC 4.0 introduced an improvement to the CDocument class. The new virtual
member functions, GetFile() and ReleaseFile(), let you specify a specialized
CFile derivative in a CDocument derivative. Whenever CDocument needs to do
something with a file, it calls GetFile() with the filename, file permissions,
and some other arguments. GetFile() returns a CFile pointer, which can be any
CFile derivative. This includes CDocument::Serialize(). The undocumented fun
begins when we take a peek at the default implementation of
CDocument::GetFile() in Listing One.
The implementation of GetFile() is pretty close to what you would imagine,
except for the first line. What is this undocumented CMirrorFile class? Why is
your document using it instead of CFile? What implications does this have for
your programs? Is CMirrorFile connecting to the Microsoft Network and sending
your documents to Microsoft? 


CMirrorFile Exposed


A quick search reveals that CMirrorFile is declared in the AFXPRIV.H header
file. Listing Two contains the declaration of this CFile derivative.
This certainly is a minimal CFile derivative. Aside from overriding Open(),
Close(), and Abort(), CMirrorFile adds one data member, m_strMirrorName. 
The implementation for CMirrorFile lives with most of CDocument's source in
DOCCORE.CPP. Listing Three shows the pseudocode for CMirrorFile::Open().
CMirrorFile::Open() breaks into two logical blocks. In the first, Open()
checks for the modeCreate indicating that the caller wants to create a new
file or truncate an existing file. Next, Open() calls CFile::GetStatus().
GetStatus() returns nonzero if the file exists (which implies that the user
wants to truncate it). If the file exists, Open() then calls
GetDiskFreeSpace() and determines how many bytes are available on the drive.
Open() then compares this result with two times the size of the existing file.
If there is enough room for two times the size of the current filename, Open()
creates a temporary file and stores the name in the m_strMirrorName.
The second block of code in Open() only runs if m_strMirrorName is not empty,
which implies that the contents of the first block ran and obtained a
temporary file. In the second block, if there is a non-empty m_strMirrorName,
Open() goes ahead and opens the mirror file using CFile::Open(). Next, Open()
copies the file time and file security from the original file to the mirror
file. If the second block has executed, Open() returns True. If the second
block does not execute, Open() calls right through to CFile::Open(), returning
whatever it returns.
To summarize, CMirrorFile::Open() is actually opening a different file than
the one specified if a write operation is going on CFile::modeCreate and if a
file is being overwritten.
For example, in Scribble (which uses document/view), if you wrote a fresh
scribble and saved it in UNDOCMFC.SCR, the code would not execute. However, if
you load the UNDOCMFC.SCR file, make some changes, and then save, this code
will execute and Scribble will actually write to a mirror file.
Turning our attention to document data, Listing Four contains the pseudocode
for CMirrorFile::Close() from DOCCORE.CPP.
First, Close() stores the name of the file in m_strName (a deceptively named
variable) because the CFile::Close() call clears out m_strFileName.
After calling CFile::Close(), CMirrorFile::Close() checks to see if there is a
mirror file in use. If so, Open() deletes the specified file and copies the
mirror file over to the specified file.
In a nutshell, CMirrorFile is protecting your document by saving the original
and writing to a temporary file. If there is a problem writing, the original
file is safe and sound. If there isn't a problem, CMirrorFile copies the new
file over the file and you are none the wiser (until now). CMirrorFile even
makes sure that the original security and file-creation information is copied
over correctly.
Some of you may not like MFC doing something like this behind your back.
Additionally, there are the cycles lost checking the disk space, copying the
security/file-time information, and copying the mirror file over the original
file. If you don't like the fact that CDocument uses CMirrorFile, you can
override GetFile() to return a good-old-fashioned vanilla CFile.
CMirrorFile really is the only undocumented CDocument behavior of interest.
Next, we examine the undocumented CView derivative CPreviewView and how it
implements MFC print previewing.


Undocumented Print Preview 


MFC's print-preview support is based on the document/view architecture. The
key advantage to MFC document/view is that the normal display-drawing code in
your view is transparently reused to provide basic print-previewing
functionality. The default MFC print preview supports one- and two-page
display and three-level zooming. Figure 1 shows the MFC print-preview engine
in a zoomed two-page state. Notice how the user has a toolbar and the
application's normal menu and view are replaced by the print-preview view.
We first discovered CPreviewView while tracing the MFC print-preview logic,
which starts in CView::DoPrintPreview(). Listing Five contains the
CView::DoPrintPreview() pseudocode so you can see how CPreviewView is created,
initialized, and used.
DoPrintPreview() first stores a pointer to the main frame window in pParent.
Next, it creates a local CCreateContext structure and populates its fields.
Next, DoPrintPreview() creates a CPreviewView instance with dynamic creation
using the CRuntimeClass pointer argument. DoPrintPreview() then calls
CFrameWnd ::OnSetPreviewMode() and creates a toolbar based on the toolbar
resource argument.
After creating and storing the toolbar in the CPreviewView pointer,
DoPrintPreview() creates the view and makes it the active view. When setting
the CPreviewView to the active view, DoPrintPreview() saves the previously
active view in the CPrintPreviewState structure. Once the CPreviewView and
toolbar are created and activated, DoPrintPreview() updates them and returns
True.
We've discovered a good deal about print preview. DoPrintPreview() sets the
active view in the main frame to switch from the regular MDI or SDI window to
the print-preview view. You could use this technique in your MFC applications
if you ever wanted to have a "full window mode" or wanted to produce a similar
effect. To learn more about how it works, check out
CFrameWnd::OnSetPreviewMode().
While it's pretty clear what DoPrintPreview() is doing, it is still not clear
how print preview actually works. To understand MFC print preview, you have to
get to know CPreviewView.



CPreviewView Revealed


Inside MFC's private header file, AFXPRIV.H, we found the declaration for
CPreviewView, which reveals a great deal about the class. Listing Six contains
the abbreviated declaration.
Notice that CPreviewView is a CScrollView derivative. We haven't covered this
class, but as the name implies, CScrollView provides scrolling support. When
you zoom in on a print preview, you'll notice it displays scroll bars, that's
the CScrollView inheritance at work.
CPreviewView is brimming with interesting behavior. Table 1 describes the uses
for key data members. SetPrintView(), called in CView::DoPrintPreview() (see
Listing Five), is one important member for CPreviewView. It takes care of
initializing the print-preview view and preparing it to start the
print-previewing process. Keep that in mind as you review the pseudocode for
CPreviewView::SetPrintView(), as found in VIEWPREV.CPP (see Listing Seven). 
SetPrintView() first sets the value of m_pPrintView to the document's view.
The document's view actually does the rendering in OnDraw(). CPreviewView
keeps a copy of the document's live view because CPreviewView uses the view's
OnDraw() function to render the data on the preview screen.
Next, SetPrintView() creates a new CPrintInfo object. It's not really obvious
from the pseudocode, but the CPrintInfo constructor creates a CPrintDialog (a
Windows printing common dialog). After creating the CPrintInfo object,
SetPrintView() sets some flags. The most important flag tells the CPrintDialog
not to return a DC. You'll see why in a second.
After creating the CPrintInfo object, SetPrintView() creates a CPreviewDC
object. CPreviewDC is yet another undocumented MFC class. MFC's CDC class
contains two handles to device contexts: m_hDC and m_hAttribDC. The m_hDC
member represents the device context for the output and the m_hAttribDC is
used for information purposes. Usually these device contexts represent the
same thing, the screen or the printer.
The CPreviewDC object maintains two different device contexts--one for the
screen (m_hDC) and one for the printer (m_hAttribDC). CPreviewDC is
implemented this way because print preview has to draw something as though it
would appear on the printer, but must be able to translate so it appears
correctly on the screen. SetPrintView() sets up the CPreviewDC accordingly
(setting the m_hDC to the view's hDC and setting m_hAttribDC to the printer's
hDC). To learn more about how CPreviewDC handles displaying what the printer
will display onscreen, check out the MFC header AFXPRIV.H and the source file
DCPREV.CPP.
After initializing the DCs to their proper print-preview state, SetPrintView()
initializes the m_sizePrinterPPI data member and m_nPages and m_nZoomOutPages.
Then, SetPrintView() initializes the scroll bars in the scroll view and sets
the current page.
At this point, we still haven't seen where the actual drawing of the print
preview happens. If you'll look back at Listing Seven
(CView::DoPrintPreview()), you'll notice that this routine "forces" an update
of the frame window that contains the freshly created CPreviewView. Like any
other MFC view, this will cause CPreviewView's OnDraw() member function to be
called.
First, OnDraw() creates two pens. rectPen is for the box that is drawn to
represent the page. shadowPen is used to draw a shadow around the page and
give it a fancy three-dimensional effect.
Next, OnDraw() enters a For loop for each page, taking the following steps:
1. It draws the page outline and shadow using the paint DC that is passed as
an argument to OnDraw(). It draws the rectangle with a series of
MoveTo()/LineTo() calls. Then the For loop calls FillRect() to draw in the
white color of the pages.
2. It displays the page number by calling OnDisplayPageNumber().
3. Once the fresh piece of paper has been drawn, OnDraw() calls
m_pPrintView->OnPrint() with the CPreviewDC, thus generating the print-preview
output. OnPrint() calls your view's OnDraw() code with a CPreviewDC that
reflects the printed output on the display as a print preview.
After the For loop has iterated through all of the pages to be previewed,
OnDraw() frees the pens it created.


Recap


Now that you know how CPreviewView and CPreviewDC work, the big question is
how do you apply this knowledge? In our next installment, we'll show you how
to customize the MFC print preview and we'll also introduce a couple of new
undocumented MFC classes.
Figure 1: MFC print-preview engine in a zoomed two-page state.
Table 1: CPreviewView data members.
Member Description
m_pOrigView Pointer to the previously active view.
m_pPrintView Pointer to the view for "printing."
m_pPreviewDC Pointer to a CPreviewDC.
m_dcPrint Pointer to a printer DC.
PAGE_INFO Structure that defines the dimensions of one 
 of the preview pages.
m_pPageInfo Pointer to a page info structure.
m_pageInfoArray Array of two-page info structures; hard-coded to 2.
m_bPageNumDisplayed Flag that indicates if the page number has been 
 displayed.
m_nZoomOutPages Stores the number of pages to be displayed on one 
 print-preview view. 
m_nZoomState Can be one of four zoom states (set via 
 SetZoomState()).
 ZOOM_OUT Default, shows a smaller than 1:1 ratio. 
 ZOOM_MIDDLE Zoom level between out and in. 
 ZOOM_IN Shows a larger than 1:1 ratio. 
 ZOOM_OFF No zooming.
m_nMaxPages Specifies the number of pages (always 2 for 
 CPreviewView).
m_nCurrentPage Current page being displayed.
m_nPages Number of pages. Can be either one or two in 
 the default CPreviewView.
m_nSecondPageOffset Specifies the left coordinate of where the 
 second preview page is drawn.
m_hMagnifyCursor Handle to the magnifying glass cursor.
m_sizePrinterPPI Stores the dimensions of a page on the printer; 
 retrieved by calling GetDeviceCaps(LOGPIXELSX).
m_ptCenterPoint Used to center the print preview pages on the 
 page.
m_pPreviewInfo Pointer to a CPrintInfo 
 structure that is used to store printing 

 information.

Listing One
CFile* CDocument::GetFile(LPCTSTR lpszFileName, UINT nOpenFlags, 
 CFileException* pError)
{
 CMirrorFile* pFile = new CMirrorFile;
 if (!pFile->Open(lpszFileName, nOpenFlags, pError))
 delete pFile; pFile = NULL;
 return pFile;
}

Listing Two
class CMirrorFile : public CFile
{
// Implementation
public:
 virtual void Abort();
 virtual void Close();
 virtual BOOL Open(LPCTSTR lpszFileName, UINT nOpenFlags,
 CFileException* pError = NULL); protected:
 CString m_strMirrorName;
};

Listing Three
BOOL CMirrorFile::Open(LPCTSTR lpszFileName, UINT nOpenFlags, 
 CFileException* pError)
{
 m_strMirrorName.Empty();
 CFileStatus status;
 if (nOpenFlags & CFile::modeCreate) {
 if (CFile::GetStatus(lpszFileName, status)){
 CString strRoot;
 AfxGetRoot(lpszFileName, strRoot);
 DWORD dwSecPerClus, dwBytesPerSec, dwFreeClus, dwTotalClus;
 int nBytes = 0;
 if (GetDiskFreeSpace(strRoot, &dwSecPerClus, &dwBytesPerSec, 
 &dwFreeClus, &dwTotalClus)){
 nBytes = dwFreeClus*dwSecPerClus*dwBytesPerSec;
 if (nBytes > 2*status.m_size){
 // get the directory for the file TCHAR 
 szPath[_MAX_PATH];
 LPTSTR lpszName;
 GetFullPathName(lpszFileName,_MAX_PATH, szPath, &lpszName);
 *lpszName = NULL;
 GetTempFileName(szPath, _T(MFC), 0,
 m_strMirrorName.GetBuffer(_MAX_PATH+1)); 
 m_strMirrorName.ReleaseBuffer();
 }
 }
 }
 if (!m_strMirrorName.IsEmpty() &&
 CFile::Open(m_strMirrorName, nOpenFlags, pError)){ 
 m_strFileName = lpszFileName;
 FILETIME ftCreate, ftAccess, ftModify; 
 if (::GetFileTime((HANDLE)m_hFile,&ftCreate,&ftAccess,ftModify)){
 AfxTimeToFileTime(status.m_ctime, &ftCreate);
 SetFileTime((HANDLE)m_hFile,&ftCreate,&ftAccess, &ftModify);
 }

 DWORD dwLength = 0;
 PSECURITY_DESCRIPTOR pSecurityDescriptor = NULL;
 GetFileSecurity(lpszFileName, DACL_SECURITY_INFORMATION,NULL, 
 dwLength, &dwLength);
 pSecurityDescriptor = (PSECURITY_DESCRIPTOR) new BYTE[dwLength];
 if (::GetFileSecurity(lpszFileName, DACL_SECURITY_INFORMATION,
 pSecurityDescriptor, dwLength, &dwLength)){
 SetFileSecurity(m_strMirrorName, DACL_SECURITY_INFORMATION, 
 pSecurityDescriptor); 
 }
 delete[] (BYTE*)pSecurityDescriptor;
 return TRUE;
 }
 m_strMirrorName.Empty(); 
 return CFile::Open(lpszFileName, nOpenFlags, pError);
}

Listing Four
void CMirrorFile::Close()
{
 CString m_strName = m_strFileName;
 CFile::Close();
 if (!m_strMirrorName.IsEmpty()) {
 CFile::Remove(m_strName);
 CFile::Rename(m_strMirrorName, m_strName);
 }
}

Listing Five
BOOL CView::DoPrintPreview(UINT nIDResource, CView* pPrintView, CRuntimeClass*
 pPreviewViewClass, CPrintPreviewState* pState)
{
 CFrameWnd* pParent = (CFrameWnd*)AfxGetThread()->m_pMainWnd;
 CCreateContext context;
 context.m_pCurrentFrame = pParent;
 context.m_pCurrentDoc = GetDocument();
 context.m_pLastView = this;
 // Create the preview view object
 CPreviewView* pView = (CPreviewView*)pPreviewViewClass->CreateObject();
 pView->m_pPreviewState = pState; // save pointer
 pParent->OnSetPreviewMode(TRUE, pState); // Take over Frame Window
 // Create the toolbar from the dialog resource
 pView->m_pToolBar = new CDialogBar;
 if (!pView->m_pToolBar->Create(pParent, MAKEINTRESOURCE(nIDResource),
 CBRS_TOP, AFX_IDW_PREVIEW_BAR)){
 TRACE0(Error: Preview could not create toolbar dialog.\n); 
 return FALSE;
 }
 pView->m_pToolBar->m_bAutoDelete = TRUE; // automatic cleanup
 if (!pView->Create(NULL, NULL, AFX_WS_DEFAULT_VIEW,
 CRect(0,0,0,0), pParent, AFX_IDW_PANE_FIRST, &context)) { 
 TRACE0(Error: couldnt create preview view for frame.\n); 
 return FALSE;
 }
 pState->pViewActiveOld = pParent->GetActiveView();
 CView* pActiveView = pParent->GetActiveFrame()->GetActiveView();
 pActiveView->OnActivateView(FALSE, pActiveView, pActiveView);
 pView->SetPrintView(pPrintView);
 pParent->SetActiveView(pView); // set active view - even for MDI

 // update toolbar and redraw everything 
 pView->m_pToolBar->SendMessage(WM_IDLEUPDATECMDUI, (WPARAM)TRUE); 
 pParent->RecalcLayout(); // position and size everything 
 pParent->UpdateWindow();
 return TRUE;
}

Listing Six
class CPreviewView : public CScrollView
{
 DECLARE_DYNCREATE(CPreviewView)
// Constructors
public:
 CPreviewView();
 BOOL SetPrintView(CView* pPrintView);
// Attributes
protected:
 CView* m_pOrigView;
 CView* m_pPrintView;
 CPreviewDC* m_pPreviewDC; // Output and attrib DCs Set, not created
 CDC m_dcPrint; // Actual printer DC
// Operations *omitted
// Overridables *omitted
// Implementation * some omitted
public:
 virtual void OnPrepareDC(CDC* pDC, CPrintInfo* pInfo = NULL);
protected:
 afx_msg void OnPreviewClose(); 
 afx_msg int OnCreate(LPCREATESTRUCT lpCreateStruct); 
 afx_msg void OnSize(UINT nType, int cx, int cy); 
 afx_msg void OnDraw(CDC* pDC); 
 afx_msg void OnLButtonDown(UINT nFlags, CPoint point); 
 afx_msg BOOL OnEraseBkgnd(CDC* pDC); 
 afx_msg void OnNextPage(); 
 afx_msg void OnPrevPage(); 
 afx_msg void OnPreviewPrint(); 
 afx_msg void OnZoomIn(); 
 afx_msg void OnZoomOut();
 void DoZoom(UINT nPage, CPoint point);
 void SetScaledSize(UINT nPage);
 CSize CalcPageDisplaySize();
 CPrintPreviewState* m_pPreviewState; // State to restore
 CDialogBar* m_pToolBar; // Toolbar for preview
 struct PAGE_INFO {
 CRect rectScreen; // screen rect (screen device units)
 CSize sizeUnscaled; // unscaled screen rect (screen device units)
 CSize sizeScaleRatio; // scale ratio (cx/cy)
 CSize sizeZoomOutRatio; // scale ratio when zoomed out (cx/cy)
 };
 PAGE_INFO* m_pPageInfo; // Array of page info structures
 PAGE_INFO m_pageInfoArray[2]; //Embedded array for default implementation
 BOOL m_bPageNumDisplayed; // Flags whether or not page number has yet
 // been displayed on status line 
 UINT m_nZoomOutPages; // number of pages when zoomed out 
 UINT m_nZoomState;
 UINT m_nMaxPages; // for sanity checks
 UINT m_nCurrentPage;
 UINT m_nPages;
 int m_nSecondPageOffset; // used to shift second page position

 HCURSOR m_hMagnifyCursor;
 CSize m_sizePrinterPPI; // printer pixels per inch
 CPoint m_ptCenterPoint;
 CPrintInfo* m_pPreviewInfo;
 DECLARE_MESSAGE_MAP()
};

Listing Seven
BOOL CPreviewView::SetPrintView(CView* pPrintView)
{
 m_pPrintView = pPrintView;
 m_pPreviewInfo = new CPrintInfo;
 m_pPreviewInfo->m_pPD->SetHelpID(AFX_IDD_PRINTSETUP);
 m_pPreviewInfo->m_pPD->m_pd.Flags = PD_PRINTSETUP;
 m_pPreviewInfo->m_pPD->m_pd.Flags &= ~PD_RETURNDC;
 m_pPreviewInfo->m_bPreview = TRUE; // signal that this is preview
 m_pPreviewDC = new CPreviewDC; // must be created before any
 if (!m_pPrintView->OnPreparePrinting(m_pPreviewInfo))
 return FALSE;
 m_dcPrint.Attach(m_pPreviewInfo->m_pPD->m_pd.hDC);
 m_pPreviewDC->SetAttribDC(m_pPreviewInfo->m_pPD->m_pd.hDC);
 m_pPreviewDC->m_bPrinting = TRUE;
 m_dcPrint.m_bPrinting = TRUE;
 m_dcPrint.SaveDC(); // Save pristine state of DC
 HDC hDC = ::GetDC(m_hWnd);
 m_pPreviewDC->SetOutputDC(hDC);
 m_pPrintView->OnBeginPrinting(m_pPreviewDC, m_pPreviewInfo); 
 m_pPreviewDC->ReleaseOutputDC();
 ::ReleaseDC(m_hWnd, hDC);
 m_dcPrint.RestoreDC(-1); // restore to untouched state
 // Get Pixels per inch from Printer 
 m_sizePrinterPPI.cx = m_dcPrint.GetDeviceCaps(LOGPIXELSX); 
 m_sizePrinterPPI.cy = m_dcPrint.GetDeviceCaps(LOGPIXELSY);
 m_nPages = m_pPreviewInfo->m_nNumPreviewPages; 
 m_nZoomOutPages = m_nPages;
 SetScrollSizes(MM_TEXT, CSize(1, 1)); // initialize mapping mode only
 if (m_pPreviewInfo->GetMaxPage() < 0x8000 &&
 m_pPreviewInfo->GetMaxPage() - m_pPreviewInfo->GetMinPage() <= 32767U)
 SetScrollRange(SB_VERT, m_pPreviewInfo->GetMinPage(),
 m_pPreviewInfo->GetMaxPage(), FALSE);
 else
 ShowScrollBar(SB_VERT, FALSE);
 SetCurrentPage(m_pPreviewInfo->m_nCurPage, TRUE); return TRUE;
}



















PROGRAMMER'S BOOKSHELF


Getting Hooked on Java




John H. McCoy


John is a member of the computer science staff at Sam Houston State University
in Huntsville, Texas. He can be contacted at csc_jhm@shsu.edu.


Last year's book-publishing love affair with the Internet has already morphed
into this year's doting on Java, the programming environment from Sun
Microsystems. Although there are numerous Java-related books in the works,
Hooked on Java, by Java development team members Arthur van Hoff, Sami Shaio,
and Orca Starbuck, is one of the first. For $29.95 you get the book and a
CD-ROM containing documentation, sample applets, and development tools in both
Sun Solaris 2.x, Windows 95, and Windows NT format. Both a Java compiler and
an applet viewer are included and neither an Internet connection nor a browser
is needed to explore the language or the sample applets on the CD. You do,
however, need Netscape Navigator 2.0 to run the Java-powered Web pages. 
The book, which consists of six chapters and numerous appendices, is intended
for anyone involved with the creation of Web pages and/or interested in
learning the basics of Java applets. Chapters 3 and 4 constitute a "cookbook"
for integrating the applets into Web pages. All the applets covered are
included on the CD along with an example Web page using the applet. You should
be familiar with HTML, but progression beyond the HTML bunny slopes will be
needed to creatively apply much of the information. Chapters 5 and 6 cover
basic Java syntax and building a Java applet from scratch. Here, the primitive
state of Java and the lack of development tools become apparent. The authors'
statement in chapter 5 (repeated in chapter 6) that "You should know a little
about programming to get the most out of this chapter" is an understatement to
say the least.
In the cookbook section, the authors thoroughly describe the applets and how
to incorporate them into Web pages. The reading is easy and the discussion on
the Applet tag in chapter 3 is excellent. After a few examples, however,
you'll likely want to skip to the CD and run the applets from the menu and use
this part of the text for reference. In looking over the Applet tag examples,
I did notice that quotes are used inconsistently. In one instance, the code
filename is quoted, in the next it isn't. Knowing that quotes are only
required if a string contains a blank is helpful when you are getting started.
Rather than cutting and pasting from the examples, I wrote my own entry from
the Applet Tag Definition in Appendix G. Nothing in the definition indicated
to me that "=" signs are needed, but they are and the examples have them. DOS
readers would also benefit from a discussion about the use of "\" versus "/"
in paths.


Java Overview


The overview of the Java language in chapter 5 is good, but it is C code. I
stuck with the green slopes in C code and I found it easy going. Programmers
who aren't C coders may need a reference, but the level is about right. The
assertion that data types are platform independent left me wondering how I
missed the signing of the peace treaty in the Big Endian/Little Endian holy
war. There is an unfortunate typo in the discussion of control-flow
statements. It probably won't be a problem because most programmers will
probably just look at the syntax and skip the verbiage. The coverage is
adequate for its intended use and the more adventurous can download the Java
Language reference from the Internet or print it from the JAVASPEC.PS file in
the Psfiles.zip archive.
Chapter 6 is the programming guide for creating your own applets. Sample code
(which you can cut and paste into your own applet) is included on the CD and
is discussed in this chapter. Documentation for the Java Developers Kit has to
be downloaded from the Internet. The Java programming page on the CD is blank,
so the hardcopy chapter and appendices are all there is. DOS users will be
frustrated by some of the compiler's idiosyncrasies. For example, the compiler
is case sensitive in file names entered on the command line. Attempting to
compile an applet with "javac uiapplet.java" instead of "javac UIApplet.java"
will cause the compiler to bomb with the message "Warning! Public class
UIApplet must be defined in a file 'UIApplet.java'." This is definitely not
what a DOS-oriented person would expect, and some warning is needed.


Example Applets


The applets on the CD are obviously intended to be a dazzling show of flashy
animation to get you "hooked on Java." Once you get it set up, the applets
perform pretty much as advertised. Some are incomplete and the animation is
slow. Most are merely frustrating--the ballistic simulation is so slow that it
is effectively unusable on my DX2-66. The tortuously slow antics of the "Duke"
mascot make him appear more as the personification of an abscessed tooth than
the jovial mascot he's supposed to be. It is at least implied in the book that
the authors in their positions at Sun are responsible for the current
incarnation of Java. Thus, while it is not totally correct to say that the
performance problem is beyond their control, I think it is fair to say that
applets are likely to be more impressive and addictive on a Sun workstation
than on a typical PC.
The Readme.txt file in the Win95 directory of the CD correctly states that you
can't run the Windows 95 Java Development Kit from the CD. It also states that
you can run the Web pages from the CD. However, the \Win95\index.html file
does not exist and the subdirectories in the Solaris directory have been
compressed into zip files in the Win95 directory. To run anything you must
first install the files on your hard disk using an unzip program that can
handle long file names. I first unzipped it using OS/2 and copied the files to
Win95 over my peer network. Since then, I have acquired WinZip 6 and
reinstalled everything. It works fine.
During the unzip of docs.zip you will encounter a duplicate file name. If you
open the docs archive in the WinZip window you will see java.awt.image.html
and java.awt.Image.html. If you have Navigator 2.0 installed you can view any
of the HTML pages by dragging it and dropping it on the Netscape icon. I
should say you can view any of them except the java .awt.Image.html. Since the
Windows 95 file system is not case sensitive it always finds the first one.
java.awt.image.html documents the package java.awt.image, while
java.awt.Image.html documents the class java.awt.image. No matter what you do
it will be wrong. I used a different name for the first file. It ends up as an
orphan, but it is accessible if you ever need it and can remember what you
named it. Once installed, all the applets can be run in Navigator 2.0 by
loading one of the menu files /hooked/book/building.html or
/hooked/book/cool-applets.html and selecting the desired applet.
Only two applets had a problem I couldn't correct. UIApplet 2 and UIApplet 3
both use a setBackground() and a setForeground() method to set colors. I could
set any color I wished and recompile the applet, but it always displayed black
on gray. This was true whether I used the Applet viewer or Netscape.
There also is a problem with the align attribute in the Applet tag. This shows
up in both the clock and the image loop Web pages. If the align appears in the
Web page when it is loaded, the applet will not run. You won't see the clock
ticking or "Duke" waving at you. If you leave the align attribute out, both
applets will run, but they display in the wrong place on the page and the text
does not wrap around the applet. Surprisingly, if you edit the page while it
is active and add the align-right attribute, then click reload, the applet
works as it should. I tried loading the pages both from local files and from
an OS/2 http server, and got the same results. The Netscape browser I am using
is a beta version and the problem may be resolved in the final release.
The documentation for the Java classes and packages is included on the CD in
hypertext form. A search tool is mentioned, but I could never locate it.
Consequently, finding a method is hit or miss if you don't know what class or
package it is in. I tried to look up the setForeground() and setBackground()
methods and essentially had to scan all of the class methods. When you do
locate it, you wonder why you bothered because it is typical UNIX man-page
jargon. If you prefer hard copy, you can print it from the Postscript files.


Conclusion


Whether you think Java is a curse or a blessing is irrelevant. Java is a fact
to be dealt with. If you need to get acquainted with it, I recommend Hooked On
Java as a good starting point, particularly if you don't have an Internet
connection. Like most things associated with Java, Hooked On Java is not quite
finished, but I thought it was a good buy when I bought it and I still do.
Hooked On Java
Arthur van Hoff, Sami Shaio, and Orca Starbuck
Addison-Wesley Publishing, 1996
181 pp., $29.95
ISBN 0-201-48837-X











































































SWAINE'S FLAMES


Cast of Characters


I've had a run of letters recently along the lines of "Who are you to
criticize Steve Jobs and Bill Gates?" and "Where/what the heck is this 'Swaine
Manor' you keep talking about?" and "Do you get paid for writing this stuff?"
(All actual quotes or close paraphrases.) I think maybe it's time to review
the basic plotlines and sets, and run down the cast of characters for those
who tuned in this soap opera in midseason and are a little disoriented.
Mike Swaine: Me. The protagonist of our ongoing drama, a humble scrivener by
trade, a high-tech low-life by rep, and a Webbish nebbish from way back.
Cousin Corbett: My frequently visiting relative, a typical
entrepreneur/inventor.
Billg: William H. Gates III, thirtysomething boy billionaire, big cheese, and
prime rennet of Microsoft Inc. Here's Bill: "IBM is a big company." "The
Internet is a great phenomena."
Zelda: The product-testing Lab. She's pretty hard on UPS drivers, too.
Apple: The patience-testing company.
Basic: She's only in the show because she's Aaron Spelling's daughter.
Dylan: The recently orphaned sexy loner.
C: The most popular girl in school, and it's not because she's pretty. 
Swaine's World: http://www.cruzio.com/~mswaine/. Party time. Excellent. 
Foo Bar: A secret Silicon Valley watering hole much frequented by high-tech
CEOs and gentlefolk of the fourth estate, where I moonlight as a relief
bartender.
Stately Swaine Manor: The mountain fastness where I weave this screed. (Also a
nod to Byte magazine columnist Jerry Pournelle, who refers to his home as
"Chaos Manor.")
I hope this helps those of you having a hard time telling the actors from the
props. If you hate to have the jokes explained, skip the preceding section.
In a truly chaotic manner, the dog-shy UPS driver recently flung another load
of books from the truck, including one by an author who has written much on
the subject of chaos, Clifford Pickover. The latest Pickover offering is Black
Holes: A Traveler's Guide (John Wiley & Sons, 1996). I don't have the physics
background to appreciate this fully, but it's surprising how far Pickover can
take a non-physicist with his engaging way of motivating an equation. Yes,
there are equations in the book; in fact, each chapter revolves around one
important equation regarding black holes (there also are numerous programs, in
C and Basic, that demonstrate key concepts). For example, the chapter on
gravitational time dilation presents the equation for what happens to time in
the vicinity of a black hole, gives some sample data, offers some implications
for science-fiction writers trying to be realistic, and frames it all in
Pickover's own science fiction story. It all works, and it's all just
sufficiently odd. Pickover conveys the weirdness of black holes better than
anybody else who's tried.
I want to end the column this month with a challenge. Because I'm curious as
to just how historically savvy today's developer is, I've put together the
following quiz. The task is simply to arrange these 15 more or less famous
computers in their proper chronological order, based on the date each was
first delivered. In the case of one early computer that may never actually
have been built, the date it was first clearly described will do. But you
don't have to get the dates, just the order. I've numbered the 15 computers in
hexadecimal; a solution consists of the 15 nonzero hex digits in some order,
such as 1984BA72EFD365C. Send your solution to mswaine@cruzio.com. As is our
tradition, no prizes will be awarded and your immortal soul becomes the
property of Miller Freeman Publishing. But you may get your name in print. 
1 Atanasoff-Berry Computer 6 Data General Nova B IBM PC/AT
2 CDC 6600 7 DEC VAX 11/780 C MITS Altair 8800
3 Colossus Mark I 8 DEC PDP-1 D Sperry Rand UNIVAC 1103
4 Commodore VIC-20 9 ENIAC E Xerox Star 8010
5 Cray-1 A IBM 360/50 F Zuse Z1
Michael Swaineeditor-at-largemswaine@cruzio.com


































OF INTEREST
Azalea Software has begun shipping Carrick 1.0, a Windows-based encryption
tool. Carrick works as both a stand-alone app or from within any Windows
application that can access a DLL. Carrick is based on the Blowfish algorithm
that was designed by cryptographer Bruce Schneier, author of Applied
Cryptography and columnist for Dr. Dobb's Journal. Blowfish is several orders
of magnitude stronger than DES, the current standard private-key encryption
algorithm. 
The Carrick DLL can be called from within C/C++, Visual Basic, and other
development tools. Carrick doubles as an SDK, which includes four API calls, a
standardized file header, a default file extension (.CAR), and a complete
cryptographic protocol that insures file security. Carrick allows optional
sender authentication, date/time stamping, and other features. There are hooks
for you to add functionality within the Carrick framework. 
Carrick is priced at $159.00 for single users and $199.00 for a two-copy
bundle. The DLL is also available for licensing. Carrick currently supports
Macintosh, DOS, UNIX, and VAX/VMS. 
A working demo, which encrypts and decrypts files using a limited set of
passwords (001-999) can be downloaded from Azalea's home page. 
Azalea Software Inc.
P.O. Box 16745
Seattle, WA 98116-0745
206-932-0234
http://www.encryption.com 
Willows Software has announced availability of its TWIN cross-platform
development software and services. The APIW-based TWIN Technology for UNIX is
available in both source and binary forms directly from Willows and on the
Internet. The company also announced a four-tier Professional Services program
(Basic Subscription Service, Standard Support, Premium Support, and Strategic
Consulting) designed to assist as you use TWIN XPDK 1.5 for UNIX to build and
deploy applications based on the technology. Professional Services programs
are purchased on a yearly basis, and include a basic subscription to Willows
Software updates on a quarterly CD-ROM. 
The Willows TWIN XPDK for UNIX allows you to build Windows applications that
will run on most UNIX variants. TWIN XPDK is a source-porting kit that allows
you to migrate applications to SCO UNIX and UnixWare, SunOS and Solaris, SGI
UNIX, IBM AIX, HP HPUX, MIPS ABI, Dec Alpha OSF/1, Linux, and Macintosh. 
TWIN is the first product based on Application Programming Interface for
Windows (APIW), a European standard adopted on December 15, 1995 by the
European Computer Manufacturers Association (ECMA). The APIW is waiting for
approval by the International Standards Organization (ISO) to become an
international standard for cross-platform development.
The Willows XPDK for 1.5 UNIX is available from Willows Software and on the
Internet. Willows' Professional Services are priced at U.S. $250.00 for a
basic subscription, $1000.00 for standard, and $5000.00 for premium support.
Pricing for strategic consulting services is negotiable. The software also is
available in source and binary form on a single CD-ROM for $79.00. Individual,
noncommercial users who wish only to download the source and binary versions
of the technology may do so at no charge from the Willows Web site. The
noncommercial license does not allow distribution for commercial purposes. 
Willows Software
12950 Saratoga Avenue, Suite A
Saratoga, CA 95070-4670
408-777-1820
http://www.willows.com
Borland International has announced it has licensed Rogue Wave's
implementation of the Standard C++ Library. Borland C++ 5.0, which is
scheduled to begin shipping early in 1996, will integrate Rogue Wave's
Standard C++ Library 1.0, which includes the Standard Template Library (STL),
string and other classes.
The Standard C++ Library 1.0 is based on the CD registration version of the
library. This version began a public review in July, 1995 and precedes final
approval. It includes the latest changes and revisions approved by the joint
ANSI/ISO Standard C++ Library Committee. 
Borland International
100 Borland Way
Scotts Valley, CA 95066
408-431-1000
http://www.borland.com
Version 8 of Metrowerks' CodeWarrior development environment for the Macintosh
supports Object Pascal, C, and C++, and can generate executables for Macintosh
(PowerPC and 680x0), Windows (95 and x86 NT), Magic Cap, and Be OS. The
$399.00 CodeWarrior Gold package includes a year's worth of free updates.
CodeWarrior is also available in Bronze ($149.00, only compiles for 680x0
Macintosh) and Academic ($99.00, identical to Gold) versions.
Metrowerks also has released its $79.00 "Discover Programming for Macintosh"
package, which includes CodeWarrior Bronze (but without the free updates,
MacApp, OpenDoc, or the MPW Shell) and online versions of three popular
Macintosh programming books.
The $299.00 CodeManager provides Macintosh programmers with a source-code
control system completely compatible with Microsoft's Visual SourceSafe.
Metrowerks Inc.
3925 West Braker Lane, Suite 310
Austin, TX 78759-5321
800-377-5416
http://www.metrowerks.com
Netscape Communications has announced that a number of companies have created
software components that extend the capabilities of the Netscape Navigator 2.0
platform using the Netscape Navigator Plug-In API. The Plug-Ins are available
from the Netscape Plug-In Clearinghouse on the Netscape Web site. 
The Netscape Navigator Plug-In API lets you extend the functionality of
Netscape Navigator 2.0 with native support for new data types and additional
features. Plug-Ins appear as additional capabilities of Netscape Navigator and
can add multimedia capability such as streaming audio and video, VRML-based
3-D graphics, and animation to Web sites. Additionally, utility Plug-Ins
enable OLE objects, VBXs, and OCXs to be embedded in Netscape Navigator. 
Among the new Plug-Ins are: Shockwave for Director from Macromedia; VDOLive
from VDONet (compresses video images without compromising quality on the
receiving end); RealAudio from Progressive Networks provides live and
on-demand real-time audio straight from a Web site into Netscape Navigator;
ToolVox for The Web from Voxware lets you stream high-compression (53:1)
speech audio from Web pages without the need for an audio server; WebFX from
Paper Software, a 3-D VRML platform that lets users fly through VRML worlds on
the Web and run interactive, multiuser VRML applications written in Java; VR
Scout VRML Plug-In from Chaco Communications, a VRML viewer that sends users
through 3-D graphical scenes and uses multithreading to enable users to view
scenes while they are downloading; WIRL from VREAM lets users experience
interactive virtual reality within a Web page; Corel Vector Graphics CMX
Viewer enables you to use scaleable, high-resolution vector graphics files;
Lightning Strike from IION, an optimized wavelet image codec; OpenScape
Toolkit from Object Power, a component-based rapid application development
Plug-In built with OpenScape's Visual Basic-compatible scripting language; OLE
Control Plug-In from NCompass lets you embed OLE controls as applets created
using tools such as Visual C++, Visual Basic, and the Microsoft Windows Game
SDK. 
The Netscape Navigator Plug-In SDK contains the tools and documentation to
develop Plug-Ins for supported platforms, and is freely available on the
Netscape Web site.
Netscape Communications Corp.
501 East Middlefield Road
Mountain View, CA 94043
415-528-2555
http://home.netscape.com
The Crescent Internet ToolPak from the Crescent division of Progress Software
is a toolset that enables Visual Basic 4.0 programmers to build internetworked
Windows apps and utilities. The Internet ToolPak consists of six 32-bit OCX
controls and an Internet mail Wizard for building Internet-enabled apps in VB
4.0. The tools include: two Internet Mail controls, a Newsgroup control, a
Web/http control, an FTP control, a client/server control, and the mail
Wizard. The Internet ToolPak sells for $199.00.
Crescent Progress Software
14 Oak Park
Bedford, MA 01730 
617-280-3000
http://www.progress.com/crescent
Spyglass has released its Spyglass Server SDK for Web developers. At the same
time, the company released a free, standalone Web HTTP server that was
developed using the Spyglass Server SDK. The Spyglass Server SDK includes an
API for the Application Development Interface (ADI), an alternative to CGI;
the Spyglass Modular Security Framework, an API which allows customized
security systems to be added to the server; the User Interface Programming
Interface (UIPI) for creating custom user interfaces; support for Secure
Sockets Layer (SSL) bulk encryption; documentation, sample ADI applications,
and source libraries.
The Spyglass Server SDK starts at $75,000 including run-time licenses and
support. Platforms supported are UNIX and Windows 95/NT. A free standalone
server can be downloaded at the Spyglass home page. 
Spyglass Inc. 
1230 E. Diehl Road
Naperville, IL 60563
708-505-1010
http://www.spyglass.com 
RSA Data Security has announced it is working with leading firewall and TCP/IP
stack vendors to bring the Internet Engineering Task Force's (IETF) proposed
"IPSec" security standard to realization. The initiative, called S/WAN,
designates specifications for implementing IPSec to ensure interoperability
among firewall and TCP/IP products.

S/WAN's goal is to use IPSec to allow companies to mix-and-match the firewall
and TCP/IP stack products to build Internet-based Virtual Private Networks
(VPNs). S/WAN is based on the IETF's Security Architecture for the Internet
Protocol, RFC 1825-1829 known as "IPSec." S/WAN supports encryption at the IP
level, which provides more lower-level security than higher-level protocols,
such as Secure Socket Layer (SSL) and Secure HyperText Transfer Protocol
(S/HTTP). 
To guarantee IPSec interoperability, S/WAN defines a common set of algorithms,
modes, and options. In addition, S/WAN uses RSA's most-advanced
block-encryption cipher, RC5 Symmetric Block Cipher. S/WAN uses RC5 at key
sizes ranging from 40 bits for exportability to 128 bits that can withstand
trillions of MIPS-years of computer attack. S/WAN can also be implemented
using the government's DES algorithm. 
RSA Data Security also announced BSAFE 3.0 cryptography engine and Version 2.0
of its TIPEM toolkit. BSAFE 3.0 is built around RSA's Layered Open Crypto
Toolkit (LOCT) architecture. This architecture comprises four layers
including: application-specific tools, token interface, cryptography engine,
and certificate engine. 
BSAFE includes modules for popular encryption techniques, such as RSA, DES,
Diffie-Hellman, RC2, and RC4, and also supports improved routines for
pseudorandom number generation, as well as digital signatures and
certificates. 
BSAFE 3.0 now supports the RC5 algorithm for implementing secure,
high-bandwidth applications (such as secure video) without resorting to
expensive special-purpose crypto hardware. 
TIPEM 2.0 is a toolkit for interoperable privacy-enhanced messaging that
facilitates development of secure e-mail and other messaging applications.
TIPEM 2.0's API simplifies implementation of authentication via RSA Digital
Signatures, encryption using RSA Digital Envelopes, and certificate-based key
management. TIPEM 2.0 supports CCITT X.509 V1 and V3 Certificates, PKCS, PEM
message formats, and also complies with the new S/MIME specification. TIPEM
2.0 features an expanded algorithm palette, allowing for support of SHA1, RC5,
and Triple-DES. 
BSAFE 3.0 and TIPEM 2.0 sell for $290.00 each.
RSA Data Security Inc.
100 Marine Parkway
Redwood City, CA 94065-1031
415-595-8782
http://www.rsa.com
SunSoft has announced ProWorks/Visual XP, a tool that automatically generates
native GUIs from a single design for Motif and Microsoft Windows. The new
optional extension to the visual development toolset for SunSoft Visual
WorkShop for C++ is a cross-platform GUI builder allowing C and C++ developers
to rapidly build application user interfaces under Solaris, then deploy on
Motif and Windows and with native look-and-feel. The tool is available for
both Solaris SPARC and Solaris Intel platforms. Code generated for Solaris
calls the Motif API, while code generated for Microsoft Windows calls the
Microsoft Foundation Class (MFC) library.
ProWorks/Visual XP for Solaris 2.5 Intel edition sells for $395.00.
SPARCworks/Visual XP, for Solaris 2.5 for SPARC, requires SPARCworks/Visual,
and sells for $495.00.
SunSoft Inc.
2550 Garcia Avenue
Mountain View, CA 94043-1100
415-960-3200
http://www.sun.com/sunsoft 
Btrieve Technologies and Magic Software are working together to develop an
integrated toolkit comprised of Btrieve's scaleable SQL and Btrieve database
engines and Magic's visual RAD tools. The toolkit integrates Btrieve's
MicroKernel Database Architecture and database engine with Magic's
table-driven RAD tool for building client/server applications. 
Btrieve Technologies Inc.
8834 Capital of Texas Highway North, 
Suite 300
Austin, TX 78759
512-794-1719
http://www.btrieve.com





































EDITORIAL


When the Telecomm Bill Comes Due


Once the dust settles around the Telecommunications Act of 1996, what we'll be
left with is a subsidy program for ravenous lawyers. Congress, which has more
than its fair share of failed or otherwise "reformed" attorneys, has shown it
still knows how to take care of its own. From the A-word to the V-chip, the
Telecommunications Act has already turned into a lawyer's sweetest dream and a
taxpayer's worst nightmare.
The first legal salvo, fired before the presidential ink was dry, centered on
the Communications Decency Act (CDA), which prohibits providing "indecent"
material to minors over the Internet. Anyone who does, according to the CDA,
can end up with two years in prison and $500,000 in fines. 
Civil suits challenging the CDA have been filed by the ACLU, Center for
Technology and Democracy, and Citizens Internet Empowerment Coalition, a group
including Apple, Microsoft, the American Society of Newspaper Editors, the
American Library Association, and the American Booksellers Association.
What the plaintiffs want are the same First Amendment rights as print
publishers. Congress, on the other hand, claims that online speech should be
subject to the same regulations governing radio and TV broadcasts. In the
meantime, U.S. District Judge Ronald Buckwalter has prohibited the government
from enforcing most of the CDA because it is unconstitutionally vague.
Buckwalter did leave intact the CDA's reference to "patently offensive"
material, allowing government prosecution within certain guidelines.
Nevertheless, the Justice Department won't prosecute until a court decision is
reached.
Because the bill requires expedited judicial review of any challenges to it,
there's little doubt that the CDA will end up before the Supreme Court within
a year. When this happens, the Supreme Court will be in a conundrum. Although
it does not currently have an official Web site, the Court is going online in
the near future and will provide full-text versions of Supreme Court
decisions--including the now famous (or infamous, depending on how you look at
it) 1978 FCC v. Pacifica Foundation decision dealing with a radio broadcast of
George Carlin's 12-minute "Filthy Words" monologue. A verbatim transcript of
"Filthy Words," prepared by the FCC and replete with CDA-banned words, is part
of the decision. (The complete decision is available "unofficially" at various
law-oriented Web sites around the country.) By posting the decision online,
the Court will be in violation of the law it will have to review.
Putting CDA aside, other legal challenges to the Telecommunications Act
revolve around Rep. Henry J. Hyde's (R-Ill.) attempt to prohibit online
discussions of abortion by tacking on the 100-year-old Comstock law. Hyde's
addition makes "using interactive computer services to provide or receive
information about abortion" punishable by a $250,000 fine and/or five years in
prison. 
(Perhaps confused about his own proposed legislation, Hyde said on the House
floor that "any discussion about abortion, both pro-life and pro-choice
rights, is protected by the First Amendment guarantee of free speech," adding
that "this amendment...does not prohibit serious discussions about the moral
questions surrounding abortion, the act of abortion itself, or the
constitutionality of abortion." Pontificating aside, the new law says
otherwise.)
Not wanting to be left out, lawyers for both local and long-distance telephone
service providers are lining up at the legal trough, too. Although delighted
that the Telecommunications Act opens the switch to local-service
opportunities, long-distance providers like AT&T and MCI are already
challenging the long-standing requirement that rural--and less
profitable--customers be allowed equal access to universal service.
Long-distance providers also are squawking about "must-carry" provisions,
previously limited only to cable TV, which now apply to phone companies, too.
Nor do they like the provisions that require discounted Internet access rates
for schools and libraries. (Paradoxically, the Telecommunications Act mandates
$2 billion for bringing the Internet into classrooms, while other
Congressional proposals cut $3.1 billion from the education budget.) Clearly,
what the long-distance companies want are the benefits of expanding their
customer base, without the responsibilities that go along with it.
The Telecommunications Act also greases the skids for cable television
providers to get into the telephone and Internet service business, and vice
versa. Within a year or so, a lot of us will be able to get high-speed
Internet access over the same line as cable TV. This means it will be legal
for "indecent" audio and video coming from an HBO movie to enter our homes for
TV viewing, but illegal for the same words and pictures to come via the
Internet--and across the same cable, too. Go figure.
If you're confused by the Byzantine contradictions of the Telecommunications
Act, you're not alone. Yes, the Act does have its pluses. I'm all for my local
phone provider having competition, but not at the expense of the First
Amendment. The final straw, however, was last week's notice that my monthly
cable TV rates are going up $2. A person can only take so much.
Jonathan Ericksoneditor-in-chief












































LETTERS


The Cost of a Free Lunch


Dear DDJ,
The article "Proposing a Standard Web API," (DDJ, February 1996) by Michael
Doyle, Cheong Ang, and David Martin, incorrectly states that "the applet
developer must purchase a compiler from Sun or its licensees at considerable
cost." The Java Developers Kit contains a Java compiler with a very liberal
licensing agreement and can be obtained free of cost. Check out
http://java.sun.com for more information.
Eric Kuzniar
Asheville, North Carolina 
kuzniar@cs.unca.edu
DDJ Responds: Eric, thanks to you and others, such as Noel Gorelick
(gorelick@tesla.asu.edu), for pointing out the oversight.
Dear DDJ,
I would like to comment on Jonathan Erickson's "Editorial" in the November
1995 issue of DDJ, the letter from Michael Doyle in the December 1995 issue,
and Erickson's retraction in the January 1996 issue.
I believe that, as application developers and operating-system implementers,
we need to examine the battle of Web APIs very seriously. What is at stake, as
described by Erickson, is a standard that will benefit the Internet. I contend
that what is at stake is the creative flexibility of our industry. Let me
explain.
Promoting one Web API is very similar to allowing Microsoft's Win32 API to
dominate 100 percent of the market. This is bad for computing, users,
developers, OEMs, ISVs, IHVs, and Microsoft. No innovation into vertical
markets and general competition is fostered by the universal dominance of one
system API for an entire class of applications.
The description of the consortium to define the Web API, mentioned in the
letter by Doyle, does not expound upon the rules of membership or the model of
decision making. If all licensees are permitted to be involved in the
consortium, then all API developers would be involved. This would include
operating-system makers, application makers, and hardware manufacturers. How
would the members vote on the design of the Web API? Would the more-important
members, such as Sun, Netscape, and Microsoft, have majority control. Would
startups be allowed to add to the API to support their emergent technology?
A parallel can be drawn with a platform API such as the Standard UNIX system
calls, Xlib, or Win32. All commercial implementations of these APIs either
directly support, or coexist with an extensible API set. These extensions do
not necessarily have to be a universal standard. They are the innovation that
drives the computing industry. They are the made-for-Netscape slogans. They
are the value added.
By forcing one standard API, Eolas makes all existing operating systems
virtually a hardware abstraction layer for the Web API. Thus, for HTTP-based
embedded languages, a standard Web API reduces the value added by any company
involved with Internet computing to null.
Standards are the only way to make quantitative technological progress, but
heterogeneity is the life and the capitalism of the computing industry.
Angus McCollum
gusm@msn.com


Multiple Encryption Correction


Dear DDJ,
In our article "Multiple Encryption: Weighing Security and Performance" (DDJ,
January 1996), we state (on page 124) that "Three-key triple encryption is
still vulnerable to a meet-in-the-middle attack requiring 22k words on memory
and about 2k+1 operations." Instead, the numbers should be reversed so the
sentence reads, "Three-key triple encryption is still vulnerable to a
meet-in-the-middle attack requiring 2k+1 words of memory and about 22k
operations." 
Burton S. Kaliski, Jr.
Matthew J.B. Robshaw
RSA Laboratories
Redwood City, California


Putting the Squeeze On


Dear DDJ,
I was enthused to read Jason Mathews' article "Comparing Data Compression
Algorithms" (DDJ, January 1996). I have a similar task except my data are call
detail records. Two things in his article stood out:
Although he mentions it once, Mathews ignores decompression time even though
CDF data is read surely at least as often as it is written. It is a pity that
this critical aspect was not discussed further.
Jason claims the compress and gzip programs have very similar compression
ratios. Although I've only directly compared the two algorithms for several
gigabytes of data, I have never seen compress come close to gzip; compress
typically gets 2-3:1, and gzip is typically 3-4:1.
Andrew Hume
andrew@plan9.att.com


Shocking Stuff


Dear DDJ,
I read the December 1995 "Editorial" entitled "Shock Treatment" with great
interest. I am a masters student in the management department at the
University of Canterbury in New Zealand. In part fulfillment of my degree, I
am carrying out a research project for Trans Power New Zealand Ltd., the
government-owned company that operates the national electricity grid.
The electricity industry in New Zealand has been in a state of ongoing change
since the commencement of a government deregulation and privatization drive in
1986. Electricity generation (previously the preserve of the government) has
been deregulated, and ECNZ (the state-owned electricity generation company)
has literally been carved in two in an effort to bootstrap a fledgling
competitive generation industry.
Trans Power (the grid operator) has been charged with the role of providing a
neutral and transparent dispatch service to the industry. The matching of
generation to demand will be achieved by a wholesale electricity spot market
(an interim spot market is being launched in around six weeks time), in
addition to contract and hedging facilities.
The electricity retail sector is now fully deregulated (and, in fact, some
power companies have been privatized and even listed on the NZ stock
exchange). While these utilities have maintained their natural monopoly status
with physical distribution networks, customers may now pick and choose between
energy suppliers. 
Interesting stuff, but to the point: One aspect of my project is to
investigate potential business opportunities for Trans Power. The company
already has an extensive telecommunications infrastructure (the telecom
industry here was deregulated and privatized around five years ago.) Utilizing
bandwidth into the home on existing reticulation systems for home
communications, demand side management, and intelligent appliance control is
an area I would like to investigate.
If any readers can provide me with more information--a journal reference,
newspaper article, or the like--it would be great. Thank you in advance for
any information you may be able to provide.
Mark Lilley

Christchurch, New Zealand
misc2311@cantva.canterbury.ac.nz


Absence of Malice


Dear DDJ,
I read Jonathan Erickson's "Editorial" (DDJ, March 1996) about Randall
Schwartz, and I will be sending Schwartz some money.
Until I read the editorial, I was ambivalent about what CPU I put in my
motherboards. No longer. I now know I will no longer use Intel CPUs. Period.
I have begun using these tactics more and more. I do not care for some of
Microsoft's attitudes these days, so I don't buy Microsoft products for
personal use any more, unless I absolutely have to (for example, Windows 95,
but not Office). Instead, I use Linux to do all my fun stuff. 
I also informed President Clinton, by e-mail, that I will not vote for him (or
anyone else involved) in the next election if he signs any bill containing the
Communications Decency Act. I have never had cable television, because I don't
believe people should be paying ridiculous sums of money for very little of
anything. I will not use a cellular telephone service, because: 1. The
cellular telephone companies have tried to make it illegal to even own a
device that can receive their frequencies; 2. They attempted to put a
high-powered transmitter antenna across the street from a grammar school in my
neighborhood; and 3. It appears that they are doing their best to convince
people that airwaves should cost lots of money. Ask yourselves this: Is the
Internet worth your human rights or your freedom? Are computers? I have become
convinced over the years that politics and monetary greed go hand-in-hand. So
be it; from this point on, I speak with my wallet.
Chuck Bermingham
Chicago, Illinois
bermin19@starnetinc.com
Dear DDJ,
While reading Jonathan Erickson's "Absence of Malice" editorial in the latest
edition of DDJ, I noticed that it says:
Schwartz ill-advisedly ran Crack, a commercially available password-cracking
program that uses brute force to discover vulnerable passwords.
As may have already been pointed out, Crack is not a "commercially available"
program--it is a "freely available" package easily downloaded over the
Internet. In fact, any knowledgeable high-school student could easily install
and run Crack (and do it in such a way so as to not be easily discovered).
This is what makes maintaining computer security a major issue and why Intel
actually should be thanking Schwartz for revealing their appallingly inept
situation (the bozo with the "pre$ident" password should have been reported
and given the boot...).
By the way, although Crack does resort to "brute" force to try and crack a
password, it first checks any dictionary lists made available to check against
the encrypted passwords. It is unfortunately surprising how often people use
simple words as passwords.
Mark J. MacLennan
Iowa City, Iowa 
maclenna@cgrer.uiowa.edu 
Dear DDJ,
As programming has become more important, it has become evident that we cannot
restrict ourselves to simply exchanging programming tips. Our interactions
with society are significant enough that we need the vision to see what is
right and wrong and, what is rarer, the courage to do something about it.
Jonathan Erickson's "Absence of Malice" editorial (DDJ, March 1996) on the
silly but tragic prosecution of Randall Schwartz is a fine example of this
vision and courage. I applaud you for seeing what was necessary to say in this
matter and saying it.
Jeffrey Kegler
Sunnyvale, California
jeffrey@rahul.net


































Direct Port I/O and Windows NT


Undocumented features for direct control of hardware devices




Dale Roberts


Dale works with data acquisition and control software at the Vestibular
Laboratory of the Johns Hopkins University School of Medicine. He can be
reached at roberts@ ishtar.med.jhu.edu.


Port I/O instructions allow all 80x86 CPUs to communicate with other hardware
devices in the system. For low-level, direct control of a hardware device, the
C functions _inp() and _outp() (implemented using the 80x86 processor's IN and
OUT instructions) let you read from or write to an I/O port. However,
inserting _inp() or _outp() in a Windows NT application gives you a
privileged-instruction exception message and the option of terminating or
debugging the offending app. If you attempt port I/O from a 16-bit DOS app in
an NT console window, the I/O is either ignored or emulated by NT's virtual
device drivers--you don't get an exception, but you don't get the direct I/O
either.
This isn't a bug; NT is supposed to work this way. The NT architects decided
that it would be too risky to allow applications to directly access the system
hardware. With unrestricted I/O access, an application could turn off all
interrupts, take over the system, and trash the display or the hard drive. A
buggy program could unintentionally do the same. NT's architecture requires
that all hardware be accessed via kernel-mode device drivers--special, trusted
pieces of software that essentially become part of the operating system when
loaded. These device drivers have complete access to the entire system memory,
all hardware devices, and all privileged processor instructions. In contrast,
applications run in user mode, where they have restricted access to
memory--and where the CPU can't execute certain privileged operating-system
instructions, including I/O instructions.
The restriction on I/O port access is both a blessing and a curse. On one
hand, it makes NT exceptionally stable. Generally, application programmers can
write and crash and debug programs all day long without shaking NT. Several
applications can run without adversely affecting one another. On the other
hand, I/O restrictions prevent you from communicating directly and quickly
with the hardware without taking the relatively large amount of time required
for a call to a device driver. Whenever you want to communicate with a device
driver, you must send a request through NT's I/O subsystem. This can take
thousands of processor-clock cycles. A port I/O instruction would take about
30 clock cycles.
Why would you ever need to put I/O instructions in user-mode code? When
writing a device driver, it might make things easier if you could write a
quick program to interact with the device, sprinkling printf()s and getchar()s
among port I/O instructions so that you could verify that you are driving the
device correctly before you put the code into an actual device driver and
chance a system lockup. Or you may want to write a portion of a driver in a
user-mode DLL (as with video drivers, for instance) to achieve a desired level
of performance. One of my favorite uses of I/O is for using an oscilloscope to
debug programs and time sections of code. To do this, you need to set and
clear a bit in a digital output port and monitor the voltage on a scope.
Since direct, user-mode port I/O in NT seems so useful, you'd think there
would be an accepted way to achieve it. A quick look through the sample source
code in the Windows NT Device Driver Kit (DDK) reveals a program called
"PORTIO." Initially, I thought this would provide direct port I/O from an app.
However, PORTIO is merely an example showing how to use Win32
DeviceIoControl() calls to a kernel-mode device driver, which implements the
actual I/O. Using PORTIO, each I/O operation requires a costly, time-consuming
call to the device driver. This was useless for my oscilloscope timings. I
needed a better way.


Accomplishing I/O Protection in NT


To figure out how to grant I/O access to a user-mode app, you have to
understand how I/O protection is implemented in Windows NT. NT does not
actually implement the I/O protection on its own. Since the CPU can trap
attempted I/O port accesses, NT depends on this 80x86 feature. The first
mechanism that must be understood is the privilege-level system used by the
80x86 processors. Four privilege levels are defined by the processor--0, 1, 2,
and 3--and the CPU always operates at one of these levels. The most privileged
level is 0; the least privileged, 3. NT uses only levels 0 and 3. Privilege
level 0 is used for the full-access kernel mode, and 3 for the
more-restrictive user mode. The current privilege level (CPL) of the processor
is stored in the two least-significant bits of the CS (code segment) register.
Rather than statically defining which privilege levels can have I/O access,
the CPU defines an I/O privilege level (IOPL) value, which is compared against
the CPL to determine if I/O instructions can be used freely. The IOPL is
stored in two bits of the processor's EFLAGS register. Any process with a CPL
that is numerically greater than the IOPL must go through the I/O protection
mechanism when attempting port I/O access. Because the IOPL cannot be less
than 0, programs running at privilege level 0 (like kernel-mode device
drivers) will always have direct port I/O access. NT sets the IOPL to 0.
User-mode code always has a CPL of 3, which is larger than the IOPL.
Therefore, user-mode port I/O access attempts must go through the protection
mechanism.
Determining if CPL>IOPL is the first step in the protection mechanism. I/O
protection is not all-or-nothing. The processor uses a flexible mechanism that
allows the operating system to grant direct access to any subset of I/O ports
on a task-by-task basis.
The CPU accomplishes this by using a bitmask array, where each bit corresponds
to an I/O port. If the bit is a 1, access is disallowed and an exception
occurs whenever access to the corresponding port is attempted. If the bit is a
0, direct and unhampered access is granted to that particular port. The I/O
address space of the 80x86 processors encompasses 65,536 8-bit ports. The
bitmask array is 8192 (0x2000) bytes long, since the bitmask array is packed
so that each byte holds eight bits of the array. There is even flexibility in
how much of the bitmask array must be provided. You can provide anywhere from
0 to the full 8192 bytes of the table. The table always starts from I/O
address 0, but you can choose not to provide the bitmask for upper I/O
addresses. Any part of the bitmask that you do not provide is assumed to be 1,
and therefore access is not granted to those ports.
The bitmask array, called the I/O Permission bit Map (IOPM), is stored in the
Task State Segment (TSS) structure in main memory, which is contained in a
special segment referenced by the segment selector in the processor's Task
Register (TR). The location of the IOPM within the TSS is flexible. The offset
of the IOPM within the TSS is stored in a 2-byte integer at location 0x66 in
the TSS; see Figure 1.


NT TSS Specifics


The 80x86 TSS was designed so that each task in the system could have its own
TSS. In NT, however, the TSS is not fully used. The TR, which points to the
TSS segment descriptor, is never modified. Each process uses the same copy of
the TSS, so each process uses the same copy of the IOPM.
In NT, the default IOPM offset points beyond the end of the TSS. This
effectively denies access by user-mode processes to all I/O ports. To grant
access to I/O ports for user-mode processes, you must modify the IOPM offset
so that it is within the TSS, or extend the TSS so that the original default
offset falls within the TSS.


The Video-Port Routines


Since I didn't want to reinvent the wheel, I looked through the NT DDK
documentation to see if there was a facility to deal with user-mode I/O
access. In the Kernel Mode Driver Reference Manual, I came across the
video-driver support routines VideoPortMapMemory() and
VideoPortSetTrappedEmulatorPorts(). The former grants direct access of memory
and I/O ports to user-mode portions of video drivers, presumably for
performance. Source-code examples in the DDK show user-mode portions of the
VGA video drivers using the IN and OUT port I/O instructions. The latter
video-port function grants full-screen DOS-mode programs direct access to a
subset of the VGA I/O ports. The description given in the DDK documentation
for this second routine even makes reference to the IOPM and notes that it is
shared by all of the virtual DOS machines (more accurately, it is shared
across all NT processes).
The video-port routines suggest that there is a mechanism within NT for
allowing user-mode access to I/O ports. Initially, I tried to use the video
routines in a kernel-mode driver to grant I/O access to my user-mode test
program, but this turned out to be complicated. The kernel-mode device driver
has to pretend that it is a video driver to use these routines. Video header
files must be included, the video library must be linked to, and video
initialization routines must be called.
The presence of these two routines demonstrates why user-mode I/O can be
useful, and their descriptions in the DDK documentation are enlightening. But
the functions are intended to be used only with video drivers. Using the video
routines with a nonvideo driver was messy, so I dropped this as an option.


Delving Further


The video-port functions are the only documented method for enabling
direct-port I/O. Since I found them difficult to use, I decided to create my
own. I first tried increasing the size of the TSS so that the default IOPM
offset would land within the TSS; see Figure 2. I had to modify the TSS
segment descriptor in the global descriptor table (GDT) directly and change
the default segment size of 0x20AB to 0x20AB+0xF00 to allow access to the
first 0xF00 I/O ports. The processor's TR then had to be reloaded for the
change in the TSS descriptor to take effect. It isn't a good idea to extend
segments in a haphazard fashion, because all memory must be accounted for by
the 80x86 paging system. A page fault could occur during a reference to the
IOPM, which would crash the system. But because the physical page size is 4 KB
and I did not extend the TSS beyond the end of a physical page, there was no
trouble. Since there were only zeros beyond the original end of the TSS,
increasing its size granted universal I/O access across all applications. The
TOTALIO device driver in Listing One illustrates this.
At first, this may seem like the best possible method to grant I/O access--you
set it once and don't need to grant access to each process individually.
However, this method is dangerous and unrestrictive. It would allow, for
instance, DOS programs to directly access the video registers, even if they
were not running in full-screen mode. It would allow DOS disk utilities to
access the hard drive directly and wreak havoc on NTFS partitions. NT device
drivers keep information on the state of the devices they control, and TOTALIO
would allow applications to completely violate this arrangement. As soon as
you start up a DOS program, or any other program with port I/O, you risk
trashing the whole system.


Granting Access to a Single Process


Since TOTALIO was risky, I looked for a method that would allow a kernel-mode
driver to grant I/O access to a single process.

Using a debugger, I examined the NT TSS and found a block of 0xFFs extending
from offset 0x88 up to the end of the TSS. I assumed that in NT, the block of
0xFFs was where the IOPM was intended to sit, even though the default IOPM
offset points beyond this area. There were 0x2004 bytes of 0xFF. The extra
four bytes are present because the 80x86 requires at least one extra byte of
0xFF at the end of the IOPM. The 80x86 requires the extra byte because it
always accesses two bytes of the IOPM at a time.
I moved the IOPM offset to point to the start of the 0xFFs, as in Figure 3. I
zeroed a few bytes of the IOPM and tried to access ports. Nothing happened. My
application still caused exceptions. The kernel-mode device-driver fragment in
Listing Two illustrates this attempt.
What was wrong? A visual inspection of a memory dump of an NT process
structure showed that NT stores the IOPM offset in a location of its own,
within the process structure. The actual IOPM offset in the TSS is loaded from
the value in the process structure whenever a process gains control, so
changing the TSS directly is of no use. To change the IOPM base address, the
value in the process structure must be changed. Once the IOPM offset in the
process structure is changed, user-mode I/O access is granted to that process
for all ports whose corresponding IOPM access bit is 0. Listing Three
illustrates direct modification of the process structure.


Yet Another Way


Early on, I ran across some kernel-mode function names in the NTOSKRNL library
(which contains kernel-mode device-driver support routines) that weren't
documented in the DDK. Among these functions were Ke386SetIoAccessMap(),
Ke386QueryIoAccessMap(), and Ke386IoSetAccessProcess(). From their names,
these functions sounded like they might do what I needed, but because they
were not documented, I initially had difficulty getting them to work. Only
after I completely understood the 80x86 I/O protection mechanism and had my
own implementation working, did I have the knowledge to go back and decipher
them.
Ke386SetIoAccessMap() takes two arguments: an integer that must be set to 1 in
order for the function to work, and a buffer pointer. It copies a supplied I/O
access bitmap of length 0x2000 from the buffer into the TSS at offset 0x88.
Ke386QueryIoAccessMap() takes the same arguments but does the opposite,
copying the current IOPM from the TSS into a buffer of length 0x2000. If the
integer argument is set to 0, the set function copies 0xFFs to the IOPM, and
the query function copies 0xFFs to the user's buffer.
Ke386IoSetAccessProcess() takes two arguments: a pointer to a process
structure obtained from a call to PsGetCurrentProcess(), and an integer that
must be set to 1 to grant I/O access, or to 0 to remove I/O access. When the
integer argument is 0, the function disables I/O access by setting the IOPM
offset of the passed process to point beyond the end of the TSS. When the
integer argument is 1, the function enables I/O access by setting the IOPM
offset of the passed process to point to the start of the IOPM at offset 0x88
in the TSS.
Using set and query together, it is possible to read, modify, and write back
the IOPM, adding access to the desired ports by setting their respective
permission bits to zero. Ke386IoSetAccessProcess() then enables the IOPM
lookup for the desired process. The kernel-mode device driver in Listing Four,
GIVEIO.C, sets the IOPM to 0s to allow full user-mode access to all I/O ports.
Listing Five, a user-mode test application called TESTIO.C, uses direct port
I/O to exercise the PC's internal speaker.


Direct--for Real?


Once a user-mode process is given permission to access an I/O port, the I/O
access proceeds without any further help from the device driver. The purpose
of the device driver is to modify the IOPM and the process's copy of the IOPM
offset. Once that's done, the application's I/O port accesses proceed
unhindered. In fact, the device driver could be unloaded once the IOPM is
modified, and the application could still do direct I/O. Listing Five
illustrates this by opening and closing the GIVEIO driver, giving the
application I/O access, before it performs the port I/O.


I/O Timing


Using port I/O from an application isn't a free ride. There's overhead in the
protection mechanism, so the 80x86 IN and OUT instructions take longer in user
mode, where CPL > IOPL. The number of processor-clock cycles it takes to
execute the IN and OUT instructions varies depending on the CPU mode. In
so-called real mode (plain-vanilla, nonextended DOS), an OUT instruction takes
16 processor-clock cycles to execute on a 486; in virtual-8086 mode (a DOS
program running in a Windows DOS box or an NT console window), it takes 29
cycles. In protected mode, the execution time depends on whether CPL > IOPL.
In the context of NT, this means that it depends on whether a process is
executing in kernel mode or user mode. In kernel mode, an OUT instruction
takes a mere ten cycles. In user mode it takes a whopping 30 cycles! So the
execution time of a "direct" I/O operation is in fact three times longer for a
user-mode process, but it is still tiny compared to a device-driver call,
which might take on the order of 6000 to 12,000 clocks (somewhere in the
100-200ms range on my 486). The extra time taken when CPL > IOPL, and when the
processor is in virtual 8086 mode, is the time it takes the processor to check
the bits in the IOPM.


Careful with that Axe, Eugene!


Pardon the Pink Floyd reference, but it seems appropriate to provide warnings
about this potentially dangerous tool. 
With I/O access knowledge, you may be tempted to start using it for
everything, but remember that I/O protection exists in NT for good reasons.
I/O protection helps give the operating system its seemingly bullet-proof
stability by forcing all access to a device to go through a single, controlled
channel. Frivolous use of user-mode port I/O would tend to erode NT's
stability. Circumventing an existing kernel-mode device driver is a bad idea.
Device drivers maintain information about the state of the devices they
control. Bypassing a driver and accessing hardware directly may cause the
driver and the hardware to get out of sync, with unpredictable results.
Imagine the chaos that would result if every application tried to directly
access the network card.
User-mode I/O may be useful for developing device drivers. It might serve as a
development tool for quickly testing new hardware. Direct I/O from user-mode
processes should find very little use in software that is distributed to end
users. It should never occur in an application. If you are accessing a device,
you should be doing it from a device driver. User-mode port I/O might
occasionally be useful in user-mode portions of a device driver to achieve
better overall performance for the driver.
Having ruled out its use in applications, it is likely that even most device
drivers would not benefit from user-mode port I/O. Although it may be tempting
to use it in every device driver, just to squeeze out that last bit of
performance, most devices would not become appreciably faster by using
user-mode port I/O. In many devices, the time delays perceived by the user are
not in the calls to the device driver, but in the action of the device. The
user isn't usually waiting for the device-driver call itself to complete, but
rather for the disk drive to spin, the read/write head to move, or the paper
to feed through the printer. User-mode I/O should only be used if there is a
definite bottleneck in port I/O access from applications, and then, only if
direct user-mode I/O access would improve the driver by making a noticeable
and significant difference to the user. Even Microsoft uses this technique
sparingly. The only device driver in the system's DRIVERS directory that
references the three undocumented routines is the VIDEOPRT.SYS driver, which
contains the VideoPort...() functions.
If I/O access is done in the user-mode section of a driver, kernel mode may
still be needed for, among other things, servicing interrupts and controlling
DMA. User-mode port I/O does not remove the necessity of writing kernel-mode
device drivers.
If you decide that you want to use port I/O in the user-mode portion of your
device driver, your kernel-mode driver should modify only the IOPM permission
bits that correspond to the I/O ports required by the user-mode portion of the
driver. You should use the Ke386QueryIoAccessMap() to get the current IOPM,
zero each of the permission bits required by the driver, then use the
Ke386SetIoAccessMap() routine to write the IOPM back. If and when your driver
is unloaded, it should set each permission bit back to 1. Only one IOPM is
used by all processes in the system, including, possibly, the video driver.
For this reason it is important that the IOPM is not simply written with 0xFFs
when access is no longer needed. Of course, the usual device-driver rules
given in the DDK manual for allocating I/O ports and keeping track of them in
the registry would still apply.
Is system security and integrity violated by user-mode port I/O access? No,
because a device driver is still required to grant the I/O access to the
application, so it is not possible for an app to gain access to I/O ports on
its own. The granting of I/O access may be done on a per-process basis, and
the kernel-mode device driver that grants I/O access could be modified to
grant access only to those processes that it trusts. Only a user running with
administrator privileges can load device drivers, so in general, a user-mode
application cannot load a device driver and grant itself I/O access unless the
administrator is running it. Granting I/O access to a user-mode process is not
directly related to NT's security system and does not make any attempt to foil
it.
Granting I/O access to a process is a very specific action. It does not enable
the use of the other protected 80x86 instructions, such as STI (enable
interrupts) and CLI (disable interrupts). These, and the other privileged
instructions, can be executed only by a kernel-mode driver.


Portability


The technique described here is specific to 80x86-compatible CPUs. Still, NT
runs on several other platforms, including the DEC Alpha, MIPS, and PowerPC.
Although this specific implementation is not portable to those processors,
there shouldn't be any reason why the same effect could not be achieved on
them. None of the other processors have I/O instructions; all hardware is
memory mapped. Since any physical memory can be mapped into a user-mode
process's memory space (see the MAPMEM example program in the DDK), it should
be possible to make any hardware accessible to a user-mode process.
None of the techniques presented here are documented by Microsoft, so
portability across releases of NT, even on the same processor platform, could
be problematic. It is not likely that the whole mechanism would be removed,
since the video drivers rely on it. But the names and functionality of
specific undocumented routines could change, or the routines could go away
altogether and perhaps become embedded in the video-port library.
It appears that the undocumented functions were added to NT to increase
video-driver performance. On the 80x86 platform, increasing video performance
required allowing access to some of the video I/O ports in user mode. Since
this is the only use of the mechanism, and since it is documented indirectly
through the VideoPort...() routines, there was no reason for the underlying
Ke386...() routines to be documented. 
Another reason Microsoft may have chosen not to document this mechanism is
that it is not fully implemented. Currently, the IOPM is shared by all
user-mode processes in the system. To be safer and more useful, the system
should maintain a separate IOPM for each process. One way to do this would be
to save the IOPM (or a pointer to one) in the process structure and copy it
into the TSS each time a process changes. But copying 8192 bytes would add a
large amount of overhead to a task switch. Another way to give each process
its own IOPM would be to give each process its own TSS. NT's process structure
could be stored in the TSS, since Intel reserves only the first 0x68 bytes of
the TSS for the processor's use and allows the rest to be used by the
operating system. Switching the TSS for each process requires reloading the
TR. The LTR instruction, which loads the 80x86 task register, has only a tiny
overhead of 20 clocks on a 486. Each TSS could have its own segment descriptor
in the GDT. Or, since segment descriptors are a commodity, the segment
descriptor for the TSS could be modified for each process switch. In any case,
keeping the entire IOPM would require an overhead of 8 KB for each process. An
alternative would be to store only as much of the IOPM as is needed, and to
only create a new TSS and store the IOPM for processes that require user-mode
I/O access. Most I/O devices exist below the 0x400 port address, so this would
require only 0x80 (128) bytes of storage. To achieve full generality, NT could
just save as much of the IOPM as is necessary to map the highest I/O address
that needs to be accessed.


Conclusion


The availability of direct port I/O in user-mode processes opens new doors for
NT programmers. Hopefully this technique will prove useful in dealing with
hardware devices in an 80x86 NT environment.
Figure 1: The segment selector in the processor's TR points to the segment
descriptor in the GDT, which defines the location and size of the TSS in
memory. The IOPM is stored as part of the TSS. Its offset is stored in a
2-byte integer at location 0x66 in the TSS.
Figure 2: The default TSS size is 0x20AB. We need to extend it to 0x2FAB so
that the IOPM offset falls within the TSS.
Figure 3: NT places the IOPM at offset 0x88 in the TSS. We need to modify the
IOPM offset to point to this area.

Listing One
/******************************************************************************
TOTALIO.SYS -- by Dale Roberts

Compile: Use DDK BUILD facility
Purpose: Give direct port I/O access to the whole system. This driver grants
total system-wide I/O access to all applications. Very dangerous, but useful 
for short tests. Note that no test application is required. Just use control 
panel or "net start totalio" to start the device driver. When the driver is
stopped, total I/O is removed. Because no Win32 app needs to communicate with
the driver, we don't have to create a device object. So we have a tiny driver 
here. Since we can safely extend the TSS only to the end of the physical
memory
page in which it lies, the I/O access is granted only up to port 0xf00. 
Accesses beyond this port address will still generate exceptions.
******************************************************************************/
#include <ntddk.h>
/* Make sure our structure is packed properly, on byte boundary, not
 * on the default doubleword boundary. */
#pragma pack(push,1)
/* Structures for manipulating the GDT register and a GDT segment
 * descriptor entry. Documented in Intel processor handbooks. */
typedef struct {
 unsigned short limit;
 GDTENT *base;
} GDTREG;
typedef struct {
 unsigned limit : 16;
 unsigned baselo : 16;
 unsigned basemid : 8;
 unsigned type : 4;
 unsigned system : 1;
 unsigned dpl : 2;
 unsigned present : 1;
 unsigned limithi : 4;
 unsigned available : 1;
 unsigned zero : 1;
 unsigned size : 1;
 unsigned granularity : 1;
 unsigned basehi : 8;
} GDTENT;
#pragma pack(pop)
/* This is the lowest level for setting the TSS segment descriptor limit
field.
 * We get the selector ID from the STR instruction, index into the GDT, and 
 * poke in the new limit. In order for the new limit to take effect, we must 
 * then read the task segment selector back into the task register (TR).
 */
void SetTSSLimit(int size)
{
 GDTREG gdtreg;
 GDTENT *g;
 short TaskSeg;
 _asm cli; // don't get interrupted!
 _asm sgdt gdtreg; // get GDT address
 _asm str TaskSeg; // get TSS selector index
 g = gdtreg.base + (TaskSeg >> 3); // get ptr to TSS descriptor
 g->limit = size; // modify TSS segment limit
//
// MUST set selector type field to 9, to indicate the task is
// NOT BUSY. Otherwise the LTR instruction causes a fault.
//
 g->type = 9; // mark TSS as "not busy"
// We must do a load of the Task register, else the processor
// never sees the new TSS selector limit.

 _asm ltr TaskSeg; // reload task register (TR)
 _asm sti; // let interrupts continue
}
/* This routine gives total I/O access across the whole system. It does this
 * by modifying the limit of the TSS segment by direct modification of the TSS
 * descriptor entry in the GDT. This descriptor is set up just once at system 
 * init time. Once we modify it, it stays untouched across all processes.
 */
void GiveTotalIO(void)
{
 SetTSSLimit(0x20ab + 0xf00);
}
/* This returns the TSS segment to its normal size of 0x20ab, which
 * is two less than the default I/O map base address of 0x20ad. */
void RemoveTotalIO(void)
{
 SetTSSLimit(0x20ab);
}
/****** Release all memory 'n' stuff. *******/
VOID
TotalIOdrvUnload(
 IN PDRIVER_OBJECT DriverObject
 )
{
 RemoveTotalIO();
}
/****** Entry routine. Set everything up. *****/
NTSTATUS DriverEntry(
 IN PDRIVER_OBJECT DriverObject,
 IN PUNICODE_STRING RegistryPath
 )
{
 DriverObject->DriverUnload = TotalIOdrvUnload;
 GiveTotalIO();
 return STATUS_SUCCESS;
}

Listing Two
/*****************************************************************************
This code fragment illustrates the unsuccessful attempt to directly modify
the IOPM base address. This code would appear in a kernel-mode device driver.
Refer to the GIVEIO.C listing for a complete device driver example.
******************************************************************************/
/* Make sure our structure is packed properly, on byte boundary, not
 * on the default doubleword boundary. */
#pragma pack(push,1)
/* Structure of a GDT (global descriptor table) entry; from processor
manual.*/
typedef struct {
 unsigned limit : 16;
 unsigned baselo : 16;
 unsigned basemid : 8;
 unsigned type : 4;
 unsigned system : 1;
 unsigned dpl : 2;
 unsigned present : 1;
 unsigned limithi : 4;
 unsigned available : 1;
 unsigned zero : 1;
 unsigned size : 1;

 unsigned granularity : 1;
 unsigned basehi : 8;
} GDTENT;
/* Structure of the 48 bits of the GDT register that are stored
 * by the SGDT instruction. */
typedef struct {
 unsigned short limit;
 GDTENT *base;
} GDTREG;
#pragma pack(pop)
/* This code demonstrates the brute force approach to modifying the IOPM base.
 * The IOPM base is stored as a two byte integer at offset 0x66 within the
TSS,
 * as documented in the processor manual. In Windows NT, the IOPM is stored 
 * within the TSS starting at offset 0x88, and going for 0x2004 bytes. This is
 * not documented anywhere, and was determined by inspection. The code here 
 * puts some 0's into the IOPM so that we can try to access some I/O ports, 
 * then modifies the IOPM base address. This code is unsuccessful because NT 
 * overwrites the IOPM base on each process switch. */
void GiveIO()
{
 GDTREG gdtreg;
 GDTENT *g;
 short TaskSeg;
 char *TSSbase;
 int i;
 _asm str TaskSeg; // get the TSS selector
 _asm sgdt gdtreg; // get the GDT address
 g = gdtreg.base + (TaskSeg >> 3); // get the TSS descriptor
 // get the TSS address
 TSSbase = (PVOID)(g->baselo (g->basemid << 16) 
 (g->basehi << 24));
 for(i=0; i < 16; ++i) // poke some 0's into the
 TSSbase[0x88 + i] = 0; // IOPM
 *((USHORT *)(TSSbase + 0x66)) = 0x88;
}

Listing Three
/* From inpection of the TSS we know that NT's default IOPM offset is 0x20AD.
 * From an inspection of a dump of a process structure, we can find the bytes 
 * 'AD 20' at offset 0x30. This is where NT stores the IOPM offset for each 
 * process, so that I/O access can be granted on a process-by-process basis. 
 * This portion of the process structure is not documented in the DDK.
 * This kernel mode driver fragment illustrates the brute force
 * method of poking the IOPM base into the process structure. */
void GiveIO()
{
 char *CurProc;
 CurProc = IoGetCurrentProcess();
 *((USHORT *)(CurProc + 0x30)) = 0x88;
}

Listing Four
/*********************************************************************
GIVEIO.SYS -- by Dale Roberts
Compile: Use DDK BUILD facility
Purpose: Give direct port I/O access to a user mode process.
*********************************************************************/
#include <ntddk.h>
#include <mondebug.h>

/* The name of our device driver. */
#define DEVICE_NAME_STRING L"giveio"
/* This is the "structure" of the IOPM. It is just a simple character array 
 * of length 0x2000. This holds 8K * 8 bits -> 64K bits of the IOPM, which 
 * maps the entire 64K I/O space of the x86 processor. Any 0 bits will give
 * access to the corresponding port for user mode processes. Any 1
 * bits will disallow I/O access to the corresponding port. */
#define IOPM_SIZE 0x2000
typedef UCHAR IOPM[IOPM_SIZE];
/* This will hold simply an array of 0's which will be copied into our actual 
 * IOPM in the TSS by Ke386SetIoAccessMap(). The memory is allocated at 
 * driver load time. */
IOPM *IOPM_local = 0;
/* These are the two undocumented calls that we will use to give the calling 
 * process I/O access. Ke386IoSetAccessMap() copies the passed map to the TSS.
 * Ke386IoSetAccessProcess() adjusts the IOPM offset pointer so that the newly
 * copied map is actually used. Otherwise, the IOPM offset points beyond the 
 * end of the TSS segment limit, causing any I/O access by the user-mode 
 * process to generate an exception. */
void Ke386SetIoAccessMap(int, IOPM *);
void Ke386QueryIoAccessMap(int, IOPM *);
void Ke386IoSetAccessProcess(PEPROCESS, int);
/***** Release any allocated objects. ******/
VOID GiveioUnload(IN PDRIVER_OBJECT DriverObject)
{
 WCHAR DOSNameBuffer[] = L"\\DosDevices\\" DEVICE_NAME_STRING;
 UNICODE_STRING uniDOSString;
 if(IOPM_local)
 MmFreeNonCachedMemory(IOPM_local, sizeof(IOPM));
 RtlInitUnicodeString(&uniDOSString, DOSNameBuffer);
 IoDeleteSymbolicLink (&uniDOSString);
 IoDeleteDevice(DriverObject->DeviceObject);
}
/*****************************************************************************
 Set the IOPM (I/O permission map) of the calling process so that it is given
full I/O access. Our IOPM_local[] array is all zeros, so IOPM will be all 0s.
If OnFlag is 1, process is given I/O access. If it is 0, access is removed.
******************************************************************************/
VOID SetIOPermissionMap(int OnFlag)
{
 Ke386IoSetAccessProcess(PsGetCurrentProcess(), OnFlag);
 Ke386SetIoAccessMap(1, IOPM_local);
}
void GiveIO(void)
{
 SetIOPermissionMap(1);
}
/******************************************************************************
Service handler for a CreateFile() user mode call. This routine is entered in
the driver object function call table by DriverEntry(). When the user-mode 
application calls CreateFile(), this routine gets called while still in the 
context of the user-mode application, but with the CPL (the processor's
Current
Privelege Level) set to 0. This allows us to do kernel-mode operations. 
GiveIO() is called to give the calling process I/O access. All the user-mode 
app needs do to obtain I/O access is open this device with CreateFile(). No
other operations are required.
*********************************************************************/
NTSTATUS GiveioCreateDispatch(
 IN PDEVICE_OBJECT DeviceObject,

 IN PIRP Irp
 )
{
 GiveIO(); // give the calling process I/O access
 Irp->IoStatus.Information = 0;
 Irp->IoStatus.Status = STATUS_SUCCESS;
 IoCompleteRequest(Irp, IO_NO_INCREMENT);
 return STATUS_SUCCESS;
}
/*****************************************************************************
Driver Entry routine. This routine is called only once after the driver is 
initially loaded into memory. It allocates everything necessary for the 
driver's operation. In our case, it allocates memory for our IOPM array, and 
creates a device which user-mode applications can open. It also creates a 
symbolic link to the device driver. This allows a user-mode application to 
access our driver using the \\.\giveio notation.
******************************************************************************/
NTSTATUS DriverEntry(
 IN PDRIVER_OBJECT DriverObject,
 IN PUNICODE_STRING RegistryPath
 )
{
 PDEVICE_OBJECT deviceObject;
 NTSTATUS status;
 WCHAR NameBuffer[] = L"\\Device\\" DEVICE_NAME_STRING;
 WCHAR DOSNameBuffer[] = L"\\DosDevices\\" DEVICE_NAME_STRING;
 UNICODE_STRING uniNameString, uniDOSString;
 // Allocate a buffer for the local IOPM and zero it.
 IOPM_local = MmAllocateNonCachedMemory(sizeof(IOPM));
 if(IOPM_local == 0)
 return STATUS_INSUFFICIENT_RESOURCES;
 RtlZeroMemory(IOPM_local, sizeof(IOPM));
 // Set up device driver name and device object.
 RtlInitUnicodeString(&uniNameString, NameBuffer);
 RtlInitUnicodeString(&uniDOSString, DOSNameBuffer);
 status = IoCreateDevice(DriverObject, 0, &uniNameString,
 FILE_DEVICE_UNKNOWN, 0, FALSE, &deviceObject);
 if(!NT_SUCCESS(status))
 return status;
 status = IoCreateSymbolicLink (&uniDOSString, &uniNameString);
 if (!NT_SUCCESS(status))
 return status;
 // Initialize the Driver Object with driver's entry points.
 // All we require are the Create and Unload operations.
 DriverObject->MajorFunction[IRP_MJ_CREATE] = GiveioCreateDispatch;
 DriverObject->DriverUnload = GiveioUnload;
 return STATUS_SUCCESS;
}

Listing Five
/*********************************************************************
TSTIO.EXE -- by Dale Roberts
Compile: cl -DWIN32 tstio.c
Purpose: Test the GIVEIO device driver by doing some direct
 port I/O. We access the PC's internal speaker.
*********************************************************************/
#include <stdio.h>
#include <windows.h>
#include <math.h>

#include <conio.h>
typedef struct {
 short int pitch;
 short int duration;
} NOTE;
/* Table of notes. Given in half steps. Communication from "other side." */
NOTE notes[] = {{14, 500}, {16, 500}, {12, 500}, {0, 500}, {7, 1000}};
/***** Set PC's speaker frequency in Hz. The speaker is controlled by an
 ***** Intel 8253/8254 timer at I/O port addresses 0x40-0x43. *****/
void setfreq(int hz)
{
 hz = 1193180 / hz; // clocked at 1.19MHz
 _outp(0x43, 0xb6); // timer 2, square wave
 _outp(0x42, hz);
 _outp(0x42, hz >> 8);
}
/*********************************************************************
Pass a note, in half steps relative to 400 Hz. The 12 step scale is an 
exponential thing. Speaker control is at port 0x61. Setting lowest two bits 
enables timer 2 of the 8253/8254 timer and turns on the speaker.
*********************************************************************/
void playnote(NOTE note)
{
 _outp(0x61, _inp(0x61) 0x03); // start speaker going
 setfreq((int)(400 * pow(2, note.pitch / 12.0)));
 Sleep(note.duration);
 _outp(0x61, _inp(0x61) & ~0x03); // stop that racket!
}
/*********************************************************************
 Open and close the GIVEIO device. This should give us direct I/O
access. Then try it out by playin' our tune.
*********************************************************************/
int main()
{
 int i;
 HANDLE h;
 h = CreateFile("\\\\.\\giveio", GENERIC_READ, 0, NULL,
 OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
 if(h == INVALID_HANDLE_VALUE) {
 printf("Couldn't access giveio device\n");
 return -1;
 }
 CloseHandle(h);
 for(i=0; i < sizeof(notes)/sizeof(int); ++i)
 playnote(notes[i]);
 return 0;
}
















Pipes for Macintosh 


Pipes aren't just for UNIX anymore




Kristiaan Coppieters


Kris moved from Belgium to New Zealand in September 1995. Currently, he
manages the service department of CMC, a Wellington-based Apple Centre. He can
be reached at 100025.2724@compuserve.com.


After switching back and forth between Macintosh and UNIX systems, it occurred
to me that the filter-and-pipe principle--as implemented on UNIX, MS-DOS, and
OS/2--could exist on the Macintosh too. Consequently, I implemented pipes for
the Macintosh, analogous to UNIX pipes.
UNIX filters are relatively easy to create and anyone with a basic knowledge
of C can write one. I wanted my Macintosh filters to be just as easy to
create, so I developed a C program shell for a Macintosh filter. This shell
makes it easy for anyone with a basic understanding of C to develop new
filters.
My Macintosh pipes are visual and are created by arranging icons in a Finder
window. Because a Macintosh pipe is an arrangement of icons, it is more
persistent than a UNIX pipe--once created, you can use the pipe multiple times
without retyping. My pipes are pure Mac applications: No extensions are made
to the basic Macintosh OS, although you do need System 7.


The Basic Behavior


You use a filter by dropping a text file on the filter's icon. The filter
reads the dropped file, presents a standard file save dialog, and writes the
processed results to the output file. The resulting file is a text file that
can be opened from any suitable application. The sample filters include
converters between Macintosh and MS-DOS file formats and a sort filter that
sorts the lines in a text file.
When you arrange two or more filter icons visually on a horizontal row in a
folder, you are constructing a pipe. (Use the command key to make icons
"stick" on grid positions while moving icons.) If you drop a document on the
left-most icon of this pipe, data is sent from left to right through the
consecutive filters. The data hops from icon to icon, until it reaches the
right-most filter of the pipe, which saves the result in a file. After
processing the dropped document, all filters in the pipe automatically quit.
You can change the behavior of some filters by changing their names. For
example, if you rename the sort filter to "sort column 3-6", it will sort the
incoming data on character columns 3 to 6 instead of sorting on complete
lines. These application-name options are analogous to UNIX command-line
options.
When you double click the left-most icon of a horizontal row of one or more
filter icons, all filters in the pipe start up and remain idle. I call this a
"server pipe." If one or more documents are dropped on the left-most filter
icon of the active pipe, they are each processed as previously described.
After processing, the complete pipe remains idle and does not quit. By
choosing File-Quit in the menu of the left-most filter, all filters in the
pipe will quit. Currently, there is no need for a server pipe, but they could
be used to exchange data among multiple Macintoshes.


Developing Your Own Filters


The program shell handles all the details of dealing with the Macintosh
operating system. You only need to develop a routine called ProcessChar(int
c), which is called once for each character in the file. When there is no more
data, ProcessChar is called an additional time with the pseudocharacter cEOF
(equal to -1). Whenever it has data to send further down the pipe, it can call
PutChar(int c) or PutStr(char *s). The easiest way to develop a new filter is
to copy and modify the template file copy.c; see Listing One.
The ProcessChar function in simple filters accepts a character, modifies it,
and immediately passes it along. More complex filters--like sort--receive and
store the complete file before processing it and sending it further down the
pipe.


Handling Options


When a filter starts, it parses its own file name. If you want to handle
parameters in the application name, you need to expand a few functions to
handle the details. The sort filter in Listing Two is a good example of this
type of processing.
The AcceptAppNam(char *appNam) function is called with the first word of the
filter's name. This allows a filter to change function depending on its name.
For example, if the icon is called "Sort Descending Field 2-3,"
AcceptAppNam("sort") is called.
The ParamNeedsValue(char *param) function is called for each word that follows
the program name. It returns True if that word is followed by an argument. In
the previous example, ParamNeedsValue("descending") would return False and
ParamNeedsValue("field") would return True. If False is returned, the engine
passes the parameter to AcceptParameter(char *param), which can set
appropriate global state variables. Otherwise, it grabs the next word and
treats it as an argument to this parameter.
The argument parsing is geared to handling numeric ranges, such as "column
2-4,7". The engine parses the word following the parameter and calls
AcceptParameterFromTo(char *param, char *from, char *to) for each range, and
AcceptParameterValue(char *param, char *value) for each lone value. In this
example, there would be calls to AcceptParameterFromTo("column","2","4") and
AcceptParameterValue("column","7"). You can redefine these functions to
control how parameters are handled. Use atoi() if you need to convert strings
into integers.
You may need to use global variables in your filter module, either to store
state information from the application name parameters, or to store the data
you receive before processing it. For example, the sort filter uses global and
dynamic data storage extensively.


Overview of the Engine


Each filter consists of two C source files: pipe.c contains the engine, and
another file contains ProcessChar() and other filter-specific functions.
Filter-specific files for sort and copy are shown in Listings One and Two,
respectively. The remaining source code is available electronically; see
"Availability," page 3.
At startup, the routine InitPipe is called. The filter scans through all files
in the folder where it is located, looking for application icons that are at
the same horizontal level and to its left. If it finds more than one, it
selects the one that is closest. It then prepares to handle Apple Events.
If an application was found, the filter opens a PPC (Program-to-Program
communication) port using the Macintosh PPC Toolbox. A PPC port is identified
by two strings, the type and name. Here, the name is chosen to be a random
string, and the type is the name of the application whose icon it found to its
right. After opening the port, the application is ready to be launched.
The filter now tries to open a PPC connection established by a previous
filter. The PPC connection's type is the same as this filter's name. This PPC
call, like most PPC calls in pipe.c, is executed asynchronously so the filter
will not stall if one of the calls is unsuccessful.
After InitPipe, GetParameters is called. The filter then parses its own file
name, and calls the corresponding routines in the filter module.
After the necessary preparations, the filter enters its main event loop, where
it regularly checks for incoming PPC connections (from an application to its
right) or for Apple Events (from documents dropped on the filter).
Internally, there are data structures that implement two state machines: one
for the incoming connection (gInConnection) and one for the outgoing
connection (gOutConnection). Depending on the events that occur, these
machines cycle through a set of states allowing them to process incoming and
outgoing data.
If a file is being processed, and there is not another filter application to
the right, the filter brings up a standard file-save dialog. Before doing so,
the filter checks to see if it is running in the foreground or in the
background. If it is running in the background, it uses the Notification
Manager to notify the user and waits until it is put in the foreground.



Example Filters


I've included a variety of simple filters in the source (available
electronically). The copy filter is primarily a template for developing new
filters. The Toupper and Tolower filters convert all letters in a file to
uppercase or lowercase.
The Mac2dos and Dos2mac filters convert text files between Macintosh and DOS
format. They change the end-of-line characters and also convert most upper
ASCII characters, including accented letters. Because many DOS files are not
recognized as text files, I included the file type bina in the bundle resource
for the filters in pipe.r, so even untyped DOS files should be accepted by the
Dos2mac filter. 
The Datascope filter does no processing, but simply shows the data that passes
through it in a window. This allows you to peek at the data as it is being
processed. The Unique filter removes consecutive duplicate lines from the
input.
The Sort filter, which sorts its input by line, is the most complex of those
I've described here. Sort supports several parameters. By adding the
"descending" (or "desc" or "d") parameter, Sort will sort in descending order
instead of the default ascending order. To sort on a specific range of
characters, add a "column" option such as "column 3-10" to sort on character
positions 3 to 10. If the input consists of tabbed text, you can use the
"field" option. "Sort field 2-3" sorts on fields 2 and 3. Tabbed text files
are created by most spreadsheet and database programs when you export to text
format. Because Sort stores the complete file in memory while sorting, it
needs a fairly large application size, unlike most of the other filters, which
can live with 100K or less. Sort does not recognize comma-separated
parameters, but it could easily be extended to do so.


Compiling the Examples


The example filters can be compiled with either Symantec C++ 7.0 or MPW. An
MPW makefile pipe.make and an MPW script makeall are included for making the
included example filters. MPW is handy for compiling a set of filters with a
single command; with the Symantec IDE you have to compile each filter
separately.
Because the console emulator of Symantec C++ suited my needs better than the
corresponding MPW SIOW (Simple Input Output Window) library for showing data
being processed, the Datascope filter must be compiled using Symantec C++ 7.0.
It needs a resource file datascope..rsrc, which is just a compiled form of
the resource source file pipe.r. If you modify pipe.r, you can use the
pipe.make makefile to recreate datascope..rsrc. To remove some MPW-specific
lines from the source, I added a line #define __THINKC__1 to the
Edit-Options-Think C-Prefix settings in the project datascope..
While compiling with MPW, you will see a few warnings about parameters being
unused. These warnings are harmless.


Future Plans


I plan to extend this filter package in several ways. One improvement will be
to rewrite pipe.c to remove a few old Toolbox calls and compile PowerPC-native
filters. I would also like to improve the handling of parameters. In
particular, the 31-character limit on file names is too short for some
applications. It would be nice to store parameters in a separate file and just
specify the file name in the application name, such as "sort @settings" to
instruct Sort to read the settings text file for more parameters.
I also have many ideas for new filters:
Monitor will monitor a folder for dropped documents. Each document that is
dropped in the folder will be sent through the pipe. This allows you to create
server pipes, so you can just drop data in a folder to process it.
Serial will send the data it receives to one of the serial ports, where you
can connect a printer, a plotter, or another peripheral. This will allow you
to create serial spoolers.
Transmit and Receive will be a pair of filters that allow you to send data
over the network from one Macintosh to another for further processing. A
system could be set up where a single transmitter could send to multiple
receivers, distributing a workload over several Macintoshes.
Transmit TCP and Receive TCP. These will do the same as the aforementioned
items, but to and from a UNIX machine. They will allow you to connect UNIX
pipes to Macintosh pipes and vice versa.
There are also many UNIX filters that could be implemented using this
framework. If you would like to use my programs in your own projects, you are
free to do so. I do ask, however, that my name and involvement be mentioned on
startup screens and in manuals.

Listing One
/* copy.c -- The generic version of processchar that should be used as 
* a base for new pipes. Contains the routine ProcessChar(int chr). 
* ProcessChar should call PutChar(chr) after or while processing incoming 
* characters. At the end of the file, there is an extra call to ProcessChar 
* with chr == cEOF to signal EOF. chr an unsigned char in the range 0 to 255 
* when calling ProcessChar with a valid character.
* There is no harm done by calling PutChar(chr) with chr == cEOF.
*/
#ifndef _processchar_____LINEEND____
#include "processchar.h"
#endif
/****************************************************************************
* Empty filter. Use as a starter for new filters
*/
void ProcessChar(int chr)
 {
 if (chr != cEOF)
 {
 PutChar(chr);
 } 
 } 
/* ************************************************************************ */
/* This routine will be called at startup with the name of the application
* You can set some of your global variables to influence ProcessChar.
* All characters are converted to lowercase before the call.
*/
void AcceptAppNam(char *appNam)
 {
 }

/* ************************************************************************ */
/* This routine is called for each command-line parameter encountered. You 
* should test the parameters and return TRUE if the parameter needs a value, 
* FALSE if not. All characters are converted to lowercase before the call.
* Example: sort ascending column 3-5
* ParamNeedsValue("ascending") should return FALSE
* ParamNeedsValue("column") should return TRUE
* You can use if (strcmp(param,"name") == 0) for comparisons...
*/
int ParamNeedsValue(char *param)
 {
 return(FALSE);
 }
/* ************************************************************************ */
/* Called for each value-less command-line parameter encountered.
* All characters are converted to lowercase before the call.
* Example: sort ascending column 3-5
* AcceptParameter("ascending") is called
* You can use if (strcmp(param,"name") == 0) for comparisons...
*/
void AcceptParameter(char *param)
 {
 }
/* *************************************************************************
*/
/* Called for a single-valued command-line parameter. All characters are 
* converted to lowercase before the call.
* Example: sort ascending field 3,4
* AcceptParameterValue("field","3") and AcceptParameterValue("field","4") 
* are called.
* You can use if (strcmp(param,"name") == 0) for comparisons...
* Use atoi() for converting numeric strings into integers.
*/
void AcceptParameterValue(char *param,char *value)
 {
 AcceptParameterFromTo(param,value,value);
 }
/* *************************************************************************
*/
/* This routine is called for a two-valued command-line parameter.
* All characters are converted to lowercase before the call.
* Example: sort ascending field 3-4,5-7
* AcceptParameterFromTo("field","3","4") and 
* AcceptParameterFromTo("field","5","7") are called
* You can use if (strcmp(param,"name") == 0) for comparisons...
* Use atoi() for converting numeric strings into integers.
*/
void AcceptParameterFromTo(char *param,char *valueFrom,char *valueTo)
 {
 }

Listing Two
/* sort.c -- Store input file in unbalanced binary tree. Sort criterium can 
* either be complete lines, range of character columns, or range of tabbed 
* text fields. 
*/
#ifndef _processchar_____LINEEND____
#include "processchar.h"
#endif
/* *************************************************************************
* SORT filter

*/
typedef struct SLine
 {
 char *line;
 int from;
 int to;
 struct SLine *pSmaller;
 struct SLine *pGreater;
 } TLine;
typedef TLine *TpLine;
/* Some globals */
TpLine gpTreeLine = NULL;
char gLine[cMaxStrLen+1];
char *gpLine = gLine;
int gLen = 0;
int gFromPos = 0;
int gToPos = 0;
int gFieldCount = 1;
/* Parameters */
int gAscending = TRUE;
int gCharFromPos = 1;
int gCharToPos = cMaxStrLen;
int gFromField = 1;
int gToField = cMaxStrLen;
void AddLine(char *line, int len, int from, int to)
 {
 TpLine pLine;
 TpLine pPrvLine;
 int direction;
 pLine = gpTreeLine;
 pPrvLine = NULL;
 direction = 0;
 while (pLine != NULL)
 {
 char *pStr1;
 char *pStr2;
 char c1, c2;
 pPrvLine = pLine;
 pStr1 = pLine->line + pLine->from - 1;
 c1 = *(pLine->line + pLine->to);
 *(pLine->line + pLine->to) = '\0';
 pStr2 = line + from - 1;
 c2 = *(line + to);
 *(line + to) = '\0';
 direction = strcmp(pStr1,pStr2);
 *(pLine->line + pLine->to) = c1;
 *(line + to) = c2;
 if (direction > 0)
 pLine = pLine->pSmaller;
 else
 pLine = pLine->pGreater;
 }
 pLine = (TpLine) NewPtr(sizeof(TLine));
 if (pLine != NULL)
 {
 pLine->pSmaller = NULL;
 pLine->pGreater = NULL;
 pLine->from = from;
 pLine->to = to;

 pLine->line = (TpChr) NewPtr(len+1);
 if (pLine->line != NULL)
 strcpy(pLine->line,line);
 else
 {
 DisposePtr((Ptr) pLine);
 pLine = NULL;
 }
 }
 if (pLine != NULL)
 { 
 if (pPrvLine == NULL)
 gpTreeLine = pLine;
 else
 {
 if (direction > 0)
 pPrvLine->pSmaller = pLine;
 else
 pPrvLine->pGreater = pLine;
 }
 } 
 }
void DumpTree(TpLine pLine)
 {
 if (pLine != NULL)
 {
 if (gAscending)
 DumpTree(pLine->pSmaller);
 else
 DumpTree(pLine->pGreater);
 PutStr(pLine->line);
 PutChar('\n');
 if (gAscending)
 DumpTree(pLine->pGreater);
 else
 DumpTree(pLine->pSmaller);
 DisposePtr((Ptr) pLine->line);
 DisposePtr((Ptr) pLine);
 }
 }
void ProcessChar(int chr)
 {
 if (chr == '\n' (chr == cEOF && gLen > 0))
 {
 *gpLine = '\0';
 if (gFromPos == 0) 
 {
 gToPos = 0;
 gFromPos = 1;
 }
 AddLine(gLine,gLen,gFromPos,gToPos);
 gpLine = gLine;
 gLen = 0;
 gFromPos = 0;
 gToPos = 0;
 gFieldCount = 1;
 }
 else if (chr != cEOF)
 {

 if (gLen < cMaxStrLen)
 *(gpLine++) = chr;
 gLen++;
 if (gFromPos == 0)
 {
 if (gFromField <= gFieldCount && gCharFromPos <= gLen)
 gFromPos = gLen;
 }
 if (gFieldCount <= gToField && gLen <= gCharToPos)
 gToPos = gLen;
 if (chr == '\t')
 gFieldCount++;
 }
 if (chr == cEOF)
 {
 DumpTree(gpTreeLine);
 gpTreeLine = NULL;
 }
 } 
/* *************************************************************************
*/
/* This routine will be called at startup with the name of the application
* You can set some of your global variables to influence ProcessChar.
* All characters are converted to lowercase before the call.
*/
void AcceptAppNam(TpChr appNam)
 {
 }
/* *************************************************************************
*/
/* Called for each command line parameter encountered. You should
* test parameters and return TRUE if parameter needs a value, FALSE if not.
* All characters are converted to lowercase before the call.
* Example: sort ascending column 3-5
* ParamNeedsValue("ascending") should return FALSE
* ParamNeedsValue("column") should return TRUE
*/
int ParamNeedsValue(char *param)
 {
 return(strcmp(param,"column") == 0 strcmp(param,"field") == 0
 strcmp(param,"col") == 0 strcmp(param,"fld") == 0
 strcmp(param,"c") == 0 strcmp(param,"f") == 0);
 }
/* *************************************************************************
*/
/* Called for each value-less command-line parameter encountered.
* All characters are converted to lowercase before the call.
* Example: sort ascending column 3-5
* AcceptParameter("ascending") is called
*/
void AcceptParameter(char *param)
 {
 if (strcmp(param,"ascending") == 0 
 strcmp(param,"asc") == 0 
 strcmp(param,"a") == 0)
 gAscending = TRUE;
 else if (strcmp(param,"descending") == 0 
 strcmp(param,"desc") == 0
 strcmp(param,"d") == 0)
 gAscending = FALSE;
 }
/* *************************************************************************
*/

/* This routine is called for a single valued command line parameter.
* All characters are converted to lowercase before the call.
* Example: sort ascending field 3
* AcceptParameterValue("field","3") is called
*/
void AcceptParameterValue(char *param,char *value)
 {
 AcceptParameterFromTo(param,value,value);
 }
/* *************************************************************************
*/
/* This routine is called for a two-valued command-line parameter.
* All characters are converted to lowercase before the call.
* Example: sort ascending field 3-4
* AcceptParameterFromTo("field","3","4") is called
*/
void AcceptParameterFromTo(char *param,char *valueFrom, char *valueTo)
 {
 if (strcmp(param,"field") == 0 
 strcmp(param,"fld") == 0 
 strcmp(param,"f") == 0)
 {
 gFromField = atoi(valueFrom);
 gToField = atoi(valueTo);
 }
 else if (strcmp(param,"column") == 0 
 strcmp(param,"col") == 0 
 strcmp(param,"c") == 0)
 {
 gCharFromPos = atoi(valueFrom);
 gCharToPos = atoi(valueTo);
 }
 }DOS SERIAL NETWORK































Examining VxD Service Hooking


Monitoring, altering, or otherwise changing parts of Windows




Mark Russinovich and Bryce Cogswell


Mark is a developer at NuMega Technologies, and Bryce is a researcher in the
computer-science department at the University of Oregon. Mark can be reached
at markr@numega.com, and Bryce can be reached at cogswell@cs.uoregon.edu.


One reason DOS ruled the desktop is that programmers could easily augment and
even change its behavior. A popular way of doing this was to write a terminate
and stay resident (TSR) program that would intercept application requests for
various DOS services and modify the requests before passing them along to DOS,
or even service the requests directly. Thus, even without access to DOS source
code, anybody with a C compiler or assembler could create their own flavor of
DOS.
Windows 3.1 and Windows 95 also make their internal services visible to any
program written to take advantage of them. The heart of Windows consists of a
few dozen virtual devices (VxDs) that together create a substrate upon which
the more-familiar USER.EXE, KERNEL.EXE, and GDI.EXE run. Sometimes VxDs manage
the underlying hardware privately, but in most cases, VxDs export services
that other VxDs, including the Windows kernel VxD, VMM, can call. Windows is
designed so that a client requesting a service provided by a VxD is
dynamically connected with the service provider. In other words, a client VxD
can request a service from another VxD without knowing the location of that
service location in memory.
Dynamic linking is achieved by calling services indirectly, through a jump
table associated with each VxD, analogous to the interrupt vector table (IVT)
used for DOS services. This dynamic-linking functionality allows VxDs to hook,
or intercept, the services of other VxDs, much the same as one interrupt
service routine (ISR) could be substituted for another under DOS. The hooking
VxD can examine the request's parameters, modify them, and even replace the
original service's functionality with a new one. This flexibility makes
Windows, like DOS in the old days, fully extensible and replaceable.
In this article, we'll describe the different types of service hooking and
their implementation under Windows 3.1 and Windows 95. We'll then present
VCMon, an application that uses service hooking to monitor some behavioral
aspects of the Windows 95 disk cache (VCACHE), such as cache hit and miss
rates, which cannot be seen by any other means.


Why Hook?


Service hooking falls into three categories: monitoring, altering, and
replacing.
Service monitoring. This is a passive form of hooking where the hooking
routine simply examines the input and output parameters before passing them to
the original service. For example, a serial-port monitoring application can
hook the Windows serial port VxD's (VCOMM) services and record all data sent
and received over a serial line by Windows programs. Monitoring hooks also
allow one to determine when certain system states have been entered, such as a
DOS box becoming a full-screen DOS session or memory resources falling below a
certain threshold.
Service altering. This is less common than monitoring because the clients'
assumptions about the service being performed must be carefully preserved. If
the parameters are altered so that the assumptions are violated, the system
may become unstable. Parameter altering is usually safe and useful in
remapping the keyboard: The VxD service that receives keyboard input is
hooked, and the hook routine modifies the appropriate scan codes before
passing them on to Windows. 
Service replacing. This, too, is rare because the hooking VxD usually must
take over the entire functionality of the VxD whose services it is hooking.
Assumption preservation applies here as well: The hooker must fully understand
the entire specification of the services being replaced. Service hooking has
recently been used in some memory- compression software, including some
best-selling Windows programs. These programs replace the services of the
Windows pagefile VxD with their own versions, which cache compressed paging
data in RAM for faster access.


How Service Hooking Works


All VxDs contain a Device Declaration Block (DDB), a data structure that
contains all the information required by Windows to integrate a VxD into the
Windows kernel; see Figure 1. This includes the device's name, unique numeric
identifier (assigned upon request by Microsoft), and initialization order
number, as well as the address of the device's control entry point and a
pointer to a list of additional entry points (known as the "service table") to
all the services the VxD exports. Windows uses the device identifier and the
service table to dynamically look up and call VxD services.
When programming in assembly, the programmer codes VxD requests to other VxD
services as an assembly-language macro (VxDCall or VMMCall). This expands to a
software interrupt 20h followed by a 4-byte constant in the VxD's assembly
code. The constant contains the device identifier of the VxD that provides the
requested service in the high word, and the service number, as determined by
the service's index in the service table, in the low word. For instance, a
call to the Windows PageFree service looks to you like Example 1(a), but is
translated into the code in Example 2(a). If programming in C, the programmer
codes a library call as shown in Example 1(b), which results in the invocation
of the previous code.
When the Windows kernel executes an interrupt 20h, it looks at the constant
following it and searches its table of registered VxDs for one with a matching
identifier. If it finds one, it indexes into the VxD's service table to find
the address of the requested service. For the sake of efficiency in future
executions of the code that generated the trap, it replaces the int
20h/constant combination with an indirect call instruction that jumps through
the target VxD's service table. Both forms require exactly six bytes of code,
so the replacement is a perfect fit; see Example 2.
The level of indirection through the service table makes service hooking
possible. Windows 3.1 introduced the basic form of service hooking using a VxD
service in the Windows kernel called Hook_Device_Service. Example 3 presents
the assembly language and equivalent C form (from the VtoolsD VxD C wrapper
library) for a call to this service. The GetVxDServiceOrdinal statement is a
macro that simply takes the name of a service, converts it into its VxD
identifier/service number pair, and stores it into the specified register. The
header files in the DDK define __service macros yielding ((device_id << 16)
service_number) for all services exported by Windows.
When a service is hooked, Windows takes the new hook procedure's address,
places it in the service table, and returns the address originally contained
in the service table to the calling code. From that point on, the hook
procedure becomes the service handler (since all calls are done indirectly
through the table). Therefore, it is at the hook procedure's discretion to
call the routine whose address it received as a result of the hook action. By
the time a VxD hooks a service, other VxDs may have already done so, and the
returned address may be of another hook procedure rather than the original
service. Hooking a service creates a hook chain that starts at the service
table and can extend through hook routines all the way to the original
service, as in Figure 2. Example 4 is a Windows 3.1 version of an example hook
procedure.
Before Windows 95, once a VxD had hooked another VxD's services, it remained
on the service chain forever. There was no way to unhook a service because
Windows couldn't determine where the next hooker stored its
"previous-on-the-chain" address in order to restore it (the analogy to
unhooking an ISR under DOS should be apparent). To solve this problem, Windows
95 introduced a special hook-procedure definition that lets Windows determine
the location of the procedure next on the chain following the VxD. 
To take advantage of Windows 95's unhook capability, a hook procedure must be
defined as type HookProc (if a C wrapper library designed for Windows 95 is
being used, this is taken care of by the run-time library); see Example 5.
When this code is assembled, a special header at the front of the routine is
created for unhook support, as in Example 6. The first jump instruction and
the opcode of the following jump in the generated code form a 4-byte constant
(for example, a signature) since the jump is relative. The Hook_Device_Service
routine checks for this constant. If it finds the constant,
Hook_Device_Service treats the procedure as a Windows 95-friendly hook
procedure. This allows it to store the address of the variable that is to
receive the previous hooker's address as the target of the second jump
instruction. These instructions are never executed; they only exist as part of
the header structure.
When a service is unhooked, Windows traverses the service chain down to the
procedure being removed by following the pointers in each hooker's header.
Windows splices the procedure out of the chain by moving the procedure's Prev-
InChain pointer into the PrevInChain pointer of the hooker at which it pointed
or into the service-table entry. Unhooking is demonstrated in Figure 3, and
unhooking example code is shown in Example 7. In Figure 3, the middle hook
procedure is being removed from the service chain. The next hooker (to the
left) in the chain has its previous pointer, pictured with scissors cutting
it, modified to point at the same location as did the removed procedure. In
this case, that location is the original service.
If any service on the chain between the service-table entry and the hook
procedure being removed does not comply with the Windows 95 hook-procedure
definition, the unhook will fail and the VxD must support the hook procedure
until Windows exits.


VCMon: Service Hooking in Action


VCMon is an application that demonstrates interesting aspects of service
hooking. (The entire VCMon system is available electronically; see
"Availability," page 3.) It is made up of a Win32 GUI and a VxD that together
monitor and display information about the behavior of VCACHE, the Windows 95
disk cache VxD. System Monitor (under "Start/ Programs/Accessories/System
Tools" if you installed it during Windows 95 setup) can show the current size
of VCACHE, but nothing else, whereas VCMon is able to monitor much more by
utilizing service hooking; see Figure 4.
VCMon hooks several VCACHE services, including one that does a cache lookup to
see if a block associated with a designated disk buffer is present (a cache
hit) or not (a cache miss). The hook procedure for the routine converts
certain types of requests into two separate requests with modified parameters.
This allows VCMon to distinguish cache hits from misses. Converting one
request into two is an example of modifying a service to do more than was
originally intended. When VCMon determines the result of a request, it updates
a statistics data structure that is shared with the GUI.
Another service that is hooked is the service that Windows calls to see if
VCACHE wants to report any misses or hits. This information, in conjunction
with the current amount of free memory and page fault rate, is used by Windows
to determine if VCACHE should be told to shrink or grow. For informational
purposes, VCMon also hooks the pagefile VxD's read/write routine, which is
invoked whenever there is a page fault.
The VCMon GUI communicates with the VxD via DeviceIOControl calls where, at a
rate that can be specified through the Rate button on the GUI, the VxD wakes
up a thread in the GUI, telling it that new statistics are available in the
shared-statistics data structure. The GUI displays the statistics as
accumulated numbers in the dialog and, if a check box associated with an entry
is toggled, as a graph. The Reset button clears all the accumulated statistics
and sets all open graphs to start drawing at their left-hand sides.
While the VCMon VxD is very short, it provides information about Windows'
inner workings that is not otherwise available. With little additional effort,
VCMon can be extended to hook and display statistics about any VxD service.


Conclusion


VxD service hooking is not useful for everyone, just as DOS TSRs were not, but
the ability to monitor, alter, or change parts of Windows means that
enhancements are limited only by imagination and programming ability.

For more information: The Windows 95 DDK. See VMM.HLP\ Virtual Machine
Manager\Miscellaneous Services for documentation on service hooking and
STDVXD.HLP\Virtual File Cache Services for information on VCACHE.
Figure 1: The DDB and service table.
Figure 2: A hook chain.
Figure 3: Unhooking.
Figure 4: Running the VCMon program.
Example 1: Programmer's view of a VxD call. (a) Assembly version; (b) C
version.
(a)
VMMcall _PageFree, <hMem, flags>

(b)
PageFree(ULONG hMem, ULONG flags);
Example 2: Generated code for a VxD call. (a) Before first execution; (b)
after first execution.
(a)
int 20h ; VxD service trap
dd 1055h ; hi word == VMM.VXD id, lo word == service #

(b)
call [VMMServiceTable+55h*4] ; indirect through table
Example 3: Hooking a service. (a) C version; (b) assembly version.
(a)
PrevInChain = Hook_Device_Service(
 GetVxDServiceOrdinal(Service),
 HookProc, &thunkHookProc);

(b)
GetVxDServiceOrdinal eax, Service
mov esi, OFFSET32 HookProc ; points to the hook
 ; procedure to install
VMMcall Hook_Device_Service 
mov PrevInChain, esi
Example 4: Windows 3.1 hook procedure.
PrevInChain dd ? ; address of previous service
BeginProc HookProcedure
 call [PrevInChain] ; chain to previous service
 ret
 EndProc HookProcedure
Example 5: Windows 95 hook procedure.
PrevInChain dd ?
BeginProc HookProcedure, Hook_Proc PrevInChain
 Call PrevInChain
 ret
EndProc HookProcedure
Example 6: Windows 95 generated hook-procedure header.
 jmp HookProcedure ; signature
 jmp [PrevInChain] ; previous-in-chain ptr
HookProcedure:
 ...
Example 7: Unhooking a service. (a) C version; (b) assembly version.
(a)
Unhook_Device_Service(
 GetVxDServiceOrdinal(Service),
 HookProc,
 &thunkHookProc );

(b)
GetVxDServiceOrdinal eax, Service
mov esi, offset32 HookProc
VMMCall Unhook_Device_Service

































































Building a DOS Serial Network


More functionality than DOS's Interlnk




Kyle York


Kyle is a programmer for TGV and can be contacted at noesis@ucscb.ucsc.edu.


MS-DOS 6's Interlnk program enables simple resource sharing by allowing you to
connect two PCs--a client and a server--via their serial or parallel ports. In
this way, you can use one PC to run programs and access data that resides on
another. Unfortunately, the server PC must be dedicated, cannot run Windows,
and does not allow remote access via the modem. Consequently, I ended up
building my own client/server MS-DOS resource-sharing package that supports
remote logins, security, and other features. Only three pieces of software are
needed to build this utility--a serial port driver, basic networking, and an
MS-DOS communication layer--all of which are available in various forms. Once
I brought these pieces together, my job was relatively easy.
Still, the package (available electronically in both source and executable
form; see "Availability," page 3) is approximately 6000 lines of C and 3000
lines of assembly code. I wrote most of this package in C for readability.
However, there are certain functions that are simply easier to write in
assembly stubs. Chaining to another interrupt is easily accomplished in
assembly, for instance, and is much faster than any C function. Also, the
CRC32 routine was written in assembly for speed: A 4.77-MHz 8088 can process
25K/sec in the assembly-language CRC routine. This was a five-fold increase
over my best effort in straight C. The code/algorithms were adapted from
information gleaned from the comp.compression FAQ. The original algorithm is
attributed to Rob Warnock (rpw3@ sgi.com).


MS-DOS Internals


For all of its shortcomings, MS-DOS is not horribly laid out. Specifically,
all important data and structures are contained in an area of contiguous data,
roughly 2KB long, known as the "swappable data area" (SDA). This area contains
information such as the process ID of the currently running process, any
errors encountered, the current disk-transfer address, and so on. This layout
is good because all you need to do to change the context of MS-DOS is swap the
contents of this area for each active task, making MS-DOS reentrant and
multitasking. Luckily, only a handful of data is needed to successfully
multitask and run the installable file system. These are all noted in the
dosdata.c module of my package.
This SDA has remained basically unchanged since MS-DOS 3.2, with the exception
of MS-DOS 4.x, where Microsoft attempted to build multitasking into the MS-DOS
kernel. Because of bugs and incompatibilities, this kernel was abandoned with
the release of MS-DOS 5.0. This is only worth mentioning because my DOSRIFS
CLIENT module (and the multitasking version of the SERVER module) is not
compatible with MS-DOS 4.x.
Since MS-DOS was not designed to be reentrant or multitasking, there are a few
precautions you must take before swapping the SDA. Certain functions of the
BIOS must be protected; for instance, you don't want to interrupt a pending
disk operation. The file _server.asm provides all of the necessary checks to
keep MS-DOS and BIOS functioning correctly during a task switch.


MS-DOS Redirector Interface


The most difficult part of the project involved communicating with MS-DOS.
MS-DOS 3.1 included the network-redirector interface attached to Int 0x2f,
function 0x11. Although this is often referred to as the MSCDEX API and
associated with CD-ROMs, it is actually an installable file-system API that
can be used as a translation layer to allow access to any foreign file system.
Using the network redirector requires knowledge of many of the MS-DOS
internals: the list of lists (LOL), SDA (also known as the "MS-DOS data
segment"), current directory structure (CDS), and system file table (SFT). All
of the necessary information can be found in Ralf Brown's interrupt list
(available from all major MS-DOS FTP sites) and in the book, Undocumented DOS,
by Andrew Schulman et al. (Addison-Wesley, 1993). The 22 installable file
system (IFS) functions are laid out in a seemingly random order. To simplify
things, I grouped them (see Table 1).
Also note that there is no single way to determine whether a particular call
should be intercepted. All loaded drivers in the chain receive all requests.
If a driver doesn't process a request, it is supposed to chain to the next
driver. The logical drives (A:, B:, C:, and so on) are processed by the last
driver in the chain. To determine that a call should be intercepted, you need
to follow two rules:
For maintenance commands, the MS-DOS CDS pointer will point to the drive for
which the maintenance is to be performed, so the driver need only check the
first three characters in the path name for x:\ where x is the drive you are
intercepting.
For file I/O commands, ES:DI points to the file information in the system file
table. The device on which the file was opened is in the low 6 bits of the
dev_info field, numbered 1...26.


Client Module


All IFS dispatching occurs in clientDispatch. The functions in _dispatchTable
are laid out based on the subfunction number. The table contains the address
of the function that handles this request, the type of check necessary to
determine if the request is for a RIFS device, and the RIFS function number
(for the server).
If a function does not exist in the table (Null entry), or the call is not
directed to a RIFS device, the driver restores the state of all the registers
and chains to the next driver.
To conserve memory, each packet is retrieved from a common packet queue. If a
packet cannot be found within 30 seconds, a General Failure error is returned.
If the packet is found, the necessary fields are initialized, and the packet
is sent to the correct function for further processing.
Each function packages the required parameters (Table 2), transmits the
request, and waits for a reply. To enable retrying a command, the client
module maintains two active packets: one send and one receive.
The most complex functions are file read and file write, because the length of
the requested operation can exceed the packet size. To circumvent this, I
simply break all requests into 1024-byte (or smaller) pieces and process them
one at a time.
Finally, any active packets are released, and the correct status information
is returned to MS-DOS.


Packeting


Packet processing is carried out in dispatch.c (see Listing One). This makes
it easy to change the algorithms and protocols without looking at any other
code. The two packeting functions are Transmit and Receive. A packet contains
the following packet-specific fields:
ID, two bytes identifying the destination. I used AY to identify packets
headed for the server, and LY to identify packets headed toward the client.
length, the total number of bytes in the packet, including the header.
notLength, the 1's complement of the length, for error checking
CRC32, the CRC 32 value calculated across all data, plus the header (for the
calculation, the CRC32 field is set to 0).
The remainder of the fields are data fields. To transmit a packet, you supply
a packet, with the length field set to just the length of the variable-sized
data buffer. The Transmit function adds the length of the header, calculates
the notLength and CRC32 fields, and sends the packet.
Receive is somewhat more interesting as it is done byte-by-byte, using a
simple state machine; see Table 3. If any state fails the validity check, the
process begins again. The packets are assumed to come in blocks. If any byte
is not received within about half a second after the last byte, the state
times out and returns to 0.
After a full packet is received, it is submitted to either the client or the
server for further processing.



Server Module


The server module is easier to implement than the client, and it can also be
made portable to any environment (OS/2, Window NT, Windows 95, UNIX, and so
on) with a little effort.
All server processing occurs in _serverDispatch, which is called by the
assembly routine _serverTest in _SERVER.ASM. The _serverTest checks for the
following conditions: DOS can be interrupted; the program is not trying to
interrupt itself; BIOS can be interrupted; and at least one server request is
pending.
Once at _serverDispatch, the MS-DOS data area is swapped with the saved
version, and the various interrupts are set to prevent accidental
interruption: Ctrl-Break handling is simply ignored, and the Abort, Retry,
Fail? command is set to always return Fail.
Next, a check is made that a valid function number was used and a valid
connection exists. Again, the connection can fail without warning or a correct
connection might never have been made (invalid password given during login).
Finally, the packet is sent off for processing, and transmitted back to the
client. The server never resends a packet, so the same one is reused when
packaging the result.
When running under MS-DOS, the server runs in the background, interrupting
only when necessary to process a packet. Unfortunately, this makes the server
very system (and version) dependent.


Security


I've implemented two levels of security, login and share. When a client first
establishes a connection with the server, it sends a password. If the password
matches with the server, a user validation code is returned to the client with
a result code of 0; otherwise, the result code ACCESS_ DENIED is returned and
the port is reset. This user-validation code must be present in all packets
sent to the server.
In addition, there is share-level security. When a drive is shared by the
server, the following security flags are allowed in addition to a password:
directory search, directory create, directory remove, file create, file
remove, file read, and file write. To connect to a shared drive, the client
sends an IFS_SHAREREQUEST message with the share name and password. If the
server validates the password, it returns a verification number along with a
result code of 0. This verification number must be present in the
shareValidation field for any request to the shared drive, or the result code
will be set to ACCESS_DENIED.


Parallel-Port Sharing


Port sharing is accomplished by intercepting interrupt 0x17 (the parallel port
I/O) and routing the requests to the server. Due to packet overhead, sending
one byte at a time is inefficient, so a packet is sent only when it becomes
full (1024 bytes), a time-out is reached, or function 0x01 ("initialize
parallel port") is called. All of these functions return a status of printer
on line and printer not busy.
When the server receives a packet to be routed to a port, it sends the data to
the printer using a simple loop. For this reason, you should run a print
spooler on any server that shares its parallel ports.
Serial ports are currently not shareable. The interface exists, but since the
BIOS routines do single-character I/O, the overhead would quickly eat up any
savings. The data could be buffered in both directions, but I have not
implemented it.


Communication


The communications interface is handled through the IO_BASE structure. This
allows a very generic interface from which the specialized device drivers can
be derived. Currently, only three such drivers are provided:
IO_LOOP, the debugging driver that simply echoes all characters received at
the Receive(...) function in dispatch.c. Since this guarantees no
communication errors, it is useful for debugging the packeting and
packet-processing functions. It also sets a baseline from which the speed of
connections can be measured.
IO_SERIAL, derived from a simple asynchronous library, provides serial port
I/O with a throughput of up to 115,200 bps when using a null modem and a
16550a UART. Since interrupt-driven serial processing has been discussed at
great length, I will not go into detail here. The code is fairly well
documented and complete. I had no luck getting the transmitting code to work
reliably in an interrupt-driven state, so sending packets is done in a loop,
pumping out data as fast as the UART will accept it. Since the packet size is
around 1060 bytes, this loop will require a maximum of less than 1 second at
14.4 Kbits/sec. (or about 8 seconds at 1200 bps). Most modems have built-in
buffers for compression and error correction, so I would not expect any
noticeable delay to occur in the server, even on slow machines.
IO_PARALLEL, which provides a parallel-port interface with a maximum
throughput of around 512 Kbits/sec. The parallel driver is interrupt driven,
but because of the high overhead of interrupting for each character, the
characters are sent in packet buffers of less than 2KB, preceded by a length
and check code.


Configuration


The configuration files are simple ASCII text files that list various
variables and values to associate with them.


Stubs


The stub files clstub and svrstub eliminate the client and server code,
respectively. This is useful for machines that do not want or need to share
resources and for machines running non-MS-DOS operating systems (where the
server does not run in the background, but rather in a loop).


Running the Driver


Table 4 lists the functions available via the user interrupt. A sample
application (pmap, available electronically in both source and executable
format) exercises all of the user functions via a simple command-line
interface.


Endless Possibilities


The exploration into basic client/server packaging will come in handy in
future products. While the following tips are common knowledge to most
professional programmers, it is good to occasionally be reminded of them:

Develop the client and server simultaneously. This is immensely important in
testing because the communication layer is eliminated as a source of errors.
Begin with a well-defined, robust, and expandable messaging system. Attention
to the details of each function is vitally important here, because any change
must be propagated through both the client and server.
Keep the transmit/receive mechanism completely independent of all other code.
Now that I have a working IFS stub, I will be able to port my encrypted file
system to DOS. Actually, with this stub, it is possible to port any file
system to DOS, allowing endless possibilities. It is surprising that in the
past 15 years, no one has mass marketed an encrypted file system for DOS. The
code for encryption is freely available, as is the code for several different
file systems, and now at least one fully functional IFS for MS-DOS is
available. With all the concern about security, encrypted file systems will
probably become commonplace in the future.
Table 1: IFS functions.
 Directory Maintenance
 0x01 Remove Directory
 0x03 Make Directory
 0x05 Change Directory
 Directory Searching
 0x1b Find first
 0x1c Find next
 File Maintenance
 0x0e set file attributes
 0x0f get file attributes
 0x11 rename
 0x13 delete
 0x16 open existing
 0x17 create/truncate
 0x2e extended open
 File I/O
 0x06 close
 0x07 commit
 0x08 read
 0x09 write
 0x0a lock
 0x0b unlock
 0x21 seek from end
 Miscellaneous
 0x0c get disk space
 0x20 flush all disk buffers
 0x23 process termination
Table 2: IFS messages.
IFS Messages Request Response
IFS_RMDIR ASCIIz fully qualified directory Error returned by DOS
 to remove
IFS_MKDIR ASCIIz fully qualified directory Error returned by DOS
 to create
IFS_CHDIR: ASCIIz fully qualified directory Error returned by DOS
 to change to
IFS_FINDFIRST WORD -- attribute Error returned by DOS
 ASCIIz -- file mask struct ffblk
IFS_FINDNEXT struct ffblk (from findfirst) Error returned by DOS
 struct ffblk
IFS_SETATTR WORD -- new file attribute Error returned by DOS
 ASCIIz-- fully qualified file name
IFS_GETATTR ASCIIz-- fully qualified file name Error returned by DOS
 WORD-- file attributes
 DWORD-- file length
IFS_RENAMEFILE ASCIIz-- old filename Error returned by DOS
 ASCIIz-- new filename
IFS_DELETEFILE ASCIIz-- filename to delete Error returned by DOS
IFS_OPENFILE WORD-- open mode Error returned by DOS
 ASCIIz-- file name WORD-- handle
 WORD-- attribute
 WORD-- file time
 WORD-- file date
 DWORD-- file length
IFS_CREATEFILE WORD-- create mode Error returned by DOS

 ASCIIz-- file name WORD-- handle
 WORD-- attribute
 WORD-- file time
 WORD-- file date
 DWORD-- file length
IFS_EXTOPEN WORD-- action Error returned by DOS
 WORD-- open mode WORD-- handle
 WORD-- create attributes WORD-- status
 (0=opened,
 1=created,
 2=replaced)
 ASCIIz-- filename WORD-- file time
 WORD-- file date
 DWORD-- file length
IFS_CLOSEFILE: WORD-- handle of file to close Error returned by DOS
 struct ftime-- time
 from client SFT
IFS_COMMITFILE WORD-- handle of file to commit 0
IFS_READFILE WORD-- handle of file from which
 to read Error returned by DOS
 DWORD-- offset into file from WORD-- number of
 which to begin bytes read
 WORD-- number of bytes to read BYTE[]-- bytes read
IFS_WRITEFILE WORD-- handle of file to which to
 write Error returned by DOS
 DWORD-- offset into file from WORD-- number of bytes
 which to begin written
 WORD-- number of bytes to write
 BYTES[]-- data
IFS_LOCKFILE WORD-- handle of file to lock or
 unlock Error returned by DOS
 WORD-- function (0=lock, 1=unlock)
 DWORD-- offset into file from which to begin
 DWORD-- number of bytes to lock
IFS_GETSPACE BYTE-- logical drive number Error returned by DOS
 (0 = A:, 1 = B:, ...) struct dfree
IFS_CLOSEALL none None
IFS_PORTOUT WORD-- port # 0...2 0x9000
 WORD-- number of bytes
 BYTE[]-- data
IFS_LOGIN ASCIIz passWORD 0x00 validated
 0x05 invalid password
IFS_SHAREREQUEST ASCIIz-- share name 0x0000 (OK); 0x0005
 (access denied)
 ASCIIz-- passWORD WORD-- access bits
IFS_LIST WORD-- first share name to 0x0000 (OK); 0x0012
 retrieve (no more shares)
 WORD-- number of share
 names returned
 WORD-- number of bytes
 in buffer
 BYTES[]-- share data
 WORD (access)
 ASCIIz name
 ASCIIz path
 ASCIIz comments
Table 3: Receiving a packet.
 State Description
 0 Nothing found yet

 1,2 Looking for ID (AY or LY)
 3,4 Looking for length
 5,6 Looking for notLength
 > 6 Waiting for remainder of the packet
Table 4: User functions.
USER_LOAD_CHECK
 entry: none
 exit: URESULT_OK; URESULT_NOT_LOADED
USER_GET_STATS
 entry: p1 = port #
 p4 = ^status buffer
 exit: URESULT_OK; URESULT_PORT_NOT_OPEN; URESULT_INVALID_PORT
USER_RESET
 entry: p1 = port #
 exit: URESULT_OK; URESULT_PORT_NOT_OPEN; URESULT_INVALID_PORT
USER_UNLOAD
 entry: p4 = ^segment buffer (2 bytes)
 exit: URESULT_OK; URESULT_NOT_LOADED
USER_CONNECT
 entry: p1 = port #
 p2 = 0 (disconnect), 1 (connect), 2 (connect/force)
 p4 = ASCIIz password + ASCIIz timeout + ASCIIz retry count
 exit: URESULT_OK; URESULT_PORT_NOT_OPEN; URESULT_INVALID_PORT;
 URESULT_BAD_PASSWORD; URESULT_IN_USE
USER_DRIVE_MAP
 entry: p1 = local drive ('a'..'z')
 p2 = 0 (unmap), 1 (map), 2 (map/force)
 p4 = ASCIIz share name + ASCIIz password
 exit: URESULT_OK; URESULT_DRIVE_INVALID; URESULT_SHARE_NOT_FOUND;
 URESULT_DRIVE_IN_USE; URESULT_NOT_CONNECTED;
 URESULT_BAD_PASSWORD; URESULT_MISMATCH
USER_DRIVE_MAP_GET
 entry: p1 = local drive ('a'..'z')
 p4 = ^128 byte buffer
 exit: URESULT_OK; URESULT_DRIVE_NOT_MAPPED; URESULT_DRIVE_INVALID
USER_PORT_MAP
 entry: p1 = port (0..2 (lpt1..lpt3))
 p2 = 0 (unmap), 1 (map), 2 (map/force)
 p4 = ASCIIz share name + ASCIIz password
 exit: URESULT_OK; URESULT_SHARE_NOT_FOUND; URESULT_PORT_IN_USE;
 URESULT_NOT_CONNECTED; URESULT_BAD_PASSWORD; URESULT_MISMATCH;
 URESULT_INVALID
USER_PORT_MAP_GET
 entry: p1 = port (0..2)
 p4 = ^128 byte buffer
 exit: URESULT_OK; URESULT_PORT_NOT_MAPPED; URESULT_PORT_INVALID;
USER_LIST
 entry: p1 = first share name to get
 p2 = buffer size
 p4 = ^buffer (buffer holds server name)
 exit: URESULT_OK; URESULT_SHARE_INVALID
 buffer format: WORD number of shares returned
 ...WORD access, ASCIIz name, ASCIIz path, ASCIIz comments,...
USER_SHARE
 entry: p1 = access flags (high bit set if port + port # lower 15 bits)
 p2 = 0 (delete), 1 (add)
 p4 = ASCIIz name + ASCIIz path + ASCIIz password +
 ASCIIz remarks
 exit: URESULT_OK; URESULT_NAME_IN_USE; URESULT_PATH_NOT_FOUND;

 URESULT_SHARE_BUFFER_FULL
USER_SHARE_GET
 entry: p1 = share # to get
 p2 = # of bytes in buffer
 p4 = ^ buffer
 exit: URESULT_OK; URESULT_INVALID_SHARE; URESULT_BUFFER_OVERFLOW
 buffer holds: WORD (access) + ASCIIz name + ASCIIz path +
 ASCIIz remarks
USER_SET_DIRECT
 entry: p1 = port #
 p2 = 1 (lock port) 0 (unlock port)
 exit: URESULT_OK; URESULT_PORT_NOT_OPEN; URESULT_PORT_INVALID
USER_READ_DIRECT
 entry: p1 = port #
 p2 = buffer length
 p4 = ^buffer
 exit: URSULT_OK; URESULT_PORT_NOT_OPEN; URESULT_PORT_INVALID
 buffer = WORD (# of bytes returned) + data
USER_WRITE_DIRECT
 entry: p1 = port #
 p2 = buffer length
 p4 = ^buffer
 exit: URSULT_OK; URESULT_PORT_NOT_OPEN; URESULT_PORT_INVALID

Listing One
#include <stdlib.h>
#include <dos.h>
#include "dispatch.h"
#include "asm.h"
#include "server.h"
#include "client.h"
#include "misc.h"
/* states:
 get byte <-------------+---+
 byte == 'L' or 'K'? <---+-+ 
 
 +---> no ---------->--+ 
 
 packet ID[0] = bytes 
 get byte 
 byte == 'Y' 
 
 +---> no ---------->----+ 
 
 packet ID[1] = byte 
 get word 
 word <= PACKETSIZE 
 
 +---> no ---------->------+
 
 length = word 
 get word 
 word = ~length 
 
 +---> no ---------->------+
 
 get remainder of packet
 dispatch packet
 

 if more than 1/2 sec passes since the last byte, the packet pointer is reset
*/
void Receive(PORT *port)
{
 register int key;
 IO_BASE *io = port->io;
 RCVINFO *rcv = port->rcvInfo;
 if (!port)
 return;
 key = io->readByte(io, 0);
 while (key >= 0) {
 rcv->stats.bytes.rcvd++;
 if (!rcv->packet) {
 rcv->packet = getPacket();
 if (!rcv->packet)
 return;
 rcv->ptr = (void *) rcv->packet;
 rcv->ct = 0;
 } else {
 if (Elapsed(rcv->lastTime) > 180) {
 rcv->stats.errors.timeout++;
 rcv->ct = 0;
 }
 }
 rcv->lastTime = CLOCK;
 rcv->ptr[rcv->ct++] = key;
 switch (rcv->ct) {
 case 2:
 if (key == 'Y')
 break;
 else {
 rcv->stats.errors.packetID++;
 rcv->ct = 1;
 }
 case 1:
 if ((key != 'K') && (key != 'L')) {
 rcv->stats.errors.packetID++;
 rcv->ct = 0;
 } else
 break;
 case 4: /* length has been found */
 if (rcv->packet->length > PACKETSIZE) {
 rcv->stats.errors.length++;
 rcv->ct = 0;
 }
 break;
 case 6: /* ~length has been found */
 if (rcv->packet->length != ~rcv->packet->notLength) {
 rcv->stats.errors.length++;
 rcv->ct = 0;
 }
 break;
 default:
 if (rcv->ct == rcv->packet->length) {
 DWORD oldCRC = rcv->packet->crc32;
 rcv->packet->crc32 = 0;
 if (crc32(0, rcv->packet, rcv->packet->length) != oldCRC) {
 rcv->stats.errors.crc++;
 rcv->ct = 0;

 } else {
 rcv->fwdPacket = rcv->packet;
 rcv->packet = 0;
 if (rcv->fwdPacket->ID[0] == 'K') {
 rcv->stats.server.packetsRcvd++;
 serverSubmit(io->port->server, rcv->fwdPacket);
 } else {
 rcv->stats.client.packetsRcvd++;
 clientSubmit(io->port->client, rcv->fwdPacket);
 }
 }
 }
 }
 key = io->readByte(io, 0);
 }
}
/* send (packet) to (port) */
void Transmit(PORT *port, PACKET *packet)
{
 IO_BASE *io = port->io;
 packet->length += sizeof(*packet);
 packet->notLength = ~packet->length;
 packet->crc32 = 0;
 packet->crc32 = crc32(0, packet, packet->length);
 io->writeString(io, packet, packet->length);
}
RCVINFO *createRcvInfo(void)
{
 return calloc(1, sizeof(RCVINFO));
}
void freeRcvInfo(RCVINFO *rcvInfo)
{
 free(rcvInfo);
}





























Conditional Compilation


Make programs more portable with this simple tool




David Epstein


David is a member of the X3J3 Fortran committee and one of the creators of F,
a new language for educational and professional programming. He can be reached
at david@imagine1.com.


In C, conditional compilation is a preprocessor feature that provides text
substitution, macro expansion, and file inclusion. While often useful, these
features can lead to a number of problems, which contributed to their being
replaced or augmented in C++ (examples are const, inline functions, and
type-safe linkage).
Fortunately, modern Fortran provides safe replacements for these C++ features.
Fortran's PARAMETER provides named constants, PURE or internal procedures
offer the speed benefits of C's macro functions without the risks, and
Fortran's MODULE provides a safe way for one file to use facilities defined in
another. All that's missing is the ability to choose which parts of the
program will be compiled.
The Fortran 90 Conditional Compilation Facility (CCF) provides this missing
functionality. CCF is a line-based language that can easily be adapted to
other programming languages. CCF also is easy to implement; Listing One gives
a simple Fortran CCF processor written in under 200 standard Fortran
statements.
Although Listing One implements CCF as a preprocessor, this isn't mandatory.
In fact, the first implementation of CCF handled conditional compilation as an
integral part of the compiler. The benefits of this approach are improved
speed and consistent error messages. This kind of full CCF processor would use
a scanner, parser, symbol table, and expression analyzer, and most likely
would not be free or readily available on new hardware. 


CCF for Fortran


CCF is based on a subset of regular Fortran, which makes it easy to learn,
implement, and adapt to other languages. To define a CCF for your favorite
programming language, just choose a subset of that language that you'd like to
use for conditional compilation, and a way to distinguish CCF statements from
the regular program statements. With this in hand, you can write a simple CCF
processor using the approach in Listing One.
To keep the Fortran CCF simple, I included only the minimum subset of Fortran
that supports conditional compilation. This required at least two features: an
IF statement, to support the "conditional" aspect, and variables, to control
the conditionals. Combining these minimal requirements with a few usability
statements yields the nine statements in Table 1.
The difference between initializing a CCF variable with a value in the INTEGER
or LOGICAL statements and assigning a value to a CCF variable with the
assignment statement is that the assignment statement value cannot be
overridden with an optional invocation value.
The aforementioned nine statements are distinguished in the source code by the
characters !CCF$ in columns one to five of a line. Since ! starts a comment in
Fortran, all CCF statements look like comments to a Fortran compiler.


CCF Processor Output 


The CCF processor reads the source file, interprets the CCF commands, and
produces one of three different output formats: short, big replace, and big
shift.
The CCF short output contains only the lines that will actually be compiled.
All CCF statements and all lines that are in the FALSE branches of the CCF IF
constructs are deleted. The advantage of this file format is that it's easy to
produce; it's the only format supported by the preprocessor in Listing One.
The CCF big replace output contains the same number of lines as the input
file. The only difference between the big-replace file and the input file are
the lines in the FALSE branch(es) of the CCF IF constructs. These lines are
commented out by placing the characters !CCF* in columns one to five. Because
it has the same number of lines, it's easy to match compiler warnings against
the original file. Also, no lines become longer. However, some of the original
file might be lost if the lines being commented out already had something in
the first five columns.
The CCF big shift is similar, but the lines in the FALSE branches are shifted
to the right five characters, and the string !CCF> is placed in columns one to
five of this new line. Like the big-replace file, the big-shift file makes it
easy to match compiler warnings to the original source. Unlike the big-replace
output, however, big-shift files preserve all of the original program text.
The big-shift and big-replace files make it possible to simply discard the
original file after processing. The processed file can be used as subsequent
input to the CCF processor, which simply recognizes lines beginning with !CCF*
or !CCF> as lines commented by a previous run through CCF. For instance,
Example 1(a) has already been run through a CCF processor to produce code for
DOS. Without any additional editing, it can be reprocessed to produce UNIX
code, as in Example 1(b).
Having only one file helps avoid problems with accidentally changing the wrong
file. With the short output, it may be tempting to make temporary changes to
the processed file, which can cause confusion if these changes don't get
carried into the original source.


CCF Processor-Invocation Options 


The CCF processor must know two pieces of information when it is invoked:
which CCF variables to override, and what type of output file to produce. The
simple CCF processor in Listing One accepts these options from the keyboard or
a batch file.


A Simple CCF Processor


Because CCF statements are a subset of the statements in the target language,
it's relatively easy to use the compiler for your target language to help
process CCF input. Listing One implements a CCF processor by creating a new
program that, when compiled and run, outputs the final program. This involves
simply uncommenting the CCF lines and replacing all other lines with
corresponding PRINT statements. Example 2(a) is sample input to this
algorithm, and the corresponding output is Example 2(b). When Example 2(b) is
compiled and run, it will output Example 2(c)--the final, processed output.
This simple processor doesn't support either the big-shift or big-replace
formats. (For the more complete CCF_95 processor, contact ccf@imagine1.com.)


Other Uses of Conditional Compilation


Although portability is the most popular use of conditional compilation, it
can simplify many other tasks as well. Conditional compilation can be used to
selectively enable debugging code. This allows the program to be compiled with
different levels of debugging enabled, ranging from simple Entering FUNCTION
Foo comments to more extensive checks on complex processes. Example 3 shows
how this might be used. To avoid using the processor on every compile, you can
place !CCF> and !CCF* comments manually.
Because the CCF processor completely ignores all non-CCF lines, you can place
comments or other text in the FALSE branch of a CCF condition. You can even
maintain several different kinds of source code in a single file, such as the
Fortran source and a batch file; suitable CCF options will extract either one.

I've used CCF to test Fortran 90 expression parsers by including CCF
statements to select parts of a large test file. This made it easy to narrow
down a failure to a specific part of the input without manually editing the
file to identify the problem line.


The Future of CCF


CCF has been submitted to X3J3 as a possible addition to the Fortran standard.
The most recent summary (available from ftp://ftp.ncsa.uiuc.edu/x3j3) lists
CCF as a "high priority'' for the next update. Additional information is
available from http://www.fortran.com/fortran/market.html.


Acknowledgments


Many have contributed to the development of CCF, including Karen Barney, John
Ehrman, Dick Weaver, Bruce Pumplin, Mike Dedina, Mark Epstein, Kelly Flanagan,
and Harris Hall. The use of conditional compilation for debugging was
introduced to me by Don Rose. Support for the standardization of CCF has come
from many X3J3 members, but particularly from Jeanne Martin.


References


Epstein, David. "Imagine A Conditional Compilation Facility (CCF)." Fortran
Journal (May/June 1994).
------. "CCF Here and Now." Fortran Journal (September/October 1994).
------. "CCF Is Unlike The Others." Fortran Journal (January/February 1995).
Table 1: CCF statements.
Statement Description
INTEGER Declare and initialize an integer CCF variable.
LOGICAL Declare and initialize a logical CCF variable.
IF Classical conditional statement.
ELSEIF Part of the IF construct.
ELSE Part of the IF construct.
ENDIF Part of the IF construct.
assignment Assign a value to a CCF variable.
PRINT Output to the screen during CCF processing.
STOP Halt during CCF processing.
Example 1: (a) File processed for DOS; (b) file processed for UNIX.
(a)
!CCF$ IF (system == DOS) THEN filename = '\back\slash\eightdot.3'!CCF$ ELSEIF
(system == UNIX) THEN!CCF* filename = '/forward/slash/manydot.many'!CCF$
ELSE!CCF$ STOP 'Set CCF variable "system" to DOS or UNIX'!CCF$ ENDIF

(b)
!CCF$ IF (system == DOS) THEN!CCF* filename = '\back\slash\eightdot.3'!CCF$
ELSEIF (system == UNIX) THEN filename = '/forward/slash/manydot.many'!CCF$
ELSE!CCF$ STOP 'Set CCF var "system" to DOS or UNIX'!CCF$ ENDIF
Example 2: (a) Original file; (b) intermediate file; (c) output file.
(a)
!CCF$ INTEGER :: version_num = 2!CCF$ IF (version_num == 2) THEN max =
1024!CCF$ ELSE max = 512!CCF$ ENDIF

(b)
 INTEGER :: version_num = 2 IF (version_num == 2) THENPRINT *,' max = 1024'
ELSEPRINT *,' max = 512' ENDIF

(c)
max = 1024
Example 3: Using CCF for debugging.
!CCF$ IF (debug_level > 0) THEN
!CCF> PRINT *, 'Entering FUNCTION SnickersBar'
!CCF$ IF (debug_level > 1) THEN
!CCF> PRINT *, 'with argument "foo" = ',foo
!CCF> PRINT *, ' and argument "moo" = ',moo
!CCF$ ENDIF ! (debug_level > 1) THEN
!CCF$ ENDIF ! (debug_level > 0) THEN

Listing One
!******************************************************************************
! Simple CCF Processor

! This code is free. You are also free (do whatever you want with this code).
! David Epstein and imagine1 appreciates any donations (comments, 
! coding suggestions or checks made payable to imagine1).
! e-mail : ccf@imagine1.com
! telephone : +1-503-383-4846
! address : imagine1
! PO Box 250
! Sweet Home, OR 97386
! BASIC IDEA
! Since the Conditional Compilation Facility (CCF) is a subset of Fortran,
! a simple CCF processor is achieved by using a Fortran processor to
! execute the CCF statements.
! STEPS
! 1. Compile this CCF processor with your Fortran processor.
! You now have a CCF processor.
! 2. Invoke this CCF processor and follow the prompts for required input.
! 3. Compile the ccf-temp file (ccftemp.f90 for dos and unix systems).
! 4. Execute the ccf-temp file to create the resulting output file.
! ALGORITHM
! Turn !ccf$ lines into Fortran lines by replacing !ccf$ with blanks.
! Turn all other lines into output statements (include handling of !ccf*
! and !ccf> lines as lines that CCF previously turned into comments).
! Special handling of CCF variables is required due to Fortran requirement
! that the specification-part precedes the execution-part. All CCF variable
! declaration lines are buffered until written to a MODULE. This MODULE
! is also used to handle initialization of CCF variables when the user
! wants to override initial values in the source.
! DESCRIPTION OF OUTPUT
! This simple CCF processor creates a file that consists only of the lines
! that the Fortran processor would see. In other words, all CCF lines
! (those with !ccf$ in columns 1 to 5) will not appear in the created file
! and all lines that are in a FALSE branch of a CCF if-construct will not
! appear in the created file.
! DIFFERENCES BETWEEN THIS CCF AND FULL CCF
! This simple CCF processor differs from the CCF in your Fortran processor
! in a few ways:
! 1) Two files must be maintained because this CCF processor does not
! give the option of creating a file with the same number of lines
! as the input file. A full CCF processor can comment lines in the
! the FALSE branch of a CCF if-construct with a !ccf* or !ccf>. This
! CCF processor will not include CCF lines or the lines in a FALSE
! branch of a CCF if-construct.
! 2) This CCF processor creates two temp files that must be compiled and
! run in order to create the CCF output file. One of these temp files
! is INCLUDEd in the other temp file. A full CCF processor does not
! require creating extra files.
! 3) The CCF PRINT and STOP statements are executed at compile time in
! the CCF in your Fortran processor. In this CCF processor the CCF
! PRINT and STOP statements are executed when the ccf temp file is
! executed. This slightly hinders the usability of these CCF stmts.
! 4) There is a limit on the number of CCF variable declaration lines
! (you may change this limit by changing MAX_CCF_VAR_DECL_LINES).
! 5) Fortran keywords INTEGER and LOGICAL are reserved words for this
! CCF processor (do not name a CCF variable "integer" or "logical").
! 6) Diagnostic checks are not made on the CCF statements. Please stick
! to the CCF language definition (CCF variables do not have "kinds",
! CCF statements do not have labels, use free source form for the
! CCF lines (even if your source is fixed source form), etc.).
!******************************************************************************

program SimpleCcf
implicit NONE
 ! for buffering CCF variable declarations so they can be written to a MODULE
 integer, parameter :: MAX_CCF_VAR_DECL_LINES = 100
 character (len = 132) :: ccf_var_lines(MAX_CCF_VAR_DECL_LINES)
 integer :: num_ccf_var_decl_lines = 0
 ! for storing lines of input
 character (len = 132) :: line
 ! file names (limit of 32 is a random selection and can be changed)
 character (len = 132) :: in_file, & ! Input file to be CCFed
 out_file ! Output file excludes CCF lines and
 ! lines inside FALSE branches
 character (len = 132) :: temp_file1, & ! Middle-step file MAIN
 temp_file2, & ! Middle-step file MODULE
 batch_file ! Option to run CCF batch mode
 ! batch mode for input (input and output filenames and initial values)
 logical :: batch ! TRUE if batch_file exists
!CCF$ integer :: dos = 1
!CCF$ integer :: unix = 2
!CCF$ integer :: your_system = 3 ! Edit here for system other than dos or unix
!CCF$ integer :: system = 1
 ! set filenames
!CCF$ if (system == dos .OR. system == unix) then
 temp_file1 = "ccftemp.f90"
 temp_file2 = "ccfmod.f90"
 batch_file = "ccfbatch.dat"
!CCF$ else ! system other than dos or unix
!CCF$ !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!CCF$ stop 'Please edit here and above for system other than dos or unix'
!CCF$ !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!CCF> temp_file1 = "ccftemp filename for your system"
!CCF> temp_file2 = "ccfmod filename for your system"
!CCF> batch_file = "ccfbatch filename for your system"
!CCF$ endif
 inquire (FILE=batch_file, EXIST=batch)
 if (batch) then
 open (UNIT=7, ACTION="READ", &
 STATUS="OLD", POSITION="REWIND", ERR=101, FILE=batch_file)
 read (UNIT=7, FMT="(A)", END=102) in_file
 read (UNIT=7, FMT="(A)", END=102) out_file
 else
 print *, "Enter the name of the file to be CCFed:"
 read *, in_file
 print *, "Enter the name you want for the output file:"
 read *, out_file
 endif
 open (UNIT=8, ACTION="READ", &
 STATUS="OLD", POSITION="REWIND", ERR=103, FILE=in_file)
 open (UNIT=9, ACTION="WRITE", &
 STATUS="REPLACE", POSITION="REWIND", ERR=104, FILE=temp_file1)
 call WriteLine(9, "INCLUDE '" // TRIM(temp_file2) // "'" )
 call WriteLine(9, "PROGRAM CcfIt" )
 call WriteLine(9, "USE CcfVars ! Module with CCF vars and init values" )
 call WriteLine(9, "implicit NONE" )
 call WriteLine(9, " " )
 call WriteLine(9, "CALL mCcfInits ! Init CCF values" )
 call WriteLine(9, 'open (UNIT=9, ACTION="WRITE", STATUS="REPLACE", ERR=7,&')
 call WriteLine(9, ' POSITION="REWIND", FILE="' //TRIM(out_file)// '")')
 !**************************************************************************!

 !** Read each line of the input file looking for **!
 !** 1) CCF lines **!
 !** 2) lines previously commented out by CCF in a Fortran processor **!
 !** 3) other lines **!
 !** 1) CCF lines - **!
 !** a) CCF variable declarations are buffered until written to **!
 !** the CCF module **!
 !** b) all other CCF lines are turned into Fortran lines by **!
 !** replacing the !ccf$ in columns 1-5 with blanks **!
 !** 2) lines previously commented out by CCF in a Fortran processor - **!
 !** a) !ccf* in columns 1-5 - Replaced columns 1-5 with blanks **!
 !** b) !ccf> in columns 1-5 - Shift the line left 5 columns **!
 !** Double all double quotes (") as in 3) **!
 !** 3) other lines - **!
 !** Other lines are simply echoed to the output file. Doing **!
 !** this requires doubling all the double quotes ("). **!
 !**************************************************************************!
 do ! until end of file
 read (UNIT=8, FMT="(A)", END=50) line
 ! !ccf$ - a ccf line
 if ((line(1:5) == '!ccf$') .OR. (line(1:5) == '!CCF$')) then
 ! if a CCF var declaration, save this line for the CcfVar MODULE
 ! else replace the !ccf$ with blanks (turn into a Fortran statement)
 if (CcfVarDeclaration()) then
 num_ccf_var_decl_lines = num_ccf_var_decl_lines + 1
 if (num_ccf_var_decl_lines > MAX_CCF_VAR_DECL_LINES) then
 print *, 'CCF halts. Limit of CCF variable declaration lines'
 print *, MAX_CCF_VAR_DECL_LINES,'exceeded on line:'
 print *, line
 stop
 endif
 line(1:5) = ' ' ! replace !ccf$ with blanks
 ccf_var_lines(num_ccf_var_decl_lines) = line
 else
 line(1:5) = ' ' ! replace !ccf$ with blanks
 call WriteLine(9, line)
 endif
 else ! not a CCF line
 ! !ccf* - a non-ccf line, commented by ccf by replacing columns 1 to 5
 ! !ccf> - a non-ccf line, commented by ccf by shifting the line
 ! right 5 columns and placing !ccf> in columns 1-5
 if ((line(1:5) == '!ccf*') .OR. (line(1:5) == '!CCF*')) then
 line(1:5) = ' ' ! blank out "!ccf$"
 elseif ((line(1:5) == '!ccf>') .OR. (line(1:5) == '!CCF>')) then
 line = line(6:) ! shift left 5 char
 else
 ! No !ccf* or !ccf> to blank out or shift.
 endif
 ! Double each double quote (") and output the new line
 call HandleDoubleQuotesThenWriteLine()
 endif
 enddo
 50 continue ! end of input file
 call WriteLine (9, "goto 8 ! We are done." )
 call WriteLine (9, "7 print *, 'Trying to open CCF output file with the&")
 call WriteLine (9, "& name you supplied >" // TRIM(out_file) // "'" )
 call WriteLine (9, "stop 'CCF Error: opening this output file.'" )
 call WriteLine (9, "8 continue" )
 call WriteLine (9, "END")

 call WriteLine (9, " ")
 call WriteLine (9, "subroutine WriteLine(unit_num, line) ")
 call WriteLine (9, " integer :: unit_num ")
 call WriteLine (9, " character (len=*) :: line ")
 call WriteLine (9, " ")
 call WriteLine (9, ' write (UNIT=unit_num, FMT="(A)") TRIM(line) ')
 call WriteLine (9, "end subroutine WriteLine ")
 ! Write the MODULE file which contains all the CCF variable declarations
 ! and any initial values.
 call WriteCcfVarsModule()
 ! Reminder of middle-step
 if (.NOT. batch) then
 print *
 print *, "CCF is creating ", TRIM(temp_file1), &
 " as the temporary file to compile and run"
 endif
 ! Close files and we are done
 close (UNIT=7, STATUS="KEEP", ERR=105)
 close (UNIT=8, STATUS="KEEP", ERR=106)
 close (UNIT=9, STATUS="KEEP", ERR=107)
 goto 110
 !**** ERROR messages ******************************************************!
 101 print *, 'Trying to open CCF batch file "', TRIM(batch_file), '"'
 stop 'CCF Error: opening this input file.'
 102 print *, 'Trying to read a line from CCF Batch file "', &
 TRIM(batch_file), '"'
 stop 'CCF Error: Batch file expecting input and output filenames'
 103 print *, 'Trying to open CCF input file "', TRIM(in_file), '"'
 stop 'CCF Error: opening this input file.'
 104 print *, 'Trying to open CCF temp file "', TRIM(temp_file1), '"'
 stop 'CCF Error: opening this input/output file.'
 105 print *, 'Trying to close CCF batch file "', TRIM(batch_file), '"'
 stop 'CCF Error: closing this input file.'
 106 print *, 'Trying to close CCF input file "', TRIM(in_file), '"'
 stop 'CCF Error: closing this input file.'
 107 print *, 'Trying to close CCF temp file "', TRIM(temp_file1), '"'
 stop 'CCF Error: closing this input/output file.'
 ! end ERROR messages ******************************************************!
 110 continue ! no I/O errors. We are done.
contains
!******************************************************************************
subroutine HandleDoubleQuotesThenWriteLine()
! Echo a non-CCF line to the middle-step temp file. This is done by turning
! the line into a character constant. Turning a line into a character constant
! requires placing it in between two double quotes (") and doubling each 
! occurrence of a double quote. The new line is then passed to a WriteLine 
! subroutine. Note that the original line could be 132 double quotes.
 character (len = 264) :: did_quotes_line ! line after doubling the "
 integer :: new_len ! line len after doubling the "
 character (len = 285) :: new_line ! 285 handles a line of 132 "s as:
 ! 1-19 : call WriteLine(9, "
 ! 20-283: 264 "s (132 "s doubled)
 ! 284-285: ")
 integer :: pos, & ! pos in original line
 new_pos ! position in did_quotes_line
 did_quotes_line = ' '
 new_pos = 1
 ! Double each occurrence of a double quote (").
 do pos = 1, LEN_TRIM(line)

 if (line(pos:pos) == '"') then
 did_quotes_line(new_pos:new_pos+1) = '""'
 new_pos = new_pos + 2
 else
 did_quotes_line(new_pos:new_pos) = line(pos:pos)
 new_pos = new_pos + 1
 endif
 enddo
 new_len = new_pos - 1
 new_line(1:19) = 'call WriteLine(9, "'
 new_line(20:20+new_len -1) = did_quotes_line(1:new_len)
 new_line(20+new_len:20+new_len+1) = '")'
 ! Set new_len to the length of the Fortran statement that may need splitting
 new_len = 20+new_len+1
 if (new_len <=132) then
 call WriteLine(9, new_line(1:new_len))
 elseif (new_len <= 262) then
 ! need to split line once
 call WriteLine(9, new_line(1:131)//"&")
 call WriteLine(9, "&"//new_line(132:new_len))
 else
 ! need to split line twice
 call WriteLine(9, new_line(1:131)//"&")
 call WriteLine(9, "&"//new_line(132:261)//"&")
 call WriteLine(9, "&"//new_line(262:new_len))
 endif
end subroutine HandleDoubleQuotesThenWriteLine
!******************************************************************************
function CcfVarDeclaration()
! Determine if the current CCF line ("line") is a CCF variable declaration
! line. This simple CCF processor expects free source form, keywords are
! reserved words, and there are no "kinds" on these variables.
 logical :: CcfVarDeclaration ! TRUE if CCF INTEGER or
 ! CCF LOGICAL statement
 integer :: pos ! pos in line
 ! skip over the !ccf$ and the blanks
 pos = 6
 do while (line(pos:pos) == ' ')
 pos = pos + 1
 end do
 ! This Simple CCF processor expects a blank or a ':' after
 ! INTEGER or LOGICAL. Note that INTEGER and LOGICAL are reserved words.
 if (((line(pos:pos+6) == 'INTEGER') .OR. &
 (line(pos:pos+6) == 'LOGICAL') .OR. &
 (line(pos:pos+6) == 'integer') .OR. &
 (line(pos:pos+6) == 'logical')) &
 .AND. &
 ((line(pos+7:pos+7) == ' ') .OR. &
 (line(pos+7:pos+7) == ':'))) THEN
 CcfVarDeclaration = .TRUE.
 else
 CcfVarDeclaration = .FALSE.
 endif
end function CcfVarDeclaration
!******************************************************************************
subroutine WriteCcfVarsModule()
! Write the CcfVars MODULE. CcfVars contains all the CCF variable
! declarations and a subroutine called mCcfInits. mCcfInits contains
! assignments of any initial values for CCF variables supplied either

! in the CCF batch file or from standard input.
 integer :: i ! counter for loop
 character (len = 32) :: init_var, & ! CCF variable to be initialized
 init_val ! intial value for CCF variable
 open (UNIT=10, ACTION="WRITE", STATUS="REPLACE", &
 POSITION="REWIND", ERR=201, FILE=temp_file2)
 call WriteLine(10, "MODULE CcfVars")
 call WriteLine(10, "implicit NONE")
 do i = 1, num_ccf_var_decl_lines
 call WriteLine(10, ccf_var_lines(i))
 enddo
 call WriteLine(10, "CONTAINS")
 call WriteLine(10, "SUBROUTINE mCcfInits")
 init_var = "foo" ! any foo other than '0' since '0' terminates loop
 do while (init_var /= '0')
 if (batch) then
 read (UNIT=7, FMT="(A)", END=202) init_var
 if (init_var(1:1) /= '0') then
 read (UNIT=7, FMT="(A)", END=202) init_val
 line = init_var // "=" // init_val
 call WriteLine(10, line)
 endif
 else
 print *, "Enter the name of a CCF variable to be initialized"
 print *, "(or '0' when you are done initializing):"
 read *, init_var
 if (init_var /= '0') then
 print *, "Enter the value you want for this CCF variable: "
 read *, init_val
 line = init_var // "=" // init_val
 call WriteLine(10, line)
 endif
 endif
 enddo
 call WriteLine(10, "END SUBROUTINE mCcfInits")
 call WriteLine(10, "END MODULE CcfVars")
 close (UNIT=10, STATUS="KEEP", ERR=203)
 goto 210
 !**** ERROR messages ******************************************************!
 201 print *, 'Trying to open CcfVars Module file "', TRIM(temp_file2), '"'
 stop 'CCF Error: opening this output file.'
 202 print *, 'Trying to read a line from CCF Batch file "', &
 TRIM(batch_file), '"'
 stop 'CCF Error: Batch file expecting initial var-val pairs or 0'
 203 print *, 'Trying to close CcfVars Module file "', TRIM(temp_file2), '"'
 stop 'CCF Error: closing this output file.'
 ! end ERROR messages ******************************************************!
 210 continue ! no I/O errors
end subroutine WriteCcfVarsModule
end program
!******************************************************************************
subroutine WriteLine(unit_num, line)
implicit NONE
 integer :: unit_num
 character (len=*) :: line
 write (UNIT=unit_num, FMT="(A)") TRIM(line)
end subroutine WriteLine


































































Benchmarking Real-Time Operating Systems


Using a modified Dhrystone to represent application workload




Eric McRae


Eric, an embedded-systems consultant, can be contacted at eric@elmi.com or by
telephone at 206-885-4107.


As an embedded-systems consultant, I'm regularly confronted with questions
like "Should we use a commercial operating system? If so, which one?" While I
always have an opinion, I remain frustrated with the lack of useful, objective
data to help answer these questions. While there are numerous test suites for
real-time operating systems (RTOS), they generally don't give prospective
customers an intuitive feeling for how a product will perform on their
hardware. For example, knowing that Vendor A's RTOS can task switch in 26
microseconds on a 60-MHz SuperZotsCPU doesn't tell me a whole lot about
whether I can use that RTOS on my 20-MHz ZotsCPU.
Consequently, designers waste time reinventing RTOSs even though they probably
would use a commercial RTOS if they understood its benefits and performance
drawbacks. I began to study existing benchmark work with the intent of
developing a more useful set of metrics. After conducting numerous e-mail,
phone, and fax conversations with other engineers and vendor representatives,
I came up with the suite of benchmarks described here. I hope that your
comments will lead to further refinements.


Target Class


Although they could be easily modified otherwise, these benchmarks are
designed to approximate demands placed on an RTOS by what I call "medium-sized
target systems." These systems have the following characteristics:
Use a 32-bit CPU.
Require 20 or fewer tasks.
Do not use dynamic task creation and destruction.
Have "sufficient" memory.
Have high-priority interrupt service requirements.
Have time-varying processing-load levels.
The tests are intended to characterize those aspects of an RTOS most important
to embedded-systems designers:
Task-switch performance.
Task-priority management performance.
Memory-allocation performance.
Message-passing performance.
 Interrupt latency.
Determinism (the measure of predictability of performance).


The Dhrystone Metric


While the Dhrystone benchmark is well known, it is generally not associated
with measuring RTOS performance. The standard Dhrystone benchmark (available
at http://www.netlib.org/benchmark/ file dhry-c) was designed to be run in an
environment with an interactive user interface. The test has three execution
phases: 
Initialization.
Looping.
Reporting. 
During initialization, the program asks users to enter the desired number of
Dhrystones to execute, then records execution start time. During the looping
phase, the program executes the specified number of Dhrystone loops. Upon
entering the reporting phase, the execution stop time is recorded and the
Dhrystone results are computed and displayed based on the time required for
execution.
I use a modified version of the Dhrystone benchmark to represent the
application workload for an RTOS. The Dhrystone benchmark as modified for RTOS
benchmarking does not interact with a user. When the hard-coded initialization
is complete, the program commences an endless Dhrystone loop, which instead of
decrementing the traditional limit count, will pulse a digital "DhryPass"
output. (Interrupts will be disabled before and enabled after the instructions
used to pulse the output.) Additionally, the code examines the state of
certain global variables and may branch depending on their state. For each
test, the RTOS will be restarted with specific values placed in global
variable locations. The mcp-main() function in the test suite (see Listing
One) will initialize all devices and start up tests appropriate to the
benchmark being run.
The Benchmark Control/Monitor hardware counts pulses on the DhryPass output
for arbitrary intervals (see the accompanying text box entitled "Benchmark
Laboratory Setup"). The measurement interval can be increased to any
reasonable time to gain resolution. The average rate at which the output
pulses reflects the speed of the hardware and the time spent executing RTOS
code. The average pulse rate drops as the RTOS consumes more processing time.
The key to this approach is that the benchmark results are presented as a
percentage of the CPU/hardware resource consumed by the RTOS. For example, 9
percent of the CPU is consumed by managing 20 time-sliced tasks." This ratio
approach allows designers using different hardware configurations to better
understand expected performance in their particular hardware configuration.
The ratio is determined by running the system with the RTOS disabled to get a
baseline and then running tests during which the RTOS must perform certain
duties.


RTOSB Baseline


The RTOSB Baseline test establishes the baseline performance of the system. It
is used to normalize subsequent test results so they are relatively
independent of the hardware used for benchmarking.
The modified Dhrystone loop is allowed to run as a stand-alone function with
interrupts disabled and no operating-system calls made. The number of DhryPass
transitions are counted over a 20-second period.
The resulting number is the baseline Dhrystone performance of the system. The
value is scaled to 100. The scale factor will be applied to the results of
subsequent tests so that they can be stated in percentage of the baseline
performance. The difference between subsequent results and 100 represents the
percentage of the processing resource consumed by the RTOS.
The graphical representation of this benchmark is a single-valued result. The
value measured here is published only for subsequent verification of results.

It is important that any RTOS benchmark be largely independent of the hardware
platform used to measure performance. Since CPU performance in different
platforms varies widely, benchmarks that are valid only for a specific target
are fairly useless to those who are not using that platform in their projects.
By measuring both baseline and burdened performance, a relative metric can be
created that will hold true for any system that is not specifically designed
for a particular RTOS product.


RTOSB RoundRobin


The RTOSB RoundRobin test determines the CPU resource consumed by the RTOS
when it has to manage task switching. This test uses the same benchmark
function as Baseline RTOSB, but with a periodic interrupt and one or more
tasks ready to run at the same priority level. Control switches from task to
task once every interrupt. The test will be made with 1, 5, 10, and 20 tasks,
each of which will be instances of the same basic Dhrystone routine. Each task
still pulses the same DhryPass output. The DhryPass output could pulse twice
very quickly if a task that has just pulsed the output is preempted by a task
that immediately pulses the output. The Benchmark Control/Monitor, however,
can safely account for transitions occurring 2 microseconds apart. This could
be substantially shorter than any task-switch time.
The length of time that a particular task has control of the CPU is determined
by the period of the timer interrupt. If the interrupt period is short, the
RTOS will consume more processing resources than if the period is long because
of the time required to manage task switching. In order to cover a range of
nominal timer-interrupt periods, the test described in the next paragraph will
be run for periods of 1, 5, 10, 25, and 50 milliseconds.
The results of this benchmark show the processing resource consumed by the
RTOS as a function of the number of running tasks and the task-switch rate.
According to the graph in Figure 1, 7 tasks execute in 10 milliseconds. You
can assume that this RTOS will consume approximately 5 percent of the
available processing resource in order to manage task switching.
This benchmark makes use of round-robin tasking and therefore does not
specifically involve priority-management influences. In spite of this,
round-robin tasking exercises a significant portion of RTOS task-management
code and thus gives a measure of the implementation.


RTOSB Priority


The RTOSB Priority metric reveals the performance resource consumed by the
RTOS when it uses priority-based task management.
This test is similar to RTOSB RoundRobin except that, instead of round-robin
scheduling, each of the 1, 5, 10, or 20 tasks will be started with a different
priority. Tests will be run with timer-interrupt periods of 10 and 20
milliseconds. The timer-interrupt handler will suspend or resume tasks
according to the following algorithm:
1. The initial mode will be "suspend." In this mode, the timer-interrupt
handler suspends the highest-priority task in the active set. 
2. When the size of the active set reaches 1, the mode is switched to
"resume." In this mode, the timer-interrupt handler resumes the
lowest-priority task in the suspended list. 
3. When no tasks remain in the suspended list, the mode returns to suspend.
4. In the case of one task, the mode will always be resume. This last case is
somewhat degenerate in that no suspend or resume calls will be made. However,
the results from this case will identify the overhead associated with the RTOS
timer-interrupt processing. This will help calibrate the results from the
cases of 5, 10, and 20 tasks.
This benchmark shows the processing resource consumed by the RTOS's management
of tasks with differing priorities, as a function of the number of tasks being
run and the rate of priority change.
According to the graph in 2, if 10 tasks are running, you could assume that
this RTOS will consume approximately 7 percent of the available processing
resource in order to manage priority switching at a rate of 100-priority
changes-per-second. A similar system that changes task priorities only
10-times-per-second should consume only 0.7 percent of the CPU.
It is difficult to define a priority-management benchmark that produces
universal results since the use and manipulation of task priority can be
varied. This benchmark assumes that the overhead of priority management is
linearly proportional to the rate at which task priorities are changed. If the
expected rate of priority change is known, this benchmark produces data from
which linear interpolation or extrapolation may provide a reasonable estimate
of the resulting overhead.


RTOSB Semaphore


The RTOSB Semaphore test reveals the performance resource consumed by the RTOS
when it has to manage semaphores and occasional priority inversion.
The test uses one normal and one high-priority task. The normal task is free
running but it requests and releases a semaphore once during each Dhrystone
pass. The high-priority task requests the same semaphore, runs one Dhrystone
pass, releases the semaphore, and suspends itself. The timer-interrupt handler
will resume the high-priority task. Tests will be run with timer-interrupt
periods of 5 and 20 milliseconds and with no timer interrupts. This benchmark
shows the processing resource consumed by the RTOS's implementation of
semaphores.
For example, Figure 3 shows heavy semaphore usage and occasional priority
inversion. You could assume that this RTOS will consume approximately 1
percent of the available processing resource in order to manage 200
inversions/sec.
Semaphore performance is difficult to benchmark because overhead is highly
dependent on the state of the system. For this test, I assume that in an
average system, semaphore conflicts are rare and they never require more than
a single level of priority inversion. Constant requesting of semaphores is
probably not the average case. Since semaphore overhead is generally low, this
test makes heavy use of them in order to obtain significant result data.


RTOSB Memory


The RTOSB Memory benchmark determines the overhead associated with allocation
and deallocation of memory blocks. Many RTOS vendors supply both fixed and
variable-sized block management. Where available, this benchmark will be run
against both types of allocation.
The basic task load is the same as in RTOSB RoundRobin. The timer interrupt,
however, sets global variables that cause the current task to either allocate
or free memory. Pointers to allocated memory are maintained in a global
allocation table.
To cause allocation of memory, the interrupt handler places the index of an
allocation table entry in a global alloc variable. The Dhrystone tasks check
this variable at the end of each pass; if it is not -1, they will request the
allocation and place the returned pointer in the allocation table. To cause a
block to be freed, the timer handler sets a global free variable to the index
of the table entry containing the pointer to the block. The Dhrystone tasks
also check this variable; if it is not -1, the pointer stored in the
allocation table at the given index is passed to the free function.
When the test begins, no blocks are allocated and the mode is alloc. For every
timer interrupt, one block will be allocated. When the total number of
allocated blocks reaches 64, the mode will be set to free. In free mode, the
interrupt handler causes deallocation of blocks from the list using a Gray
code counter to supply the index of the block to be freed. (A Gray code
counter changes only 1 bit as the counter is incremented or decremented. It
still can identify the same number of objects as a straight binary counter.
However, when used as a set index, it will "jump around" rather than
progressing linearly through the set.) This creates an allocation table with
"holes." When the number of allocated blocks drops to 16, the mode will be set
back to alloc.
This test shows the processing resource consumed by the RTOS while managing
multiple tasks and memory allocation as a function of the number of tasks and
the rate of allocation activity. The actual results are determined by
subtracting the results determined in the RTOSB RoundRobin from this
benchmark's corresponding test results. This gives a value that reflects the
overhead due to memory management. 
In Figure 4, if 10 tasks are running,you can assume that this RTOS will
consume approximately 10 percent of the available processing resource in order
to manage memory allocation at a rate of 100 operations/second.
If we can assume that processing associated with memory management is
independent of that associated with priority management, then someone who is
doing both can add the results shown in RTOSB Priority and RTOSB Memory to get
a reasonable estimate of the CPU resources consumed by the RTOS.


Baseline Interrupt Latency


The Baseline Interrupt Latency test establishes the baseline interrupt latency
inherent in the hardware. The results of this test will be used to normalize
subsequent interrupt-to-task latency tests. Task switching and other RTOS
operations are disabled during this test.
The RTOSB Baseline test is rerun, but with the addition of a high-level
interrupt handler that will pulse the MEASURE output. The interrupt request
signal will be asserted by the Benchmark Control/Monitor with a period
sweeping from 200 microseconds to 10 milliseconds over a period of 100 seconds
in steps of 2 seconds. For example, during the first 2 seconds, there will be
an interrupt every 200 microseconds. For the nth subsequent 2-second period,
the interrupt period in microseconds will be (n+1)x200.
The Benchmark Control/Monitor will measure the time from the assertion of the
interrupt until the assertion of the MEASURE signal, resulting in minimum,
maximum, and average interrupt-response times. These values will provide a
baseline against which subsequent interrupt-to-task latency can be compared.
The graphical representation of the test result is three scalar values.
It is necessary to determine this baseline performance experimentally even
though the interrupt latency is predominantly a function of the particular CPU
and instructions used in the target. Variations in the hardware environment
supporting the CPU can also have significant effect on interrupt latency.
Determining this baseline will assure the hardware independence of the next
benchmark, RTOSBInterrupt. In addition, this test will serve as a validation
of the Benchmark Control/Monitor test measurements, since the results here
should be in general agreement with latencies published by the CPU vendor.


RTOSB Interrupt



The RTOSB Interrupt test measures the overhead incurred when using the RTOS to
manage the connection between an interrupt and the routine that must respond
to the interrupt.
This test uses the load specified in RTOSB Memory at a fixed execution period
of one millisecond. In addition, a highest-priority-handler task will
initially start and then block on a signal. The interrupt service routine
(ISR) is activated directly by the interrupt and behaves according to RTOS
rules in that it makes use of any required entry and exit wrapper functions.
The ISR will assert a signal to the handler function and then exit. The
handler function, upon receipt of the signal, will immediately pulse the
MEASURE output and reblock, awaiting the next signal.
The time from the assertion of the interrupt-request signal until the
assertion of the MEASURE signal by the handler will be recorded. Minimum,
maximum, and average times for the series of 131,072 interrupts, as described
in the Baseline Interrupt Latency test, will be recorded for task loads of 1,
5, 10, and 20 tasks.
The results of this test are the times required for the RTOS to receive an
interrupt and invoke a handler while managing a variety of task loads and
memory allocation. These results are tabulated differently in that they are
presented as multipliers of the baseline interrupt latency (see Figure 5). If
the baseline interrupt latency average is 10 microseconds and the average
latency from this test is 35 microseconds, the result would be tabulated as
3.5. The distance between the minimum and maximum curves represents the
predictability or determinism of the RTOS's response to an interrupt. When the
minimum, average, and maximum are close together, the RTOS's
interrupt-response time is very predictable. The latency factors for the three
curves are determined by dividing the measured response times by the
corresponding baseline response times.
This test produces an approximation of the performance of the RTOS. The
minimum and maximum values recorded may not represent the actual best and
worst cases, which depend on the synchronization of many small, independent
time windows.
Since the baseline latency of the hardware may be quite small, the calculated
latency factors may be large. It is still necessary to avoid using the actual
response times here to maintain result independence from the hardware test
platform. For a baseline interrupt latency of 3.2 microseconds, for example,
it may be desirable to add a footnote to a result chart saying, "A latency
factor of 30 implied a response time of 96 microseconds on the test platform."



RTOSB Message


The RTOSB Message metric reveals the performance of the RTOS when it has to
manage queue-based messaging.
This test uses a fixed set of 10 tasks and timer-interrupt periods of 50, 25,
10, 5, and 1 milliseconds. The tasks all have assigned message queues. Nine
tasks are instances of a simple retrieve and forward routine, which blocks
until it receives a message and then immediately forwards that message to the
next task. The remaining task is the Dhrystone task. It will start and remain
in the Dhrystone loop until a global-message flag value changes. The Dhrystone
task will then send one or two messages to the first message-forwarding task,
depending on the new state of the global-message flag. The Dhrystone task will
then block until it has received the same number of messages from its queue. 
At each timer interrupt, the global-message flag is toggled between 0 and 1.
The total number of messages passed cycles between 10 and 20 during every
timer interrupt, corresponding to the current state of the flag. No Dhrystone
looping takes place until all messages have passed through all tasks.
This benchmark shows the overhead consumed by the RTOS when managing message
origination and retrieval as a function of message rate. In Figure 6, a system
expecting to manage 1500 messages/second will consume approximately 10 percent
of the CPU resources.
This test forces the RTOS to manage bursts of messages. It is difficult to
predict whether an RTOS will behave differently if the messages are dispersed
in time. It is assumed that the difference is small.


Summary 


These benchmarks provide an intuitive sense of an RTOS's performance. The
benchmarks are not designed for comparing different vendors' offerings, and
they are not intended to be combined into a single figure of merit. The task
of evaluating an RTOS still is complex, but I hope these benchmarks will
assist you in matching an RTOS product to your system requirements.
I would like to thank the many people who have shown support for this project.
Special thanks to Linda Thompson, Robin Kar, and John Fogelin, each of whom
provided significant technical input.
After modifying the current benchmark suite as indicated by feedback from you,
I might run the tests against the various RTOS products, create a large-system
equivalent of these tests, or create an equivalent for POSIX-compliant RTOSs.
I welcome your comments and suggestions.
Benchmark Laboratory Setup
The figure shown here illustrates the components necessary to implement the
benchmarks I describe. In general, these include the following: 
Test Target System.The target for this test series is any average
microcontroller meeting the aforementioned target class assumptions. Three I/O
connections to this system are required. The first is a digital "Measure"
output used to indicate that the test system is ready for measurement. The
second is an interrupt request (IRQ) input that will be used to invoke
asynchronous interrupts. The remaining signal is a DhryPass output. Most
microcontroller evaluation boards make suitable test targets.
Benchmark Control/Monitor.An off-the-shelf evaluation board with custom
software will be used to provide stimulus and take performance measurements on
the RTOS test target. The Benchmark Control/ Monitor (BenchMon) consists of a
20-MHz Motorola 68332 Evaluation Board. The MC68332 contains an integral
module called the Time Processor Unit (TPU), capable of high-resolution timing
and control of discrete I/O signals. Custom TPU microcode developed for this
benchmark will be used to measure the time between the assertion of signals
and to count the transitions of a signal. Minimum, maximum, and average time
values can be calculated with a 200-nanosecond resolution over millions of
test cycles. Control software running on the BenchMon CPU will interact with
the TPU microcode to control the onset of each test and to display results via
the serial data port.
In-Circuit Emulator.An in-circuit emulator (ICE) is connected to the target
CPU as a suggested part of the test setup. It provides a means of control and
visibility into the target without the inclusion of an RTOS monitor task in
the system. Also, the trace and overlay features of the emulator facilitate
implementing each RTOS and verifying their real-time behavior.
Other Equipment.A workstation capable of compiling target code and operating
the ICE and BenchMon program must be available. Oscilloscopes and other test
equipment may also be needed.
--E.M.
Figure 7: Benchmark Laboratory Setup
Figure 1: RTOSB RoundRobin.
Figure 2: RTOSB Priority.
Figure 3: RTOSB Semaphore test.
Figure 4: RTOSB Memory test.
Figure 5: RTOSB Interrupts.
Figure 6:RTOSB Message.

Listing One
/* RTOS Benchmark Master Control Program
** The main() function here is invoked as the first user task after the RTOS
** has been configured and started. It is responsible for configuring the
** task set and operation mode based on the global test configuration
** parameters. The Dhrystone code is not included here due to its size. 
** However its flow is modified to be as follows.
** Initialize data;
** while( 1 )
** {
** Invoke callback routine supplied when this task started;
** Do normal Dhrystone computations;
** }
** When a Dhrystone task is started, it is supplied with a callback argument
** pointing to a function in this file. That callback is invoked at the end
** of every pass through the Dhrystone loop. Some RTOS may be very restrictive
** in what can be done during an interrupt. If task control cannot be exerted 
** during that time parts of this code will have to be restructured.
** Caveat emptor: This code has not yet been tested on any RTOS.

** There are subtle differences between this code and the article text.
** The code is more recent.
*/
/* RTOS specific routines compiled separately */
extern void enableSliceModeV(); /* Enable RoundRobin multitasking */
extern void enableTaskModeV(); /* Enable multitasking */
 /* Install a callback in the timer interrupt */
extern void installTimerHandlerV( void (*callbackPF)(void) );
 /* Start a high priority task */
extern void startHighTaskV( void (*callbackPF)(void) );
extern void waitForSignalV( void ); /* For High Priority latency task */
extern void signalHighTask( void ); /* signal high priority task */
 /* Install interrupt handler */
extern void installIntHandler( void (*callbackPF)(void) );
extern void killSelfV( void ); /* Stop current thread */
extern void suspendTaskV( int ); /* Suspend this task */
extern void suspendSelfV( void ); /* Suspend current task */
extern void resumeTaskV( int ); /* Resume a given task */
extern void createSemaphoreV( void *semaphorePV );/* construct a semaphore */
extern void releaseSemaphoreV( void * );
extern void * requestSemaphorePV( void );
 /* Starts one task at a given priority,
 ** passes it the given callback, rtns ID */
extern int startTaskN( int priorityN, void (*callbackPF)(void) );
 /* Starts countN tasks at same priority */
extern void startRRTasksV(int priorityN, int countN, void
(*callbackPF)(void));
 /* Starts countN tasks at different priority */
extern void startTasksV(int priorityN, int countN, void (*callbackPF)(void) );
extern void rtosFreeV( void * ); /* free a block */
extern void *rtosAllocPV( void ); /* allocate a block */
extern void waitMsgV( void ); /* wait for message from someone */
extern void sendMsgV( void ); /* send message to next task */
extern void sendFirstMsgV( void ); /* send message to first task */
/* CPU specific macros */
#define DISABLE_INTERRUPTS /* Assembler statement for disable */
#define ENABLE_INTERRUPTS /* Assembler statement for ensable */
/* Target board specific functions compiled separately */
extern void resetMeasureV( void ); /* de-asserts the Measure output */
extern void startMeasureV( void ); /* Asserts the Measure output */
extern void dhrypulseV( void ); /* Toggles the Dhrypulse output */
/* Compiler specific definitions */
#define interrupt
#define MAXTASKS 20 /* Maximum number of tasks in list */
#define MAXALLOC 64 /* Maximum number of allocated blocks */
#define MINALLOC 16 /* Minimum number of allocated blocks */
enum TEST
{
 BASELINE = 1,
 ROUNDROBIN = 2,
 PRIORITY = 3,
 SEMAPHORE = 4,
 MEMORY = 5,
 BASEINTLAT = 6,
 INTLATENCY = 7,
 MESSAGE = 8
};
/* Global test configuration values set by emulator at startup */
enum TEST testN; /* Determines which test to run */
int taskCountN; /* Number of tasks (where appropriate */

int highPriorityIsBig; /* Selects priority direction for OS */
int basePriN; /* Starting task priority */
enum STATE
{
 NotStarted = 0,
 Running,
 Suspended
};
enum MODE
{
 Suspend,
 Resume
};
struct TASKLISTENT
{
 int idN; /* numeric task ID */
 int priN; /* task priority (bigger is higher) */
 enum STATE stateN; /* Current state */
} taskListAH[MAXTASKS];
static void *semaphorePV; /* generic semaphore pointer */
static int resumeN; /* resume flag for semaphore test */
static int taskN; /* Task ID */
static int allocN, allocFillN; /* allocation table indicies */
static void * semaphorePV;
static int messageCountN; /* determines when and how many */
static void *allocAP[MAXALLOC]; /* Allocation table */
/* Gray Decode Table */
static int const grayTabAN[MAXALLOC] = 
{
 0, 1, 3, 2, 6, 7, 5, 4, 12, 13, 15, 14, 10, 11, 9, 8,
 24, 25, 27, 26, 30, 31, 29, 28, 20, 21, 23, 22, 18, 19, 17, 16,
 48, 49, 51, 50, 54, 55, 53, 52, 60, 61, 63, 62, 58, 59, 57, 56,
 40, 41, 43, 42, 46, 47, 45, 44, 36, 37, 39, 38, 34, 35, 33, 32
};
/* Function declarations */
static void performanceHandlerV( void );
static void nullCB( void );
static void semaSuspendV( void );
static void semaphoreV( void );
static void semaphoreHandlerV( void );
static void memoryHandlerV( void );
static void memoryCB( void );
static void latencyHandlerV( void );
static void interrupt intHandlerV( void );
static void interrupt latencyHandlerV( void );
static void latencyTaskV( void );
static void waitSendMsgV( void );
static void messageCB( void );
static void messageHandlerV( void );
static void messageCB( void );
/* The entry point for the first user task */
void mcp_main( void )
{
 int i;
 resetMeasureV(); /* Reset Measure output port */
 switch(testN)
 {
 case BASELINE:
 startTaskN( basePriN, nullCB); /* Start 1 task, minimal callback */

 DISABLE_INTERRUPTS; /* No more interrupts */
 startMeasureV(); /* Begin measurement */
 break;
 case ROUNDROBIN: /* Run with various timer tick rates */
 enableSliceModeV(); /* Enable RR multitasking */
 /* Start countN tasks, min. callback */
 startRRTasksV(basePriN, taskCountN, nullCB);
 startMeasureV(); /* Begin measurement */
 break;
 case PRIORITY:
 enableTaskModeV(); /* Enable multitasking */
 /* Start countN tasks, min. callback */
 startTasksV(basePriN, taskCountN, nullCB);
 /* Install routine in timer int. handler */
 installTimerHandlerV( performanceHandlerV );
 startMeasureV(); /* Begin measurement */
 break;
 case SEMAPHORE:
 createSemaphoreV( &semaphorePV ); /* construct a semaphore */
 /* Start a normal task that uses a semaphore */
 enableTaskModeV(); /* Enable multitasking */
 startTaskN( basePriN, semaphoreV );
 /* Then start a higher priority task that suspends itself
 ** and then resumes after every timer tick. */
 resumeN = 0; /* prevent next task from going far */
 if( highPriorityIsBig )
 taskN = startTaskN( basePriN + 1, semaSuspendV );
 else
 taskN = startTaskN( basePriN - 1, semaSuspendV );
 installTimerHandlerV( semaphoreHandlerV );
 startMeasureV(); /* Begin measurement */
 break; 
 case MEMORY:
 for( i = 0; i < MAXALLOC; i++ )
 allocAP[i] = 0; /* clear allocation table */
 allocFillN = 0; /* initialize index */
 allocN = -1; /* No allocations yet */
 enableSliceModeV(); /* Enable RR multitasking */
 startRRTasksV(basePriN, taskCountN, memoryCB);
 installTimerHandlerV( memoryHandlerV );
 startMeasureV(); /* Begin measurement */
 break;
 case BASEINTLAT:
 startTaskN( basePriN, nullCB );
 installIntHandler( intHandlerV ); /* Handler pulses Measure */
 break;
 case INTLATENCY:
 for( i = 0; i < MAXALLOC; i++ )
 allocAP[i] = 0; /* clear allocation table */
 allocFillN = 0; /* initialize index */
 allocN = -1; /* No allocations yet */
 enableSliceModeV(); /* Enable RR multitasking */
 startRRTasksV(basePriN, taskCountN, memoryCB);
 installTimerHandlerV( memoryHandlerV );
 /* At this point we have a good system load. Now
 ** set up the high priority task and the interrupt handler. */
 startHighTaskV( latencyTaskV );
 installIntHandler( latencyHandlerV );
 break;

 case MESSAGE:
 startRRTasksV(basePriN, 9, waitSendMsgV ); /* start msg tasks */
 /* start Dhrystone/initiator task */
 startRRTasksV(basePriN, 1, messageCB );
 installTimerHandlerV( messageHandlerV ); /* message trigger */
 break;
 default:
 break;
 }
 killSelfV(); /* Remove this thread */
}
/* Function: addNewTaskV
** Purpose: Callback from RTOS specific task start routines. This function
** updates the active task list.
*/
void addNewTaskV( int idN, int priorityN )
{
 static int startedTasksN = 0;
 taskListAH[startedTasksN].idN = idN;
 taskListAH[startedTasksN].priN = priorityN;
 taskListAH[startedTasksN].stateN = Running;
 startedTasksN++;
}
/* Function: performanceHandlerV
** Purpose: Called from the RTOS timer interrupt code. This routine
** suspends and resumes tasks in the task list.
*/
void performanceHandlerV( void )
{
 static enum MODE modeN = Suspend;
 int i, theTaskN, itsPriN, tasksN;
 if( taskCountN == 1 ) return; /* don't do anything if just 1 task */
 tasksN = taskCountN;
 if( modeN == Suspend )
 { /* search for the highest priority task that is still running */
 if( highPriorityIsBig ) itsPriN = 0;
 else itsPriN = 32767;
 for( i = 0; i < taskCountN; i++ )
 {
 if( taskListAH[i].stateN == Running )
 { /* if the task is running, check it's priority */
 if( highPriorityIsBig )
 {
 if( taskListAH[i].priN > itsPriN )
 {
 itsPriN = taskListAH[i].priN;
 theTaskN = i; /* remember which task is highest */
 }
 }
 else
 { /* Low numbers are higher priority */
 if( taskListAH[i].priN < itsPriN )
 {
 itsPriN = taskListAH[i].priN;
 theTaskN = i; /* remember which task is highest */
 }
 }
 }
 }

 suspendTaskV( taskListAH[theTaskN].idN ); /* suspend highest task */
 taskListAH[i].stateN = Suspended;
 tasksN--;
 if( tasksN == 1 ) modeN = Resume;
 } /* End of mode == suspend */
 else /* Mode must be Resume */
 {
 if( highPriorityIsBig ) itsPriN = 32767;
 else itsPriN = 0;
 tasksN = 0;
 for( i = 0; i < taskCountN; i++ )
 {
 if( taskListAH[i].stateN == Suspended )
 { /* if the task is suspended, check it's priority */
 if( highPriorityIsBig )
 {
 if( taskListAH[i].priN < itsPriN )
 {
 itsPriN = taskListAH[i].priN;
 theTaskN = i; /* remember which task is lowest */
 }
 }
 else
 { /* Low numbers are higher priority */
 if( taskListAH[i].priN > itsPriN )
 {
 itsPriN = taskListAH[i].priN;
 theTaskN = i; /* remember which task is lowest */
 }
 }
 }
 else tasksN++; /* Count number of running tasks */
 }
 /* If there were no suspended tasks */
 resumeTaskV( taskListAH[theTaskN].idN ); /* resume the task */
 tasksN++;
 if( tasksN++ == taskCountN ) /* If all tasks are now running */
 modeN = Suspend; /* Switch back to suspend mode */
 }
}
/* Function: nullCB
** Purpose: Do nothing call back
*/
void nullCB( void )
{
}
/* Function: semaphoreV
** Purpose: Callback for normal priority semaphore task. Releases semaphore
** (if owned), then grabs it right back. If a higher priority task has 
** requested it, then I guess we won't come right back from the release.
*/
static void semaphoreV( void )
{
 static int haveIt = 0;
 if( haveIt ) /* No semaphore to release the first time */
 releaseSemaphoreV( semaphorePV );
 semaphorePV = requestSemaphorePV();
 haveIt = 1;
}

/* Function: semaSuspendV
** Purpose: Callback for high priority semaphore task. Releases semaphore 
** (if owned) and then suspends itself. When resumed, it requests semaphore.
*/
static void semaSuspendV( void )
{
 static int haveIt = 0;
 if( haveIt )
 {
 releaseSemaphoreV( semaphorePV );
 suspendSelfV();
 }
 semaphorePV = requestSemaphorePV();
 haveIt = 1;
}
/* Function: semaphoreHandlerV
** Purpose: Called during a timer interrupt. Resumes the desired task. */
static void semaphoreHandlerV( void )
{
 resumeTaskV( taskN );
}
/* Function: memoryHandlerV
** Purpose: Invoked from the timer interrupt handler. This routine places an
** index of an allocation table entry in a location that can be read by the 
** memory callback routine. This routine has alloc and free modes. In alloc 
** mode, it finds the index of the first non-allocated entry. In free mode,
** it uses a Gray code hash to get the index of an entry to be freed.
*/
static void memoryHandlerV( void )
{
 static int freeModeN = 0;
 int i;
 if( ! freeModeN )
 { /* If allocating blocks */
 for( i = 0; allocAP[i]; i++ ) ; /* Scan for unallocated slot */
 allocN = i; /* set this for callback routine */
 allocFillN++; /* track number of allocated blocks */
 if( allocFillN == MAXALLOC )
 freeModeN = 1;
 }
 else
 { /* Must be freeing blocks, use Gray code index */
 allocN = grayTabAN[--allocFillN];
 if( allocFillN == MINALLOC ) /* if we hit lower allocation limit */
 freeModeN = 0;
 }
}
/* Function: memoryCB
** Purpose: Called from the dhrystone loop. Looks at the allocAP[allocN] entry
** and requests an alloc or free depending on whether the value is 0.
*/
static void memoryCB( void )
{
 if( allocAP[allocN] )
 { /* if this entry is allocated */
 rtosFreeV( allocAP[allocN] );
 allocAP[allocN] = 0;
 }
 else /* entry was null, assign it a block */

 allocAP[allocN] = rtosAllocPV();
}
/* Function: intHandlerV
** Purpose: invoked directly by the discrete interrupt input
** Used for establishing baseline interrupt latency
*/
static void interrupt intHandlerV( void )
{
 startMeasureV(); /* Asserts the Measure output */
 resetMeasureV(); /* de-asserts the Measure output */
}
/* Function: latencyHandlerV
** Purpose: invoked directly by the discrete interrupt input. Used for 
** establishing baseline interrupt latency. Signals task waiting for interrupt
*/
static void interrupt latencyHandlerV( void )
{
 signalHighTask(); /* assert signal to waiting task */
}
/* Function: latencyTaskV
** Purpose: Runs as a thread, waits on a signal and then pulses the
** measure output. The test measures the time from the onset
** of the interrupt to the assertion of the measure output.
*/
static void latencyTaskV( void )
{
 while( 1 )
 {
 waitForSignalV(); /* Hang here waiting for interrupt */
 startMeasureV(); /* Asserts the Measure output */
 resetMeasureV(); /* de-asserts the Measure output */
 }
}
/* Function: waitSendMsgV
** Purpose: Self contained thread that lives for message exchange. The RTOS 
** specific message functions determine the recipient of messages sent such 
** that all running tasks send receive messages. Started from a normal 
** Dhrystone loop but never return; don't contribute to Dhrystone rate.
*/
static void waitSendMsgV( void )
{
 while( 1 )
 {
 waitMsgV(); /* wait for message from someone */
 sendMsgV(); /* send message to next task */
 }
}
/* start Dhrystone/initiator task */
/* Function: messageCB
** Purpose: Called from the dhrystone loop. If the messageCountN
** value has changed, initiate a message loop.
*/
static void messageCB( void )
{
 static int lastCountN = 0;
 if( messageCountN != lastCountN )
 { /* if it's time to do some messages */
 if( (messageCountN = lastCountN ) == 2 )
 { /* if sending/receiving 2 messages */

 sendFirstMsgV();
 sendFirstMsgV();
 waitMsgV();
 waitMsgV();
 }
 else
 { /* just sending one message */
 sendFirstMsgV();
 waitMsgV();
 }
 }
}
/* Function: messageHandlerV
** Purpose: Called from the timer interrupt handler. Bounces the global
** message counter between 1 and 2.
*/
static void messageHandlerV( void )
{
 if( ++messageCountN == 3 )
 messageCountN = 1;
}










































Writing a Portable Transport-Independent Web Server


TLI makes HTTP transport independent




John Calcote


John is an engineer on Novell's Directory Services team. He can be contacted
at jcalcote@novell.com.


Like all standards, HTTP is being asked to do things its designers never
considered--implementing reliable user authentication for secure business
transactions over the Internet, displaying information in newly created
formats, and the like. Consequently, new versions of the standard are under
development in an attempt to solve emerging problems without losing backward
compatibility.
One of the more solvable limitations of most contemporary HTTP servers (those
based on the NCSA or CERN implementations) is that they are implemented on top
of Berkeley Sockets, and thus closely tied to TCP/IP. To address this problem
in a recent project, I chose to implement my Web server on top of the
Transport Layer Interface (TLI). Because TLI is not a transport service
provider (TSP), but rather an abstraction layer over various TSP's, my Web
server will still communicate with existing Sockets-based browsers, as long as
I choose to listen on a TCP socket for Web requests. As it turns out, TLI
source code is 98 percent transport independent (even transport unaware). I
demonstrate this fact in the accompanying source code (available
electronically; see "Availability," page 3) by creating two listen threads,
one for TCP/IP and one for Novell's SPX II. Both types of requests are mapped
to TLI file handles, and are serviced by the same Web server code path.


The OSI Communications Model


TLI was developed by AT&T in an effort to solve some of the problems
associated with Berkeley Sockets and to stratify some of the many inconsistent
communication interfaces in the UNIX kernel at the time. In the first place,
BSD Sockets are actually part of the operating-system kernel. The BSD Sockets
functions are system calls. The left side of Figure 1 shows the seven layers
of the OSI communications model. The right side indicates the dividing line
between those portions of the model that are implemented in the
operating-system kernel and those portions implemented in application space.
BSD Sockets are implemented at the network layer of the OSI model. Because of
its design, it is "hard-wired" into the TCP/IP and UDP/IP protocols.
TLI is implemented at the OSI model's transport layer. These functions
actually become a part of the network application. TLI is implemented as a
library of functions that are either linked into the application statically at
compile time or dynamically at run time in the form of a loadable DLL. In
either case, the code for TLI runs in the same address space as the
application, rather than in kernel space. The implications of this difference
are subtle. First, since TLI is not part of the kernel, some method must be
provided for it to communicate with system resources and networking hardware.
Since portability was a major goal in TLI's design, the UNIX STREAMS interface
was chosen as a system-level interface because of its multipurpose, generic
functionality. Most ports of TLI are implemented on top of a port of the
STREAMS subsystem (although this is not a requirement), which provides an
efficient, integrated, full-duplex pipeline to kernel device drivers.
As Figure 2 shows, there are actually three components to TLI: 
User-level library, containing the application-visible interface.
Transport-service provider, chosen by the programmer when a TLI end point is
opened.
STREAMS subsystem, providing the POSIX open(), close(), read(), write(), and
ioctl() functions necessary to establish network-communication channels in a
manner consistent with file-system access. 
TLI is implemented as a particular state machine. A TLI end point at any given
time depends on a number of factors, including the initialization stage of the
end point and asynchronous events that have been received by TLI on the end
point.


Local Management


Local management of TLI data structures is straightforward. The t_open()
function initializes and returns an OS file descriptor based on a particular
Transport Service Provider (TSP), such as TCP or Novell's SPX II. How you
specify the TSP is implementation dependent, but generally, it takes the form
of a string containing a UNIX-like file system path to the desired device; for
example /dev/tcp or /dev/nspx.
Once a file descriptor has been opened and assigned to a TSP, it may be bound
to a specific or arbitrary network address using t_bind(). This phase creates
one of the few places in TLI application code that is transport specific,
however, the transport-specific information is concise enough to be passed
into a generic routine as a parameter. For instance, an OpenEndPoint()
function might take a string signifying the desired transport provider. The
t_bind structure contains two fields, a netbuf structure and an integer, qlen,
used to specify the maximum number of simultaneous, outstanding indications a
server is willing to handle; see Example 1.
The netbuf structure is generic enough to handle any sort of
transport-specific network address or data imaginable. It is basically a sized
buffer of variable length. The maxlen field tells TLI how large the buffer
actually is (allocation size), while the len field tells the application how
much data was returned by TLI.
Once bound to a network address, a TLI file descriptor may be used by a client
to request a connection with t_connect(), or it may be used by a server to
listen for a connect indication with t_listen(). When t_listen() returns,
signifying that a connect indication has arrived, the server may accept the
indication on the server socket or on another opened and bound end point. If
the server accepts on the server socket, then no other indications may be
serviced until the session has been terminated, at which point the server may
reopen the socket and return to a blocked state by calling t_listen(). If the
server accepts on another end point (as is typical), then it may immediately
return to t_listen() on the server socket after calling t_accept() and passing
the new end point to a responder thread or process.
The t_accept() function takes a listening-file descriptor, a receiving-file
descriptor, and the t_call structure used in t_listen(). This structure
contains three netbuf structures and an integer value representing the
indication sequence number; see Example 2. For most situations the opt and
udata buffers are not used. The addr buffer contains the address of the
calling client, in a transport-specific format, and the sequence value
specifies which of the incoming indications this call is associated with.
The trivial case of qlen is 1. If qlen is set to 1 when a server port is bound
to a TLI end point, then all attempts to connect to the server on that
descriptor while it is processing a previously received connect indication are
rejected by the underlying TSP. This case is far easier to code, but makes for
a rather nonrobust server. If qlen is greater than 1, then outstanding connect
indications must be queued up by the server while it is processing the current
indication. The difficulty in coding servers using a qlen greater than 1 lies
in the fact that asynchronous incoming indications must be gathered off the
wire before t_accept() will work. The server is notified of such an
asynchronous event by t_accept() failing with an error code of TLOOK. At this
point, the server needs to call t_look() to retrieve the event, and then
process it according to the nature of the event.
Two possible events may occur during t_accept(). The T_LISTEN event indicates
that another connect indication has arrived, while the T_DISCONNECT event
means that someone previously queued up for connect desires to cancel his
connect indication. If the event is T_LISTEN, you call t_listen() with a new
call structure and queue it up for later processing. If the event is
T_DISCONNECT, you call t_rcvdis() to retrieve the sequence number of the
disconnecting client, scan the indication queue for the t_call block
containing this sequence number, and then remove it from the waiting queue.
You then return to t_accept() to try accepting the original connect indication
again. Asynchronous events could keep you from accepting the current
indication until you have filled your outstanding indication queue with qlen
connect indications.
When you have finally accepted the indication (establishing the connection on
a new end point), you either fork a new process to handle the client's
request, or begin a new thread in a multithreaded environment, passing the
client file descriptor.
The server must handle all outstanding connect indications before returning to
t_listen() to wait for another incoming indication. It's quite apparent that
on a busy day the server may never get back to blocking on t_listen()!


TLI File Descriptors


As in Berkeley Sockets, the end points created by TLI are operating-system
file descriptors. TLI opens such a descriptor by calling the STREAMS open()
function, and pushing the TIMOD STREAMS module onto the stream for this
descriptor. This is where the real fun begins. The request-handler thread
cannot tell the difference between a TLI file descriptor and a standard file
descriptor, because all file descriptors in the UNIX environment are STREAMS
based. This means that the standard C library function fdopen() may be called
on the descriptor to open an I/O stream buffer for the client file handle.
(Note that these standard C library functions are not at all related to the
UNIX STREAMS interface, which is actually based on POSIX.) Once this is done,
all of the standard file I/O functions may be used on the descriptor. Input is
gathered from the client using fgets(), fgetc(), or fread(), and data may be
sent to the client using fprintf(), fputs(), fputc(), or fwrite().
Furthermore, in the UNIX heavy-weight process environment, it is common for
the server to redirect the child process's stdio to the client's network file
descriptor before forking, by using the POSIX dup2() function to duplicate the
client descriptor on the system's stdin and stdout file descriptors. This
trick allows the responder to retrieve all client input using the even simpler
stdio interface provided in gets() and printf(). It makes for some really
interesting network application code to see all client I/O being handled with
stdio functions.
When a session is complete, the server simply calls fclose() on the file
pointer to terminate the session. If stdio is redirected, as in the UNIX
server environment, the startup code of the responder process automatically
calls fclose() on the stdio file descriptors when the main function returns.
The HTTP protocol depends on orderly release of the connection for correct
operation. Beware that some ports of TLI do not correctly handle orderly
release of the connection when buffered I/O is used in this manner. Since TLI
is an open specification, not all TLI ports implement the full functionality
of the specification. The specification even documents those portions that are
optional. The way to handle this deficiency is to use the t_sndrel() and
t_rcvrel() functions on the embedded file descriptor before calling fclose()
on the file pointer. 
Essentially, when the server is finished sending data, it calls t_sndrel(),
and then blocks on a call to t_rcvrel() while waiting for the client to
indicate that it has finished reading this data. The server will wake up when
the client either calls t_sndrel(), or aborts the connection with t_close().
This handshake ensures that the client has received and processed all the data
sent by the server before the connection is torn down. By definition, an
abortive release on a TLI file descriptor will truncate all data not
explicitly received by the client.


The Hypertext Transfer Protocol


HTTP is a line-oriented, ASCII protocol. The most common method of handling
I/O in the responder is to call fgets() to retrieve the client's request
strings, and then send data back using fprintf() or fputs().

There are several (ever-changing) types of HTTP requests, including GET, HEAD,
PUT, POST, LINK, UNLINK, and DELETE. Of these, GET is by far the most popular,
followed by POST. Since 95 percent of all requests are GET requests, I'll
limit my discussion to it.
An HTTP GET request is made up of a single request line, followed by several
MIME header fields, and then a blank line. If, as in the case of POST or PUT,
the request contains any data, then one of the MIME header fields should be
"content-length." The value in this field will specify how many bytes of data
are to follow the blank line. Since GET requests don't contain data, this is
not a concern for us. You simply read until you hit a blank line, and then you
can start sending your response. In reality, since these TLI sessions are full
duplex, you can send data before you have retrieved it all. This means that
you could read the request line, and ignore any MIME headers sent by the
client. Some of the most-common MIME headers are listed in Table 1. An HTTP
response is formatted in a manner similar to a request. The first line
contains a signature string, a status code, and a human-readable form of the
status. A valid "OK" response might look like "HTTP/1.0 200 Here she comes!". 
After the status-code line, the server sends zero or more MIME headers
indicating the type and length of the file being sent. Again, a blank line
follows the header, and then the file data itself. At the very least, a server
should send a content-type header field. If a content-length header field is
sent, the client can display a progress bar to the user as the transfer is
taking place. 
Writing an entire Web server is a topic for more than one article, but the
front-end server code is a large portion of any server (and a daunting one at
that). The sample code in the source accompanying this article should give you
a good headstart if you're planning on using TLI for a transport interface.
Like all programmer's interfaces, there is a learning curve, but TLI has good
documentation and was designed to be easy to learn and easy to use.


References


Graham, Ian S. The HTML Sourcebook. New York, NY: John Wiley & Sons, 1995.
NLM Transport Interfaces (included with the Network C for NLMs Software
Developer's Kit). Novell Inc.: September 1991. 
Rago, Stephen A. Unix System V Network Programming. Reading, MA:
Addison-Wesley, 1993.
The Novell Software Developer's Kit CD, volume 5. Novell Inc.: 1995. 
Figure 1: OSI communications model.
Figure 2: TLI component block diagram. 
Example 1: The t_bind structure contains two fields, a netbuf structure and an
integer, glen.
struct netbuf {
 unsigned int maxlen;
 unsigned int len;
 char *buf;
};
struct t_bind {
 struct netbuf addr;
 unsigned int qlen;
};
Example 2: The t_call structure used in t_listen(). 
struct t_call {
 struct netbuf addr;
 struct netbuf opt;
 struct netbuf udata;
 int sequence;
};
Table 1: Common HTTP response headers.
Accept:
Accept-encoding:
Authorization:
Content-Length: (POST or PUT requests only)
Content-Type: (POST or PUT requests only)
From:
If-Modified-Since:
Pragma:
Referrer:
User-Agent:

Content-Encoding:
Content-length:
Content-Transfer-Encoding:
Content-type:
Date:
Expires:
Last-modified:
MIME-version:
Server:
Title:
WWW-Authenticate:
WWW-Link:



































































Improving Usenet News Performance


A message-passing operating system is one place to start




Robert Krten


Robert is a principal with PARSE Software Devices. He can be contacted at
rk@parse.com.


Usenet is a worldwide network of computers that allows users to post and read
messages (articles) categorized hierarchically into topical groups
(newsgroups); for instance, comp.os.qnx refers to the newsgroup "computer
science," subgroup "operating systems," thread "qnx."
In this article, I'll examine how Usenet news works on existing UNIX systems.
I'll then describe how the speed and efficiency of this arrangement can be
improved under the QNX operating system.


How Does News Work?


People from around the world post articles to various newsgroups. News servers
then distribute these articles to neighboring machines. Neighboring machines
distribute the articles to other machines until the article has been
propagated around the world. Machines check incoming articles to see if they
already have a copy, and delete the incoming article if they do. I won't
detail the machines' methods of getting articles, except to say there is a
program on the user's system that is responsible for getting articles from
other machines and storing them. 
For instance, a typical system accepts articles for about 4000 newsgroups
(this is by no means a huge system, either!). As each article comes in, the
article's header is scanned, and the news software determines which newsgroup
the article should be stored in.
Before news traffic escalated, it seemed like a good idea to store one article
per file. Consequently, newsgroup names were simply converted into pathnames.
For example, the 1143rd article in the newsgroup comp.os.qnx might be stored
in a file called "/usr/spool/news/comp/os/qnx/1143." The next article that
came in for that newsgroup would go into /usr/spool/news/ comp/os/qnx/1144 and
so on.
There are a number of reasons why this approach is no longer ideal. For one
thing, articles can arrive in any order: We're not always going to get all of
the articles for comp.os.qnx, then all of the articles for comp.os.rsx11, and
then comp.os.svr4. This means that as articles arrive, the news storage
program is creating files in an ad hoc manner all over the disk. Additionally,
it creates anywhere from a few-thousand to several-hundred-thousand files per
day, depending on the size of the feed!
Given the volume of news these days, disks can fill up quickly. Consequently,
most news systems have an "expiry policy." How long do you retain articles
before deleting them? This usually depends upon two factors--how much disk
space you have and which newsgroups you value. In the unlikely case that the
disk will never overflow, the volume of news coming in must equal the volume
of news being expired. On a typical news system, the expiry processing is done
by a batch file that runs periodically, say, once or twice a day. With current
implementations, expiry processing can take a few hours. Not only are there
many files being created in random places on the disk each second, there also
are roughly equal numbers of files being deleted from different random places
each second. This is suboptimal. 
Once the files are in their appropriate places, it is hoped that a large
number of users will read a large percentage of the articles that have been
stored. Sadly, the vast majority of news articles are never read. They simply
expire.
To make matters worse, the news systems in common use end up copying each
article many times over before placing it in its final resting place. Then,
when it comes time to expire the article, an additional read of the article
takes place; see Figure 1.


How can this Process be Made Better?


Rather than using a general-purpose file system to (suboptimally) manage a
news system, you can use your knowledge of how these files are created, read,
and deleted to create a file system uniquely tailored to handling this
information. However, the drawback of a "new" filing strategy is that a huge
body of free software available on the Internet is oriented toward the current
organization (the /usr/spool/news structure, for instance).


The KNews Program


Still, there are situations when you can leverage operating-system features to
improve things. KNews, the program I present here, takes advantage of some
features of the QNX operating system to more efficiently handle news articles.
Under KNews, one or more programs get and store articles, while a different
program implements a "virtual file system" to keep track of the articles.
The first program (the newsfeeder) knows how to get articles from other
machines. (There actually are two main variants of the newsfeeder: one that
knows how to get articles from NNTP-based hosts, and another that knows how to
get articles from UUCP/rnews feeds.) When the newsfeeder gets an article, it
looks at the header. In conjunction with the local expiration policy file, the
newsfeeder program immediately determines when the article will expire. The
article is then appended to a file that has other articles that expire at that
same time. When the article is placed into the file, the article's position
and its length are noted.
For example, assume you were examining article 677, posted to ott.forsale on
January 15, 1996, at 10:34 GMT. The expiration period for that particular
newsgroup is four days. Consequently, you know that the article will expire on
January 19, 1996, at 10:34 GMT. Therefore, you store the article at the end of
a file called 960119.10 (ignore the :34 because you've defined the expiry
granularity to be one hour). You then make a note of the article's position
within the storage file, along with its length and number: ott.forsale,
art#677, exp 1996 01 19 10:00, pos 36554, len 495.
By the time you received a day's worth of news, you would have created a
number of files, each containing collections of articles. The number of files
created is a function of the expiration policy and the expiration granularity.
For example, with a maximum expiration of ten days and expiration granularity
of six hours, you would create 40 files (10*24/6). This is between 3- and
4-orders-of-magnitude fewer files than conventional means!


Retrieving Articles


Okay, so now all the articles are in a few big files--which I call
expiry-ordered heap (EOH) files--stored in a directory. By convention, this
directory is /usr/spool/ knews. What you really want to do is have the
operating system present a /usr/spool/ news directory structure to all
applications. This way, none of the applications have to be modified to work
with KNews. What this means, in effect, is that you map files under
/usr/spool/news into portions of your EOH files; see Figure 2.
The newsfeeder program already provides the information you need: the
newsgroup name, article number, EOH file name, position within the file, and
length of the article.
Under most operating systems, creating a virtual file system that does this
kind of mapping is a major undertaking. Under the QNX operating system, it
simply involves writing a few hundred lines of code.


Taking Over /usr/spool/news


The first step to implementing a virtual file system is to fake the
/usr/spool/news directory structure. To applications programs, it must appear
that the directory exists and operates just like a disk-based file system
would.

Our virtual file system (VFsys.knews) should export the illusion of two main
object types--subdirectories and files-- and it should support read-only
attempts to access these objects; see Example 1. Since it is a read-only file
system, you'll also need some method of telling VFsys.knews about new
articles, by giving it the article number, expiry date, offset, and length.


A Little Background


QNX is a microkernel, message-passing operating system with most services
implemented via a message. For example, when an application (client) opens a
file, the application's C library open code places the arguments passed to the
open() call into a message, then sends this message to the process that is
managing the file system, Fsys (server).
Fsys decides if the open should succeed or not (based on permissions,
existence of the file, or whatever) and returns a status. Sometime later, the
client might request a read of 200 bytes. Again, the client's library composes
a message, which it then sends out to the server, requesting that a read be
performed. The server will go out to disk, then return the data to the client.

Fsys isn't the only program that is allowed to receive and process open and
read calls, though. Under QNX there is no sacred attribute associated with
programs like Fsys versus user-written programs. However, you can do a neat
trick under QNX that you can't easily do under other operating systems: You
can write a program that assumes responsibility for a portion of the pathname
space. This functionality is a subset of what is generally termed a "resource
manager" under QNX.


What Does a Resource Manager Have to Do?


To successfully take over /usr/spool/news, you have to: 
1. Tell someone that you are now responsible for a portion of the pathname
space (that is, you register yourself).
2. Handle requests from clients (open, read, close, and so on).
3. Handle other messages from a cooperating newsfeeder program (new article
has arrived, expire, and so on).
In order to implement step 1, the function resmgrInit in resmgr.c (available
electronically; see "Availability," page 3) calls qnx_prefix_attach to
associate the program with a particular pathname prefix; in this case,
/usr/spool/news. This causes any process opening files and directories in this
portion of the pathname space to send its messages to our process instead of
the process managing the actual disk drive.
To implement steps 2 and 3, the function serve in resmgr.c waits for messages
from clients (the Receive function call). This is a blocking call: The program
does not continue past Receive until a message has been received. Once you get
a message, you call the appropriate routine from the jump table. You can
receive a whole range of messages, each corresponding to either some
C-function call that a client can execute, or some of the special newsfeeder
messages. In terms of regular messages, you receive messages corresponding to
the open, read, write, readdir, and stat calls. To implement the virtual file
system, you just have to handle these calls. 
The open call specifies which file is being opened. Because you can get
requests from multiple clients (for example, three different terminal sessions
can all do an ls on some portion of /usr/spool/news at the same time), you
have to track processing based upon who is sending the message. So, in the
open handler you will notice an open control block (OCB) associated with the
particular request, via map_ocb. This OCB holds the state information
associated with a particular client's use of the virtual file system.
There are really two types of requests that you can handle: file requests and
directory requests. File requests deal with the functions in Table 1(a).
Directory requests are similar; see Table 1(b). Once you have associated the
context with the open call, you can then expect that you will receive a read
request, followed by a close request. The close request is where you
disassociate the context from the request.
The real work is done in the open (in the case of files) or read (in the case
of directories) functions. For a file, open has to verify that the file
actually exists, then it has to initialize the context for that file.
To determine if the file exists, you look at some internal data structures
that VFsys.knews maintains (in database.c, available electronically). This
data is a network of directories and files, similar to what a file-system
manager would actually maintain on disk; see Figure 3. By following the chain
of directories (shown as circles in Figure 3), you eventually get to a file
entry (shown as a rectangle in the figure). The file entry tells us the
article number, EOH filename, position, and length.
You then open the EOH file, seek to the article's starting position, and store
the file descriptor in the context block. Then, whenever the client performs a
read, your server just reads bytes from the file descriptor and transfers them
back to the client.
There is a similar flow for directories. The directory path is evaluated
against the internal database structure and checked for validity. If invalid,
an error status is returned to the client. If valid, an OK status is returned,
and you can expect that the client will send the server io_readdir messages to
get the directory entries. Again, you make use of the context block, but this
time, you drive the processing by a finite-state machine. The context block's
state variable, ocb --> state, can have one of these state values:
OCBState_Initial, OCBState_GetFiles, OCBState_GetDirectories, or
OCBState_Done.
OCBState_Initial generates the virtual directory entries "." and ".." (each
directory must have these), then transits to either the OCBState_GetFiles or
OCBState_ GetDirectories state, depending upon what is contained in the
internal database structure. OCBState_GetFiles returns directory entries for
files, one-by-one, from the internal database structure. OCBState_
GetDirectories does the same thing for subdirectories.
I use functions like io_stat (for the client's stat function call), io_chdir,
and so on. Their operation is analogous to the operation of the simple open/
read/close and handle/readdir/close messages.
Apart from the standard resource manager messages, there also are a number of
messages that deal specifically with VFsys.knews itself; for example, messages
that tell VFsys.knews about a new article that's just become available, or
tell VFsys.knews to expire articles.


Handling the Expiry Message


When a client sends the expiry message, the Receive in procedure serve
(resmgr.c) gets the message and the newsmsgsJT jump table ends up calling the
procedure newsExpire. The procedure newsExpire calls journalExpire and
databaseExpire, which do the work. databaseExpire runs through all of the
internal database elements, finds the articles that have expired, and removes
them. I optimized for speed here by having the database entries sorted by
expiration. On my system, a 486/33 with about 10 KB of articles being expired,
this took under one second. You could run expire once per minute if you wanted
to, although a practical limit would be once per hour. I run mine every three
hours.


So What's Available?


VFsys.knews is an ongoing project. In addition to the work I've described
here, other developers around the world are working on source code for
configuration, newsfeeders, and management software. If you are interested in
participating in the project, send me e-mail at rk@parse.com. I'm particularly
interested in hearing from anyone who would like to port this architecture to
other operating systems. It might not be as simple as under QNX, but if we
keep the same basic functionality, we can save a lot of grief for system
administrators on other operating systems.


For More Information


QNX Software Systems Ltd.
175 Terence Matthews Crescent
Kanata, ON Canada, K2M 1W8
613-591-0931
http://www.qnx.com
Figure 1: Conventional news flow.
Figure 2: Mapping /usr/spool/news to EOH files in /usr/spool/knews.
Figure 3: Internal database organization.
Table 1: (a) File request functions; (b) directory requests.
File Request resmgr.c Procedure
(a)
Open a file. io_open
Read that file. io_read

Close that file. io_close

(b)
Open a directory. io_handle
Read entries out of
 that directory. io_readdir
Close that directory. io_close
Example 1: A virtual file system for news.
 $ cd /usr/spool/news/comp/os/qnx
 $ ls
 1134 1136 1137 1142
 $ cat 1134
 <contents of article 1134 show up>
 $ cd ../vms
 $ pwd
 /usr/spool/news/comp/os/vms















































Sharing Data Between Web Page Frames Using JavaScript


Implementing a hidden-frame technique




Tom Tessier


Tom is a student in the engineering physics department at the University of
Alberta, Canada. He can be reached at tessier@ee.ualberta.ca.


Although HotJava and client-side image maps are significant features in
Netscape Navigator 2.0, the addition of frames and JavaScript may make a more
immediate impact on average Web users. With these, the very nature of the Web
will be changed from static, boring, non-interactive pages to sites that are
different each time users visit them, containing table of contents information
displayed at all times, interactive games, and more.
In this article, I'll examine how to create such dynamic pages. 


The <Frameset> Tag


Frames divide a Web page into multiple scrollable regions, allowing different
HTML documents to be displayed in a single window. Frames can be targeted by
other URLs on the same page, share data, display table of contents and
copyright information at all times, and so on.
The HTML code that initiates a frame-based page is actually quite small, its
only purpose being to divide the page up into regions and load documents into
these different areas. Two distinct tags are required: 
<Frameset>, used to specify exactly how the frames will appear on the Web
page. 
<Frame>, which defines the various aspects of each frame, including which URL
to load. 
A COLS="parameter" attribute placed within the <Frameset> tag divides the page
vertically, while a ROWS="parameter" command splits the page horizontally. For
example, to divide the page into two equal halves--one frame on the left and
one on the right--you would enter the tag <Frameset COLS="50%,50%">. To
implement both column and row frames, simply nest the <frameset> tags. The
placement of the nested frames is important, since the frame order is based on
the sequence in which the tags are positioned in the HTML; see Example 1 and
the resulting page in Figure 1. 
Frames, by default, are empty and non-functional. You must point each frame to
a document using the identifier <FRAME SRC="URL">. The physical frame
referenced by the SRC command depends on the location of the current <Frame>
tag. For example, if the second FRAME SRC statement within a <Frameset> points
to THISURL, then THISURL will be loaded into the second frame; see Example 1
and Figure 1. For a complete list of attributes available for both the
<Frameset> and <Frame> tags, see Figure 2. Note that a frame HTML document has
no BODY, and no tags normally associated with the <Body> identifier may appear
before the <Frameset>.


Referencing Frames


Frames can be referenced using the TARGET command. By placing TARGET=
"window_name" into the anchor, client-side image map, or form tags, the URL
pointed to by the tag will be loaded into the frame called window_name; see
Example 2.
Another important JavaScript document object useful to frames is
location.href, which lets you dynamically alter the location of the current
window or frame. A particularly powerful use of this object is to make the
frame point to itself, allowing the JavaScript code to generate a different
page at various times. This is done by simply entering
location.href=location.href. When encountered, this causes the page to reload
and reexecute. The problem is, all of the data in the page will be reset every
time the URL reloads. Obviously, if you plan to keep track of variables
between frame sets, a different scheme must be used.
The key is to store the page in a frame. Then, using an undocumented feature
of JavaScript, a dynamically loadable page may be created. The undocumented
object, parent.variable_name, allows a child frame to freely access the
parent's JavaScript data, as illustrated in Example 3 (the page is displayed
in Figure 3). Notice that at least two frames must be set up before Netscape
will initiate the frame session. The second frame is sized using the parent
frame to hold all of the child's variables; the child may reload at will with
no risk of the parent losing its data.
You also can use the parent frame to store actual HTML code. This is useful in
creating interactive pages by writing the variable's contents to the page at
the start of the document (writing to the page after a document has loaded is
illegal). For example, insert the line document.write (parent.code); see
Example 3. It is even possible to have code change with time by simply writing
new text into the parent.code variable. Conceivably, one could use this
technique to store all of the required HTML documents for a page into a single
file. 
It should be noted that Netscape Navigator 2.0, although now in release, is
still buggy and can have problems with frame-data accessing. For example,
resizing the page will cause the entire document to reset. Keep this in mind
when using frame-data sharing.


Creating an Interactive Questionnaire


In my article, "Using JavaScript to Create Interactive Web Pages" (DDJ, March
1996), I presented JavaScript code that generated a simple multiple-choice
page: It prompted the user to select an option and displayed the results via
form inputs. Listings One and Two are that same multiple-choice code converted
to the dynamic-frame method. As can be seen in Figure 4, the resulting page is
much more attractive, looking more like a CGI-generated document than a cheap
form hack. The advantages of using the dynamic-frame method, as opposed to the
form-output technique are many. The resulting Web pages are not restricted
solely to forms for output: Pictures, fonts, colors, and anything else that
would ordinarily be placed in HTML may be used.


Conclusion


Frames are a powerful addition to HTML. By simply adding frames to a Web page,
the documents contained there can be much easier to use. Frame-based table of
contents information, form results, and so on, increase the quality of the
World Wide Web. Used in conjunction with JavaScript, filenames allow the
creation of impressive pages previously realizable only through complex
server-based CGI.
Figure 1: The Web page resulting from the HTML code in Example 1.
Figure 2: Summary of the frame tags and their attributes: (a) <Frameset> tag;
(b) <Frame> tag; (c) <Noframe> tag.
(a)
<FRAMESET parameter> frame data </FRAMESET>. Divide a page into frames, where
frame data holds the <FRAME> and nested <FRAMESET> tags, while parameter is
one of the following:
 ROWS = "height value list" --specifies the number and height of row
(horizontal) frames to divide the page into 
COLS = "width value list" --indicates the number and width of column
(vertical) frames to divide the page into.
To have both columns and rows on a page, the <FRAMESET> tags must be nested
(see Example 1). The height and width value lists follow the same syntax: a
comma separated list, with the total number of items indicating the desired
number of row or column frames. The format of each item in the list is as
follows:
Integer Value. Indicates the size, in pixels, of the frame (if not used in
conjunction with the values below, NetScape will frequently override this
value to force the frame to fill up the entire window).

Percentage Value. Indicates the percentage size of the frame (if the total
percentage does not equal 100 percent, NetScape will force the frames to fill
up the entire window). For example: <FRAMESET COLS="50%,50%"> divides the page
into two equally sized column frames, one on the left and one on the right.
Relative Value. Implied with the *, relative values specify the size of each
frame in relation to the remaining items in the list. For example:
 <FRAMESET ROWS="*,*,*"> creates a total ratio of 3 (1+1+1), dividing the page
equally into three rows each taking up 1/3 of the space: one frame on top, one
in the middle, and one on the bottom.
 <FRAMESET ROWS="2*,*"> creates a total ratio of 3 (2+1), giving 2/3 of the
space to the top frame, 1/3 to the bottom.
 <FRAMESET ROWS="2*,*,*"> creates a total ratio of 4 (2+1+1), giving 1/2 of
the space to the top frame, 1/4 to the middle, and 1/4 to the bottom.
 <FRAMESET ROWS="3*,*,2*,*"> creates a total ratio of 7 (3+1+2+1), giving 3/7
for the top row, 1/7 for the second, 2/7 for the third, and 1/7 for the
bottom-most. 
In general, it is easiest to use percentages. Mixing of the three types of
values is also allowed. For example:
 <FRAMESET COLS="100,*,100"> assigns 100 pixels for the width of the left-most
and right-most frames, while the rest of the space is given to the middle. 
 <FRAMESET COLS="20%,*"> assigns 20 percent of the space for the left frame,
and gives the remaining space to the right-most. A value of 80 percent in
place of * is equivalent.

(b)
<FRAME parameters>. Define the document attributes of each frame, where the
parameters are:
 SRC--"url" specifies the document URL to load into this frame. Without SRC,
the frame is blank (and will not be functional).
 NAME="window name" indicates the name to assign to this frame so that it can
be targeted by documents in separate frames, via the TARGET attribute.
 MARGINWIDTH="value" controls the frame margin width in pixels (used to
control how the document text will be constrained by the left and right sides
of the frame).
 MARGINHEIGHT="value" controls the frame margin height in pixels.
 SCROLLING="yesnoauto" defines whether the frame should have a scrollbar or
not. Auto instructs Netscape to place scroll bars as necessary.
 NORESIZE if present, indicates that the frame cannot be resized by the user.
Useful in implementing a hidden frame.
(c)
<NOFRAMES> Alternate HTML code </NOFRAMES>. The NOFRAME tag provides
alternative content viewable by non-Frame-capable clients. NetScape Navigator
2.0 will ignore all tags and data contained between the start and end NOFRAME
tags.
Figure 3: The page resulting from Example 3.
Figure 4: Typical page using JavaScript for frame data sharing (dynamic-frame
method).
Example 1: Sample HTML file illustrating how to divide a page into frames.
<HTML>
<!-- COLS divide the page vertically; ROWS divide the page
horizontally -->
<FRAMESET COLS="50%,50%"> <!-- divide page vertically down the
middle - creating two equal halves: one on the left, the other on the
right -->
 <FRAMESET ROWS="40%,20%,40%"> <!-- divide the left half horizontally
into three pieces: one on top, a smaller one in the middle, and one on
the bottom -->
 <FRAME SRC="frame1.html"> <!-- set the top 40% pointing to
frame1.html -->
 <FRAME SRC="frame2.html"> <!-- point the middle 20% to URL
frame2.html -->
 <FRAME SRC="frame3.html"> <!-- set the bottom 40% to
frame3.html -->
 </FRAMESET>
 <FRAMESET ROWS="*,*"> <!-- divide the right half horizontally
into two equal pieces: top and bottom. Could have used "50%,50%"
instead. -->
 <FRAME SRC="frame4.html"> <!-- set the top half to
frame4.html -->
 <FRAME SRC="frame5.html"> <!-- set the bottom half to
frame5.html -->
 </FRAMESET>
</FRAMESET>
</HTML>
Example 2: Sample HTML code showing how to use the TARGET command to load URLs
into another frame. (a) index.html contents; (b) text1.html contents; (c)
init.html contents.
(a)
<HTML>
<FRAMESET COLS="35%,*">
 <FRAME SRC="text1.html">
 <FRAME SRC="init.html" NAME="main_frame">
</FRAMESET>
</HTML>

(b)

<HTML>
<BODY>
Table of Contents:<BR>
<!-- by default, target all links to frame "main_frame" -->
<BASE TARGET="main_frame">
<A HREF="learn.html">1. Learning Opportunities</A><BR>
<A HREF="codes.html">2. Activation Codes</A><BR>
<A HREF="hiring.html">3. End of Year Hiring Positions</A><BR>
<A HREF="firing.html">4. End of Year Firing Positions</A><BR>
<!-- place a clickable client side image map here now -->
<IMG SRC="navimage.gif" ALIGN="center" USEMAP="#navbar">
<MAP NAME="navbar">
<!-- target the client side image map link to the main_frame -->
<AREA SHAPE="RECT" COORDS="0,0,19,19" HREF="copyright.html"
TARGET="main_frame">
<AREA SHAPE="RECT" COORDS="20,0,39,19" HREF="logo.html"
TARGET="main_frame">
</MAP>
<!-- target the form results to the frame "main_frame" -->
<FORM METHOD="GET" ACTION="cgi-bin/process.cgi"
TARGET="main_frame">
Please enter the job you are planning on applying for:<BR>
<INPUT TYPE="text" SIZE=50 NAME="JOBTITLE"><P>
<INPUT TYPE="submit" VALUE="Send">
<INPUT TYPE="reset" VALUE="Clear">
</FORM>
</BODY>
</HTML>

(c)
<HTML>
<BODY>
<H1 ALIGN="center">Joe's Pizza Corp. Job Information</H1>
</BODY>
</HTML>
Example 3: Sample HTML files illustrating dynamic frame reloading and frame
data sharing using JavaScript; (a) index.html contents; (b) main.html
contents.
(a)
<HTML>
<SCRIPT LANGUAGE="LiveScript">
// variables defined here may be accessed by any of the frames defined
// below via parent.variable_name command. NOTE: if NetScape window
// is resized, the variables will be reset to their initial values.
var count=0
var code="<BR>HTML code stored in Parent frame."
code=code+"<BR>Standard <H1>HTML</H1> may be used."
</SCRIPT>
<!-- must define two frames... although the second frame is not used, a
minimum of two frames is needed before netscape will initiate a frame
setup -->
<FRAMESET ROWS="*,0">
 <FRAME SRC="main.html">
 <FRAME SCROLLING="no" NORESIZE>
</FRAMESET>
</HTML>

(b)
<SCRIPT LANGUAGE="LiveScript">
<!-- hide this script tag's contents from old browsers
function refresh()

{
location.href=location.href // reload this frame
}
parent.count++ // increment variable which counts the number of
// times the page has been loaded
document.write ('<FORM><LI><INPUT TYPE="radio" onClick="refresh()">')
document.write ("The page has been loaded "+parent.count+" times.") 
document.write ("<BR>"+parent.code) // write the Parent frame HTML
// code stored in the code variable
<!-- done hiding from old browsers -->
</SCRIPT>

Listing One
<!-- contents of index.html file -->
<HTML>
<SCRIPT LANGUAGE="LiveScript">
<!-- hide from old netscape browsers -->
// variables defined here may be accessed by any of the frames defined
// below via the parent.variable_name command. NOTE: if the NetScape window
// is resized, the variables will be reset to their initial values.
// ie: these variables are available to the child frames.
// variables
var totalnum=3 // total # of questions
var correctans=0 // the question # (from 0 to N-1) of the correct
// answer to each question. Set in Listing2 (scratcha.html)
var count=0 // counter variable. Initialized to zero
var arraycount=1 // index into the questans array. Intialized to 1
// since that is where the first array string begins.
var totalright=0 // total # of questions answered right (0 initially)
var correcttext="blank" // the text of the correct answer
<!-- done hiding from old netscape browsers -->
</SCRIPT>
<!-- must define two frames... although the second frame is not used, a
minimum of two frames is needed before netscape will initiate a frame
setup -->
<FRAMESET ROWS="*,0"> <!-- give the 2nd frame 0 pixels, and assign 
the rest of the space to frame 1 via the "*" -->
 <FRAME SRC="scratcha.html">
 <FRAME SCROLLING="no" NORESIZE>
</FRAMESET>
</HTML>

Listing Two
<!-- contents of scratcha.html file -->
<SCRIPT LANGUAGE="LiveScript">
<!-- hide this script tag's contents from old browsers
// editable local frame variables
 var totalans = 3 // total # of answers per question (used to
// generate the question/answer array below)
// This function defines an array such that the first property, length, (with
// index of zero), represents the number of elements in array. The remaining 
// properties have an integer index of 1 or greater, and are initialized to 0.
function MakeArray(n) {
 this.length = n;
 for (var i = 1; i <= n; i++) {
 this[i] = 0 }
 return this
 }
function checkout(questionnum)

{
 if ( parent.count > parent.totalnum ) // if user 
// clicks a radio button after the test is complete, display an alert
 {
 alert('To return to the main page, click on "Return to Main Page."')
 }
 else
 {
 if ( questionnum == parent.correctans) // if the currently
// selected question is the correct response, say so
 {
// dunno if allowed to do below line
 parent.totalright++
 alert("Correct.")
 }
 else // if wrong answer, display the correct one
 {
 alert("Incorrect. The answer is:\n"+parent.correcttext)
 }
 parent.count++ // increment count so can goto next question
 if ( parent.count == parent.totalnum) // if count =
// the total # of questions
 {
 parent.count++ // then increment count again so is 
// greater than totalnum so can activate the alert above if user
// tries to click on a radio button after the test is complete
 }
 }
location.href=location.href // reload the page (will reload the page
// into which this function is placed) so can update using the new
// values stored in the parent frame's variables
}
// formula for # of array elements:
// total number of questions + 1 (have to include the Test is 
// Complete question at the end of the question data) times total
// answers allowed per question + 2 (plus two because have to have a
// string for the actual question itself and for the value indicating
// the correct answer). eval used to convert (totalnum)*(totalans+2) 
// into a number usable to pass to MakeArray... need eval since the 
// expression is inside a function call (ie: inside the MakeArray call)
questans = new MakeArray(eval((parent.totalnum+1)*(totalans+2)));
// array for question list (start with question #2 - define question
// one in the document's html code below)
questans[1] = "When our star (Sol) dies it will most"+" likely become:"
questans[2] = "A black hole."
questans[3] = "A white dwarf."
questans[4] = "A neutron star."
questans[5] = 1 // correct answer here is "A white dwarf" (list
// answers from 0 to ans#-1)
questans[6] = "Pyconuclear reactions are:"
questans[7] = "reactions which require high density."
questans[8] = "reactions which depend on heat."
questans[9] = "reactions which require low density."
questans[10] = 0 // correct answer here is "reactions which require
// high density" (list answers from 0 to ans#-1)
questans[11] = "A white dwarf maintains its compact shape via:"
questans[12] = "coulombic repulsion."
questans[13] = "fermi neutron pressure."
questans[14] = "fermi electron pressure."

questans[15] = 2 // correct answer: "fermi electron pressure."
// end of editable questions. Don't change the below (just the text
// displayed at the end of the test)
questans[16] = "The test is complete. Thank you."
questans[17] = ""
questans[18] = ""
questans[19] = ""
questans[20] = 255 // correct answer: none
// NOTE: will also need some variables in hiddenfr.html stored into
// separate forms so that can save data between URL reloads (ie: when
// location.href is set the variables will be destroyed if not stored
// in the new URL... easier to code if just place data in the hidden frame)
if ( parent.count > parent.totalnum ) // if user 
// completed all questions, display the Finished page.
 {
 document.write("<H1>The Test is Complete. You got "+
parent.totalright+" correct out of "+parent.totalnum+
" questions.</H1>")
 }
else
 {
// make arrayind = to the parent frame variable arraycount
arrayind=parent.arraycount
// write the form tag
document.write ('<FORM METHOD="post">')
// increment arrayind after each use below
// and place a <BR> tag after each to seperate each text item.
// write the question
document.write ("<H1>",questans[arrayind++],"</H1>")
// write radio button for answer 1
document.write ('<LI><INPUT TYPE="radio" onClick="checkout(0)">')
// and check if the correct answer for this question is answer 1
if ( questans[arrayind+3] == 0 )
// and if so, set the rightans variable to the text of the correct answer
 { parent.correcttext = questans[arrayind] }
// write answer 1
document.write (questans[arrayind++],"<BR>")
// write radio button for answer 2
document.write ('<LI><INPUT TYPE="radio" onClick="checkout(1)">')
// and check if the correct answer for this question is answer 2
if ( questans[arrayind+2] == 1 )
// and if so, set the rightans variable to the text of the correct answer
 { parent.correcttext = questans[arrayind] }
// write answer 2
document.write (questans[arrayind++],"<BR>")
// write radio button for answer 3
document.write ('<LI><INPUT TYPE="radio" onClick="checkout(2)">')
// and check if the correct answer for this question is answer 3
if ( questans[arrayind+1] == 2 )
// and if so, set the rightans variable to the text of the correct answer
 { parent.correcttext = questans[arrayind] }
// write answer 3
document.write (questans[arrayind++],"<BR>")
// write the closing form tag
document.write ("</FORM>")
parent.correctans = questans[arrayind++] // and set the new
// correct answer
// and save the new arrayind value into the parent frame variable
parent.arraycount=arrayind

// skip two lines
document.write ("<BR><BR>")
// now display the number of questions the user has chosen correctly so far
document.write ("You have completed "+parent.count+" of "+
parent.totalnum+" questions, with "+parent.totalright)
 if ( parent.totalright == "1" ) // if only one right, make
// sure use the word "answer" instead of the plural form "answers" 
 {
 document.write (" correct answer.")
 }
 else
 {
 document.write (" correct answers.")
 }
 }
// skip two lines
document.write ("<BR><BR>")
// link to main page:
document.write ('<LI><A HREF="index.html">Return to main page.</A>')
<!-- done hiding from old browsers -->
</SCRIPT>










































PROGRAMMING PARADIGMS


Programming Perfection




Michael Swaine


This item is from the "Programming Paradigms" and "Swaine's Flames" Buglist
(along with a heartfelt "Gee, thanks!" to all you beta testers out there):
"Won't bunking lead to an increase in the 'american populous'?" e-mails Jim
Weaver from Oregon, suggesting that "perhaps Cowlishaw's observation [in the
March 1996 DDJ article 'A Conversation with Michael Cowlishaw'] about
programmers and debuggers applies equally well to writers and spell checkers."
Jim is using a term that I invented in my March 1996 "Swaine's Flames" column
to comment on a misspelling that I perpetrated in that same issue's
"Programming Paradigms" column. This marks Jim as a Double-Columned Editorial
Ego Stroker and guarantees him a mention here. The only problem is, I didn't
make the mistake Jim cites. (I refer now to the erroneous use of "populous"
for "populace." Lowercasing "American" is Jim's own error--unless it's some
kind of political statement.) No, the mistake I made was an entirely different
mistake. What I wrote was "populus," which is not even an English word. An
editor fixed it into "populous."
As I told Jim, rather than Cowlishaw's observations, I would point to
Poncifrete's advice: "Correcting an error is the most error-prone activity in
programming. Never do it."
Sound advice in any field, with the possible exception of politics. I'm
thinking of New Jersey Congressman Bill Martini's pet phrase, "Correct me if
I'm not mistaken," the logic of which makes me dizzy. 
But I digress.
Some programmers aspire to Poncifrete's standard of perfection. Such
perfectionists really believe, in the tightly-knotted ganglia of their neural
nets, that they ought to be able to sit down cold and code that routine clean
and complete in one pure act of immaculate cerebration. But I want to talk
about one kind of perfectionist, the all-inclusive visionary kind. The kind of
perfectionist who says, "All of computing can be viewed as (fill in the
blank)." 


Everything is Compression


Gerry Wolff believes that all of computing can be viewed as compression. Wolff
is a professor at the School of Electronic Engineering and Computer Systems at
the University of Wales at Bangor. His vision encompasses other fields, too:
Besides "Computing as Compression," he has written papers on "Cognition as
Compression" and "Language Learning as Compression." To be followed, I have no
doubt, by "Coal Production as Compression," a hot topic in Wales, and "Editing
as Compression," which--uh, yeah, I can take a hint.
Wolff's views seem well thought out and original. He has demonstrated their
applicability in many areas of computation, too (he'd have to). None of his
demonstrations have set any records for performance, but he argues that that's
because they are hybrids--his paradigm implemented with tools and on hardware
based on a different paradigm. We may have to wait for Wolff to build a
computing-as-compression machine before we can truly evaluate his program, but
he's working on that, too. (You can tap into his research program at http://
www.sees.bangor.ac.uk/~gerry/sp_summary.html.)
His basic conjecture is that "all kinds of computing and formal reasoning may
usefully be understood as information compression by pattern matching,
unification, and search." Several of these terms require definition, not the
least of them being the word "usefully."
By "unification," Wolff means something less than logicians and Prolog
programmers mean by the term. (In Prolog, correct me if I'm not mistaken, the
term means something like "banging two assertions together to see what
conclusions they spark.") Wolff's unification means a simple merging of the
matching patterns returned by the pattern-matching phase. For example, a
simple compression algorithm might look for repeating patterns (aaaaaaa,
161616), using some matching methods to recognize repetitions, and unifying
the patterns by merging them in some redundancy-reducing fashion (ax7, 16x3).
These examples of matching and unification are two methods among many, and
that's where the search comes in: Wolff's model, which he calls "SP" (short
for "simplicity and power"), involves narrowing the options using some
metrics-guided search technique like hill climbing.


The Road Not Taken


Wolff thinks that the multiple-alignment technique used in DNA analysis is a
good approach to matching and unifying information content, and he proposes
using it in his model. He points out that it has been promoted for use in
unsupervised learning, fuzzy-pattern recognition, deductive and probabilistic
reasoning, and other tasks.
Wolff predicts that computers built and programmed along the SP model would
produce less-brittle software, greater flexibility, better performance in the
face of uncertain or incomplete information, at least partial automation of
the process of software development, and elimination of the need to write
search routines (because powerful search techniques would be built into the
model). Some of these sound like the predicted benefits of neural-net or
fuzzy-logic systems, and SP has a lot in common with those approaches.
I don't know whether or not Wolff is onto something promising, but there is
one thing about his notions that I find intriguing. The current trend in
software development--toward building from components--makes it hard to judge
the overall efficiency of a system. When lots of people build roads to get
them where they want to go, and then you design a route to your goal over
these roads, all you can say with certainty is that you've chosen an efficient
path to your goal using the available roads. This says nothing about the
crow-fly distance from where you are to where you want to be. Wolff's SP
model, and I hope he'll correct me if I'm not mistaken, is very much concerned
with searching for an efficient path from here to there. And that's certainly
worth looking into.
There must be a reason why these grand visions seem so often to pop up in
out-of-the-way places like Bangor, Wales. Actually, Bangor is the center of
the universe compared to the place where Konrad Zuse created his grand vision.


Springtime for Hitler


April 1945 was not a good time to be in Berlin. As the Allied troops
approached the city, one lesson to be drawn from recent events was the
importance of choosing the right name for your product. No matter that the
Versuchsmodell 4 at Gttingen University was merely a computer. Any device
called the V4 was bound to attract the attention of the occupation troops,
even if it had no connection with the V1 and V2 buzzbombs. Konrad Zuse had
invented the V4, as well as the V1 through V3--which were destroyed in the
bombing--and he got permission to transport the surviving V4 to a safer
location.
There ensued a hairsbreadth escape. Zuse and the V4 trucked south into
Bavaria, deep into the Bavarian Alps, along the Austrian border, to the tiny
town of Hindelang, and then to the even smaller village of Hinterstein. In a
small shed in Hinterstein, the V4 found a seemingly good hiding place.
A few days later, the French army was in Hinterstein.
The V4 managed to escape notice, hidden in a basement that time, but the next
time it wasn't so lucky. It was discovered by British Secret Service officers,
who, however, lost all interest in it when they determined that it had nothing
to do with buzzbombs.
What with funds dried up, his staff dispersed, and the need to hide the
machine in the basement whenever company was expected, there was no way Zuse
could get any work done on the V4. There in Hinterstein, with time on his
hands and ideas in his head, Zuse settled into theoretical work. Outside,
spring was breaking through with Alpine earnestness. Inside, Zuse began the
design of a programming language. Since he had no machine to run it on, this
was a purely theoretical enterprise, which suited Zuse all right.


Everything is Bits


Zuse had already developed some thoughts about language. All of computing, he
believed, can be viewed as starting at the bit. This was in fact the guiding
principle behind his machines, the V1 through V4. Zuse didn't just mean that
he wanted to build machines that calculated in binary, although he did mean
that, in part, and that was not by any means a given in the early 1940s.
Historically, calculating devices had often worked in decimal, and it was not
immediately obvious that binary was better.
"The first principle of the Plankalkul," Zuse wrote later, "is: data
processing begins with the bit." Building on the bit, you can make programs of
arbitrary complexity because you can derive all of predicate and propositional
logic; and you can also create the most complex data structures. The bit is
it.
There were various other nifty insights in Zuse's language. Program flow in
Plankalkul was tightly controlled. It had no GOTO statement; in fact, its
version of the IF statement had no ELSE clause. It did, however, have a
subscripted FIN statement that caused a jump out of a number of iteration
levels.
Zuse introduced into his notation what we would today call invariants,
tracking the mathematical relationships between variables used, an idea that
didn't reappear in programming languages until C.A.R. Hoare brought it up in
the language FIND in 1971. Example 1(a) is an approximation of Zuse's
notation. This statement (it's a single statement, although it spans three
lines) maps an 11-element floating-point array into an array of 11 ordered
pairs, with data types defined elsewhere. Or so they tell me. The notation is
a little obscure.
But Plankalkul's most outstanding feature was its hierarchical data
structures, the outgrowth of Zuse's insistence on the bit as central to
everything. Floating point was not a built-in datatype, but was defined as
(3xS0, 7xS0, 22xS0). This means 3 bits, 7 bits, 22 bits (S0 was the sole
provided primitive datatype, the bit), and these three segments were used,
respectively, for sign and other markers, exponent, and mantissa. Once
floating point was defined it could, in turn, be used in defining more complex
datatypes, which could be arbitrarily complex, with heterogeneous components
of variable length.
Nothing like this came into programming languages again for a decade.
Curiously, Plankalkul didn't have much impact on subsequent programming
languages. In particular, it had little effect on the design of Algol, even
though some of the Algol designers knew of Plankalkul and even though it had
many features Algol was to have, as well as some it lacked. Some who were
involved in the Algol spec say that the reason was that Plankalkul had a
horrible notation and was, well, too perfect.
Zuse saw it differently, of course. History doesn't always speak with one
voice.



History is Bunk


Which brings us back to the "Programming Paradigms" and "Swaine's Flames"
Buglist: Because history does not speak with one voice, writing about history
is a sure way to get mail. If you don't make your own mistakes, you will
repeat the mistakes of your sources, and your readers will nail you either
way.
Robert Patterson writes to say, "I'm sure I'm the zillionth person to tell
you, but (for the record) I believe that the ENIAC revival of Feb. 14 occurred
at the University of Pennsylvania in Philadelphia, rather than at the
University of Pittsburgh as you reported in the March '96 issue of Dr. Dobb's.
I used to walk by the ENIAC every day when I was attending the U of Penn, and
occasionally I went in to peek at it."
Robert is right, of course.
And Gary Brown writes, "I hold in my hot little hand (no mean task when you
are touch typing) the manual 'GENIACS: Simple Electric Brain Machines, and How
to Make Them' (c) 1955 by Oliver Garfield. I think your reference to Mr.
Berkeley may be in error (unless Mr. Garfield only wrote the manual...who
knows...). I can still remember putting all the little screws on the masonite
disks!"
I'm still researching this one. I welcome e-mail from anybody who knows for
sure who created Geniac.


Everything is a Function


Not all my mail goes to the Buglist. Says Matteo Vaccari, "I've read with
interest your columns about languages other than C, in Dr. Dobb's. May I
suggest you to take a look at an interesting language called Haskell? It is a
functional language, with a few twists. For one thing, it is entirely
functional: Unlike Lisps and Schemes, there is no imperative subset." Aha, I
thought, a pure functional language with no flaw in that crystalline paradigm.
That's the kind of thing I'm talking about this month. Haskell is a lazy,
functional-programming language invented by Mark Jones at Oxford University.
It has been implemented, at least in part, for DOS, Mac OS, Atari, Amiga, and
Sun OS in the form of a free interpreter called "Gofer" (no relation to the
Internet utility Gopher) or MacGofer. You can find Gofer and MacGofer at
http://www.inesc.pt/free-dir/free-S-1.53.html. A lot of info on Haskell itself
is available at http://www.cs.yale.edu/ HTML/YALE/CS/haskell/. Haskell has
many of the sorts of features you'd expect in a more or less modern
programming language, and, as Example 1(b) illustrates, has an unsurprising
syntax.
Haskell has the usual built-in datatypes, including lists, which are as useful
in Haskell as they are in that other functional language, LISP. List functions
like map, which applies a function to all elements of a list, are central to
the way you program in Haskell. Strings are implemented as lists of
characters.
Pattern matching also is central to Haskell. The declaration of a function, in
fact, involves the specification of patterns to be matched when the function
is invoked. These patterns are just the formal arguments of the function, so
the same assertion could be made regarding any programming language that
supports functions, but in Haskell these patterns can be more complex,
including lists, tuples, wildcards, and other forms. Haskell also supports a
LET statement, which makes its claims to be a pure functional language without
assignment statements suspect, but Haskell's LET is not Basic's LET. It's used
in a very limited way for local declarations.
But I wanted to focus on the essence of Haskell, which is this grand vision of
pure functional programming.
The briefest definition of functional programming is "everything is a
function," although "programming without assignment" is a close second. It may
come as a surprise, not to you but to self-trained Basic hackers, that it is
possible to live without an assignment statement. (Zuse, incidentally,
invented the assignment statement. That many other computer pioneers also
invented it independently doesn't take away from this original act of
invention. Like many successes, the assignment statement has many parents.)
But it's more revealing to point out the lack of side effects of pure
functional programming. It has several desirable implications: Individual
functions can be analyzed in isolation, and you can convince yourself of their
correctness without having to consider other parts of the program; you can
perform sophisticated program transformations; and you can do lazy evaluation.
Lazy evaluation allows for some intuitively satisfying techniques, like
defining the Fibonacci series as an infinite list and selecting from that list
the nth element. Without lazy evaluation an infinitely long list can only be
dealt with indirectly if you don't have infinite time at your disposal, and
who does, these days.
Some readers would like to rewrite my columns, like Glenn Linderman, who
rewrote the closing sentence in my February 1996 column. That column discussed
the language ReWrite and ended, "I hope to write about it [Mathematica] soon."
Glenn thinks this would have had more punch if I had said, "I hope to ReWrite
about it [Mathematica] soon." "Since you've discussed it before," Glenn says,
"you could have gotten away with it." Heck, Glenn, I wish I'd thought of that.
What a perfect way to end the column.
Example 1: (a) Approximation of Zuse's notation; (b) Haskell syntax.
(a)
P2 / R(V) => R
 V/ 0 0
 A/ 11 X d1 11 X 2

(b)
fact n = product [1 . . n]
comb n r = fact n / (fact r * fact (n-r))
comb 5 2
entered at the interpreter's prompt gets the response
10





























C PROGRAMMING


SGML, Singapore, Bill's Book, and Quincy 96's DDE




Al Stevens


Several years ago, I worked on a government project developing what can best
be described as a "document processor" application. The project was based on a
master text database of construction-engineering specifications. The database
constituted a comprehensive document boilerplate from which sections were
extracted to build custom specifications for the construction of buildings.
The project resulted in a software application that runs on PCs and supports
several agencies. The version that I worked on was a DOS text-mode application
running on AT-class machines.
We integrated a word processor into the application, selecting one with an
all-ASCII format for the document database so that our software could readily
extract and renumber selected paragraphs, reconcile references, and so on. The
project took some criticism because of the commercial word processor we used.
Contractors who wanted to bid on government construction projects had to use
our system, which meant that they had to buy the commercial word processor.
That was viewed as an undue burden to assess on contractors who were
struggling to make ends meet and still remain competitive. However, short of
writing a custom word processor, we had no alternative. No one in the
government wanted to fund the development of a word processor when perfectly
good commercial ones existed.
Eventually, when Microsoft and WordPerfect took all the DOS word-processor
business away from the little guys, the vendor of our word processor dropped
the product. The government bought rights to the software so that the project
could continue to support it. The burden of paying for the word processor was
thus lifted from subsequent users of the system.
Several years ago, the project team began converting the system to Windows
3.1. At about the same time, they modified the database to support the
Standard Generalized Markup Language (SGML) because that format is a
Department of Defense standard for text databases. SGML is an all-ASCII text
format that uses embedded tags to associate parts of a document with the
document's unique Data Type Dictionary (DTD), which defines the document's
structure and rules. The structure depends on the hierarchical format that
most documents adhere to. The SGML format defines only a document's structure,
whereas the proprietary formats of word processors define a document's
appearance. An SGML document is therefore portable to any platform that
supports SGML and can be rendered on that platform's printers, CD-ROMs, Web
pages, and so on. The Hypertext Markup Language (HTML) is itself implemented
as a DTD within SGML.
No SGML Windows word processor existed three years ago when the project team
elected to support SGML. Consequently, they undertook the development of a
Windows-hosted text editor that looks like a typical Windows word processor,
supports SGML, and incorporates some of the application-specific functions.
Funding was made available because there was no other choice. This approach
had the added advantage that, once again, users would not be required to
purchase a commercial product in order to use the application.
Freeze frame at this point. Design decisions were made based on what was known
to be available at that time. Crystal balls were not--and still are
not--standard government issue.
Now, fast forward three years to the present. The Web reigns. Electronic
publishing is ubiquitous. CD-ROM authoring systems abound. SGML is a hot item.
Microsoft introduces "Author," a two-way filter that converts between SGML and
Word formats. Novell introduces WordPerfect 6.1 SGML Edition, a native SGML
word processor. There are many commercial, freeware, and shareware SGML
parsers and editors. The world is about to embrace SGML.
Back home, the project's custom SGML editor is less than an overwhelming
success. It runs slowly on low-end machines and was released too soon, giving
users early exposure to the dreaded Windows GPF. What's more, by the time the
editor was released, users had become accustomed to, and spoiled by,
feature-laden commercial Windows word processors. They didn't like the
application's slow, unreliable, feature-poor, custom SGML editor.
Quite recently, and with the perfect wisdom of 20/20 hindsight, someone in the
user community wrote a letter to his senator suggesting an investigation into
why the government is wasting money developing an SGML editor when commercial
Windows word processors with SGML capabilities are available. The senator's
staff, not knowing anything whatsoever about the issue, sent an inquiry under
the good senator's signature to the agency, probably just to satisfy the
would-be whistle-blower. The matter should have died there, but letters of
inquiry from U.S. senators cannot be ignored. This one resulted in many hours
of effort and piles of paperwork at all levels from the agency's top
management down to the project team, as everyone scrambled to explain what
they were doing and why. Your tax dollars at work.
Which brings me to the point. Software technology shifts are constantly and
rapidly stimulated by market pressures and advances in hardware bandwidth. No
one can predict with any certainty or accuracy how things are going to change.
When you launch a development project that has much more than a few months on
its schedule, by necessity you must base it on technology that can be
obsolete, dramatically changed, or even unavailable by the time of your first
scheduled release.
The earlier experience with the soon-to-be-obsolete DOS text-mode word
processor is a case in point. At the time, that product was the only one
available that suited the project's needs. The fallout from that decision
could have been costly. If, in the course of going belly-up, the vendor had
shredded the source code, a major government project would have been seriously
impacted.
Programmers today are faced with those kinds of decisions all too often. I get
criticized for endorsing the Microsoft Foundation Classes when there are many
other fine framework class libraries available. But I am above all a
pragmatist. A tool's potential and propensity for endurance are as important
as its momentary usefulness. Any development project of any consequence will
span years. Our selection of a tool must consider the likelihood that the
tool's life span will equal that of the project that uses it. Don't sweat the
small stuff. You can change editors in midstream. You can switch to a better
debugger if one comes along. But if you base your product on the proprietary
architecture of someone else's software, you'd better be sure that support
will continue and that the software will grow and evolve along with the other
parts of the system.


CD-ROMs and Long Filenames


Take a look at the CD-ROMs that Microsoft ships with Windows 95 development
products on them. Visual C++. Windows 95 itself. Do you see long filenames?
Not many. Wonder why? I think I just found out. While developing a companion
CD-ROM for a Windows 95 C++ programming book, I built a prototype of the
CD-ROM on 100-MB Zip disks, which I sent to the publisher to get mastered onto
a CD-ROM. When the CD-ROM master was written, my publisher called in a panic
to say that she could not read some of the subdirectories and files on her
Windows 95 computer. She sent the CD-ROM to my coauthor who had no problem
with it. He sent it to me, and I tried it out on three different machines. It
worked fine on all but one of my computers, the one with the Teac SuperQuad.
That drive has abominably slow Windows 95 drivers, so I'm using the DOS 16-bit
drivers instead. The penalty is that I cannot share the drive on the network.
With that as a clue, I went to one of the other machines, one that worked okay
with the CD-ROM, removed the Windows 95 drivers, and installed the DOS 16-bit
drivers into CONFIG.SYS and AUTOEXEC.BAT. With that configuration, long
filenames are unseen by Windows 95; their 8.3-format counterparts are unseen,
too. I verified this conclusion by trying to read some of the files on the
Microsoft Office CD-ROM. It uses long filenames, and the same problems
occurred. 
So beware. If you are using Windows 95 to construct a CD-ROM, either alert
your users that they must have Windows 95 CD-ROM drivers or truncate all the
filenames to the 8.3 format. This can be a problem if you are distributing
source code on CD-ROMs. A typical Visual C++ Developer Studio session
generates header and CPP files with long filenames to represent the classes
they implement. If you distribute a GNU derivative program, you must also
distribute all the GNU source code. The GNU compiler suite includes many files
with long filenames.


To Singapore and Back


I spent the first week in February in Singapore teaching the concepts of C++
frameworks in game development to a small class of college freshman at Ngee
Ann (pronounced "neon") Polytechnic. This trip, made at the invitation of the
Center for Computer Studies, was one of the perks that comes with the
notoriety that this column and my books afford me. 
The staff at Ngee Ann Polytechnic was using game development as a vehicle to
inspire the students' interest in computer programming. The 15 students who
signed up needed no inspiration. They were bright, attentive, polite, and
extremely well versed in the C language. I used the occasion to reveal some of
the behavior of C++ classes, and to a man they soaked the knowledge up like so
many sponges.
Singapore is a beautiful, tropical island country about 85 miles north of the
equator. It's small enough that there are no telephone area codes.
Unemployment and crime are not a problem. It was in Singapore that a young
American lad was caned for spraying graffiti on a car. To be caned is to be
swatted several strokes across the backside with a big stick. The incident got
a lot of publicity when President Clinton displayed his diplomatic skill by
negotiating a sentence reduction--four strokes instead of six. Where was he
when I got caught pilfering Dad's Camels? Drug dealing is not tolerated in
Singapore, being a capital offense with swift execution of the sentence.
Chewing gum is forbidden, too. You cannot import or sell chewing gum. The law
was passed to prevent kids from leaving gum stuck on public places as kids
will do. Rather than pass a law that they cannot enforce--don't stick gum
where it does not belong--they passed a law they could enforce--don't import
or sell chewing gum. Effective. We read about that after we arrived in the
country. As a consequence, Judy is now an undetected but guilty, nonetheless,
international smuggler of a controlled substance. I offered her several
strokes of the cane, but she firmly declined.


The Road Ahead


On the plane to Singapore I read Bill Gates' new book, The Road Ahead,
(Penguin Books, 1995). I was skeptical about how much of this book Gates
actually wrote. He mentions two collaborators inside, but not on the cover.
Most likely, he wrote the foreword without help. I say that because it's not
as well written as the rest of the book. Odd that he would write his own
foreword. Forewords are traditionally written by someone else. Authors
persuade some luminary in their field to write a foreword to add credibility
to the work. Gates, having written forewords for other authors, knows that but
seemingly decided to write his own foreword anyway. I guess when you are Bill
Gates, there is no brighter luminary.
The book relates Bill's vision of the information superhighway, what it is and
what it will become. Clearly, he intends for Microsoft to be a major player in
that arena, and this book introduces that intention to the rest of us--as if
we hadn't already guessed. The Web explosion happened while Windows 95 was in
beta, and Microsoft was caught short. Gates means to catch up, and this book,
apparently, is the launch. A few days ago, Microsoft announced a major
reorganization toward that goal.
For example, Microsoft recently made its new Web browser and server programs
available for free download, an obvious assault on Netscape's success. An
acquaintance of mine downloaded and installed the Web server program under the
NT server operating system. Then he used both Netscape's Navigator and the
Microsoft Web browser to access his own pages. He noticed a significant
performance lag with Netscape and speculates that the Microsoft server could
be intentionally sensing its own browser and running better with it than with
competing browsers. I chided him that such shenanigans are clearly beneath
Microsoft, a company that would never resort to dirty tricks to make the
competition look bad. Oh, yeah? This sounds like a job for Andrew Schulman.
The Road Ahead does a good job of explaining what Gates calls simply "the
highway," its origins and its current state. Then the book lays out what Gates
sees as the future of the technology and its impact on society, business,
industry, and the international community. It's an easy read, covers the
subject comprehensively at the lay level, and its prophesies are probably
reasonable. 
But The Road Ahead is not without some speed bumps. In a discussion of a
society where your every move is recorded and documented by the government
with concealed cameras, Gates calls the practice "unremarkable," citing it as
being welcome by citizens concerned about crime. He says, "Almost everyone is
willing to accept some restrictions in exchange for a sense of security."
Chilling. I am reminded of the words of another of our wealthy and famous
leaders from the past who said, "Those who desire to give up Freedom in order
to gain Security, will not have, nor do they deserve, either one." Those
words, spoken by Thomas Jefferson, come to mind whenever someone suggests to
me that society is best served by citizens being asked or forced to sacrifice
yet one more freedom for the common good. On balance, I should point out that
one of the freedoms that Jefferson himself was not willing to give up was the
freedom he enjoyed to own slaves. His philosophy, however, outlived his
contradictory practices, but I'm afraid we learned little from it. We are
allowed to chew gum, however.


Quincy 96 and DDE


I suppose some of this column ought to be dedicated to C/C++. I'm wrapping up
the development of Quincy 96, the Windows 95-hosted IDE that acts as a front
end for the gnu-win32 port of the GNU C/C++ compilers. Since Quincy 96 is
meant to be used from inside an interactive tutorial, it needs to be launched
and controlled from another program, in this case, an Asymetrix Toolbook
script. I chose DDE as the protocol for the commands that the script sends to
Quincy 96 for two reasons: First, Toolbook's OpenScript language supports
launching and sending DDE commands to other programs. Second, I can test the
commands in the absence of the script by setting up dummy files with the names
of the commands and associating them with Quincy 96 through the Windows 95
File Type mechanism.
I thought I had everything working prior to integrating the two parts of the
system. I had allowed the Visual C++ Developer Studio to install DDE protocols
into Quincy 96, and I had put the necessary program-launching and
command-sending code into the Toolbook script. I had installed the command
functions into Quincy 96 and tested them by double-clicking the dummy files
that simulate the commands. But, when I put the two parts--the Toolbook
application and Quincy 96--together, the command exchange did not work. As a
test, I substituted a launch and command from Toolbook to Microsoft Word to
see if the mechanism worked from the script. It did. Then I tried using
Notepad as the target application. That did not work. The return value from
the script's command said that no DDE server was running to accept the
command, which was the same thing that I was getting from Quincy 96. This did
not make sense. You can drag and drop files and double-click registered files
to open them into Notepad, Word, and Quincy 96. The calls to DragAcceptFiles,
EnableShellOpen, RegisterShellFileTypes, ParseCommandLine, and
ProcessShellCommand that Developer Studio puts into the derived CWinApp class
are supposed to take care of that. Why did the command mechanism work only for
Word and not for the others?
In desperation, I did what we all hate to do. I went to the documentation. It
turns out that in order to be a DDE command server, an application must
register itself as such and provide a callback function to process the DDE
commands. Listing One shows the code that I added to Quincy 96 to do that.
None of the functions are members of any of the MFC classes. The overloaded
CWinApp::InitInstance function calls the RegisterDDE function to register the
protocol. The DdeCallback function calls the overloaded CWinApp::OnDDECommand
function to process the command. I took this code from the examples in the
Win32 SDK documentation and whittled away at it until it fit into the MFC
design. Some of it might not be necessary, particularly some of the cases in
the switch statement in the callback function. The code works, however, and I
am reluctant to mess with it. Anyone more familiar with this architecture is
encouraged to send me a message and set me straight. By the way, this exercise
involved a lot of operating system crashes before I finally got everything
working.



Source Code


The source code files for the Quincy 96 project are free. You can download
them from the DDJ forum on CompuServe and on the Internet by anonymous ftp;
see "Availability," page 3. To run Quincy, you'll need the GNU Win32
executables from the Cygnus port. They can be found on ftp.cygnus.com/ in the
/pub/sac directory. Get Quincy 96 first and check its README file to see which
version of gnu-win32 you need. Every time they release a new beta, I have to
make significant changes to Quincy 96. As I write this, the latest beta is
Version 13 and Quincy 96 works with Version 10.
If you cannot get to one of the online sources, send a 3.5-inch high-density
diskette and a self-addressed, stamped mailer to me at Dr. Dobb's Journal, 411
Borel Avenue, San Mateo, CA 94402, and I'll send you the Quincy source code
(not the GNU stuff, however--it's too big). Make sure that you include a note
that says which project you want. The code is free, but if you care to support
my Careware charity, include a dollar for the Brevard County Food Bank. 

Listing One
TCHAR szApp[] = TEXT("Quincy"); // DDE service name
DWORD idInst = 0; // our DDEML instance object
HSZ hszAppName = 0; // the generic hsz for everything
UINT OurFormat; // our custom registered format
HDDEDATA CALLBACK DdeCallback(WORD wType, WORD wFmt, 
 HCONV hConv, HSZ hszTopic,HSZ hszItem, HDDEDATA hData, 
 DWORD lData1, DWORD lData2);
void RegisterDDE()
{
 DdeInitialize(&idInst,
 (PFNCALLBACK)MakeProcInstance((FARPROC)DdeCallback, hInstance),
 APPCMD_FILTERINITS 
 CBF_SKIP_CONNECT_CONFIRMS 
 CBF_FAIL_SELFCONNECTIONS 
 CBF_FAIL_POKES,
 0);
 hszAppName = DdeCreateStringHandle(idInst, szApp, 0);
 OurFormat = RegisterClipboardFormat(szApp);
 DdeNameService(idInst, hszAppName, 0, DNS_REGISTER);
}
HDDEDATA CALLBACK DdeCallback(
WORD wType,
WORD wFmt,
HCONV hConv,
HSZ hszTopic,
HSZ hszItem,
HDDEDATA hData,
DWORD lData1,
DWORD lData2)
{
 LPTSTR pszExec;
 switch (wType) {
 case XTYP_CONNECT:
 return((HDDEDATA)TRUE);
 case XTYP_ADVREQ:
 return(DdeCreateDataHandle(idInst, (PBYTE)&count, 
 sizeof(count), 0, hszAppName, OurFormat, 0));
 case XTYP_ADVSTART:
 return(HDDEDATA)
 ((UINT)wFmt == OurFormat && hszItem == hszAppName);
 case XTYP_EXECUTE:
 pszExec = (LPTSTR)DdeAccessData(hData, NULL);
 if (pszExec)
 theApp.OnDDECommand(pszExec);
 DdeUnaccessData(hData);
 return (HDDEDATA)DDE_FACK;
 case XTYP_REGISTER:
 return((HDDEDATA)TRUE);
 }
 return(0);

}






























































ALGORITHM ALLEY


Continued Fractions Versus Farey Fractions




Andreas Bender


Andreas is a Ph.D. student in the University of Cambridge's Department of Pure
Mathematics and Mathematical Statistics. He can be contacted at A.O.Bender@
pmms.cam.ac.uk.


Introduction
Speed and accuracy are common concerns in the study of algorithms. In
time-critical code, it can even be difficult to do accurate arithmetic. Say,
for example, you need to multiply by pi in a fixed-point computation. One
obvious approach is to convert pi into a fixed-point number, and do a single
multiplication. Unfortunately, accuracy requires working with large
intermediate values. It's often better to multiply by a small value (to avoid
overflow) and divide by another small value. If you choose your fraction
carefully, you can actually obtain better precision than with the fixed-point
approach.
In his October 1995 "Algorithm Alley," Louis Plebani examined one way of
finding such fractions using Farey series. This month, Andreas Bender takes a
closer look at the Farey series and presents an alternative approach based on
continued fractions.
--Tim Kientzle
Writing a number such as ".12345" as a fraction is entirely
straightforward--f=12345/100000. However, what fraction best approximates f if
its numerator or denominator may not exceed a given size H? How small can the
numerator or denominator of the approximating fraction be if we can tolerate
an error of at most e?
Careful readers of this column will remember that Louis Plebani addressed
these questions in his article "Common-Fraction Approximation of Real Numbers"
(DDJ, October 1995). Instead of the Farey fractions he used, however, I'll
approach the problem using continued fractions, then compare the merits of the
two methods. While I hope that this article is reasonably self contained, it
will certainly be easier to follow if you have read Plebani's article as well.
What is a continued fraction? Figure 1 provides a concise definition. To build
a continued-fraction approximation to a rational number, we apply the
Euclidean algorithm to the numerator and denominator of f=r/s; see Figure 2.
Since f<1, q0 is equal to 0. The numbers qi are called the "partial
quotients." Truncating the continued fraction of Figure 1 at qi leaves the
"complete quotient" ni/di. For example, the complete quotients of 12/17 are
0/1, 1/1, 2/3, 5/7, and 12/17. The theory of Diophantine approximation
produces the results:
Every complete quotient ni/di is the best approximation to f with denominator
</=di.
The error (ni/di)-f is less than, or equal to, 1/di2.
This suggests the following procedure: Calculate one complete quotient after
the other and test every one of them for our condition on approximation
accuracy or size of denominator until it is satisfied.
The Farey approximation, as described by Plebani, is calculated by subdividing
an interval. The interval from a/b to a'/b' is subdivided at (a+a')/(b+b');
for the start interval [0,1] this would, of course, be 1/2. You then check in
which of the two newly generated subintervals f lies. A median can now be
inserted into this interval and this process iterated until some endpoint
satisfies the desired condition.
The important facts about Farey series are as follows:
The nth Farey series contains all fractions between 0 and 1 with denominator
at most n.
There is a u/v in this series for which u/v - f</=1/((n+1)v) holds.
This u/v, calculated by the aforementioned procedure, is the best
approximation of f by a fraction with denominator at most n.
These two approaches have an important connection: Each subsequent complete
quotient in the continued-fraction calculation gives the same result as
computing qi medians (see R.L. Graham et al, Concrete Mathematics,
Addison-Wesley, 1989). As a result, the continued-fraction approach gives more
accurate results more quickly, but the Farey series approach is more sensitive
to bounds on the size of the denominator.
It's possible to mix these two approaches, starting with a continued-fraction
estimate, then extending the result to the optimal Farey approximation. To do
this, I need to find the successor (or predecessor) of nl/dl in the Farey
series of order dl. Start with the complete quotient nl/dl and assume that f>
nl/dl (for the other case, only the sign gcd(r,s) needs to be reversed). By
tracing the steps of the Euclidean algorithm, you can solve the equation
dlx-nly=gcd(r,s). Now divide the whole equation by the gcd(r,s) and,
henceforth, assume the gcd to be one. If (x0,y0) is a solution, then
(x0+jnl,y0+jdl) also is one for every integer j, so we can pick a j such that
0<y0+jdl </= nl and this j gives us the successor of nl/dl in the Farey series
of order dl. Note that the most computation-intensive part of this
procedure--finding (x0,y0)--amounts to just finishing the Euclidean algorithm
for r and s and tracking the partial quotients. That is the algorithm used for
computing the complete quotients as well.
The larger the qi, the more interesting the question of grouping these qi
steps together becomes. Given an arbitrarily chosen f, what are the likely
sizes of the qi? A thorough discussion of this question entails delicate
arithmetic estimates and an examination of the underlying assumptions (see
A.M. Rockett et al., "Continued Fractions," World Scientific, 1992). 
A crude summary is that qi is "not bounded, but not too big too often either."
In The Art of Computer Programming, Vol.2: Seminumerical Algorithms
(Addison-Wesley, 1969), Donald Knuth notes two interesting facts:
qi is equal to a with probability P(a)=log2 ((a+1)2/(a+1)2-1). The first few
values are P(1)=0.415, P(2)=0.17, P(3)=0.093, P(4)=0.059.
If the denominator is equal to s, we can expect the Euclidean algorithm to
take about 1.94 log10s steps, where I disregard certain correction terms.


Conclusions


There are basically three different methods of calculating our desired
approximation:
Compute successive complete quotients of the continued fraction of f and check
each one according to the desired condition. This method is appropriate if
your requirement is for accuracy; the only possible loss is one additional
complete quotient.
Take the aforementioned procedure and complement it by calculating the Farey
approximation. This seems appropriate if the condition limits the size H of
the denominator, H is not too small relative to the denominator of f, and the
best possible result is needed.
Use the Farey method directly. This is best if H is small compared to the
denominator of f. In addition, the algorithm is particularly simple.


References


Graham R.L., D.E. Knuth, and O. Patashnik. Concrete Mathematics.
AddisonWesley, 1989.
Hardy G.H. and E.M. Wright. An Introduction to the Theory of Numbers, fifth
edition. Oxford University Press, 1979.
Knuth D.E. The Art of Computer Programming, Vol.2: Seminumerical Algorithms,
Addison-Wesley, 1969.
Rockett A.M. and P. Szusz. "Continued Fractions," World Scientific, 1992.
Figure 1: Continued fraction.
Figure 2: Computing the continued fraction coefficients. The left and right
columns express the same calculation in slightly different forms.


































































JAVA Q&A


How Do I Display an Image? 




Cliff Berg and Steve Alexander


Cliff is vice president of technology and Steve president of Digital Focus.
They can be contacted at cliffbdf@digitalfocus.com and
steveadf@digitalfocus.com, respectively.


This is the first installment of a new column designed to answer your
questions about Java. Each month we will select questions submitted to The
Java Developer Web site (http:// www.digitalfocus.com/faq/), an independent
forum for the exchange of technical information about Java. We also are
constructing a Java lab that will focus on the use of Java technology within
intranets, the Internet, and embedded environments. This will provide a base
from which to evaluate technology and explore new ways to use Java to solve
real-world problems. We welcome you to this column and hope to answer the
questions that are of most interest to you. If you have specific questions or
contributions, please submit them via our Web site.
--C.B. & S. A.
Graphic image display in Java involves issues such as the types of supported
image file formats, image source (local, remote, or program generated), and
monitoring of the loading process.
Currently, Sun's Java graphics programming library (AWT) supports GIF and JPEG
images. Format considerations include local color palettes, feature
restrictions (compression, color depth, and interlacing, for example), and
dithering.
This month, we'll examine the techniques necessary to load an image from a
local file, remote location, and memory. In the coming months, we'll discuss
issues such as monitoring the loading process.


Loading an Image from a Local File


To load a single image in an application (a Java program run from the command
line), use Image img = myWindow.getToolkit() .getImage("orion.jpg"); to load
the JPEG image file orion.jpg (where myWindow is the Window or Frame object in
which you want to display the image).
In a browser applet, use Image img = ((Window)(myApplet.getParent()))
.getToolkit().getImage("orion.jpg");, where myApplet is the applet in which
you plan to display the image. Note that this assumes the applet is
instantiated inside a Frame or Window, which is consistent with the way
Netscape handles applets.
After the Image is loaded, display the image by drawing it in a Graphics
context. (Every AWT component object has a Graphics context.) This is done in
the paint() method of a component, because paint() is called by AWT
automatically when the image is done loading; see Listing One. Image loading
is asynchronous. Loading does not necessarily occur until you attempt to
display the image, via drawImage(). The last parameter to drawImage()
specifies which component to repaint when the image is finally ready. This is
normally the component that calls drawImage() in the first place. The
following sequence of events occurs:
1. When the applet is first loaded, paint() is called for each component; in
Listing One, the component calls drawImage().
2. The applet continues processing without waiting for the image. (The applet
does not even know how big the image is.)
3. AWT finishes fetching the image, if it has not already.
4. When the image is ready to display, AWT calls imageUpdate() for the
component you specified in drawImage(). This is effectively a callback,
although we have achieved it in an object-oriented way--by passing an object
instead of a function.
The default behavior of imageUpdate() is to repaint the component by calling
its paint() method again. Since the image is now available, it will be
redrawn. Its dimensions are now known as well.
Since image loading is asynchronous, you cannot call loadImage() followed
immediately by drawImage() and expect to see the image--drawImage() will start
fetching the image, but it cannot display it until the image is available. You
must take advantage of the callback mechanism that calls paint() to ensure
that a final call to drawImage() is made when the image has been loaded.


Loading a Remote Image


Images loaded across a network have the same considerations, except you must
use a version of getImage() that takes a URL object as a parameter. Construct
the URL object in the getImage() call like this: applet .getImage(new
URL("www.somewhere .com/orion.jpg"));.
Here, we have used an applet getImage() member function rather than the
Toolkit member function used earlier. Since the applet getImage() method uses
a URL argument (rather than a filename), it is appropriate for retrieving a
remote file. Specifying a file URL using the file:/// syntax apparently is not
yet fully implemented.


Loading from MemoryImageSource


If an image can be algorithmically generated, it may be more efficient to
create the image on-the-fly rather than load it from a persistent source such
as a disk or server. There is no need, for example, to obtain a simple color
gradient from a bitmap file.
Several objects and interfaces allow you to implement memory-based image
objects in Java. The key class, MemoryImageSource, creates in memory an image
object containing the color model, physical dimensions, offset, and (of
course) actual pixel information. With this class, creating an image is a
two-step process: First, define an int or byte array and populate it with
data; see Listing Two. Next, use that array to create a memory object
(MemoryImageSource) that serves as the source of data for the new image; see
Listing Three.
MemoryImageSource has six constructors that give you a range of options in
array types, color models, and properties. The createImage() method can take
either a width and height parameter or an ImageProducer object such as
MemoryImageSource. For any type of image processing using data arrays, use the
MemoryImageSource implementation.
It is easy to introduce variations in image array processing. The result is a
fast and efficient image-display technique for your Java applet or
application. Listing Four presents a complete working demo.

Listing One
class MyCanvas extends Canvas
{
 Image image;
 public MyCanvas(Image img) { image = img; }
 public void paint(Graphics g)
 {

 // Draw the image on the Graphics context. This is non-blocking.
 g.drawImage(image, 0, 0, this);
 }
}

Listing Two
public Image makeImage(int width, int height)
{
 int index = 0;
 int myImageData[] = new int[ width * height ];
 for(int y = 0; y < height; y++)
 {
 int a = y * (255 / ((height/20)-1));
 for(int x = 0; x < width; x++)
 {
 int b = x * (255 / ((width/20)-1));
 myImageData[index++] = (255 << 24) (a << 16) b;
 }
 }

Listing Three
 myImage = createImage(new MemoryImageSource(width, height, 
 myImageData, 0, width));
 return myImage; 
}

Listing Four
import java.awt.*;
import java.applet.*;
public class ImageDemoApplet extends Applet
{
 static Frame frame;
 MyCanvas canvas;
 Image image = null;
 public void init()
 {
 image = getParent().getToolkit().getImage("orion.jpg");
 canvas = new MyCanvas(image);
 add(canvas);
 }
 public static void main(String args[])
 {
 frame = new Frame();
 ImageDemoApplet app = new ImageDemoApplet();
 frame.add("Center", app);
 frame.resize(200, 200);
 app.init();
 app.start();
 frame.show();
 }
}
class MyCanvas extends Canvas
{
 Image image;
 public MyCanvas(Image img)
 {
 image = img;
 setBackground(Color.white);
 resize(100, 100);

 }
 public void paint(Graphics g)
 {
 // Draw the image on the Graphics context. This is non-blocking.
 g.drawImage(image, 0, 0, this);
 }
}
























































UNDOCUMENTED CORNER


Understanding Pentium's 4-MB Page Size Extensions




Robert R. Collins


Robert is a design verification manager at Texas Instruments' Microprocessor
Design Center. He can be reached via e-mail at rcollins@ti.com.


It's been more than three years since Intel first published the Pentium Family
User's Manual. The Manual omitted discussion of some new, advanced programming
features. Intel originally planned to release this information in its manuals,
but instead, put this information in a document commonly referred to as
"Appendix H" (formally known as the Supplement to the Pentium Processor User's
Manual) and required recipients to sign a 15-year nondisclosure agreement
(NDA). This decision has been the focus of a controversy concerning Intel's
right to protect its intellectual property versus the rights of all
programmers to have access to information that will benefit their programs.
Another point of contention is the NDA itself. Intel claims that anybody
needing this information will never be denied it, as long as they sign the
NDA. But several stories have circulated regarding programmers being denied
because Intel claims they don't need the information. This has spawned a
community of programmers dedicated to reverse engineering these features and
publishing their findings on Internet newsgroups and the World Wide Web. But
is all of this necessary?
Intel has promised that the not-yet-released Pentium Pro Processor Family
Developer's Manual will contain information on many of these advanced
features, perhaps even a description of 4-MB paging.
Four-MB paging allows the operating system to access very large data
structures without constantly referencing the Translation Lookaside Buffer
(TLB), which is used by the processor to cache virtual-to-physical address
translations for the most recently used pages of memory. This feature is most
useful to operating-system developers who want a single page of memory
dedicated to the OS kernel or a large data structure, such as a video-frame
buffer. Information about 4-MB paging has been publicly documented by
Intel--but you need to know where to look to find it. In order to get a
complete description of Pentium's 4-MB pages, you need to read both the
Pentium Family User's Manual, Volume 3 (P/N 241430) and the i860TM XP
Microprocessor Data Book (P/N 240874).
In the Pentium manuals, there are at least nine references to 4-MB pages. This
is a good start to reverse engineering 4-MB pages. These references give you
the necessary clues to write software that unlocks the secrets of page-size
extensions (PSE). However, such an effort is unnecessary. The Intel i860 XP
processor documentation claims the i860 XP is page-level compatible with the
Intel 386, Intel 486, and Pentium processors. This compatibility is noteworthy
because the i860 XP also supports 4-MB pages, and its documentation provides a
complete description of the 4-MB paging mechanism (see i860TM XP
Microprocessor Data Book, section 2.4). All that's needed to obtain an
Appendix H description of 4-MB pages are a few references from the Pentium
manuals and the description of 4-MB pages from the i860 XP manual.


A 4-KB Page Backgrounder


When paging is enabled, linear addresses (program-visible addresses) are
mapped to physical addresses (bus addresses). Paging makes it possible to
execute programs much larger than the computer's available amount of memory.
When the microprocessor needs more memory, it generates a page fault to demand
that a portion of memory be swapped between the hard disk and main memory.
Memory is partitioned into contiguous blocks, called "page frames." Each page
frame is 4 KB. The Pentium paging mechanism consists of the following:
A Page Directory Base Register (PDBR). 
A page directory.
At least one page table. 
The PDBR is CR3, and points to the base of the page directory. Each
page-directory entry (PDE) points to the page tables for 4 MB of memory. The
PDE contains control information and the pointers to the page tables. Like the
PDE, each page-table entry (PTE) contains control information, but points to a
4-KB page frame.
Linear addresses are converted to physical addresses by using a 20-bit pointer
in a page table and combining it with the low-order 12 bits of the linear
address to form a 32-bit physical address. For purposes of conversion, the
linear address is broken into three parts: 
The high-order 10 bits form an index into the page directory.
The next 10 bits form an index into the page table.
The remaining 12 bits are an index into a page frame. 
The upper 20 bits of the PTE are then combined with the low-order 12 bits of
the linear address to form the physical address. There is a direct
relationship between the sizes of these three fields and the page size. The
lower 12 bits can address 212, or 4 KB of memory. Hence, each PTE controls 4
KB of memory. The amount of memory controlled by each PDE is determined by the
number of address bits used as an index into the page table, plus the number
of bits used as the page-frame index. The PTE index is 10 bits, and the
page-frame index is 12 bits, making 222, or 4 MB of memory controlled by each
PDE. This association will be important in understanding the 4-MB paging
mechanism. Figure 1 shows how linear addresses are translated to physical
addresses for 4-KB pages.


Making the Jump to 4-MB Pages


With an understanding of the 4-KB paging mechanism, it's not difficult to
deduce the 4-MB paging mechanism. Recall that each page-directory entry
controls 4 MB of memory. Now imagine how Figure 1 would look if the page-table
lookup were eliminated. The page-frame index would increase from 12 bits to 22
bits, thus allowing direct control of a 4-MB page size. The 20-bit pointer in
the page directory would be reduced to a 10-bit pointer, pointing directly to
the 4-MB page frame of memory. With the page-table lookup eliminated, the page
directory points directly to a 4-MB page frame. This describes how 4-MB pages
are implemented in the i860 XP (i860TM XP Microprocessor Data Book, section
2.4). But the question remains: Are 4-MB i860 XP pages compatible with 4-MB
Pentium pages? To answer that question, we need to compare the i860 and
Pentium manuals.
The i860 manual claims that the i860 4-KB paging mechanism is compatible with
the x86 implementation. A comparison of page-directory format and page-table
format substantiates this claim. The page-size (PS) bit of the i860 page
directory shares the same location as the Pentium's PS bit (see i860TM XP
Microprocessor Data Book, Figure 2.13). With this information, you can assume
they are compatible, and look more closely at the Pentium manual for the
mechanics of enabling and using 4-MB pages.
Volume 3 of the Pentium manual describes how CR4.PSE enables PSEs and 4-MB
pages, but refers you to Appendix H for more information. Later in the Pentium
manual, bit 7 of the PDE is identified as the PS bit. Without CR4.PSE=1, the
Pentium will always use Intel 486-compatible (4-KB) paging, regardless of the
setting of the PDE.PS bit. Similarly, when CR4.PSE=1, and PDE.PS=0, Pentium
still uses Intel 486-compatible 4-KB pages. But when CR4.PSE=1, and PDE.PS=1,
Pentium uses an i860 XP-compatible 4-MB paging translation. 
The linear address for a 4-MB page is converted to a physical address in much
the same manner as 4-KB pages. However, the access to the page table is
omitted. The high-order 10 bits form an index into the page directory. The
page directory no longer contains a 20-bit pointer to a page table, but
instead contains a 10-bit pointer to the 4-MB page frame of memory. This
convention mandates that all 4-MB pages reside on 4-MB boundaries. The 10-bit
pointer in the page directory then is combined with the low-order 22 bits of
the linear address to form the 32-bit physical address.
Figure 2 describes the 4-MB and 4-KB paging translation mechanism. Ironically,
Figure 11-16 in Pentium Processor Family Developer's Manual, Volume 3, 1993
edition, contained a virtually identical picture. Intel obviously recognized
the significance of this pictorial representation of 4-MB pages. Subsequent
editions of the Pentium manual were substantially modified to remove the
visual representation of the 4-MB paging mechanism.


Side-effects and Caveats of 4-MB Pages


There are side-effects and caveats to enabling 4-MB pages. Consider the
following excerpt from the Pentium Processor Family Developer's Manual, Volume
3, section 23.2.14.1, which discusses compatibility with previous Intel
processors:
A Page Fault exception occurs when a 1 is detected in any of the reserved bit
positions of a page table entry, page directory entry, or page directory
pointer during address translation by the Pentium processor.
In other words, if any reserved bit in the PDE or PTE is 1, a page fault will
occur. This does not occur when CR4.PSE=0, but does when PSEs are enabled
(CR4.PSE=1). Every bit in CR4 enables a behavioral extension to the Intel 486
processor. In essence, CR4 bits enable/disable incompatibilities with the
Intel 486. Therefore, it is a natural extension of enabling 4-MB pages to
enable more rigorous type checking of the PDE and PTE. Unfortunately, even
then, the aforementioned reference isn't completely accurate. Setting some
reserved bits does generate an exception, while setting others does not. This
behavior contradicts the Intel documentation. If the Pentium was originally
intended to behave as documented, then the documentation didn't get modified
to accurately reflect the correct behavior when relaxed type checking for
reserved bits was implemented. Table 1 shows all of the Pentium paging
structures. All positions in the PDE and PTE marked as reserved will generate
a page-fault exception when CR4.PSE=1. All positions in CR3, the PDE, and PTE
marked as "0" are reserved, but don't generate a page fault when CR4.PSE=1.
Table 2 describes the meaning of all of the fields listed in Table 1.
It might be tempting to believe that the "page-directory pointer" is another
name for the CR3 register. This assumption would be incorrect. Actually, the
mention of the page-directory pointer is a mistake. This refers to a paging
structure for a new paging feature that was to be implemented in the Pentium.
This new paging feature was allegedly implemented in beta silicon, but removed
before production, and now appears in the Pentium Pro. I'll discuss this in my
next column.
The Intel documentation also doesn't tell the whole story of the error code
generated by page faults. When CR4.PSE=1, and a 1 is detected in a
reserved-bit position of the PDE or PTE, the page-fault error code indicates
that an attempt was made to set a reserved bit in a paging structure. This
indication is reflected in bit 3 of the page-fault error code. If set to 1,
then an attempt was made to set a reserved bit in the PDE or PTE. In Figure
14-7 of the Pentium Processor Family Developer's Manual, Volume 3, 1993
edition, this behavior was correctly documented, but it was removed in
subsequent editions. Table 3 shows an accurate representation of the
page-fault error code, as shown in the 1993 edition of the Pentium manual.


TLB Translation



According to the 1995 edition of the Pentium user's manual, the Pentium has
one code TLB and two data TLBs (Pentium Processor Family Developer's Manual,
Volume 1, 1995 edition, section 33.2.1.2). The data TLBs consist of a 64-entry
TLB for 4-KB page translations, and an 8-entry TLB for 4-MB page translation.
The code TLB is a single 32-entry TLB which is shared by 4-KB and 4-MB page
translations. The 4-MB code pages are cached in multiples of 4 KB. When the
Pentium caches a 4-MB code page in the TLB, it initially uses only a single
TLB entry. A code access beyond the initial 4 KB of memory associated with
this TLB accesses the PDE as if it were a 4-KB page, and is given its own TLB
entry.


TLB Invalidation


You'd assume that enabling and disabling 4-MB pages (CR4.PSE) would invalidate
the TLB, as writing to CR3 does. However, this does not occur. A potentially
dangerous situation arises when a user wants to disable 4-MB pages when a 4-MB
page is still cached in the TLB. Suppose the PDEs were modified with a
different paging translation and point to a different area of physical memory
than the 4-MB pages (this would be natural to assume, as it complies with the
whole purpose of paging). Once CR4.PSE is cleared, then any 4-MB TLB entries
still cached remain in effect until they are evicted or until the TLB is
invalidated. (Once CR4.PSE=0, TLB entries for 4-MB data pages will never get
evicted, since they have their own dedicated TLB.) Any subsequent memory (or
code) accesses while the old 4-MB TLB still is cached would retrieve incorrect
data. Therefore, before 4-MB paging can be disabled, all 4-MB PDEs must be
modified back to 4-KB PDEs. Once the PDEs are modified, CR4.PSE can be
cleared, or the TLB invalidated (which effectively disables 4-MB paging). Some
could consider this a bug, but Intel's documentation states that it's the
operating-system writer's responsibility to manage the paging mechanism,
including invalidating the TLB (Pentium Processor Family Developer's Manual,
Volume 3, section 11.3.5).


Testing the Hypothesis


Now that we have an understanding of 4-MB paging, it should be easy to write
characterization code that confirms our hypothesis. To detect whether or not
4-MB pages are implemented in Pentium as they are in the i860 XP, you could
follow these steps:
Write the software assuming 4-MB compatibility with the i860 XP.
Enable paging.
Before enabling 4-MB paging, modify the second PDE, which controls memory from
4 MB-8 MB, to point to the 0MB-4-MB page frame, and mark it as a 4-MB page.
Install a PTE in the first entry pointed to by the modified PDE. This PTE
should point back to the first page of memory at 4-MB, which contains a
signature of some sort.
Read from the signature in memory. If 4-MB paging works as expected, instead
of getting the signature, you will retrieve the PTE you installed during the
previous step. If 4-MB paging does not work as expected, all is well, because
the PTE is correctly formed, and you will retrieve your memory signature.
The key to this technique is to read from one location in memory if 4-MB pages
work or another location if they don't (so you don't page fault). This
approach is demonstrated in 4MPAGES.ASM (see Listing One). Auxiliary
subroutines for building paging structures on the Pentium and Pentium Pro
processor are available electronically at http://www.x86.org (and from DDJ;
see "Availability," page 3). 


What to Try Next


You could write more characterization code to prove whether or not any other
functional extensions are enabled by setting CR4.PSE. The listings available
electronically demonstrate the page-faulting behavior of PSE. I've also
included a program that detects the TLB size and associativity. Finally,
another program demonstrates that writing any values to CR4.PSE will not
invalidate the TLB.
Figure 1: Page translation for 4-KB page sizes.
Figure 2: Page translation for 4-MB and 4-KB page sizes.
Table 1: Structures used in Pentium paging translations (*see descriptions in
Table 2).
Table 2: Descriptions of paging extension fields. (Bold fields indicate paging
extensions available on the Pentium.)
Field Description
RSV Reserved. If set (RSV=1) may cause a page fault
 when CR4.PSE=1. Setting this bit only causes a
 page fault during page translation. If the
 referenced page entry is in the TLB, then setting
 this bit, and referencing the page will not cause
 a page fault. If the entry is not in the TLB, or
 gets flushed from the TLB, then the next reference
 to this page will cause a page fault. The page
 fault error code on the stack will have the RSV
 bit set (bit3).
AVL Available for systems programmer use.
PS* This bit is always set=1. When set=1, then this
 page directory entry points to a 4-MB page.
PS** This bit is always clear=0. When clear, then
 this page directory entry points to a page table.
D Dirty.
A Accessed.
PCD Page Cache Disable.
PWT Page Write Through.
U User.
W Writable.
P Present.
Table 3: Page-fault error code.

Listing One
 page 60,132
;-----------------------------------------------------------------------------
; 4MPAGES.ASM Copyright (c) 1996 Robert Collins
; You have my permission to copy and distribute this software for

; non-commercial purposes. Any commercial use of this software or
; source code is allowed, so long as the appropriate copyright
; attributions (to me) are intact, *AND* my email address is properly
; displayed. Basically, give me credit, where credit is due, and 
; show my email address.
;-----------------------------------------------------------------------------
; Robert R. Collins email: rcollins@metronet.com
; 7201 Avalon Dr.
; Plano, TX 75025
;-----------------------------------------------------------------------------
;-----------------------------------------------------------------------------
; Build instructions:
; Assembled using Microsoft MASM 6.11.
; To compile without the makefile:
; ML /c /DINCLUDEDIR=[YOUR FAVORITE INCLUDE DIRECTORY] 4MPAGES.ASM
; ML /c /DINCLUDEDIR=[YOUR FAVORITE INCLUDE DIRECTORY] PAGEFNS.ASM
; LINK /NON 4MPAGES.OBJ PAGEFNS.OBJ;
;-----------------------------------------------------------------------------
;-----------------------------------------------------------------------------
; Assembler directives
;-----------------------------------------------------------------------------
 .xlist ; disable list file
 .586P
 .ALPHA
;-----------------------------------------------------------------------------
; Include file section
;-----------------------------------------------------------------------------
% Include INCLUDEDIR\\macros.inc ; Include macros
% Include INCLUDEDIR\\struct.inc ; Include structures
;-----------------------------------------------------------------------------
; Public declarations
;-----------------------------------------------------------------------------
 Public GDT_PTR, ext_mem_blocks
;-----------------------------------------------------------------------------
; External declarations
;-----------------------------------------------------------------------------
 Extern Init4M_Pages : Near16
 Extern Check4M_Pages : Near16
 Extern GetLinear4M : Near16
 Extern GetPDBR : Near16
 Extern PDBR : DWord
 .list
;-----------------------------------------------------------------------------
; Dummy segments
;-----------------------------------------------------------------------------
 INTSEG segment at 0
 int0 dd ?
 INTSEG ends
 _DATA segment para public use16 'DATA'
;-----------------------------------------------------------------------------
; Data segment
;-----------------------------------------------------------------------------
 GDT_386 label fword
 GDT_PTR Descriptor <>
 SEL_RMCS equ $-GDT_386
 GDT_RMCS Descriptor <-1,,,9bh,0,> ; DS Descriptor
 SEL_RMDS equ $-GDT_386
 GDT_RMDS Descriptor <-1,,,93h,0,> ; DS Descriptor
 SEL_4G equ $-GDT_386

 GDT_4G Descriptor <-1h,0,0h,93h,8fh,0h> ; 4G Descriptor
 GDT_Len equ ($-GDT_386) - 1
;-----------------------------------------------------------------------------
; All other data
;-----------------------------------------------------------------------------
 Failure1_Msg db "4M page translation didn't work.",CRLF$
 Failure2_Msg db "Unknown page translation (this should never occur).",CRLF$
 Passed_Msg db "4M page translation behaves as expected.",CRLF$
 ext_mem_blocks dw 0
 align
 OrigPTE dd 0
 OrigINT0 dd 0
 OrigSentinal dd 0
 _DATA ENDS
 _TEXT segment para public use16 'CODE'
 ASSUME CS:_TEXT, DS:_DATA, ES:_DATA, SS:STACK
;-----------------------------------------------------------------------------
; Code starts here
;-----------------------------------------------------------------------------
 _4MPAGES proc far
 mov ax,seg STACK ; setup stack segment
 mov ss,ax
 mov sp,sizeof StackPtr
 xor ax,ax ; clear it
 pushf
 push ds ; save far return on stack
 push ax
;-----------------------------------------------------------------------------
; Set segments to normal data segment
;-----------------------------------------------------------------------------
 mov ax,seg _DATA ; get original data segment
 mov ds,ax
 mov es,ax
;-----------------------------------------------------------------------------
; Check that this processor supports 4M pages.
;-----------------------------------------------------------------------------
 call Check4M_Pages ; does this processor support 4M pages?
 jnc @F ; yes, continue
@ErrorExit:
 mov ah,9
 int 21h ; print message
 mov ax,4c01h ; set error code
 int 21h
 iret ; go split, just in case
;-----------------------------------------------------------------------------
; Setup descriptor table
;-----------------------------------------------------------------------------
@@: mov eax,ds ; make pointer to GDT table
 shl eax,4 ; have physical address of segment
 add eax,offset GDT_386 ; now have physical addr of table
 mov GDT_PTR.Seg_limit,GDT_Len ; set length
 mov GDT_PTR.Base_A15_A00,ax
 shr eax,10h ; get other address bits
 mov GDT_PTR.Base_A23_A16,al
 mov GDT_PTR.Access_rights,ah
 mov eax,cs ; get CS
 shl eax,4 ; now have physical address
 mov GDT_RMCS.Base_A15_A00,ax
 shr eax,10h ; get other address bits

 mov GDT_RMCS.Base_A23_A16,al
 mov GDT_RMCS.Base_A31_A24,ah
 mov eax,ds ; get DS
 shl eax,4 ; now have physical address
 mov GDT_RMDS.Base_A15_A00,ax
 shr eax,10h ; get other address bits
 mov GDT_RMDS.Base_A23_A16,al
 mov GDT_RMDS.Base_A31_A24,ah
;-----------------------------------------------------------------------------
; Initialize page mode
;-----------------------------------------------------------------------------
 call GetPDBR ; get address of page directory
 jc @ErrorExit ; oops
 mov PDBR,edx ; save it
;-----------------------------------------------------------------------------
; Read CMOS to determine the amount of extended memory.
;-----------------------------------------------------------------------------
 mov al,18h
 out 70h,al
 IO_Delay
 in al,71h
 mov ah,al
 mov al,17h
 out 70h,al
 IO_Delay
 in al,71h
 shr ax,6
 mov ext_mem_blocks,ax
;-----------------------------------------------------------------------------
; Enter protected mode.
;-----------------------------------------------------------------------------
 cli
 lgdt GDT_386
 mov eax,cr0 ; get control register
 or al,1 ;
 mov cr0,eax
 push cs ; push return selector on stack
 push offset PMRET ; set return offset
 JMPFAR @F,SEL_RMCS
@@: mov ax,SEL_RMDS ; get DS selector
 mov ds,ax
 mov ax,SEL_4G ; get GS selector
 mov gs,ax
;-----------------------------------------------------------------------------
; Enable page mode
;-----------------------------------------------------------------------------
 call Init4M_Pages
 mov ebx,PDBR ; initialize CR3
 mov cr3,ebx
 mov ebx,cr0 ; get 386 control register
 or ebx,80000000h ; set PG bit
 mov cr0,ebx ; now we're in protected mode
 jmp short @F
 Align
;-----------------------------------------------------------------------------
; This is the body of the test.
;-----------------------------------------------------------------------------
; Save a signature in memory so we can see if 4M pages work as expected.
;-----------------------------------------------------------------------------

@@: mov esi,FARCS
 mov edi,gs:Int0[esi] ; get original memory contents
 mov OrigSentinal,edi ; save it
 mov gs:Int0[esi],Signature ; save signature in memory
 mov edi,gs:Int0 ; get original interrupt vector
 mov OrigINT0,edi ; save it
;-----------------------------------------------------------------------------
; Modify the PDE for a 4 MB page.
;-----------------------------------------------------------------------------
 mov edx,SEL_4G ; get selector
 lea eax,Int0[esi] ; get INT0 offset
 call GetLinear4M ; get linear address
 mov dword ptr gs:[edx],87h ; modify to 4M PDE
;-----------------------------------------------------------------------------
; Save original contents of signature location
;-----------------------------------------------------------------------------
 mov edi,gs:[eax] ; get original PTE
 mov OrigPTE,edi ; save it
 mov gs:Int0,edi ; save it
;-----------------------------------------------------------------------------
; Enable 4M paging.
;-----------------------------------------------------------------------------
 mov ecx,cr3 ; get CR3
 mov ebx,cr4 ; get CR4
 or bl,PSE ; enable 4M pages
 mov cr4,ebx
 mov cr3,ecx ; flush TLB
;-----------------------------------------------------------------------------
; The next memory read will read the signature or the PTE depending upon
; whether 4M paging even works.
;-----------------------------------------------------------------------------
 mov ebp,gs:Int0[esi] ; try to read from 4M
;-----------------------------------------------------------------------------
; Get out of paging
;-----------------------------------------------------------------------------
 mov ecx,cr3 ; clear TLB by loading
 mov ebx,cr4 ; get PSE
 and bl,not PSE ; turn off PSE
 mov cr4,ebx
 mov cr3,ecx ; CR3 with any value
;-----------------------------------------------------------------------------
; Restore original value to signature location.
;-----------------------------------------------------------------------------
 mov edi,OrigSentinal
 mov gs:Int0[esi],edi ; restore sentinal
 mov edi,OrigINT0
 mov gs:Int0,edi ; restore original INT0 handler
;-----------------------------------------------------------------------------
; Split from this program
;-----------------------------------------------------------------------------
 mov cx,SEL_RMDS ; get DS selector
 mov gs,cx
 mov ecx,cr3 ; clear TLB by loading
 mov ebx,cr0 ; get 386 control register
 and ebx,not 80000001h ; clear paging bit
 mov cr0,ebx ; and store in CR0
 mov cr3,ecx ; CR3 with any value
 retf
PMRET:

 mov ax,seg _DATA
 mov ds,ax
 mov gs,ax
;-----------------------------------------------------------------------------
; Determine whether or not our test passed.
;-----------------------------------------------------------------------------
 mov dx,offset Failure1_Msg ;
 cmp ebp,Signature ; did we get a bogus signature?
 je @ErrorExit ; yep
 mov dx,offset Failure2_Msg
 cmp ebp,OrigPTE ; was our signature our original PDE?
 jne @ErrorExit ; nope
 mov dx,offset Passed_Msg
 mov ah,9
 int 21h ; print message
 mov ax,4c00h ; set error code
 int 21h
 iret ; go split, just in case
 _4MPAGES endp
_TEXT ends
 STACK segment para public 'STACK'
;-----------------------------------------------------------------------------
; Stack segment
;-----------------------------------------------------------------------------
 StackPtr db 400h dup (?)
 STACK ends
 _ZSEG segment para public 'DATA'
 _ZSEG ends
 end _4MPAGES


































PROGRAMMER'S BOOKSHELF


Graphics Programming




Dean Clark


Dean is a programmer/analyst developing graphics and imaging applications. He
can be contacted at 71160.2426@compuserve.com.


While its applicability to economics may be uncertain, the trickle-down theory
definitely applies to computing power, and where hardware goes, software is
sure to follow. The two books I'll examine here provide a PC perspective on
subjects that, until now, have been captives of high-end workstations.
Radiosity: A Programmer's Perspective, by Ian Ashdown, explores the radiosity
method for creating synthetic imagery, while the special-effects technique of
image compositing is the topic of Tim Wittenburg's Photo-Based 3D Graphics in
C++.


Radiosity: A Programmer's Perspective


Unlike ray tracing, which is most efficient at simulating the contribution of
external light sources on a scene, radiosity attempts to take into account the
contribution of individual parts of a scene on all other parts. This is a
fundamentally harder problem that, until recently, was considered beyond the
power of the typical desktop computer. Ian Ashdown's Radiosity: A Programmer's
Perspective breaks the PC barrier.
The book has three parts. The first is an introduction to the basic concepts
of illumination and radiosity theory. The second discusses the program files
used to describe scenes and a viewing system for displaying them. The last
part--nearly half the book--covers calculating form factors (which form a
matrix defining the radiant contributions of each patch in a scene on all
other patches) and solving the radiosity equation itself.
Radiosity doesn't shirk the deeper theoretical underpinnings--or their
equations. Ashdown assures us more than once that mathematical sophistication
is not a prerequisite: The math is there, but it's explained carefully and
thoroughly. Ashdown excels at explaining technical ideas in a style that is
efficient and orderly, yet relaxed and conversational.
The book includes a disk with a working radiosity rendering program, Helios,
C++ source code, and a few sample scenes. There's also a program to convert
from the AutoCAD .DXF file format to the text format used by Helios. The
program isn't fancy but it works. (An upgrade with an improved user interface
is available from the author.) In several places, Ashdown suggests ways to
optimize his implementation. His disk contains programs representing three
different ways of calculating form factors: hemicube, cubic tetrahedron, and
ray casting. The book also includes numerous ideas for extending the basic
radiosity program, such as adaptive meshing and texture mapping.
Perhaps best of all are the extensive references, including over 200
bibliographical entries. And if that doesn't satisfy you, you can download a
document containing over 700 radiosity references from the author's Web site
at http://www.ledalite.com/library/rrt.html.
Radiosity is a first-class effort, both as a reference work and as a source of
ready-to-wear code. Highly recommended.


Photo-Based 3D Graphics in C++


Image compositing is a deceptively simple idea that's used extensively in the
video and motion picture industries for producing special effects. It's what
lets the Enterprise fly through a star field and makes dinosaur clones rattle
around a kitchen.
Photo-Based 3D Graphics in C++ documents the algorithms and development of a
program called Image Compositor's Toolkit (ICT), a Windows application for
creating scenes composed of multiple cutout images that can be rotated,
scaled, and positioned over a background image. The program combines these
images using alpha blending, which uses a separate alpha-channel image to
control the proportion of background versus cutout in the combined image.
There are provisions for animating each of these effects. The C++ source code
for this application is included on the book's diskette.
Photo is organized in three major sections, plus appendices. The
"Fundamentals" section presents an overview of object-oriented design and C++,
color representations, Windows bitmap images, 3-D coordinates and transforms,
and the classes and file formats used in the book's code.
The next section, "Image Compositing Tools," discusses creating cutouts
(irregular polygonal portions clipped from an image), warping (rotating and
scaling the cutouts), alpha blending, hidden-surface removal, and animating.
Again, the files and classes to implement these are presented.
The final section, "Applications," has two demonstration scenes for the book's
application program, though the data files and images for the demonstrations
aren't included. Appendices include a description of setting up and using the
ICT program (which runs under Windows 3.1 or Windows 95), an index to ICT
functions and classes and the files that contain them, and a bibliography.
Wittenburg tries to cover so much territory that none of it has much depth.
The C++ and OOD sections are barely introductions, as is the section on using
the Borland IDE. The chapter on image warping, which is essentially about
texture mapping, mentions aliasing as a potential problem; but rather than
deal with it in general terms, Wittenburg restricts his discussion to the
blockiness caused by enlarging an image. The morphing technique Wittenburg
describes may be "morphing" in a strict technical sense, but it isn't the
effect that most of us associate with the word.
The book's diskette contains source code and Borland project files for
building the ICT application--there's no executable on the diskette. The
program compiled under BC 4.51 with only a few warnings. ICT uses Borland's
OWL for interface code. Visual C++/MFC source is also supplied, but I didn't
test it.
ICT works pretty much as advertised. Using the ICT, you can display color
images (only 24-bit .BMP images are supported), create cutouts and
alpha-channel images for compositing, and experiment with transformations. To
create composite images you have to create a scene file--a text file that
describes the different images to be pulled together and the transforms to
apply to them--but ICT can't generate scene files; it can only read, preview,
and render them. The diskette contains a second program called "Scene File
Maker" which automates the creation of ICT scene files. Oddly, this program is
written in Visual Basic instead of C++. An executable is included on the
diskette.
Image compositing, particularly applied to animation/video, is a fascinating
and rarely encountered topic that could use a solid, down-to-earth, hands-on
treatment. While it's good as far as it goes, Photo-Based 3D Graphics in C++
should not be considered more than an introduction.
Radiosity: A Programmer's Perspective
Ian Ashdown
John Wiley & Sons, 1994
$59.95
ISBN 0-471-30488-3
Photo-Based 3D Graphics in C++
Tim Wittenburg
John Wiley & Sons, 1995 
$44.95
ISBN 0-471-04972-7







































































SWAINE'S FLAMES


The Madness of King Bill


The following communication appeared in my e-mail inbox shortly after
President Clinton signed into law the Telecommunications Act of 1996,
including the provisions known as the "Communications Decency Act." The
message was signed "Anonymous," and both the signature and the style suggest
that it was written by the anonymous author of Primary Colors, the steamy
roman  clef about Clinton's presidential campaign in 1992. I have my doubts,
but the cynicism sounds right, and that last line does have the ring of truth.
I was just leaving the press conference where I'd been scooping up grounders
from the press corps on the firings at the White House travel office when
somebody pulled on my sleeve. It was spin doctor Hi Bidder, and he was
sweating like an Arkansas pig.
"Conference over at the palace, Corbett," he muttered. "Rush job."
Figuring shrewdly that he wasn't giving me a tip on an employment opportunity
with the Limbaugh organization, I got a move on over to the Oval Office.
The O.O. was lousy with spin doctors when I arrived. Before I had a chance to
bow to the Prez, spinner Steve Gherkin was filling me in on the latest
firestorm.
"It's this Communications Decency Act, Corbett. They're saying that the CDA is
a direct frontal attack on freedom of speech in America. They say the Prez has
sold them out."
"Who's this 'they'?" I probed.
"Mostly the Netniks," Hi put in. "They're mobilized. They're calling for the
Prez's head. Some nerd named Dave Whiner is putting together a letter-writing
campaign, Netnik-style."
"Which is what, exactly?"
"They spend hours designing clever Web pages and then show them to one
another."
We all had a chuckle at that. The Prez he-hawed until there were tears in his
eyes.
"You had me worried for a minute there, Hi," I said. "No press interest,
then?"
"No, so far, the scorps don't seem to have caught on."
Gherkin harumphed. "There is some online reporter claiming that the President
must be insane, perhaps the result of early drug use or an STD."
Hi and I glared at Gherkin and I shot a quick look at the Prez, but he was
staring out the window, a beatific smile on his face, and didn't seem to have
heard.
"I said press, not some crackpot Netnik. Is anybody with any real weight
speaking against the CDA?"
Gherkin shrugged. "Speaker Gingivitis has come out, you should excuse the
expression, against it. That enough weight for you?"
"After delivering the House in overwhelming support of it."
"Well, sure." He looked annoyed. "Look, Corbett, the word is out on the
Internet that the CDA will stifle political speech, that it will make the Net
the most censored medium ever. That it's plainly unconstitutional."
I snorted. "If it's unconstitutional, it's not our problem, Gherkin. That's
the Supreme Court's lookout." I glanced at the Prez, but he was busy playing
with the First Cat. "Guys," I told them, "there's no crisis here. This is no
Travelgate. Nobody's honked off except a few propellerheads. The press hasn't
even noticed. And if the Prez did, in fact, sign a bill that effectively
repeals the First Amendment, it just gives us better road position in the
election, and the Court will rule it unconstitutional, so no harm done."
"Yes, the Court has been a stalwart defender of the Bill of Rights lately."
Gherkin's snide remark annoyed me, as everything he says does. Driving home, I
looked the situation over again, but I still couldn't see how it could play
out badly for us. By the time I got home, I was sure nothing could go wrong.
Back at the townhouse, I turned on CNN while heating up dinner. The usual
stuff. Talking heads handicapping the presidential race. Entrail reading over
the latest twitch in the market. Microsoft rumored to be on the verge of
acquiring Switzerland.
And up in Boston somebody was dumping tea into the harbor.
Anonymous's premise, that only a crazy person could sign a bill like this into
law, is unfortunately incorrect. The President and members of the House and
Senate just don't think free speech is very important. The mainstream press
apparently agrees, considering lower cable rates to be the real story in the
Telecommunications Act of 1996, rather than this effort by our elected
representatives to repeal freedom of speech. There is this consolation: When
this act is overturned, there will be no doubt who the defenders of liberty
are, and no doubt about the political power of the medium they tried to
silence.
Michael Swaineeditor-at-large
mswaine@cruzio.com




























OF INTEREST
Great Circle from Geodesic Systems is a memory manager for Visual C++ 4.0. The
tool automatically eliminates memory bugs by preventing memory leaks and
premature frees. The memory manager is available for Windows 3.1/95/NT and Sun
OS 4.1x, Solaris 2.x, and other UNIX platforms. In addition to Visual C++ 4.0,
the tool supports most other C/C++ compilers, including Borland C++. It sells
for $495.00.
Geodesic Systems
4745 N. Ravenswood Avenue, Suite 11
Chicago, IL 60640
312-728-7196
info@geodesic.com
Aspect Software Engineering has announced iBasic, a server-side scripting
language that's compatible with HTML and Visual Basic. iBasic executes on Web
servers and provides the capability to respond to multiple browser requests
simultaneously by running VBA/HTML scripts in a multithreaded environment. In
a typical scenario, VBScript executes on the client browser, while iBasic
scripts execute on the server. iBasic licenses for $99.00 per server. 
Aspect Software Engineering
2800 Woodlawn Drive
Honolulu, HI 96822
808-539-3781
http://www.aspectse.com
SourceCraft has announced NetCraft, a browser/server tool that produces
standard Java code, and ObjectCraft, a client/server development environment
that produces standard C++ code using the same visual programming environment.
NetCraft generates Java source code. NetCraft supports an N-tier client/server
architecture in which web browser replace traditional Windows clients.
Netcraft sells for $995.00 per copy.
ObjectCraft, which lets you deploy two-tier and three-tier client/server
applications, generates standard Visual C++ source code, ensuring application
portability, scalability, and full OLE and OCX compliance. ObjectCraft
provides access to information stored in both relational and object-oriented
databases. ObjectCraft includes a methods editor, menu builder, forms builder,
browser, and code generator and debugger for C++. It sells for $1995.00. 
SourceCraft, Inc.
20 Mall Road
Burlington, MA 01803-4129
617-221-5665
http://www.sourcecraft.com
Eschalon Development has released a family of tools for Delphi 2.0. The
Eschalon Power Controls kit is a set of visual components that enable Windows
3.1 to Windows 95 porting. The Eschalon Power Secure toolkit delivers RC4,
DES, and Triple DES encryption to Delphi 2.0 apps. Further security features
include protection of EXEs against tampering, key generation, message digest,
and a true random-number generator. 
Eschalon Development
24-2979 Panorama Drive
Coquitlam, BC 
Canada V3E 2W8 
604-945-3198
http://www.eschalon.com
VisualCoder++ from Centauri Software is a C/C++ code processor for Windows
3.1/95/NT and OS/2 Warp. VisualCoder++ lets you build code by dragging and
dropping onto a control flow line, automatically formating and indenting
statements in the process. Function names, parameters, local variables, and
member variables are selectable from drop-down lists. The tool sells for
$99.00.
Centauri Software
4140 Oceanside Boulevard, Suite 159-103
Oceanside, CA 92056
619-630-8055
centauris@aol.com
The recently announced Borland C++ Development Suite includes Borland C++ 5.0,
CodeGuard 32/16, PVCS Version Manager, InstallShield Express, and
AppAccelerator Just-in-Time for the Java compiler in one integrated package.
The package includes a native 32-bit-hosted development environment, which
targets Windows 95/NT/3.1 and DOS. You can move between operating systems
using TargetExpert, which lets you change the target platform with the click
of a mouse. Borland C++ 5.0 also includes a multitarget project manager that
lets developers build 32- and 16-bit applications in parallel. 
Borland C++ 5.0 includes Object Windows Library (OWL) 5.0. OWL 5.0 also
encapsulates many Windows APIs, including WinSock (TCP/IP), MAPI (e-mail), and
others. OWL 5.0 is portable to other compilers, including Microsoft Visual
C++. The Suite contains a complete implementation of the ANSI/ISO C++ language
draft specification. This includes support for namespaces and C++ keywords
such as bool, explicit, mutable, and typename. Borland C++ 5.0 also contains
the Standard C++ Library, licensed from Rogue Wave. The Standard C++ Library
consists of C++ classes such as string, complex, and numeric limits, and the
Standard Template Library (STL).
For Internet development, Borland C++ 5.0 includes development tools for Java,
including Sun's Java Development Kit (JDK). The JDK is integrated with
Borland's IDE, and lets you develop cross-platform code, which can run on
operating systems from Windows 95 and Sun Solaris to Macintosh and others. The
Borland C++ IDE features complete support for developing Java applets and
applications. This includes project manager support; access to Java compiler
and debugger options through the IDE's multipage dialog boxes; and color
syntax highlighting for Java source code. In addition, Borland C++ 5.0
includes the Borland Debugger for Java and an AppExpert for Java-specific
applications and applets.
In addition to OWL 5.0, Borland C++ 5.0 supports MFC 3.2 and 4.0. With Borland
C++ 5.0, you can build MFC applications without having to manually modify MFC
library source code and move existing MFC applications over to Borland C++.
When MFC compilation support is enabled, the compiler uses the necessary
Microsoft C++ language extensions and an automated search-and-eplace mechanism
to allow proper implementation of the ANSI/ISO C++ bool keyword to compile the
MFC library source code. Since MFC support is offered directly, it is not
necessary to make individual compiler and linker settings in the IDE.
Finally, Borland C++ 5.0 introduces the native 32-bit ObjectScripting IDE, a
programmable and flexible environment that lets you modify and configure
Borland's IDE. ObjectScripting is an object-oriented cScript language similar
to C++ and the IDE Class Library. The IDE Class Library contains 23 classes
with over 600 methods and properties giving you control of the major IDE
subsystems, including the editor, build manager, project manager, and
debugger.
Borland International
100 Borland Way
Scotts Valley, CA 95066
408-431-1000 
http://www.borland.com


















EDITORIAL


Tick, Tick, Tick


Gee, it's amazing how time flies when you're having fun. Why, it seems like
just yesterday that Java and visual tools were being touted as virtual Advil
for every programmer's headache...uh, sorry, that really was just yesterday.
In any event, time waits for no programmer, as evidenced by the coming "year
2000" crisis. If your head has been buried in microcomputer silicon, you
probably haven't had to worry about the very real problem looming on the
heavy-iron horizon. At the heart of the problem are the two-digit dates
hardcoded into billions of lines of code and data files. (Rumor has it that
DDJ's own Al Stevens is even responsible for some of this code.)
Although we take disk storage for granted today, programmers used to represent
years in terms of two digits in an effort to save valuable disk space; the
year "1966," for instance, was simply referenced as "66". That was fine in the
'50s and '60s when the year 2000 was as remote as a gigabyte hard disk that
would fit in the palm of your hand. Programmers didn't worry much about
problems 40 or 50 years away (do you today?) because the task at hand was
daunting enough. Nor did they expect that much of the software they were
writing would be in use at the turn of the century. And even if some code was
hanging around, it would be someone else's problem.
Then along came the 1970s, the rise in housing prices, and the advent of
30-year mortgages. What banks discovered was that when scheduling loans into
the year "01" (for "2001"), the software treated the date as "1901." Insurance
companies and other organizations involved in long-term projections began
running into the same problem--dates used in programs were rolling back, not
over, at midnight, December 31, 1999. And since dates are used more than any
other type of data, just thinking about the morning of January 1, 2000 (okay,
January 2, considering the holiday) caused a whole lot of shaking in corporate
accounting departments. 
Identifying the "millennium" or "year 2000" problem is straightforward and the
deadline for rectifying it is fixed. Solving the problem, however, will be a
monumental task that, at its lowest level, amounts to inspecting every line of
code in every program or data file that might be affected. Experts estimate
that for many organizations, that could be up to 100 million lines of code at
a cost of $0.50 to $1.00 per line. According to year-2000 guru Peter de Jager,
we're "talking about a global effort that will cost between 300 and 600
billion U.S. dollars... [and] an effort that will require 2 million skilled
programmers."
As you might expect, an industry has grown up around the year-2000 problem
with the requisite trade shows ("Year 2000 Solutions: Resources and Strategies
for Managing the Year 2000 Date Conversion Problem," which met in March 1996),
publications (the "Tick, Tick, Tick" newsletter, edited by Williom Goodwin),
and Web sites (http://www.year2000.com). 
Likewise, software companies such as Computer Associates, Micro Focus,
Viasoft, and others have come forth with tools and services that address the
problem. One component of CA's approach, for instance, involves
"CA-Impact/2000," a parser-based "impact-analysis" tool embedded into source
code. In a similar vein, Micro Focus's "Challenge 2000" program also includes
tools for analysis and reengineering. Some companies have opted to rewrite
their systems from scratch using object-oriented tools and techniques. Others
have moved to PC-based client/server systems, away from mainframes altogether.
Still, there is an upside. Cobol programmers can count on continued
employment, and there are some great parties being planned. Times Square
hotels offering Millennium Eve blowouts are already booked up, Great Britain
is putting together a $300 million Year 2000 exposition, the Pope has kicked
off the Jubilee of the Year 2000, Mazda is launching a new Millennia
automobile, Elizabeth Arden has an upcoming line of Millennium skin-care
products, and Farberware will soon start shipping its Millennium pots and
pans. And you can expect at least one forward-looking politician in the coming
election season to promise legislation officially rolling back the clock to
1900.
But once the 1993 vintage Dom Perignon (which will be distributed December 31,
1999) is gone, governments and corporations that haven't planned for year 2000
conversion will wake up to a bigger headache than a post-party hangover. In
many cases, it may already be too late for smoothly implementing the
conversion. Let's see, from the cover date of this issue, there's a little
more than 1300 shopping days left. Tick, tick, tick....
Jonathan Ericksoneditor-in-chief













































LETTERS


Amiga Fans


Dear DDJ,
Thank you for mentioning the Amiga in your article "Dr. Dobb's Journal
Excellence in Programming Awards" (DDJ, March 1996). When I saw the name of my
computer-of-choice next to the names of those big-named systems, I was beside
myself. Those were ten letters that you didn't have to put in your article
(well, 15, if you count the commas, spaces, and the hyphen) but, you put them
in anyway.
I know it's silly to be happy about something like that but, after the
Commodore bankruptcy, I thought the Amiga might disappear from the face of the
Earth. Hopefully, Amiga Technologies will be able to resurrect this fine
machine.
Maybe some day articles on AmigaDOS message passing will sit next to articles
on Windows message passing in your magazine. 
Michael S. Sage
Amherst, Ohio 
aa5964@freenet.lorain.oberlin.edu 


Quincy 96


Dear DDJ,
Al Stevens' "C Programming" column about Quincy 96 (DDJ, February 1996) sounds
like a worthwhile project and I hope he continues extending it as mentioned to
include some AppWizard/ClassWizard functionality. Example code is always of
great help to developers working on development environments for their tools.
Al mentions that "no one has licensed or ported MFC to GNU C++" (page 126). I
should comment that my company offers a library called WM_MOTIF that allows
developers to use MFC applications on the more popular UNIX environments. Ads
for the library have appeared several times in DDJ over the past year, so you
might have read about it. As part of that product, we provide the necessary
changes to use MFC 2.52 and 3.1 with g++ (4.0 by February 1996). We do not
include the Microsoft source code, only our changes applied automatically via
a patch program. I believe two other companies also offer support for MFC
under g++, with Microsoft's blessing. Information about our WM_MOTIF library
and the WMMPATCH utility is available from our Web site (http://www.uno.com).
When you extend your IDE further, you might consider porting it to Linux using
a "Lite" version of WM_MOTIF. Requests for IDE functionality for gcc/g++
appear frequently on the Linux USENET groups (particularly
comp.os.linux.development.apps).
Jesus Alvarez
jalvarez@uno.com 


Majority Rule


Dear DDJ,
Thomas Nielsen's letter on majority rule (DDJ, February 1996) prompts me to
recommend the book Beyond Numeracy: An Uncommon Dictionary of Mathematics, by
John Allen Paulos (Random House, 1991).
Beyond Numeracy contains a chapter addressing which voting system is the
fairest, giving an example of 5 candidates and 55 voters. It turns out that
there is a voting system that suits each candidate; that is, choose the voting
system and you have chosen the candidate. This is not an academic exercise
because here in Italy the political parties have been fighting for years to
get their own voting system used, each of course wanting the one that gives
them the most points. The other chapters in Paulos' book will be of interest
to programmers, too.
Owen F. Ransen
rans001@IT.net 


The Future of Programming, Visual and Otherwise


Dear DDJ,
I loved your March 1996 issue. Keep up the good work. However, I disagree with
a couple of sentences in Al Stevens' "C Programming" column. I know it is an
opinion column, and he is entitled to his beliefs, but I object to serious
journals, such as DDJ, spreading disinformation. Al writes: 
Will such 3-D visual programming environments eliminate the need for
programmers to understand source code as we know it today? Not likely. C++ and
Visual Basic have not eliminated assembly language. Or machine language, for
that matter. In the future, when all else fails, you will look at an
occasional memory dump, stack pointer, and interrupt vector, just as you do
today.
While I do remember looking at core dumps--back when they were from little
donuts of magnetic material--I have not looked at one in at least 20 years. I
can agree that for some small set of serious gurus, looking at the entrails of
our programs may occasionally be informative. I believe that for most
professionals using modern tools, there is no need.
More importantly, I think journals such as DDJ have a responsibility to
educate developers that hand-coded assembly programming has no place in modern
systems. RISC systems, and RISC-like CPUs such as the P6, rely upon
intelligent optimizing compilers to keep their pipelines full, their registers
renamable, and the performance at the level claimed.
It is simply impossible to optimize more than a handful of assembly
instructions. Yet the PentiumPro (P6) has a 14-stage pipeline and three
execution units. To have all of these parts working at rated speeds takes at
least 17 stages of continuous work. Achieving the advertised performance takes
a continuous stream of hundreds of optimized instructions.
Computers (compilers) are good at repetitive calculations. Humans would go
crazy from boredom before optimizing a meaningful amount of code. The next
generation of CPUs from Intel and HP will use VLIW technology. This will
further strengthen the demand for optimizing compilers. Hand-coded assembly
language is dead for all but embedded systems. DDJ's readers should be getting
this as a consistent message.
By the way, it will be interesting to see how interpreted languages such as
Java, Smalltalk, and Visual Basic work on next- generation CPUs.
Pat Farrell 
Fairfax, Virginia
http://www.isse.gmu.edu/students/pfarrell
Al responds: Core dumps? I haven't looked at a core dump in years either, but
I look at RAM dumps almost daily. I wonder how fortunate I would think myself
to have been spared the rigors of assembly language, memory dumps, register
contents, and such for 20 years. You might view it as a blessing; I would take
it as a hindrance. Most contemporary debuggers allow us to view registers,
memory buffers, and machine-language code unassemblies. Those debuggers are in
the toolsets of virtually all programmers, not just for the so-called gurus.
Those features are there because we need them. That need is not likely to go
away any time soon.


Plumbers and Programmers


Dear DDJ,

In your April 1996 issue, Al Stevens talks about getting programmers out of
the closet so they can know when the insulation is not installed because the
wiring's not done because the plumbers have not finished. His suggestion that
programmers work in a virtual environment to make this possible ignores some
more pressing problems in software development. The team must first learn why
the wiring must be done before the insulation.
Having changed employers more frequently than I care to admit, it has become
clear that each business regards its product as unique. The theorists in our
business are studying "patterns" as the newest trend in software development.
The patterns have existed all along, it was EGO that refused to see it.
chermann@cris.com


Hashing it Out


Dear DDJ,
Andrew Binstock's article "Hashing Rehashed" (DDJ, April 1996) was interesting
and timely, as I just finished writing a portable hash-table class for a
project last month. Along with the square-root article and "Swaine's Flames,"
this was quite the issue!
While inspecting the PJW and Elf algorithms to see which one might be the most
appropriate for my class, I kept noticing something interesting. If sizeof(
int)==sizeof( long)==4, the algorithms are identical. Just now I ran them both
through UNIX's /usr/dict/web2 and web2a, and indeed they always return the
same results on my 32-bit system.
Now, off to determine whether an expandable closed table would be appropriate
for this class....
Scott Hess 
Burnsville, Minnesota
shess@winternet.com
Dear DDJ,
I read with much interest the article "Hashing Rehashed," by Andrew Binstock
(DDJ, April 1996). Andrew presents a decent article, however, I have an
additional recommendation.
First of all, searching through a linked list can be, at worst, O(n) in
efficiency. This is due to the fact that it is possible, with an inefficient
algorithm, to produce a "hit" on the same hash value numerous times. Instead
of using a linked list to store the data off of the hash table, I suggest
using a binary tree. That way, even if the hash function is poor, the binary
tree will still return O(log2 n) access to the stored data. This is relatively
easy to accomplish. Instead of the array or table pointing to the head of the
linked list, change the table to point to the root of the tree. This, in
addition to being a near-perfect hashing algorithm, will produce speedier
results.
Brigham W. Thorp
West Simsbury, Connecticut 
Brigg999@aol.com


Parsing Parsers


Dear DDJ,
Bravo for Thor Mirchandani's article "Building Parsers with Leopurd" (DDJ,
March 1996). I have seen too many software projects in which the use of LEX
and YACC resulted in size, speed, and compatibility problems. Top-down parsers
of the type that Thor describes are very simple to program and maintain. They
can easily be an order of magnitude smaller and faster, because they don't
have to wander around in all those big tables, and they don't have to build
parse trees.
I believe there is one mental block that prevents many programmers from
programming and using top-down parsers. It is the idea of one-token
look-ahead. I think look-ahead is counter intuitive because it means you scan
in the next token immediately after accepting the current one, rather than
immediately before. Many algorithms can be simplified by using look-ahead, not
just parsers.
Mike Dunlavey
Needham, Massachusetts 
miked@mga.com































Managing Dynamic Objects in C++


The Localized Ownership pattern language makes it possible




Tom Cargill


Tom is an independent consultant, the author of C++ Programming Style
(Addison-Wesley, 1992), and a columnist for C++ Report. This article is based
on his PLoP '95 paper published in Pattern Languages of Program Design 2
(Addison-Wesley, 1996). Tom can be contacted at cargill@csn.org.


The lifetime of a dynamic object in C++--one allocated from heap memory (the
free-store)--is managed explicitly by the program. The program must guarantee
that each dynamic object is deleted when it is no longer needed, and certainly
before it becomes garbage. (There is no garbage collection in standard C++,
and few programs can afford to produce garbage.) For each dynamic allocation,
a policy that determines the object's lifetime must be found and implemented.
Localized Ownership is a pattern language that tackles the management of
dynamic object lifetimes in C++. It forms a sequence of patterns that address
a range of design constraints. Early patterns in the sequence emphasize
simplicity and locality of the responsible code. Later patterns offer greater
flexibility at the expense of a more complex implementation that is dispersed
more widely.
The simplest policies for managing a dynamic object's lifetime are those that
localize the work within a single component, such as a function, object, or
class. A simple solution that suffices is ideal. Unfortunately, attempts to
localize lifetime policies are confounded by rich object architectures that
require their objects to play multiple roles in cooperation with many
collaborators. What ownership policies are applicable in these more complex
contexts?


Terminology


Lifetime. The lifetime of an object in C++ is the interval of time it exists
by occupying memory. An object is created from one of three "storage classes:"
automatic (local), static, and dynamic. The lifetimes of automatic and static
objects are controlled implicitly by the execution of the program. The
lifetime of an automatic object ends when execution exits its block. The
lifetime of a static object continues until the end of the execution of the
entire program. However, the lifetime of a dynamic object continues until the
program explicitly destroys that object.
Creator. A dynamic object is created by the execution of a new expression. The
entity that executes the new expression is the dynamic object's creator. The
creator may be a (member) function, object, or class. The creator cannot be
inferred from the source code alone. In general, a new expression that creates
a dynamic object is executed by a member function; that member function
executes on behalf of an object; and that object is an instance of a class.
Therefore, the creator of the dynamic object might be deemed to be the
function, object, or class. In the following code, the creator could be the
member function f, the object executing A::f, or the class A.
void A::f()
{
 . . .
 new Resource
 . . .
}
Although the creator is determined by the intent of the programmer, the
language constrains the choice. If the new expression is executed by a static
member function, the creator is either the function or its class. (There is no
object.) If the new expression is executed by a nonmember function, only the
function may be the creator. (There is neither object nor class.)
Owner. The owner of a dynamic object is the function, object, or class that
bears the responsibility for ending the lifetime of the dynamic object. Upon
creation of a dynamic object, its creator becomes the owner. A dynamic object
may acquire an owner other than its creator. A dynamic object may have more
than one owner at a time.
Ownership does not imply exclusive access to the dynamic object. Other parts
of the program may legitimately access the dynamic object, but that access
must be consistent with the owner's policy. In particular, other parts of the
program must guarantee not to attempt any access after the owner has deleted
the object.


Localized Ownership


The Localized Ownership pattern language contains three primary patterns, of
which the first is divided into three sub-cases. The three patterns are as
follows:
Creator As Sole Owner (Function As Sole Owner, Object As Sole Owner, and Class
As Sole Owner).
Sequence of Owners.
Shared Ownership.
The patterns are ordered by decreasing localization of the ownership
responsibility. Earlier patterns localize the responsibility narrowly; later
patterns distribute the responsibility more widely. Localized responsibility
generally simplifies design and implementation. Therefore, use the lowest
numbered pattern that suffices.


Resources Other than Dynamic Objects


Although Localized Ownership patterns might be used to manage other resources
(such as files or database locks), they are presented specifically in terms of
dynamic objects. There are two reasons for this. First, the focus on dynamic
objects makes the discussion more concrete. Second, for most C++ programmers,
the effort expended on managing dynamic objects dominates the effort expended
on managing other resources.


Pattern 1: Creator As Sole Owner


Context. The creator of a dynamic object is in a position to fully determine
the object's lifetime. A narrow, specific purpose for the dynamic object
suggests that its creator may be able to control its lifetime. A narrow
purpose does not imply that the object is short-lived or that the creator has
exclusive access to the object.
Solution. Upon creation of the dynamic object, its creator becomes the owner.
Ownership responsibility is never transferred; the creator must eventually
dispose of the dynamic object. Other parts of the system must access the
dynamic object according to the owner's policy. When the identity of the
dynamic object passes through an interface (such as a pointer or reference as
a function parameter or return value), constraints on the object's lifetime
must be part of that interface.
The three types of creator result in the following three specializations of
this pattern.



Pattern 1.1: Function As Sole Owner


Context. An object with access restricted to the function that creates it, and
to other functions whose execution is nested within that function's execution,
is a candidate for sole ownership by the function in which it is created. The
lifetime of the object can be established without considering the lifetimes of
any other objects.
Solution. When the sole owner is a function, that function must eventually
delete the object. The delete expression is coded explicitly in the function.
If the dynamic object is required for only the current invocation of the
function, the delete occurs before the function returns to its caller. If the
dynamic object's lifetime spans many invocations, the function may retain the
object from call to call using a local static pointer. Eventually, the
function itself executes the delete when the object is no longer needed.
Examples. The creator function creates and deletes during a single invocation;
see Listing One.
A function that holds a dynamic buffer from one call to the next is shown in
Listing Two. On each call, the function determines if the current buffer is
large enough, allocating a larger buffer when needed.


Pattern 1.2: Object As Sole Owner


Context. The lifetime of the dynamic object cannot be tied to a single
function, but its lifetime can be bounded by the lifetime of the object that
creates it. Creation occurs in a nonstatic member function.
Solution. If the sole owner is an object, then the owner acquired the dynamic
object by executing one of its member functions and must delete the dynamic
object before the end of its own lifetime. The last opportunity the owner
object has to delete the owned object is during the execution of the owner's
destructor. 
Examples. Stroustrup's "Resource Allocation is Initialization" is an example
of Object As Sole Owner (see The C++ Programming Language, Second Edition,
Addison-Wesley, 1991).
The "Cheshire Cat" technique (a term coined by John Carolan in "Constructing
Bullet-Proof Classes," Proceedings C++ At Work, SIGS Publications, 1989)
reduces unwanted, indirect header-file dependencies by moving the private
members of a class into an auxiliary object, to which the class delegates its
operations. Typically, the auxiliary structure is dynamically allocated in the
constructor and deallocated in the destructor; see Listing Three.
If a communications Connection object encounters an error, it opens a Window
to display pertinent information. The Connection object lazily defers creation
of the Window until an error occurs. If instructed to clearErrors(), the
Connection disposes of the Window if it has one. If the Connection still holds
a Window at the end of its own lifetime, the destructor performs the delete;
see Listing Four.


Pattern 1.3: Class As Sole Owner


Context. Even though there is no single function or object to serve as the
owner, the ownership responsibility can be localized to a single class. The
function that creates the object is a static or nonstatic member function of
the class.
Solution. All parts of the class collaborate to determine the lifetime of the
owned object. The implementation of the ownership responsibility is
distributed across the member functions as they execute on behalf of all
objects of that class. Static member functions may also participate.
Example. A Query object provides end-user database services. Query objects are
created and destroyed arbitrarily. All Query objects share a single DataBase
server. To conserve resources, the DataBase object should exist only when
there is at least one Query object. Collectively, the constructor and
destructor of Query use static data members to implement this policy on behalf
of the class; see Listing Five.


Pattern 2: Sequence of Owners


Context. No suitable creator that can assume on-going ownership responsibility
can be found. For example, a factory creates objects on behalf of others. Once
the object has been successfully created, the factory's ownership
responsibility should end, but the created object's lifetime must not.
Solution. Ownership may be transferred from the creator to another owner.
Ownership may then be further transferred any number of times. At each point
of transfer, the outgoing owner relinquishes its ownership obligation,
whereupon the incoming owner assumes it. The transfer of ownership must be
considered as an explicit part of the interface between the two. Although the
owner may change many times, at any particular time, ownership rests with
exactly one owner.
Examples. The code in Listing Six shows ownership transfer from a factory to a
function. The transfer could as easily be to an object or a class as the
incoming owner.
The function in Listing Seven transfers ownership to a local auto_ptr object.
The auto_ptr template is part of the draft C++ standard library (ANSI Document
X3J16/95-0091, "Working Paper for Draft Proposed International Standard for
Information Systems Programming Language C++," April 1995). The only purpose
of an auto_ptr object is to assume ownership of a dynamic object. The auto_ptr
destructor deletes the object. The advantage of transferring ownership to an
automatic object is that its destructor will execute under almost all
control-flow scenarios that leave the block.


Pattern 3: Shared Ownership


Context. When a dynamic object is shared widely by diverse components, it may
be impossible to identify a single owner or even a sequence of owners. An
object that defies simple characterization and management of its lifetime is
likely to be long-lived. 
Solution. Allow arbitrary owners to declare unilateral interest in a dynamic
object. The lifetime of the dynamic object continues until all of the owners
have relinquished their interest in it. Constraints on the owners and the
owned object vary depending on the style of implementation.
Examples. A conventional reference-counted implementation of a string class
(see The C++ Programming Language) uses Shared Ownership of the representation
object, which in turn is Object As Sole Owner of the underlying array of
characters. Usually, the representation class is specifically designed to
carry a reference count to support its shared ownership; that is, the
implementation of the ownership intrudes into the class of the owned object.
Ownership is intentionally restricted to the string objects.
Reference counting "smart pointers" (see Advanced C++ Programming Styles and
Idioms, by James Coplien, Addison Wesley, 1992) also use Shared Ownership. As
with a reference-counted string class, the mechanism usually intrudes into the
owned object and restricts ownership to the smart pointer objects.
The following example demonstrates that the intrusion on the owned object and
constraints on the owners are not intrinsically part of the pattern.
Shared ownership information with respect to arbitrary objects is recorded in
a global instance of the class Shared. Shared::adopt() records an arbitrary
owner declaring an interest in an arbitrary object. Shared::disown() records
an owner relinquishing that interest. A true result, from disown(), informs
the departing owner that no other owners remain, and the dynamic object should
be deleted. Effectively, Shared provides reference counting that constrains
neither owner nor owned object; see Listing Eight.
A Journal object logs messages to a Window. The current Window is established
in its constructor, but may be changed by the bind member function. A Journal
object assumes shared ownership of its Window by registering its ownership
with clerk and deleting any unowned Window. The Window class does not
participate in the implementation of this ownership mechanism. The other
owners of Window may be arbitrary functions, objects, and classes. Journal
requires only that they observe the protocol with respect to clerk; see
Listing Nine.


Related Topics


Dangling Pointers. Any discussion of garbage avoidance often leads to the
separate, but related, topic of dangling pointers. A pointer dangles when use
of its value becomes invalid at the end of the lifetime of the object that it
addresses. Use of a dangling pointer is undefined behavior, which can result
in a program crash or worse. A pointer that refers to a dynamic object dangles
if the dynamic object is deleted (by its owner). However, dangling pointers
can also arise without the use of dynamic allocation: A pointer to an
automatic object dangles if the pointer outlives the object. A dangling
pointer must never be dereferenced.
Reference Counting versus Garbage Collection. Reference counting (mentioned
with respect to Shared Ownership) is easily confused with garbage collection.
However, the two mechanisms are not equivalent. There are circumstances under
which reference counting reclaims memory that garbage collection misses, and
vice versa. By tolerating a dangling pointer, reference counting that depends
on an explicit decrement operation can recover a dynamic object that garbage
collection must retain. On the other hand, a cycle of mutual references can
cause reference counting to overlook a set of objects that garbage collection
would reclaim.
Exception Handling. An exception handling (EH) mechanism for C++ (using
termination semantics) has been proposed in the draft standard, and
implementations are becoming available. Clearly, a nonlocal goto mechanism
introduces further analysis in determining that a program is coded correctly,
including any use of dynamic memory. Unfortunately, to date, the C++ community
has little experience using EH. It appears that object state and resource
management (including dynamic memory) are the two key problems to address when
using EH and that resource management is the easier of the two.


Acknowledgment



My thanks to Paul Jensen, David Leserman, Doug Schmidt, John Vlissides, and
the members of my PLoP '95 workshop for their comments.

Listing One
void some::SoleOwner()
{
 . . .
 Resource *res = new Resource;
 . . .
 delete res;
 . . .
}

Listing Two
void
Transport::xmit(const Packet *p)
{
 static char *buffer = 0;
 static int allocated = 0;
 int sz = formattedSize(p);
 if( sz > allocated ){
 delete [] buffer;
 buffer = new char[sz];
 allocated = sz;
 }
 . . .
}

Listing Three
class Server {
public:
 Server();
 ~Server();
 . . .
private:
 class Auxiliary *aux;
};
Server::Server()
{
 aux = new Auxiliary;
}
Server::~Server()
{
 delete aux;
}

Listing Four
class Connection {
public:
 Connection();
 ~Connection();
 void clearErrors();
 . . .
private:
 Window *errWin;
 void error(Text);
 . . .
};

Connection::Connection()
{
 errWin = 0;
}
Connection::~Connection()
{
 clearErrors();
}
void
Connection::error(Text t)
{
 if( errWin == 0 )
 errWin = new Window;
 . . .
 errWin->write(t);
}
void Connection::clearErrors()
{
 delete errWin; // may be null
 errWin = 0;
}

Listing Five
class Query {
public:
 Query();
 ~Query();
 . . .
private:
 static DataBase *db;
 static int population;
 . . .
};
DataBase *Query::db = 0;
int Query::population = 0;
Query::Query()
{
 if( population == 0 )
 db = new DataBase;
 population += 1;
}
Query::~Query()
{
 population -= 1;
 if( population == 0 ){
 delete db;
 db = 0;
 }
}

Listing Six
Resource *Creator::factory()
{
 Resource *res = new Resource;
 if( res->valid() )
 return res; // transfer
 // retain
 delete res;
 return 0;

}
void Client::usesFactory()
{
 Resource *res =
 Creator::factory();
 if( res != 0 ){
 // transferred
 . . .
 delete res; // eventually
 }
}

Listing Seven
void Use_auto_ptr()
{
 Resource *res = new Resource;
 auto_ptr<Resource> owner(res);
 . . .
 // use res
 . . .
} // owner deletes

Listing Eight
template<class R>
class Shared {
public:
 void adopt(R*);
 bool disown(R*);
private:
 map7<void*,int> owners;
};
template<class R>
void Shared<R>::adopt(R *r)
{
 if( owners.find(r) == owners.end() )
 owners[r] = 1;
 else
 owners[r] += 1;
}
template<class R>
bool Shared<R>::disown(R *r)
{
 assert( owners.find(r) != owners.end() );
 owners[r] -= 1;
 if( owners[r] > 0 )
 return false;
 owners.erase(r);
 return true;
}

Listing Nine
class Journal {
public:
 void bind(Window*);
 void log(Message);
 Journal(Window*);
 ~Journal();
 . . .
private:

 Window *current;
 . . .
};
void Journal::bind(Window *w)
{
 clerk.adopt(w);
 if( clerk.disown(current) )
 delete current;
 current = w;
}
void Journal::log(Message m)
{
 current->display(render(m));
}
Journal::Journal(Window *initial)
{
 current = initial;
 clerk.adopt(current);
}
Journal::~Journal()
{
 if( clerk.disown(current) )
 delete current;
}







































STL Iterators


Start with iterators to use STL effectively




Dan Zigmond


Dan is a senior software engineer with Wink Communications. He can be reached
at dan.Zigmond@wink.com.


The Standard Template Library (STL) developed by Alexander Stepanov and Meng
Lee has been widely discussed since it became part of the proposed ANSI/ISO
C++ standard. With compiler vendors lagging behind standards committees,
however, many programmers are only now beginning to use STL in day-to-day
programming tasks.
In this article, I'll explore STL iterators. In future articles, I'll examine
STL generic algorithms and go inside the standard STL implementation from HP,
explaining some of the details of how generic algorithms and data structures
can be specialized using iterator tags and adapters. 


Iterators


Before discussing iterators, I'll take a step back and look at C pointers,
which form the basis of the iterator abstraction. 
When passing an array to a function in C, you usually pass a pointer. Using
this pointer (and some information about the size), you can do anything you
want with the array: Traverse it in order, pick out an arbitrary element in
the middle, copy it, and even destroy it. Example 1(a), for instance, is a
function that finds the first occurrence of a given character in an array of
characters using only pointers to the beginning and end of the array (assuming
you don't have strchr handy). 
Pointers are used as the interface to the data structure in find_char, which
in this case is an array of characters. C++ lets you generalize this function
to work with almost any sort of array by turning it into a template function.
All you have to do is substitute a template parameter for all occurrences of
the hard-coded type char; see Example 1(b). This generalized version of
find_char works on virtually any array because it depends only on your ability
to traverse an array using a pointer and compare the contents of the array
with the inequality operator. In other words, if you have an array of objects
of type T, find_in_array depends on your ability to traverse that array using
pointers of type T*. C guarantees this for C arrays.
The difficulty comes when you decide that arrays are not the perfect internal
representation for whatever you're trying to do, and you decide to move to,
say, a linked-list structure. A typical implementation might look like Example
1(c), which bears a resemblance to the original find, but the two are
obviously incompatible. For simplicity, list_nodes are implemented as a simple
struct, rather than a more complicated (and well encapsulated) class; other
sorts of implementations would require some minor changes to the new
find_in_list, but the structure would remain the same. 
There's something unsatisfying about find_in_array and find_in_list. At some
level, they are the same function; if you were to describe the algorithms in
plain English, you would probably use the same words for both. The only
difference is in the interface to the underlying data structure. Simple
pointers are the interface to C arrays, while pointers to list_node structures
are the interface to the linked lists. Pointers are incremented with the ++
operator, while the list nodes are incremented using the idiom node =
node-->next. Pointers are dereferenced using the * operator, while the list
nodes are dereferenced by extracting the item member. All this seems natural
(almost all C and C++ linked lists work this way), yet completely arbitrary.
Why can't we access lists and arrays the same way?
Example 2 is a slightly more complicated way of defining list nodes. Using
this implementation, list_nodes act very much like C pointers, so you can now
rewrite find_in_list. It's interesting that this find_in_list is virtually
identical to the generalized find_in_array. The only difference is in the type
of the pointer-like objects used to do the iteration through the data
structure. In find_in_array, I use real pointers, while in find_in_list, I use
the pointer-like list_node class.
These type-only differences are exactly what templates are designed for. Just
as I generalized find_char to find_in_array by substituting the template
parameter T for the hard-coded type char, you can generalize both
find_in_array and find_in_list to a truly generic find function by
substituting a template parameter for the list_node class and T* pointers; see
Example 3. This raises the question: What is this new parameter (called I in
the example)? In STL, classes that meet the requirements of this parameter are
called "iterators"--objects that function more or less like pointers. Like
conventional pointers, iterators are used as the interface to a data structure
so that you can define functions like find that will work with a variety of
different data representations. Because the pointer semantics used by find
(forward iteration using ++, dereference using *, and comparison with !=) can
be implemented efficiently for both arrays and linked lists, the generalized
find function is as efficient as the more specialized find_in_array and
find_in_list; once you defined the pointer-like list_node template class,
making a single find algorithm work for both arrays and lists costs nothing.
Functions like find that use iterators to encapsulate efficient algorithms
independent of the data representation are called "generic algorithms."


Using Iterators


Iterators make a lot more sense when taken in context. If you haven't examined
sample STL programs, I've included two that use iterators (and other parts of
the standard libraries) to solve a basic problem: counting the number of words
from standard input. The first program, wc (see Listing One, simply stores all
the words in a temporary vector, then counts them. (The program isn't very
smart. A better implementation would count them on the fly and not store them
at all.) The second program, uwc (see Listing Two), is a bit more
sophisticated; it stores all the words in a sorted set before counting them so
that duplicate words are discarded. 
The two programs look almost identical, even though they perform substantially
different functions and use completely distinct internal data representations.
This is because the interface to the copy algorithm is defined entirely with
iterators. In fact, copy can be defined in a very similar way to the generic
find; see Example 4. The only significant difference between copy and find is
that copy deals conceptually with two types of iterators--one for the sequence
it is copying from and the other for the sequence it is copying to. These two
sequences need not be the same; they can be, but that is just a special case.
To allow users to copy from, say, a linked list to a C array, you need to
allow the two types of iterators to be distinct.
In wc, you are copying from an istream_iterator to a vector. In uwc, you are
copying from an istream_iterator to a set. The two have very different
results: Copying into a vector in this way preserves the original word order,
while copying into a set removes, duplicates, and sorts alphabetically. In
both of these cases, the call to copy looks the same because its interface is
defined entirely in terms of iterators. (The output line is also identical
because both vectors and sets define a size member function.) The ability to
abstract our algorithms from our data structures is the real power of the STL.
The C++ library recognizes five major types of iterators. The reason to
separate iterators into these types is that different algorithms ask different
things of their data structures. The simple find algorithm, for example, makes
only three demands of the class used as the iterator parameter: It must be
incremented using ++, dereferenced using *, and compared using !=. Values in
the underlying data structure are dereferenced only once, and the result of
dereferencing is never used as an lvalue. In other words, you make a single
pass through the structure, and read from it, but never write to it.
The copy algorithm is slightly more complicated, with its two distinct types
of iterators. While SourceIterator need only perform the same operations as
the iterator used in find, DestinationIterator uses the * operator to
establish somewhere to write to, not to read from. But it also makes a single
pass through the structure, dereferencing each location exactly once.


Input and Output Iterators


The C++ standard library calls the first of these iterators (find-style) an
"input iterator." Technically, to be an input iterator, a type X must meet the
following requirements:
X must have a constructor and destructor.
X must have a copy constructor.
It must be possible to compare two objects of type X using the == operator and
the != operator. The result of these comparisons must be convertible to bool.
It must be possible to dereference an object of type X using the * operator,
but the resulting type need not be an lvalue type.
For objects x of type X, ++x should return an object of type X& as long as x
is dereferenceable. The result must be either dereferenceable or a value
signifying past-the-end.
For objects x of type X, x++ should return an object convertible to the type
const X&. The expression should be equivalent to { X tmp = x; ++x; return tmp;
}. 
For objects x of type X, it must be possible to dereference the object using
*x++ if x is dereferenceable. 
Any type that meets these criteria is an input iterator as far as the C++
library is concerned, regardless of how it meets the criteria. 
Iterators like DestinationIterator of the copy algorithm are called "output
iterators." A type X must satisfy a different set of requirements to be
considered an output iterator:
X must have a constructor and destructor.
X must have a copy constructor. 
For objects x of type X and objects t of type T where X is an iterator on a
data structure containing objects of type T, it must be possible to alter the
data structure using *x = t if x is dereferenceable and not past-the-end.
The ++ operator must be usable in both prefix and postfix fashion with objects
of type X, as described for input iterators.

For objects x of type X and objects t of type T where X is an iterator on a
data structure containing objects of type T, it must be possible to alter the
data structure using *x++ = t if x is dereferenceable and not past-the-end.
For both input and output iterators, the assumption is that only one pass will
be made through the underlying structure. This makes it possible to create
iterators for I/O streams. It allows you to write programs like wc and uwc
that work with standard input and output without having to write any special
code. You can copy to or from one of these streams just by creating a special
kind of input or output iterator (called an istream_iterator or an
ostream_iterator) and using the copy algorithm in the usual way.
I/O streams demonstrate perfectly what input and output iterators are really
for. An input iterator is for any sequence that, like an input stream, is
intended to be read only once, in order. An output iterator is for any
sequence that, like an output stream, is intended to be written to once and in
consecutive order. The reason the C++ library defines different sorts of
iterators is that different data structures can support only certain types of
iterator operations efficiently and different algorithms rely on different
sets of these operations. If a structure (like an input stream) can support an
input iterator, you can use the find algorithm to search it. If a structure
(like an output stream) can support an output iterator, you can use the copy
algorithm to fill it.


Forward Iterators


If you think of the five types of iterators as forming a hierarchy,
input/output iterators represent the bottom, the simplest of all the
iterators. On the next level up are objects that effectively combine the
properties of input and output iterators and remove the single-pass
requirement. These are called "forward iterators." More formally, a type X is
a forward iterator if it meets all of the requirements for both input and
output iterators, plus:
It must be possible to assign one object of type X to the value of another
using the = operator, with the result having the type X&. 
For all objects x1 and x2 of type X, x1 == x2 implies *x1 == *x2 and ++x1 ==
++x2.
Together, these requirements allow multipass algorithms to work as expected
with forward iterators. A good example of a data structure that supports a
forward iterator well is a singly linked list. You can pass through a list as
many times as you want without seeing different results (unlike some streams),
and you can both read to and write from most lists. A good example of an
algorithm that requires forward iterators is remove, which searches for all
instances of a given object in a sequence and takes them out. Example 5(a),
taken from HP's original implementation of STL, presents this algorithm. The
algorithm depends on find and remove_copy (which is like remove except that it
stores the resulting sequence in another container, rather than overwriting
the original). In essence, all remove does is search the sequence described by
first and last for value. If there are no instances of value in the sequence,
you're done. If there is at least one occurrence, you call remove_copy to take
them out, putting the resulting value-free sequence on top of the existing
sequence. Example 5(b) is the code for remove_copy.
remove_copy requires only input and output iterators, because it is always
reading from one place and writing to another. remove itself requires a
forward iterator, because it is essentially the special case of remove_copy in
which both the source and destination are the same.


Bidirectional Iterators


The next step for iterators is to provide decrementing as well as
incrementing, allowing for multipass, bidirectional algorithms. A type X is
considered a bidirectional iterator if it meets all the requirements of a
forward iterator as well as the following stipulations regarding
the--operator:
For objects x of type X,--x should return an object of type X& as long as x is
dereferenceable. The result must be either dereferenceable or a value
signifying past-the-end.
For objects x of type X, x--should return an object convertible to the type
const X&. The expression should be equivalent to { X tmp = x; --x; return tmp;
}. 
For objects x of type X, the expression *x--must return an object of type X&
(that is, an lvalue) if x is dereferenceable. 
An example of a data structure that supports bidirectional iterators (but
nothing more) is a doubly linked list. There aren't many standard algorithms
that require bidirectional iterators, but one is reverse_ copy, which
functions like copy except that the elements are copied from the source to the
destination in reverse order; see Example 6.


Random-Access Iterators


At the top of the iterator hierarchy are random-access iterators. These add to
bidirectional iterators the ability to jump to arbitrary locations with a
structure in constant (not linear) time by introducing the following
requirements:
For all objects x of type X, you can add an arbitrary distance d to x using x
+= d, or subtract it from x using x --= d.
For all objects x of type X and a distance d, you can create a new, temporary
object of type X by adding the distance using x + d, or subtracting the
distance using x -- d.
You can compute the distance between two objects x1 and x2 of type X using x1
-- x2. 
You can compare two objects x1 and x2 of type X using <, >, >=, and <=.
The classic example of a random-access iterator is a C pointer into an array.
One major advantage of arrays is that they support random access. The various
sorting algorithms included in the C++ standard libraries (sort, stable_sort,
partial_sort, and partial_sort_ copy) all require random-access iterators.
Other data structures (such as the STL list class) that do not provide random
access but may need to be sorted require their own special-purpose sorting
routines.


Other Distinctions


Iterators can have other properties. One of the most basic distinctions among
various types of iterators is whether one is dereferenceable or past-the-end.
A dereferenceable iterator is one that can be dereferenced using the unary *
operator. A past-the-end iterator, on the other hand, cannot be assumed to be
dereferenceable. Rather than describing an element in some container, a
past-the-end iterator is intended to bound the container itself by pointing
just past the last value in it.
Among dereferenceable iterators, there are both mutable and constant
iterators. A mutable iterator can be assigned, as in *i = a. Constant
iterators cannot. In other words, the result of *i is an lvalue for a mutable
iterator, but not for a constant iterator.


Specialized Iterators


The basic sorts of iterators are useful, but there are times when more
variations are needed. The C++ standard library provides several types of
predefined, specialized iterators for other purposes.


Iterators for Streams


I have alluded to the two types of iterators used on C++ streams. The
istream_iterator template class is used to read from input streams. When the
iterator is constructed and every time the ++ operator is used, a new value is
read from the stream using the operator. The stream to read from is passed as
an argument to the constructor of the iterator. Once the end of the stream is
reached, the iterator becomes equal to a special, end-of-stream value. At this
point, the results of dereferencing the iterator are undefined. An istream_
iterator constructed with the default constructor is also set to this
end-of-stream value, making it easy to test an istream_ iterator before trying
to dereference it. An istream_iterator is an input iterator.
An ostream_iterator acts similarly, except that it writes values using the
operator each time it is assigned. Like all output iterators, an
ostream_iterator cannot be read from. If you construct the iterator with both
an ostream and char*, the char* will be written to the ostream after each new
value is written. This is useful in delimiting output with spaces or new
lines. For example, to copy a series of words from standard input to standard
output and convert multiple spaces between them to a single space, you could
use copy (istream_iterator< string >(cin), istream_ iterator< string >(),
ostream_ iterator< string >(cout,' ');.


Insert Iterators



When making a call like copy(first, last, result), you normally expect the
sequence starting with result and continuing to result + last --first to be
overwritten. Sometimes, however, you want to insert a new sequence into
another sequence, without having to do a series of copies into a new
structure. Insert iterators facilitate this.
When you increment an insert iterator, a new element is inserted in the
sequence. The iterator is set to this new element, rather than the element
that would have been next in the sequence. There are three types of insert
iterators, and each one inserts the new element in a different place: front_
insert_iterator adds it to the beginning of the container;
back_insert_iterator adds it to the end of the container; and insert_iterator
adds the new element where the iterator points to a container.
Rather than having to create instances of these classes explicitly using
constructors, the library provides template functions to create them. The
functions front_ inserter and back_inserter take the container and return a
front_insert_iterator and a back_insert_iterator, respectively, on that
container. For example, given a container x, front_inserter(x) returns an
iterator that will insert new elements at the beginning of x.
For inserting in the middle of a structure, inserter takes two arguments: the
container itself, and an iterator into the container describing where to
insert. For example, copy(front, back, inserter(x,result)) will take the
sequence from front to back and insert it into container x starting at result.


Reverse Iterators


Sometimes you want to move backwards through a container. You can always do
this explicitly using the--operator (as long as you're using a bidirectional
or random access iterator), but this requires changing all of the code that
traverses the container in order to change directions. An easier way is simply
to create a "reverse iterator"--one that goes "backwards" with each ++ and
"forwards" with each--. The C++ library provides two template classes of
reverse iterators, reverse_ bidirectional_ iterator and reverse_iterator. The
latter assumes that you are reversing a random-access iterator.
The STL containers that can support reverse iterators have special member
functions to create them. rbegin, for example, returns a reverse iterator
pointing to the end of a sequence (which is the beginning if you want to
traverse the sequence in reverse); rend returns a "past-the-end" reverse
iterator (which actually points just before the first element). To create a
for loop that goes all the way through some vector v in reverse order, you can
use for (r = v.rbegin(); r != v.rend(); ++r) {}.
In the end, all these iterators are only useful as the interface for
manipulating our data structures with generic algorithms. I'll explore this
algorithm more in the next article in this series.
Example 1: (a) Function that finds the first occurrence of a given character
in an array of characters using only pointers to the beginning and end of the
array; (b) generalizing this function to work with almost any sort of array by
turning it into a template function; (c) this resembles, yet is incompatible
with, (a).
(a)
char* find_char(char* first, char* last, const char& value)
{
 while (first != last && *first != value) ++first;
 return first;
}

(b)
template< class T >
T* find_in_array(T* first, T* last, const T& value)
{
 while (first != last && *first != value) ++first;
 return first;
}

(c)
template <class T>
struct list_node
{
 list_node* next;
 T item;
};
template <class T>
T find_in_list(list_node<T>* first, list_node<T>* last, const T& value)
{
 while (first != last && first->item != value) first = first->next;
 return first;
}
Example 2: Defining list nodes.
template <class T>
class list_node {
public:
 list_node() : next(0) {}
 list_node(list_node< T >& ln) : next(ln.next), item(ln.item) {}
 ~list_node() { delete next; }
 list_node<T>& operator=(const list_node<T>& ln) {
 if (ln != this) {
 delete next;
 next = ln.next;
 item = ln.item;
 }
 return *this;
 }
 bool operator==(list_node<T> ln) { return (item==ln.item &&
 next==ln.next); }
 bool operator!=(list_node<T> ln) { return (item!=ln.item 

 next!=ln.next); }
 T& operator*() { return item; }
 list_node<T> operator++() { return *(next = next->next); }
 list_node<T> operator++(int) { list_node<T> tmp = *this;
 ++*this; return tmp; }
private:
 list_node<T>* next;
 T item;
};
template< class T >
list_node< T > find_in_list(list_node< T > first, list_node< T > last,
 const T& value)
{
 while (first != last && *first != value) ++first;
 return first;
}
Example 3: Substituting a template parameter for the list_node class and T*
pointers.
template< class I, class T >
I f ind(I f irst, I last, const T& value)
{
 while (f irst != last && *f irst != value) ++f irst;
 return f irst;
}
Example 4: copy can be defined in a way very similar to the generic find.
template <class SourceIterator, class DestinationIterator>
DestinationIterator copy(SourceIterator first, SourceIterator last,
 DestinationIterator result) {
 while (first != last) *result++ = *first++;
 return result;
}
Example 5: (a) The HP remove algorithm; (b) remove_copy.
(a)
template <class ForwardIterator, class T>
ForwardIterator remove(ForwardIterator first, ForwardIterator last, const T&
value) {
 first = find(first, last, value);
 ForwardIterator next = first;
 return first == last ? first : remove_copy(++next, last, first, value);
}

(b)
template <class InputIterator, class OutputIterator, class T>
OutputIterator remove_copy(InputIterator first, InputIterator
last,OutputIterator result, const T& value) {
 while (first != last) {
 if (*first != value) *result++ = *first;
 ++first;
 }
 return result;
}
Example 6: reverse_copy.
template <class BidirectionalIterator, class OutputIterator>
OutputIterator reverse_copy(BidirectionalIterator first,
 BidirectionalIterator last,
 OutputIterator result) {
 while (first != last) *result++ = *--last;
 return result;
}

Listing One
#include<string.h>

#include<vector.h>
#include<iostream.h>
void main()
{
 if (argc != 1) throw("usage: wc\n");
 vector< string, less< string > > words;
 copy(istream_iterator< string >(cin),
 istream_iterator< string >(),
 inserter(words, words.end()));
 cout << "Number of words: " << words.size() << endl;
}

Listing Two
#include<string.h>
#include<set.h>
#include<iostream.h>
void main() 
{
 if (argc != 1) throw("usage: uwc\n");
 set< string, less< string > > words;
 copy(istream_iterator< string >(cin),
 istream_iterator< string >(),
 inserter(words, words.end()));
 cout << "Number of unique words: " << words.size() << endl;
}






































Reusable Binary Associations in C++


A cookie-cutter approach for representing abstract relationships




Terris Linenbach


Terris is a software engineer for Sagent Technology Inc. and can be contacted
at terris@rahul.net.


Understanding relationships between objects is crucial to the success of any
object-oriented software project. Relationships express real-world facts about
objects without delving into computer terminology. For example, "companies"
and "employees" can be related by the statement "an employee works for a
company." Programmers have a much better chance of implementing a design
correctly if relationships are documented in a clear and concise manner, free
of any implementation details.


Association Semantics


Relationships are known as "associations" in the Object Modeling Technique
(OMT). Associations that relate two classes are known as "binary
associations." Three, four, and even more classes can participate in
associations. However, binary associations are the easiest to conceptualize,
and they tend to be the most common. Associations have additional
semantics--multiplicity, bidirectionality, properties, and association
objects. (For more on associations, see Object-Oriented Modeling and Design,
by James Rumbaugh et al., Prentice Hall, 1991.)
Associations can express one-to-one, one-to-many, and many-to-many
relationships between classes; this is known as "multiplicity." Consider a
simple one-to-one relationship (like Figure 1) between a company and its board
of directors, where the board has performed its duties for a particular
duration. A company and a board of directors participate in a simple
association called "has," as in "a company has a board of directors." In this
example, a path exists from a particular company to its board of directors and
from the board of directors to the company. This type of association is called
"bidirectional" since both participants can locate each other.
"Properties" describe aspects of the association. Without the association,
properties are meaningless. In the aforementioned example, "duration" is an
association property--it describes the amount of time that the board has
performed its duties.
Associations can be modeled and implemented as separate objects. An
association "object" in this example would contain a duration field, a pointer
to a company object, and another pointer to a board of directors object.
Associations allow projects to leverage the power of various classes without
intermixing their implementations and creating unplanned dependencies.
Associations thus promote modularity. Instead of building one monolithic class
that does everything under the sun, classes can enlist other classes to store
specific information and carry out specific tasks.


Implementing Associations


The degree of effort required to implement associations depends on the target
language. SQL, for example, requires a properly prepared join between tables.
In C++, programmers typically implement associations using pointers.
A pointer-based implementation of Figure 1 might look like Example 1. There is
a flaw in this implementation, however. Although a board object can locate its
company, the company object cannot locate its board. This is a common
oversight. The fact may well be that the existing code base does not need to
find a board of directors object given a company object. In many cases, this
is just a coincidence and is not a true reflection of the project's needs.
Some programmers take the point of view "this will get us by--we can add the
other pointer later when we need it." However, adding the pointer later means
that the class implementation will change, and depending on the code base,
client code may need to be changed. You should not allow existing code to bias
class implementations. Large systems should implement associations correctly
up front; otherwise, the hasty decision to implement half of the association
may result in extra effort later. 
In a one-to-one association, both objects must point to each other. Pointers
do not enforce this constraint. At run time, this rule is easy to circumvent,
and doing so can lead to code with hidden bugs. Pointer management is even
more important in one-to-many associations.
Dangling pointers can crop up easily at run time. Pointers can point to
deleted objects. Sometimes, pointers should be NULL when they aren't (these
bugs are difficult to track down).Properties should be shared by both the
forward and backward pointers in an association.
Some methodologies include support for "unidirectional" relationships, which
map directly to pointers in C++. However, if your design includes
bidirectional associations, C++ programmers will face challenges. The extra
effort required to fully implement associations must be justified, so
implementers should use their best judgment. In short, don't do it unless you
really have to.


Association Cookie Cutter in C++


In this article, I'll present a design for implementing binary associations.
The design consists of both OMT object-model and dynamic-model diagrams. I'll
then implement the design in C++ (using templates to promote code reuse) and
provide examples that use it. All of the code associated with this article is
available electronically (see "Availability," page 3) and at ftp.rahul.net, in
the pub/terris directory. The source code is freely distributable.
I tried to create a design that would do the following:
Automatically maintain forward and backward pointers.
Support one-to-zero-or-one, one-to-many, and many-to-many relationships.
Allow one class to be associated with many classes. For example, class A can
participate in a one-to-one association with class B, and participate in a
one-to-many association with class C.
Support ordered one-to-many and many-to-many associations, where the order of
the objects on the "many" side is an important design detail.
Support binary association objects.
Support association properties. A property class is specified by the user via
a template. An association can "own" one (and only one) property instance.
When an association is broken, the property instance is deleted (this is
optional behavior).
Be able to specify a container class for one-to-many and many-to-many
relationships, without reliance on a particular container implementation.
Provide type safety.
Provide compatibility with object-oriented databases.
Keep in mind that there are other approaches that could achieve these same
goals. In his article "Automating Association Implementation in C++" (DDJ,
October 1995), David Papurt describes a template-based approach that uses
inheritance to express associations between objects. In his design, a parent
class provides functionality for linking to another object, locating a linked
object, and removing a link; see Example 2(a). The class CompanyBase provides
all company-oriented functionality, without any association details; see
Example 2(b). Finally, the Company class mixes CompanyBase with association
functionality, as in Example 2(c). This approach works well for simple object
models. However, since inheritance is used, this approach is cumbersome when
one class participates in associations with two or more classes. In our
company example, the board of directors participates in a one-to-one
relationship with a company but also participates in a one-to-many
relationship with company executives. Papurt does not directly address
one-to-many or many-to-many relationships (he mentions that the approach can
be used to implement such associations, but doesn't demonstrate how), nor does
he cover association properties and association objects.
Another approach is described in chapter 15 of Object-Oriented Analysis and
Design by Rumbaugh et al. Private methods, such as add_item() and remove_
item(), are defined on the classes that participate in an association. These
functions maintain the backward and forward pointers. However, code reuse is
not optimal and the chapter glosses over the details.


My Solution



The approach I propose is illustrated in Figure 2 through Figure 5. Figure 2
and Figure 3 illustrate the object models, while Figure 4 and Figure 5 show
the dynamic models. Table1 describes the classes and methods of the
association objects ASToOne, ASToOneProp, ASToMany, and ASToManyProp.
DetachMe() is called on destruction to minimize "dangling pointers."
Notice that the ASToOneBase and ASToManyBase classes are indirectly
associated. ASToOneBase has a one-to-zero-or-one relationship with ASAssn, and
ASToManyBase has a one-to-many relationship with ASAssn. This means that:
One ASToOneBase object can have a one-to-zero-or-one relationship with an
ASToOneBase object; this is how one-to-one relationships are implemented.
One ASToOneBase object can have a one-to-zero-or-one relationship with an
ASToManyBase object; this is how one-to-many relationships are implemented.
One ASToManyBase object can have a one-to-many relationship with ASToOneBase
objects; this is how many-to-one relationships are implemented.
One ASToManyBase object can have a one-to-many relationship with ASToManyBase
objects; this is how many-to-many relationships are implemented.
The design consists of two dynamic models, one for one-to-one relationships
and another for one-to-many relationships. The dynamic models show the
behavior of the association objects, and demonstrate how the "forward" and
"backward" pointers are maintained.
Figure 4 shows the behavior for the one-to-one case. First, the object (call
it "A") starts in the "unattached" state. Association objects are linked by
sending one of the objects an Attach event. Sending the object an Attach event
has two results: First, B is sent an iAttach event; then, A points to B. This
ensures that the two objects point to each other.
When A receives a DetachMe event, B is sent an iDetach message. This ensures
that A and B are no longer linked.
Finally, consider the case where A is attached to B, and A receives another
Attach event. Since A is a one-to-one association object, A's left-hand side
(LHS) object cannot be linked to more than one object. So, A sends B an
iDetach message.
The one-to-many dynamic model (Figure 5) is similar to Figure 4. The Attach
and Detach events have the same "message forwarding" behavior. Attach sends an
iAttach event and Detach sends an iDetach event.
In the following discussion, I'll use the terms "left-hand-side" (LHS) and
"right-hand-side" (RHS) to identify the objects in an association. The
left-hand-side is the point of reference; there is only one object on the
left-hand-side of an association. The right-hand-side of the association is
what the left-hand-side object "sees" as it "looks out" through the
association, as if it were a sailor looking through a periscope. In a
one-to-many relationship, the left-hand-side object sees many objects through
the periscope. In a one-to-one relationship, the left-hand-side sees only one
object.
Also note that I'll actually be implementing pseudoassociation objects,
because one association object in this design only tells half of the story;
two pseudoassociation objects are needed to express one association. In other
words, I use the term "association object" when I mean "pseudoassociation
object."


One To One


In Figure 1, both sides of the association are implemented with ASToOneProp
data members. Figure 6 is an instance diagram of the participants. The
Corporation class has its own ASToOneProp object that contains:
A pointer to the corporation, as the LHS pointer.
A pointer to the board of directors' association object, as the RHS pointer.
A pointer to the association properties.
The BoardOfDirectors class has its own ASToOneProp object that contains:
A pointer to the board, as the LHS pointer.
A pointer to the corporation's association object, as the RHS pointer.
A pointer to the association properties.
The RHS pointers point to Association objects, not to "real" objects. This
allows both sides of an association to be updated in one Attach or Detach
operation. Two association objects are needed because Association objects are
not shared on both sides of the relationship--their classes may differ (as in
the case of one-to-many relationships).
Note that there is only one instance that holds the association properties.
This ensures consistency and may conserve memory.


Sample Code


One-to-zero-or-one relationships are fairly easy to implement using the
template classes. Two template classes exist for this purpose, ASToOne and
ASToOneProp. ASToOne accepts two parameters, which are the classes that are
involved in the association. ASToOneProp accepts an additional parameter,
which is the class that contains the association properties; see Listing One.
Listing Two links a company to a board of directors. By default, the
properties object is deleted when the link is broken. This can be overridden.
A board can locate its company using board.CompanyBoardAssn.GetRhsObject() and
a company can locate its board using company.CompanyBoardAssn.GetRhsObject().
The properties for the association can be located starting from the company
object using company.CompanyBoardAssn.GetProperties().m_duration and also
starting from the board of directors object using
board.CompanyBoardAssn.GetProperties().m_duration.


Violating Encapsulation


Notice that the Attach() method accepts an ASAssn object. The code that calls
Attach() accesses a data member in board. Clearly, this violates
encapsulation. All other association implementations have this encapsulation
problem. Some object-oriented practitioners even argue against associations
entirely because of the encapsulation problem. Creating association object
"getters" on company and board is a trivial task, but this is not a complete
solution. The objects are tightly bound by a member name in either case.
Other proposed solutions refer only to entire objects, as in Link( objectA,
objectB ), so they don't need to access data members; however, this violates
one of our design goals, which is to allow one class to participate in many
associations with other classes.
The encapsulation problem is not as detrimental as it seems. First,
associations can be implemented without adding data members to a class,
because ASAssn objects can be declared anywhere in a program. The following
section shows how. However, this approach carries with it the problem of
cataloging association objects and finding them when you need them.
Secondly, subclassing can be used to isolate classes. First, a stand-alone
class is built, free from any associations. Then, a subclass is created that
has association objects as data members. Any class that is reusable without a
particular association should be implemented in this manner. Listing Three
demonstrates this approach.


Association Objects


Instead of embedding data members in the Company and BoardOfDirectors classes,
association objects can be declared as local or global variables; see Example
3(a). Example 3(b) declares local variables for the company and board objects,
while Example 3(c) declares an association object for the company called
company_link. Finally, Example 3(d) declares an association object for the
board called board_link. The company and board are then linked using Example
3(e).
Is this approach best, or should data members be used? I'm not recommending
one method or the other. Both schemes are valid and have their own strengths
and weaknesses.


One-To-Many


Consider the simple one-to-many relationship between a car and its accessories
in Figure 7. The "car" side of the implementation uses an association of type
ASToManyProp, and the "accessory" side of the implementation uses an object of
type ASToOneProp. In Figure 8, the Car object has its own ASToManyProp object
that contains:
A pointer to the car, as the LHS pointer.

A list of objects, each of which contains a pointer to the accessory's
association object and a pointer to the properties for the association.
The Accessory object has its own ASToOneProp object that contains the
following:
A pointer to the accessory, as the LHS pointer.
A pointer to the car's association object, as the RHS pointer.
A pointer to the association properties.


Sample Code


Implementing one-to-many associations requires more work than implementing
one-to-one associations. First, list and iterator classes must be implemented.
The list class inherits from ASListAdapter and the iterator class inherits
from ASIteratorAdapter. These are abstract template classes. Adapters define
an interface. They have no implementation details. Adapter interfaces are
typically implemented by forwarding messages to other objects. See Design
Patterns: Elements of Reusable Object-Oriented Software, by Erich Gamma et
al., Addison-Wesley 1995.
Lists contain different types of items, depending on whether the association
has properties. In the ASToMany case, the list contains ASAssn objects. In the
ASToManyProp case, the list contains ASAssnPropTuple objects. Each
ASAssnPropTuple (Figure 3) object contains a pointer to an ASAssn object and a
pointer to an association properties object. The same list and iterator
implementations can store both types of objects.
The sample code (available electronically) implements two classes, ListTempl
and IteratorTempl. Listing Four is an example of the car and accessory
classes. In Example 4(a), one car is declared along with three accessories.
The car and accessories are then linked together in Example 4(b). Given a car
object, Example 4(c) steps through all of its accessories. Iterate() is a
factory method and returns a pointer to an object that must be deleted. The
code acc1.m_installedIn.GetProperties().m_installationCost demonstrates how to
access the properties, given an accessory object. Example 5 shows code that
iterates through a list of associations depending upon whether associations
have properties. Example 5(a) is code without properties, while 5(b) is code
with properties. Note that the call to GetLhsObject is missing in Example
5(a).
Ordered associations can be implemented in two ways. First, the list itself
can be ordered: Items can be traversed in the order that they were added to
the list; or, a list can be built to sort items based on a key generated from
RHS objects. Secondly, traversal order can be specified explicitly when items
are attached; the iOrder parameter sent to the Attach() method can specify
relative order.


Many-to-Many


Many-to-many relationships can also be implemented. The difference is that
ASToMany (or ASToManyProp) objects appear as data members in both classes. 


Conclusion


Implementing associations is not a trivial task. Programmers should take the
time to design a standard approach for implementing associations; otherwise,
individual programmers will code unique solutions that require their own
debugging and documentation. In this article, I've described an approach for
implementing associations via OMT diagrams and working C++ code. Combining OMT
diagrams with real code can be an effective approach for documenting designs.
Unfortunately, the design does not solve the encapsulation violation problem,
but it does improve your chances of producing reusable code.
Example 1: Pointer-based implementation of Figure 1. 
class Company
{
 public:
 string name;
};
class Board
{
 public:
 Company *pCorporation;
};
Example 2: (a) Parent class provides functionality for linking to another
object, locating linked object, and removing link; (b) CompanyBase provides
company-oriented functionality, without association details; (c) Company class
mixes CompanyBase with association functionality.
(a)
template <class parentClass, class linkClass>class Association : public
parentClass{ linkClass *pLinkedObject; public: link( linkClass * );
getAssociatiedObject();};

(b)
class CompanyBase{ public: string name;};

(c)
class Company : public Association< CompanyBase, BoardOfDirectors >{};
Example 3: (a) Declaring association objects as local or global variables; (b)
declaring local variables; (c) declaring an association object for
company_link; (d) declaring an association object for board_link; (e) linking
company and board.
(a)
class Company{ public: string name;};class BoardOfDirectors{};

(b)
Company company;BoardOfDirectors board;

(c)
ASToOneProp< Company, BoardOfDirectors, CompanyBoardProperties >
company_link(company);

(d)
ASToOneProp< BoardOfDirectors, Company, CompanyBoardProperties
>board_link(board);


(e)
company_link.Attach(board_link, *new Properties(1));
Example 4: (a) One car is declared along with three accessories; (b) car and
accessories are linked; (c) stepping through all of its accessories.
(a)
Car car; Accessory acc1;Accessory acc2;Accessory acc3;

(b)
car.m_hasAccessories.Attach( acc1.m_installedIn, *new CarAccessoryProperties(
500 ) );acc2.m_installedIn.Attach( car.m_hasAccessories, *new
CarAccessoryProperties( 1000 ) );acc3.m_installedIn.Attach(
car.m_hasAccessories, *new CarAccessoryProperties( 7000 ) );

(c)
typedef AS_ITER_ADAPTER_PROP( Car, Accessory, CarAccessoryProperties )
Iterator;Iterator *pIterator = car.m_hasAccessories.GetRhsObjects().Iterate();
for ( ; !pIterator -> AtEnd(); pIterator -> Next()){ printf( "Cost %d\n",
pIterator->GetCurrent().GetProperties() .m_installationCost );}delete
pIterator;
Example 5: Code that iterates through a list of associations. (a) Without
properties; (b) with properties.
(a)
for ( ; !pIterator -> AtEnd(); pIterator -> Next()){ printf( "Many link:
%d\n", pIterator->GetCurrent().i );}

(b)
for ( ; !pIterator -> AtEnd(); pIterator -> Next()){ printf( "Many link:
%d\n", pIterator->GetCurrent().GetLhsObject().i );}
Figure 1: Simple one-to-one relationship.
Figure 2: Object model #1.
Figure 3: Object model #2.
Figure 4: Dynamic model, one-to-one.
Figure 5: Dynamic model, one-to-many.
Figure 6: Instance diagram of the association implementation. This type of
diagram should not be specified in a real design. Lines with arrows represent
pointers.
Figure 7: Simple one-to-one relationship. 
Figure 8: Instance diagram of the association implementation. This type of
diagram should not be specified in a real design. Lines with arrows represent
pointers.
Table 1: Classes and methods of association objects.
Method ASToOne/ASToOneProp ASToMany/ASToManyProp
Attach(object) Attaches the LHS object to Attaches the LHS object to the
passed-in object.
 the passed-in object.
 If either object is already
 attached, the previous
 attachments are revoked.
DetachMe() Detaches the LHS and RHS Detaches the LHS object
 objects. from all of the objects
 it is attached to.
Detach(object) <<Not present>> Detaches the LHS object
 from a specific RHS
 object.
GetLhsObject() Returns the LHS object. Returns the LHS object.
GetRhsObject() Returns the object that is <<Not present>>
 associated with the
 LHS object.

Listing One
// Forward
class Company;
class BoardOfDirectors;
class CompanyBoardProperties;
class Company
{
 public:
 Company() : CompanyBoardAssn ( *this )
 {
 }
 string name;
 ASToOneProp< Company, BoardOfDirectors, CompanyBoardProperties > 
 CompanyBoardAssn;
};
class BoardOfDirectors

{
 public:
 BoardOfDirectors() : CompanyBoardAssn ( *this )
 {}
 ASToOneProp<BoardOfDirectors, Company, CompanyBoardProperties> 
 CompanyBoardAssn;
};
class CompanyBoardProperties
{
 public:
 CompanyBoardProperties( const long duration )
 : m_duration( duration )
 {}
 long m_duration;
};

Listing Two
Company company;
BoardOfDirectors board;
// Attach the board and the company - the board has served for 5 months
company.CompanyBoardAssn.Attach( board.CompanyBoardAssn, 
 *new CompanyBoardProperties( 5 ) );

Listing Three
Class CompanyPure
{
 public:
 string name;
};
class BoardOfDirectorsPure
{
};
class Company : public CompanyPure
{
 public:
 Company() : CompanyBoardAssn ( *this )
 {
 }
 ASToOneProp< Company, BoardOfDirectors, CompanyBoardProperties > 
 CompanyBoardAssn; // You can use any name you want
};
class BoardOfDirectors : public BoardOfDirectorsPure
{
 public:
 BoardOfDirectors() : CompanyBoardAssn ( *this )
 {}
 ASToOneProp<BoardOfDirectors, Company, CompanyBoardProperties> 
 CompanyBoardAssn; // You use any name you want
}

Listing Four
// Forward
class Car;
class Accessory;
class CarAccessoryProperties;
typedef AS_LIST_PROP( ListTempl, 
 Car, Accessory, CarAccessoryProperties ) List;
class Car
{

 public:
 Car() : m_hasAccessories( *this, *new List )
 {
 }
 ASToManyProp< Car, Accessory, CarAccessoryProperties >
 m_hasAccessories;
};
class Accessory
{
 public:
 Accessory() : m_installedIn( *this )
 {}
 ASToOneProp<Accessory, Car, CarAccessoryProperties> m_installedIn;
};
class CarAccessoryProperties
{
 public:
 CarAccessoryProperties( long cost )
 : m_installationCost( cost )
 {}
 long m_installationCost;
};









































An Asynchronous Design Pattern


A pattern for managing concurrency--written in Java!




Allan Vermeulen


Allan is chief technical officer for Rogue Wave Software. He can be contacted
at alv@roguewave.com.


Synchronous functions communicate with their callers by returning an object.
The IOU design pattern I present here generalizes this to the asynchronous
case--instead of a function returning an object, it returns an object IOU. The
IOU is an agreement from a supplier (the function you called) that it will
eventually close the IOU by providing the promised object. Once the IOU is
closed, you can redeem the IOU for the object that the supplier provided. In
the meantime, your code can continue to do useful things.
Two things make the IOU concept useful. First, it's simple. Calling a function
that returns an IOU is just like calling any other function. Second, it is
completely independent of the asynchronous mechanism (if any) used by the
supplier. For example, code using IOUs looks exactly the same regardless of
whether the supplier closes the IOU by launching a separate thread, talking to
another process, making a remote procedure call, or waiting for a human to
type a response into a dialog box.
All code in this article was written in Java. You can see it in action by
pointing a Java-enabled browser at http://www.roguewave.com/~alv/iou. Of
course, if the code was written in a different language, it would look a
little different. For example, in C++, you would probably make IOU a template
class, IOU<T>, so that it could be redeemed for a strongly typed object.


Using the IOU Class


The IOU class has four basic operations. The closed() function returns True if
the IOU can be immediately redeemed. The standBy() function waits until the
IOU is redeemable. The redeem() function returns the owed object. If the IOU
is not closed, redeem() waits until the IOU is closed. Finally, the
addCallback(f) function arranges for the function f to be called when the IOU
is closed. The function has signature f(obj), where obj is the object referred
to by the IOU.
The simplest way to demonstrate IOUs is with an example. Example 1 sends four
Messenger objects to fetch four Dossier objects, then computes the size of all
the dossiers put together.
Each call to getDossier() doesn't start until the previous one has completed,
so the code takes about four times as long as fetching a single message. Since
fetching a dossier takes a few seconds (the messenger might have to go across
town), this is not good.
Example 2 shows how you could program this using IOUs. This time, all four
messengers go out at once. The program takes only as long as the slowest
messenger.


Adding Callbacks


Often, you'll want to do something as soon as an IOU is redeemable. For
example, you might want to display the cover of the dossier as soon as it's
available. Example 3 shows how to do this without IOUs.
The simplest way to do this with IOUs would be to get all of them and then
redeem each one, displaying the image right after calling redeem. The problem
is that while you're waiting to redeem John's IOU, Ringo's may become
redeemable. Since you are still waiting for John, Ringo's image won't be put
up as soon as possible.
IOU callbacks provide a way around this problem. Each IOU has an associated
list of callbacks that will be executed as soon as the IOU is redeemable.
Example 4 displays the covers as soon as the IOU can be redeemed. Listing One
defines the interface for IOU callback objects.


Supplying IOUs


Figure 1 shows the key classes in the IOU pattern. The key abstraction from
the IOU client's point of view is the IOU class shown in Listing Two. The IOU
is simply a lightweight handle to the supplier's key abstraction--the Escrow
object. It manages the object promised by the IOU, manages the supplier's
synchronization policy and serves as the interface between the supplier and
the client. The Escrow class shown in Listing Three is abstract: The
synchronization policy is determined by how the particular subclass you
instantiate implements the abstract functions standBy and redeem. 
A SynchEscrow manages the synchronization by providing no synchronization. You
must close escrow, by supplying the promised object, before building an IOU
that refers to the escrow. The IOU is immediately redeemable. This escrow
synchronization strategy is useful mainly as a placeholder for synchronous
code that you may want to make asynchronous later. It is also useful for
debugging.
When you construct a ThreadEscrow, you pass it a handle to the supplier code
(in C++, this could be a pointer to a function; in Java, it is an object that
satisfies a particular interface). The ThreadEscrow starts a new thread and
runs the supplier code in the new thread. While the supplier code is running,
you build an IOU that refers to the escrow and returns the main thread of
control to the client. Once the supplier code finishes executing, the result
of the computation is stored in the escrow, the escrow is closed, and the
thread's execution is complete. The thread will be joined back to the main
thread when standBy or redeem are called. ThreadEscrow is given in Listings
Four and Five.
When a DialogEscrow is used, the asynchronous behavior is provided by a human
filling in a dialog box. The escrow is associated at construction time with a
nonmodal dialog. Once the escrow is created, IOUs can be generated, and the
flow of control can be returned to the IOU client. Pressing OK on the dialog
generates an event which, using the standard GUI mechanisms, returns control
to the escrow. The escrow is then closed, and any outstanding IOUs can be
redeemed.


IOU Groups


Sometimes you'll want to deal with many IOUs (with an IOU group) at once
instead of one IOU at a time. Using an IOU group, you can wait for all IOUs in
the group to be enabled, or wait for some number of IOUs in the group to be
enabled.
A key design constraint for the IOU pattern was that IOU groups could be built
using only the public IOU interface--an IOU group should not need to know
which type of Escrow the IOUs in the group are using. This was the initial
motivation for the callback functionality in IOU. By using callbacks, the IOU
group can arrange to be signaled when there is a change in any of the IOUs in
the group.


Streaming Models


You can either redeem an IOU in total, or it is not yet ready; there is no
middle ground. In some asynchronous applications, this is not a valid
assumption. For example, when receiving images you may be able to do useful
things to the image before it has been completely received. In this case the
IOU pattern does not apply. Instead, a streaming pattern (such as that
described by Gabe B. Beged-Dov et al., in an OOPSLA '95 workshop paper
entitled "The Pipeline Design Pattern") might be more appropriate.



Summary


The IOU pattern is a simple, intuitive way to add asynchronous behavior to
applications. By adding interfaces that deliver IOUs, instead of simply
supplying objects, you can achieve concurrency without explicitly dealing with
concurrency primitives. Since the IOU pattern is designed independent of any
particular concurrency mechanisms, it is applicable to many different
concurrency methods, from event-driven systems to distributed systems to
threaded systems.
Figure 1: Classes in the IOU pattern.
Example 1: Fetching dossiers synchronously.
// john, paul, george, ringo are Messenger objects
Dossier d1 = john.getDossier();
Dossier d2 = paul. getDossier();
Dossier d3 = george. getDossier();
Dossier d4 = ringo. getDossier();
int l = d1.length()+d2.length()+d3.length()+d4.length();
Example 2: Fetching dossiers using IOUs.
IOU iou1 = john. getDossierIOU(); // First get IOUs
IOU iou2 = paul. getDossierIOU();
IOU iou3 = george. getDossierIOU();
IOU iou4 = ringo. getDossierIOU();
Dossier d1 = (Dossier)iou1.redeem(); // Redeem the IOUs
Dossier d2 = (Dossier)iou2.redeem();
Dossier d3 = (Dossier)iou3.redeem();
Dossier d4 = (Dossier)iou4.redeem();
int l = d1.length()+d2.length()+d3.length()+d4.length();
Example 3: Showing the covers without IOUs.
// display1 et al are screen display panels
Dossier d1 = john.getDossier();
display1.show(d1.cover);
Dossier d2 = paul. getDossier();
display2.show(d2.cover);
Dossier d3 = george. getDossier();
display3.show(d3.cover);
Dossier d4 = ringo. getDossier();
display4.show(d4.cover)
Example 4: Using callbacks to redeem IOUs.
// display1 et al satisfy IOUCallback interface,
// so display1.run(dossier) draws the Dossier on
// the screen
IOU iou1 = john.getDossierIOU();
iou1.addCallback(display1);
IOU iou2 = paul. getDossierIOU();
iou2.addCallback(display2);
IOU iou3 = george. getDossierIOU();
iou3.addCallback(display3);
IOU iou4 = ringo. getDossierIOU();
iou4.addCallback(display4);

Listing One
/* File: IOUCallback.java */
/**
 * An object to be run when an IOU is closed. Each Escrow has a list
 * of these objects. They get fired when the IOU's escrow is closed.
 * @see IOU, Escrow
 * @author: alvis, 1/26/96
 */
package iou;
public interface IOUCallback {
 public void run(Object o);
};


Listing Two
/* File: IOU.java */
package iou;
/**
 * An IOU is a token that you can later redeem for an actual object.
 * IOUs are typically returned from functions that build their results
 * asynchronously.
 * An IOU refers to an Escrow object. The Escrow is responsible for
 * obtaining and holding on to the object referred to by the IOU.
 * @author Alvis, 2/7/96
 */
public class IOU extends Object {
 private Escrow escrow_;
 /**
 * Check if the IOU can be immediately redeemed. If this returns
 * true, then none of IOUs member functions will block. If it
 * returns false then standBy and redeem will likely block.
 */
 public boolean closed() {return escrow_.closed();}
 /**
 * Wait until the object is ready. This will cause this thread of
 * execution to block.
 * @exception InterruptedException When the thread is interupted before
 * the IOU is enabled.
 */
 public void standBy() throws InterruptedException {escrow_.standBy();}
 /**
 * Get the object promised by the IOU.
 * This will block if the IOU has not been enabled yet.
 * @exception InterruptedException When the thread is interupted before
 * the IOU is enabled.
 */
 public Object redeem() throws InterruptedException {return escrow_.redeem();}
 /**
 * The run(Object) object interface in the IOUCallback
 * will be invoked when the IOU is enabled. The object passed
 * as a parameter to the callback is the IOU's value.
 * The callback is normally run in the same thread in which the
 * escrow is enabled.
 * If the IOU is already enabled, then it will be run immediately.
 */
 public void addCallback(IOUCallback callback)
 {
 escrow_.addCallback(callback);
 }
 /**
 * Remove a callback that was previously added. If the callback
 * is not on the callback list it is ignored.
 */
 public void removeCallback(IOUCallback callback)
 {
 escrow_.removeCallback(callback);
 }
 /**
 * Normally, you receive IOUs as return values from functions, you
 * don't build them yourself. If you are giving out IOUs, you should
 * obtain them from Escrow.iou.
 */
 public IOU(Escrow e) {escrow_ = e;} 

}

Listing Three
/* File: Escrow.java */
package iou;
/**
 * Escrow mediates the transaction between someone with an IOU, and
 * the party that gave out the IOU. Before you can give out an
 * IOU, you must create an Escrow. The IOU will refer to the Escrow
 * for all of its functionality. You must (eventually) enable the 
 * Escrow before any IOU which refer to this Escrow can be redeemed.
 * This is an abstract base class. The mechanism used to wait until
 * the Escrow is enabled is implemented by overriding the function
 * block(). The default mechanism for holding the object is to
 * keep a copy in a variable in the Escrow. It is expected that
 * some subclasses will override enabled(), value(), enable(), and
 * block() to change this policy.
 * @see IOU
 * @author Alvis, 1/25/96 and 3/19/96
 */
public abstract class Escrow extends Object {
 private boolean closed_; //can IOUs be redeemed without blocking?
 private java.util.Vector callbackList;//fire when ready
 /**
 * Constructor.
 */
 public Escrow()
 {
 closed_ = false;
 callbackList = new java.util.Vector();
 }
 /**
 * Return an IOU which refers to this Escrow.
 * The normal method of using an Escrow is to create the Escrow, and
 * then use this method to give out IOUs.
 */
 public IOU iou() {return new IOU(this);}
 /**
 * Close the IOU. Once this is called, then redeem() and standBy()
 * must return without blocking.
 * Closing the IOU will run any callbacks which have been queued up.
 * Closing escrow more than once has no effect.
 * There can only be one first time.
 */
 synchronized protected void close()
 {
 closed_ = true;
 Object value = redeemAfterClose();
 // Fire the callbacks in the order they were provided.
 // The callback list has no elements if we have already closed escrow.
 for(int i=0; i

Listing Four
/* File: ThreadEscrow.java */
package iou;
/**
 * A framework for building IOUs by putting the supplier in its own thread.
 * A ThreadEscrow computes its value in a thread that it controls.
 * You pass the escrow an ObjectMaker when you construct it.

 * ObjectMaker is a simple interface; it requires a run function which
 * builds up the object.
 * The Escrow builds a thread and runs the ObjectMaker in that thread.
 * When the ObjectMaker returns, its value is put in Escrow and the
 * thread exits.
 * The thread uses a ThreadEscrowWorker class as its Runnable object.
 * The ThreadEscrowWorker knows about the Escrow, and can therefore
 * stuff the object produced by the ObjectMaker into the Escrow.
 * The standBy() function needs to wait for the thread that was launched
 * in the constructor. This should be possible using Thread.join(),
 * but unfortunately Netscape has a bug where Thread.join() blocks
 * forever (it has been reported, see
 * http://www.roguewave.com/~alv/javaThreadBug.html for an example).
 * To get around this, the ThreadEscrowWorker has a waitForRunToFinish()
 * function.
 * @author: Alvis, 1/26/96
 */
public class ThreadEscrow extends Escrow {
 private Thread thread_; // started in the constructor
 private ThreadEscrowWorker worker_; // the Runnable action in the thread
 Object value_; // at closing, the value is set here
 /**
 * Start up a new thread to supply the object. Once the
 * object is supplied, the new thread will set value_ and
 * close escrow.
 */
 public ThreadEscrow(ObjectMaker maker)
 {
 worker_ = new ThreadEscrowWorker(this,maker);
 thread_ = new Thread(worker_);
 thread_.start();
 }
 /**
 * Waits until the worker thread has completed. When the thread 
 * has completed it will set value_.
 */
 public void standBy() throws InterruptedException
 {
 // thread_.join(); - doesn't work due to Netscape bug - see top of file
 worker_.waitForRunToFinish();
 if (!worker_.isFinished()) {
 throw new InterruptedException();
 }
 }
 /**
 * Redeem the object. If necessary, this will wait for the thread to
 * finish computing the object.
 */
 public Object redeem() throws InterruptedException
 {
 standBy(); // if value_ is already set, this comes back fast
 return value_;
 }
}
/**
 * Do the work of supplying an object to a ThreadEscrow.
 * This class manages a thread which runs the function which supplies
 * the object to the ThreadEscrow. It should only be created by
 * a ThreadEscrow.

 */
class ThreadEscrowWorker extends Thread {
 ThreadEscrow escrow_; // my creator
 ObjectMaker maker_; // what I'm supposed to do
 boolean runFinished_; // has run finished executing?
 ThreadEscrowWorker(ThreadEscrow e, ObjectMaker m)
 {
 maker_ = m;
 escrow_ = e;
 runFinished_ = false;
 }
 public boolean isFinished() {return runFinished_;}
 
 public synchronized void run()
 {
 Object value = maker_.run();
 escrow_.value_ = value;
 runFinished_ = true;
 escrow_.close();
 notify(); // threads may be waiting in waitForRunToFinish() 
 }
 synchronized void waitForRunToFinish()
 {
 while (isFinished()==false) {
 try {
 wait(); // run() will call notify
 }
 catch(InterruptedException e) {}
 }
 notify(); // There may be other threads waiting in waitForRunToFinish()
 }
} 
 

Listing Five
/* File: ObjectMaker.java */
/**
 * A simple interface for objects that produce an object.
 * @author: Alvis, 1/26/96
 */
package iou;
public interface ObjectMaker {
 public Object run();
}



















VHDL for Hardware Design


The softening of hardware design




Phil Tomson


Phil is a senior software engineer with Cypress Semiconductor. He can be
reached at ptkwt@teleport.com.


While software engineers have been experimenting with visual-programming
methods, hardware engineers have been moving away from graphical schematics,
toward language-based methodologies that resemble the traditional
software-design approach. This shift was sparked by the development of
logic-synthesis tools (see the accompanying text box entitled "Logic
Optimization and Synthesis") and hardware-description languages (HDLs) such as
VHDL and Verilog.
HDLs allow hardware designs to be described at high levels of abstraction,
freeing the designer from details that were inherent in the schematic-based
method. The move from the schematic-based approach to HDLs in hardware design
is comparable to the move from assembly languages to high-level languages in
the software domain. Just as adding two floating-point numbers in a high-level
language is trivial compared to performing the same operation in assembly
language, using a "+" sign in an HDL is much easier than, say, drawing all the
logic gates involved in an adder. This freedom from low-level details has
generally made hardware engineers more productive.


A History of Acronyms


The Department of Defense (DoD) developed VHDL, short for "VHSIC Hardware
Description Language;" (VHSIC is short for "Very High Speed Integrated
Circuit") in the early 1980s in cooperation with IBM, Texas Instruments, and
Intermetrics. The DoD wanted a standard HDL that all of its vendors could use
when submitting designs for defense contracts. This would allow designs to be
portable and easy to reuse, as well as provide a standard form of
documentation. The language became an IEEE standard in 1987 known as "IEEE Std
1076-1987," and was updated in 1993 to "IEEE Std 1076-1993."
VHDL was derived from Ada and is similar in syntax and structure. Support for
design hierarchy is provided by the LIBRARY and USE statements used in
conjunction with Ada's PACKAGE concept.


VHDL Design Example


The best way to understand VHDL is through a practical design example that
illustrates the language's basic concepts. My example includes an interface
between the PC's parallel printer interface and some type of high-speed
data-acquisition system that can generate several million 8-bit data samples
per second. It is actually part of a larger design I intend to use in a
PC-hosted, 8-channel logic analyzer. 
On the PC side, the printer interface is actually composed of three separate
ports, each with its own I/O address. The output data port is 8 bits wide and
accessed by writing data to I/O address 378H. The status port is 5 bits wide
and read only. It is accessed by reading I/O address 379H. The 4-bit control
port is an output port accessed by writing to address 37AH.
The problem to overcome in this example design is the sample-rate limitation
of the PC's parallel port. Several million 8-bit samples per second will
certainly be overwhelming, especially since each sample requires two reads of
the status port, because only 5 bits of input are available in this interface.
The solution is to use an external, high-speed, dual-port static random-access
memory (SRAM) to buffer the sampled data; see the block diagram in Figure 1.
An SRAM with 20-ns cycle times can acquire up to 50 million samples per
second. After the acquired data is stored in the SRAM, it can be read at a
leisurely pace through the PC's printer interface, which is capable of
reading, at best, a few hundred-thousand bytes per second.
As Figure 1 illustrates, the design will need the following components to
interface to the SRAM:
A loadable address counter. This will allow reading the SRAM data starting at
any address. A 16-bit counter will address 64 KB of data, which should be
sufficient.
An 8- to 4-bit multiplexer. Data comes from the SRAM in 8-bit chunks, but only
5 bits are available in the printer interface for input. The multiplexer
allows the 8-bit data from the SRAM to be read in two 4-bit increments. This
leaves one bit of input available; it will be used to determine if the
data-acquisition system is finished acquiring data.
Tristate-able output buffers on the data bus to the SRAM. This allows data to
be written to the SRAM and creates a bidirectional data bus. This feature is
added for the sake of flexibility; it allows some communication to the
data-acquisition system. The user could specify a location or block of
locations in the SRAM that the data-acquisition system reads and acts upon
before taking samples.
The SRAM interface has been defined, but the design still needs some mechanism
for indicating to the data-acquisition module that it is free to begin taking
samples and writing data into the SRAM. To do this, the arbitration logic
block (see Figure 1) will generate a signal called free. 
There are four outputs available from the control port of the printer
interface used to perform various functions in the design. The address counter
can either be loaded with a value from the output port or incremented. Two
signals, strobe and load, indicate which function to perform. When load is
active (1), the address counter is loaded with data on the rising edge (the 0
to 1 transition) of strobe; otherwise, the counter increments on the rising
edge of strobe. The address counter has 16 bits, but only 8 bits can be
written from the printer interface; a third signal from the control port,
upper/lower, determines which part of the address counter gets loaded. One
control port signal remains, which will be used as a read/write signal to the
SRAM: When writing to the SRAM, this signal will allow data out of the
tristate buffers; otherwise these buffers will be in the high-impedance state
(Z), thus releasing the bus.
All four outputs of the control port have been used, but some method of
switching between the upper and lower four bits of the eight-bit sample data
from the SRAM is still needed. The upper/lower signal can be used for this
without conflicting with its use in loading the address counter.


VHDL Design Implementation


Listing One is the VHDL implementation of this design. The LIBRARY ieee
declaration tells the VHDL compiler about a library called ieee and makes it
visible within the description that follows. Most VHDL tools (whether logic
synthesizers or simulators) have a built-in, IEEE-defined library. The USE
declaration works like its counterpart in Ada or like #include in C; it tells
the VHDL tool where to look for packages that contain functions or types that
may be needed in the following code. The std_logic_1164 package called for in
the first USE statement contains the std_logic type our design uses. Type
std_logic represents a nine-valued logic system. Signals (which represent
wires in a circuit) of type std_logic can take on any of the nine values
defined in the std_logic type. In my example, only three of the nine std_logic
values are used: 0 (logical 0), 1 (logical 1), and Z (the tristate value).
Next comes the ENTITY declaration. This is where the design's external
interface is defined. Input and output signals that go to, or are generated
by, the design are declared in the PORT list. Each pin declared in the port
list has a direction and type. When the design is mapped into a device, these
signals will represent the actual I/O pins. The first pin declared in the PORT
list is r_wn. The direction of r_wn is IN, meaning that this is an input pin,
and is of type std_logic. 
Further down in the PORT list, we find a declaration for data that has a
direction of INOUT and type std_logic_vector. Vectors represent collections of
signals that are somehow related, such as a data bus. The std_logic_vector
type is defined in the std_logic_1164 package as an array of type std_logic.
In this case, you are declaring an 8-bit data bus. The INOUT indicates that
this is a bidirectional bus used to pass data to, or read data from, the
external SRAM. The other buses in the design are wr_data, 8-bit data from the
printer interface's output data port; rd_data, 4-bit data that goes back to
the printer interface's status port; and addr_out, 16-bit address bus to the
SRAM.
The signal free has a direction of BUFFER. When a port output signal needs to
be referenced internally, the BUFFER direction is used. This allows the value
of free to be visible inside of the design, and also allows its value to
appear on the external free pin.
After the ENTITY declaration comes the ARCHITECTURE declaration. This is where
the entity's internal behavior is described. Notice that there are some signal
declarations before the BEGIN statement. These are signals internal to the
architecture and will not appear on any external pins of the device. In this
case, you are declaring the 16-bit address counter as an internal vector of
signals. Notice, also, that there are two ALIAS declarations here. These allow
you to refer to the upper eight bits of the address counter as
addr_counter_hi, and the lower eight bits of the address counter as
addr_counter_lo; this makes the code more readable.
Statements between the architecture's BEGIN and END statements are handled
very differently from the way they would be in typical programming languages.
Since you are modeling a piece of hardware in which there can be a large
degree of parallelism, all statements in this section are executed
concurrently ("executed" in this context refers to the way a VHDL simulator or
synthesis tool would interpret them, since VHDL descriptions are not compiled
into stand-alone executables). This important point separates HDLs from normal
sequential programming languages. A hardware design consists of a collection
of logic gates (AND, OR, XOR, and the like), each of which processes data from
its inputs and passes on results to its outputs. It is best to describe such a
system concurrently as opposed to sequentially; hence, the reason for
concurrent statements in HDLs.
The first concurrent signal assignment, rdn_out <= NOT r_wn, means that the
inverted value of r_wn gets assigned to rdn_out, which is the active low read
signal to the SRAM. The next statement is a conditional signal assignment to
rd_data that implements a multiplexer where either the upper or lower four
bits of data from the SRAM are selected, depending on the value of the signal
u_ln. A Conditional signal assignment is also used for the data bus, but in
this case, it is being used to implement a tristate-able bus instead of a
multiplexer. In the data bus, when the signal r_wn is 1, indicating that we
are reading data from the SRAM, the tristate value, Z, is assigned to data
that releases the bus and allows the SRAM to output data without conflict.
(The OTHERS clause is used to assign a specified value to each signal in a
vector of signals.) Since these signal assignments are concurrent statements,
they could have been placed in any order in the body of the design
description.
There is a case where statements are executed sequentially in VHDL just as
they would be in most programming languages. This occurs within a PROCESS. The
example design has two processes: address_ counter and arbitrate. The
statements within a process are executed when signals within the sensitivity
list change value. The sensitivity list comes after the PROCESS statement and
is a list of signals between parentheses. In the address_counter process, the
sensitivity list contains the signals strobe and reset. Whenever the value on
strobe or reset changes, the address_ counter process will execute. Processes
are usually used to model memory elements--such as flip-flops or latches--or
state machines.
In our example, the address process implements the address counter for our
design. If the reset signal is 1, all outputs of the internal address counter,
represented by the vector signal addr_counter, are assigned the value 0. The
statement ELSIF (strobe''vent AND strobe='1') means that the rest of the
actions that are to take place within the address process can only happen if
there is a change in the value of the strobe signal and that change makes the
value of strobe '1'. (When the 'event attribute is applied to a signal, the
Boolean value True is returned if the signal has changed value.) This defines
strobe as the rising-edge clock for a synchronous, loadable counter with an
asynchronous reset.
The arbitrate process has the same sensitivity list at the address_counter
process. When reset is 1, the free signal is assigned the value 0. When free
is 1, it tells the data-acquisition equipment that it can now take samples and
write to the SRAM. We reset the value of free to 0 so that our device
initially has control of the SRAM. Since there are no remaining, unused
control-port signals, we need to come up with another method of controlling
the free signal. We do this in the arbitrate process by checking the values of
the addr_cntr, wr_data, load, and r_wn signals on the rising edge of the
strobe signal. If load is 1, (indicating that we are loading data into the
address counter), r_wn is 1 (indicating that the bidirectional data bus is
tristated), and all the individual signals of the wr_data and addr_counter
buses are 1, then the value of free is toggled. Since free starts out in the
inactive state after a reset, we have to first load FFH into addr_counter_lo
and addr_counter_hi and then (while keeping load 1 and wr_data FFH) toggle
strobe from 0 to 1 while rd_wn is high in order to get free to become active
(1). free will then remain active until strobe is again toggled from 0 to 1.
During the time that free is active, the PC program that controls the
interface will need to monitor the busy signal from the data-acquisition
module. When the module is no longer busy, free can easily be toggled back to
its initial inactive state (0) by making the strobe signal go from 0 to 1
again. Though this method for controlling the free signal might seem somewhat
arbitrary, it does not cause any loss of access to any SRAM locations; if you
want to read data from address FFFFH, it is still possible if the load signal
is inactive (0).


Simulation



When the VHDL-design specification is complete, simulation gives you a way of
ensuring that it is functionally correct. Use a VHDL simulator that will allow
you to specify input values to the model and check the outputs for correct
behavior.
The waveforms in Figure 2 show the results of simulating the example design.
(In this case, I used the V-System VHDL simulator from Model Technology of
Beaverton, OR.) After the power-on reset is finished, the address counter is
loaded with FFH. In the next strobe cycle, the free signal becomes active
until the fifth strobe cycle toggles free back to 0. strobe cycles 6 through 9
simulate the reading of data from the SRAM. The values 70H, 71H, 71H, and 73H
are read from locations 0000 through 0003. The signal u_ln is toggled with
strobe during these cycles so that the value on rd_data is multiplexed between
the upper and lower nibble of data from the SRAM for each location being read.
In the last two cycles, the values 45H and 46H are written to locations 0004
and 0005, respectively.
After the design is validated by simulation, it can then be taken through the
logic synthesis process. After synthesis, the design is simulated again to
check timing and to ensure that functionality is still correct.


Conclusion


While the design example presented here certainly does not contain all
possible VHDL constructs, it illustrates the major concepts involved in
designing hardware with the language.
The use of HDLs in conjunction with simulation and logic synthesis provides a
powerful design methodology that is becoming popular in the
hardware-engineering community. In many ways, the HDL design methodology blurs
the distinction between hardware and software design; software engineers are
often employed to create HDL models, because it is a methodology they can
easily relate to.


References


Bhasker, Jayaram. A VHDL Primer, Englewood Cliffs, NJ: Prentice Hall, 1992.
Gadre, Dhananjay. "Parallel-printer Adapters Find New Use as Instrument
Interfaces." EDN (June 1995).
Logic Optimization and Synthesis
Since logic gates take up real estate in devices, and since there is a direct
correlation between chip size and cost, there has long been the need to ensure
that logic equations are minimal. Originally, the process of Boolean
minimization was done "manually" with "hand tools" such as Karnaugh maps and
state tables. As digital designs grew, this manual logic-minimization process
became a headache. This led to the development of automated logic-optimization
software tools such as Espresso, from the University of California, Berkeley.
These tools use many different types of algorithms and heuristics to minimize
Boolean logic functions. 
The term "logic synthesis" is applied to the whole process of converting a
design specification into a form that can ultimately be realized as an actual
piece of hardware--the target device. While the steps of this process and the
terminology involved vary from one vendor to another, they usually follow this
flow:
1. Design compilation. Taking the user's source code and converting it into an
internal format that describes the Boolean equations that represent the design
intent. Syntax is also checked in this step.
2. Logic optimization. Minimizing the equations from step 1.
3. Device fitting or technology mapping. Minimized equations from step 2 are
mapped into the target device's architecture.
The third step (technology mapping) allows the synthesis tool to efficiently
target designs to many different types of devices. A gate array may have what
is known as a "sea-of-gates" architecture (a two-dimensional array or grid of
logic gates) in which the basic cell is a two-input NAND gate, while a PLD's
(programmable logic device) basic cell might be quite complex and would
include a flip-flop and perhaps several multiplexers or logic gates. What
might be a minimal realization of the design in a fine-grained gate-array
architecture may not be minimal in a PLD and vice versa. The technology mapper
(or "fitter") is supplied with information about the target device's internal
structure and is able to determine an efficient implementation based on this
information.
After technology mapping, more processing may be needed before the design can
be realized in an actual hardware device. For instance, if the target device
is a gate array, logic elements will need to be placed on a two-dimensional
grid, and signal wires will need to be routed between them. This is generally
not considered part of the logic synthesis process, however, and it is not
needed for PLDs.
--P.T.
Figure 1: Block diagram for the example design.
Figure 2: Form results of simulating the example design.

Listing One
--parallel port bidirectional interface
LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE work.std_arith.all ; -- overloaded '+' is from this package
ENTITY port_con IS
 PORT ( --pins defined here
 r_wn : IN std_logic ; --read when 1/write when 0
 u_ln : IN std_logic ; --upper when 1/lower when 0
 strobe : IN std_logic ; --strobe
 load : IN std_logic ; --load 
 busy : IN std_logic ; --data acq device controls bus when active
 reset : IN std_logic ;
 wr_data : IN std_logic_vector(7 DOWNTO 0);--data from PC 
 data : INOUT std_logic_vector(7 DOWNTO 0);--data in/out of RAM
 rdn_out,wrn_out: OUT std_logic ;--read/write signals out to RAM
 free : BUFFER std_logic ;--tells acq device that it can write data
 addr_out : OUT std_logic_vector(15 DOWNTO 0);--address to RAM
 rd_data : OUT std_logic_vector(3 DOWNTO 0)--4bits data to status port
 );
END port_con;
ARCHITECTURE behave OF port_con IS
SIGNAL addr_counter : std_logic_vector(15 DOWNTO 0);
ALIAS addr_counter_lo : std_logic_vector(7 DOWNTO 0) 
 IS addr_counter(7 DOWNTO 0);
ALIAS addr_counter_hi : std_logic_vector(7 DOWNTO 0) 
 IS addr_counter(15 DOWNTO 8);
BEGIN
 ------------concurrent statements-------------

 rdn_out <= NOT r_wn ;
 rd_data <= data(3 DOWNTO 0) WHEN (u_ln='0') ELSE
 data(7 DOWNTO 4);--implements 8:4 bit mux on data bus
 wrn_out <= r_wn ;
 data <= wr_data WHEN (r_wn = '0') ELSE
 (others => 'Z') ; --implements tristate buffer
 addr_out <= addr_counter; 
 -------------------------------------------
 address_counter : PROCESS (strobe, reset)
 BEGIN
 ---seqentially executed statements here----
 IF (reset = '1') THEN
 addr_counter <= (OTHERS=>'0');
 ELSIF (strobe'event AND strobe='1') THEN
 IF (load='1' AND u_ln ='0') THEN --load lower 8bits of addr count
 addr_counter_lo <= wr_data; 
 ELSIF (load='1' AND u_ln='1') THEN--load upper 8bits of addr count
 addr_counter_hi <= wr_data;
 ELSE
 addr_counter <= addr_counter+1;--increment address counter
 END IF;
 END IF;
 END PROCESS address_counter;
 arbitrate : PROCESS (strobe, reset)
 BEGIN
 IF (reset = '1' ) THEN
 free <= '0' ;
 ELSIF(strobe'event AND strobe='1') THEN
 IF(load = '1' AND wr_data = "11111111" AND 
 addr_counter = "1111111111111111" AND 
 r_wn = '1' ) THEN --toggle free
 free <= NOT free ;--this is why free direction is BUFFER
 END IF; 
 END IF;
 END PROCESS arbitrate ;
END behave ;



























Removing Blocking Network I/O From Windows Programs


Save your code when moving to Windows 




George F. Frazier and Derek Yenzer


George, a senior software development engineer at Farallon Computing, can be
reached at georgef@farallon.com. Derek is a software development engineer at
Farallon and can be contacted at dereky@farallon.com.


Deciding whether to use blocking or nonblocking network I/O is an essential
step in the design of Windows programs that use BSD-style sockets, such as
those provided by the Winsock interface. A socket function that takes an
indeterminate amount of time to complete is said to "block." 
Sockets were designed for UNIX, and if needed, the operating system can
preempt a blocked task and begin running another program. But 16-bit Windows
3.x can't preempt a task, so calling a blocking function puts all other
programs on hold until the call returns. In 16-bit Windows, it is better to
use nonblocking socket function calls that return immediately. When these
nonblocking functions complete, a notification message is posted to the
application's message queue.
Unfortunately, in the rush to bring Internet applications to Windows, many
UNIX programs were directly ported. Insufficient consideration was given to
the inherent differences between Windows and UNIX. Blocking I/O crept into
other programs as a result of poor design decisions. Besides causing
performance degradation and locking out other applications, blocking I/O in
16-bit Windows can lead to serious reentrancy issues. Sprinkling PeekMessage
loops throughout blocking code won't adequately fix most problems arising from
the use of blocking calls. To keep messages flowing, modeless dialogs and
keyboard accelerators require special handling in every place you insert
PeekMessage loops. In addition, your application cannot handle the special
processing that other applications require. Fixing the problems that arise
from blocking network I/O is nontrivial.
Porting to Win32 is one solution. Since Windows NT and Windows 95 can preempt
a suspended task, blocking I/O no longer locks out other applications. But
only true 32-bit programs benefit from preemptive multitasking; 16-bit
programs running in Windows 95 cannot be preempted. Also, to really take
advantage of Win32's network enhancements, multiple threads should be used in
all but the simplest programs. In a multithreaded application, if one of your
program's threads is preempted, another can proceed. If you only recompile
your 16-bit program for Win32 it will run in a single thread, and the entire
application will be suspended when a network I/O function call blocks. Using
multiple threads is great if your program only needs to run in Windows NT or
Windows 95 (and you are willing to spend the time to redesign it for this
much-different paradigm), but if your application also has to run in Windows
3.x, you have only one real choice--remove the blocking network I/O function
calls. In this article, we'll present a method for doing this without
rewriting the application from scratch.


Network I/O in 16-Bit Windows


Although our focus is on reading from and writing to TCP/IP sockets, the
concepts apply to other transports that provide blocking and nonblocking
versions of network I/O to a reliable stream. To use sockets in Windows, it is
necessary to make calls to a driver that implements the TCP/IP protocols. The
most commonly used API for 16-bit Windows is Winsock 1.1. The advantage of
writing to the Winsock specification is that your program will be able to
dynamically link with any WINSOCK.DLL provided by a TCP/IP vendor.
To communicate reliably between two processes, a connection-oriented stream
provided by a TCP/IP socket is established between the processes. Then data is
sent across the connection as a stream of bytes. The traditional BSD socket
operations send() and recv() are used to send data to and receive data from a
host. Whether or not calls to these functions block is determined by the mode
in which the socket was created. Winsock extends the BSD standard to include a
special set of nonblocking functions that begin with the prefix WSAAsync. By
creating a nonblocking socket and using the WSAAsynchSelect() function call,
an application will receive messages informing it when it can send or receive
data.
Because most of the overhead of using Winsock has been abstracted in the
Microsoft Foundation Classes, we've used the CAsyncSocket and CSocket classes
in our example program. CSocket derives from CAsyncSocket and simply pumps
messages until the operation completes or an error occurs. We've derived a
TAsyncSocket class from CAsyncSocket and a TSyncSocket class from CSocket. All
the code is written in C++.
The TAsyncSocket class abstracts the message-based nature of Winsock into an
interface that notifies the caller when a particular I/O operation completes.
To perform a nonblocking read, you call the TAsyncSocket::Read member function
and tell it how many bytes to read, where to store the data, which function to
call when the operation completes, and two reference constant (refcon) values.
A refcon is a user-defined value that is passed back to the completion routine
when the I/O operation finishes (so appropriate actions can be taken after the
call completes). The completion function must have three parameters: the two
refcon values and the result value for the I/O operation. The
TAsyncSocket::Write operation functions in a similar manner.
As implemented, the TAsyncSocket class permits only one outstanding read and
one outstanding write, but could easily be extended to allow multiple
read/write requests.


Watch Me Draw


Our sample application, Watch Me Draw, is designed to provide an example of
how blocking I/O can be used in a client/server scenario. Once a connection is
established, the side that initiated the connection becomes the "drawing"
side. The other end of the connection is the "watching" side. The user on the
drawing side can draw points and lines on the dialog. Those graphic objects
are then transmitted to the watching side, which duplicates the drawing
commands on its dialog. To keep the application as simple as possible, there
is no code to handle repainting the dialog or to compensate for different
dialog sizes in various video modes. In addition, error handling is minimal.
The protocol consists of three possible wire messages sent from the drawing
side to the watching side: done, pixel, and line (see Figure 1). The done
message signifies that the drawing side is closing the connection and, thus,
the watching side should close its end. The pixel message tells the watching
side that a PIXEL_DATA structure will follow. Finally, the line message tells
the watching side that a LINE_DATA structure will follow.


Implementation Using Blocking I/O


The drawing side responds to the user's mouse button clicks and performs the
appropriate drawing on its dialog box. Once a transmit structure is filled
out, the appropriate wire message is sent over the connection, followed by the
structure containing the data. Listing One presents three synchronous
functions that transmit the information.
Once a connection is established, the watching side issues a synchronous read
for a wire message. Once received, if it is a message for a pixel or line,
another read is issued to fill in the appropriate structure. The pixel or line
is drawn on the watching side's dialog box to duplicate the activity on the
drawing side. Then, another read for a wire message is performed and the
process begins again. The series of synchronous reads terminates when the done
message is received. Listing Two contains the basic structure of the watching
side's implementation.


Removing the Blocking Calls


Because nonblocking network I/O calls return immediately, code that depends on
the results of a blocking call will not work in a nonblocking paradigm. In
Example 1, Recv() is a blocking call. Since the if statement depends on a
result obtained by the blocking call (namely, the value of i), making the
Recv() call nonblocking will break the code.
If recursion and loops are ignored, we can think about the execution paths
through functions as trees with the first statement executed at the root and
the last statement executed at the leaf. Figure 2 illustrates the execution
path tree of Example 1.
When analyzing a blocking program to convert, look for blocking calls that
occur at nonleaf positions in the execution path of a function. In our
example, the Recv() call is at the root of the tree, not at a leaf position.
To convert blocking programs to nonblocking programs you must finesse your
code to assure that all nonblocking calls appear as leaves in your execution
path trees, and that the paths through your functions containing nonblocking
calls contain no loops. You must eliminate loops in the old program that
contained blocking calls.
After you've changed your socket initialization so that your network I/O
operations won't block, you have to solve three main problems. The first is to
guarantee that all nonblocking calls appear at leaf positions in any
function's execution path. In Figure 3(a), the function func1 contains a
single blocking call potentially preceded and followed by other groups of
statements (represented as Processing Code Blocks 1 and 2). To convert func1,
the blocking call becomes a nonblocking call that passes func2 (the callback
function that will be called when the operation completes); see Figure 3(b).
The code in Processing Code Block 2 has been moved to the body of func2. Thus,
if Processing Block 2 depends on a result obtained by the nonblocking call,
that result will be available after the call completes.
But how does func2 get access to local variables or parameters from func1?
This is the second issue that needs to be resolved during the conversion.
Figure 4(a) adds a few details to our generic func1: parameters x and y, and
local variables ch and Arr. Since we are splitting up code that once was in a
single function (and had access to the parameters and local variables of that
function), we need to either pass the necessary variables to the nonblocking
calls (so that they can be sent in as parameters to the callback function that
calls the rest of the code) or provide another method for this data to be
accessed. Passing a potentially lengthy list of arguments to the nonblocking
functions makes the new nonblocking code unnecessarily complex and increases
the chances you'll overlook something. A better solution is to define a local
variable structure for each nonblocking function call containing parameters
and local variables needed for code that will be executed by the nonblocking
operation's callback function. Figure 4(b) shows the local variable struct for
func1.
Figures 5(a) and 5(b) show how the local variable structure is used in the
final nonblocking code. The last problem, removing loops that contain blocking
calls in the original program, also is solved here. Again func1 is shown with
a blocking call in a For loop. The nonblocking version of this code contains
two new functions, func2 and func3. In the new func1, the local variable i has
been encapsulated into a structure accessible to all three functions. In
func1, it is simply initialized and func2 is called. In func2, the termination
condition of the loops is tested and the initial code is executed (Processing
Block 1) before the nonblocking call is made. When that call completes, func3
is called. This increments the index variable, executes the code that followed
the original blocking call (Processing Block 2), and calls func2 again,
simulating the loop in the original code.
We've used this general approach to remove the blocking calls from Watch Me
Draw. Programs that implement the converted asynchronous transmit and receive
portions of the application are available electronically; see "Availability,"
page 3.


Conclusion



Although removing blocking calls from shipping code may seem a daunting task
at first, we've found that the investment more than pays for itself in
increased reliability, robustness, and performance. Also, we've found that
adhering to a method such as this is much better than simply diving into the
code and haphazardly attempting to remove the blocking calls. As in all
reengineering efforts, a small amount of planning will save headaches down the
road.
Example 1: Recv() is a blocking call.
char i;
Recv(s, &i, sizeof(char), NULL);
if (i != '0')
{
 ErrorHandler();
}
else
{
 ComputeResult(i);
}
Figure 1: Wire messages used in the Watch Me Draw application.
done (char)

pixel (char)
x (int)
y (int)
color (COLORREF)

line (char)
x1 (int)
y1 (int)
x2 (int)
y2 (int)
color (COLORREF)
nPenWidth (int)
Figure 2: Execution path tree for Example 1.
Figure 3: Forcing all nonblocking calls to leaf positions in a function's
execution tree. (a) Blocking version; (b) nonblocking version.
(a)
void func1(void) 
{ 
// Processing Code Block 1 
BlockingCall(); 
// Processing Code Block 2
}

(b)
void func1(void) 
{ 
 // Processing Code Block 1
 NonBlockingCall(, func2);
} 
void func2(void) 
{ 
 // Processing Code Block 2
 }
Figure 4: A struct for storing parameters and local variables for a function
with blocking calls.
(a)
void func1(int x, float y) 
{ 
 char ch; 
 int Arr[10]; 
 // Processing Block 1
 BlockingCall(x, y, ch, Arr);
 // Processing Block 2
}


(b)
typedef struct func1 
{ 
 int param_x; 
 float param_y; 
 char local_ch; 
 char local_Arr[10];
} FUNC1STRUCT; 
Figure 5: Removing loops with blocking calls. (a) Blocking version; (b)
nonblocking version.
(a)
void func1(void)
{
 int i;
 for (i = 0; i < 10; i++)
 {
 // Processing Block 1
 BlockingCall();
 // Processing Block 2
 }
}

(b)
void func1(void) 
{ 
 func1struct.local_i = 0; 
 func2(); 
} 
void func2(void) 
{ 
 if (func1struct.local_i < 10)
 { 
 // Processing Block 1 
 NonBlockingCall(, func3); 
 } 
} 
void func3(void) 
{ 
 // Processing Block 2
 func1struct.local_i++; 
 func2(); 
}

Listing One
BOOL TMainDialog::TransmitDoneMessage()
{
 HEADER_DATA toTransmit;
 
 toTransmit.message = kchDoneMessage;
 if(m_CommunicationSocket.Send(&toTransmit, 
 sizeof(toTransmit)) != sizeof(toTransmit))
 {
 return(FALSE);
 }
 return(TRUE);
} // TMainDialog::TransmitDoneMessage
BOOL TMainDialog::TransmitPixelMessage(LPPIXEL_DATA lpData)
{
 HEADER_DATA toTransmit;

 
 toTransmit.message = kchPixelMessage;
 if(m_CommunicationSocket.Send(&toTransmit, 
 sizeof(toTransmit)) != sizeof(toTransmit))
 {
 return(FALSE);
 }
 if(m_CommunicationSocket.Send(lpData, 
 sizeof(PIXEL_DATA)) != sizeof(PIXEL_DATA))
 {
 return(FALSE);
 }
 return(TRUE);
} // TMainDialog::TransmitPixelMessage
BOOL TMainDialog::TransmitLineMessage(LPLINE_DATA lpData)
{
 HEADER_DATA toTransmit;
 
 toTransmit.message = kchLineMessage;
 if(m_CommunicationSocket.Send(&toTransmit, 
 sizeof(toTransmit)) != sizeof(toTransmit))
 {
 return(FALSE);
 }
 if(m_CommunicationSocket.Send(lpData, 
 sizeof(LINE_DATA)) != sizeof(LINE_DATA))
 {
 return(FALSE);
 }
 return(TRUE);
} // TMainDialog::TransmitLineMessage
------------------------ receive code ------------------------
void TMainDialog::ReadSocketData()
{
 HEADER_DATA toRead;
 
 if(m_CommunicationSocket.Receive(&toRead, 
 sizeof(toRead)) != sizeof(toRead))
 {
 // ... handle error ...
 return;
 }
 while(toRead.message != kchDoneMessage)
 {
 if(!ProcessMessage(toRead.message))
 {
 // ... handle error ...
 return;
 }
 if(m_CommunicationSocket.Receive(&toRead, 
 sizeof(toRead)) != sizeof(toRead))
 {
 // ... handle error ...
 return;
 }
 }
 // ... normal close ...
} // TMainDialog::ReadSocketData
BOOL TMainDialog::ProcessMessage(char chMessage)

{
 BOOL bConnectionOK = FALSE;
 switch(chMessage)
 {
 case(kchPixelMessage):
 bConnectionOK = ProcessPixelMessage();
 break;
 case(kchLineMessage):
 bConnectionOK = ProcessLineMessage();
 break;
 }
 return(bConnectionOK);
} // TMainDialog::ProcessMessage
BOOL TMainDialog::ProcessPixelMessage()
{
 PIXEL_DATA toRead;
 if(m_CommunicationSocket.Receive(&toRead,sizeof(toRead)) != sizeof(toRead))
 {
 return(FALSE);
 }
 // ... draw pixel ...
 return(TRUE);
} // TMainDialog::ProcessPixelMessage
BOOL TMainDialog::ProcessLineMessage()
{
 LINE_DATA toRead;
 if(m_CommunicationSocket.Receive(&toRead,sizeof(toRead)) != sizeof(toRead))
 {
 return(FALSE);
 // ... draw line ...
 return(TRUE);
} // TMainDialog::ProcessLineMessage

Listing Two
void TMainDialog::ReadSocketData()
{
 HEADER_DATA toRead;
 
 if(m_CommunicationSocket.Receive(&toRead,sizeof(toRead)) != sizeof(toRead))
 {
 // ... handle error ...
 return;
 }
 while(toRead.message != kchDoneMessage)
 {
 if(!ProcessMessage(toRead.message))
 {
 // ... handle error ...
 return;
 }
 if(m_CommunicationSocket.Receive(&toRead,
 sizeof(toRead)) != sizeof(toRead))
 {
 // ... handle error ...
 return;
 }
 }
 // ... normal close ...
} // TMainDialog::ReadSocketData

BOOL TMainDialog::ProcessMessage(char chMessage)
{
 BOOL bConnectionOK = FALSE;
 switch(chMessage)
 {
 case(kchPixelMessage):
 bConnectionOK = ProcessPixelMessage();
 break;
 case(kchLineMessage):
 bConnectionOK = ProcessLineMessage();
 break;
 }
 return(bConnectionOK);
} // TMainDialog::ProcessMessage
BOOL TMainDialog::ProcessPixelMessage()
{
 PIXEL_DATA toRead;
 
 if(m_CommunicationSocket.Receive(&toRead,sizeof(toRead)) != sizeof(toRead))
 {
 return(FALSE);
 }
 // ... draw pixel ...
 return(TRUE);
} // TMainDialog::ProcessPixelMessage
BOOL TMainDialog::ProcessLineMessage()
{
 LINE_DATA toRead;
 
 if(m_CommunicationSocket.Receive(&toRead,sizeof(toRead)) != sizeof(toRead))
 {
 return(FALSE);
 }
 // ... draw line ...
 return(TRUE);
} // TMainDialog::ProcessLineMessage



























Examining Borland Delphi 2.0


Support for client/server databases, multithreaded apps, and more




Ted Faison


Ted is president of Faison Computing, a firm that develops Delphi, C++
applications, class libraries, and software components for Windows. Ted can be
reached at 76350.1013@compuserve.com.


When compared to Version 1.0, Borland's recently released Delphi 2.0 is a big
step forward. Delphi 2.0 includes enhanced support for OLE automation, MAPI
applications, and Windows 95 controls. One of the biggest differences,
however, is that Delphi 2.0 generates 32-bit applications, delivering a
substantial performance increase over 16-bit Delphi apps. But because Delphi
2.0 is a 32-bit environment, it doesn't run under Windows 3.1, only Windows 95
and Windows NT. (Borland says it will continue to support 16-bit Delphi 1.0
until sometime next year.) Delphi 2.0 comes in three configurations: 
Delphi Desktop, the basic system.
Delphi Developer, which provides an object repository (for sharing reusable
forms and data modules), scalable data dictionary (for defining/reusing
extended field attributes), and ODBC support. 
Delphi Client/Server Suite, which includes SQL Explorer (for editing server
metadata), SQL Monitor (for tuning performance), and the SQL version of the
ReportSmith report writer.
The Client/Server Suite, which I'll focus on in this article, also comes
integrated with the PVCS version-control system, allowing team programmers to
work on the same files without undoing each other's work. Other additions to
Delphi include C-like double slashes to begin a single-line comment, the
Visual Component Library (a collection of 100 drag-and-drop tree views,
sliders, progress bars, rich-text editing, list views, and status bar
controls). The Client/Server Suite includes source code for the library.


The Integrated Development Environment


Application development programs are expected to integrate the various phases
of code editing, compiling, and linking. The Delphi IDE (see Figure 1) does
this, and goes one step further. Because Delphi supports user-interface design
with parallel code generation, it also has a Forms Designer that works in
tandem with the Code Editor window. There is a button on the toolbar that lets
you flip back and forth between a form's graphical view and its source code.
If you add controls to a form, the code window is updated immediately and
automatically.
Delphi projects usually entail multiple forms and code units, and Delphi makes
it easy to manage the code units by keeping them all on a single tabbed
window. Changing a unit is a one-click operation, and all the names of all
open units are always visible on the tab bar. Most other programs employ the
older MDI approach, where you switch between files using the Window menu.
The Delphi IDE is good but not perfect. With so many overlapping, independent
windows sharing the screen--an Object Inspector, Code Editor, Forms Designer,
main toolbar, and so on--the IDE can be a nuisance. I find it much easier to
manage a single main window subdivided into moveable subpanes, as in Microsoft
Visual C++ or even Borland C++ ClassExpert. The Object Inspector is nice for
small projects, but inconvenient to use to get around in a larger project.
(What might be nice is some kind of object-navigation outline pane, where you
could see a complete hierarchy of all the objects in a project.)
The tabbed toolbar is nice, given the number of tools provided by Delphi.
Because you can install your own tools, it is possible to run out of space on
the toolbar, so Borland provided a set of scroll bars. You can also create new
tabs on the tool bar, so you can organize your tools any way you want. 
The integrated debugger also is easy to use. The tool bar has buttons to
start, pause, step into, and step over code. A step out button is not
available. You would use that button when you're single stepping inside a
function and want to return to the calling function. Most IDEs have a step out
command, including Borland C++. Why the Delphi team left it out is a mystery. 
Something else I miss is an interactive tooltip for object inspecting at debug
time. When you put a breakpoint in your code, you often want to inspect the
value of a variable. Visual C++ pioneered the concept of tooltip inspection,
where you move the mouse over a variable. If the variable is scalar and in
scope, a tooltip window automatically appears, showing the variable's value.
In Delphi, you have to right-click the variable, then chose the
Evaluate/Modify command from the popup menu.


Delphi Reusability


Delphi was designed to make components reusable. For example, assume you
create a form that has special tools to control image effects when
manipulating TIFF images. In Delphi, you can place the form in a special
Component Repository (stored components organized into pages), making it
available for subsequent reuse. To do so, first you create the form, then use
the Project Add to Repository command. You select a page for your form, a
name, description, and author; then the form is stored. The repository is
displayed in response to a File New... command, appearing as a tabbed dialog
box.
You can add your own pages to the repository, effectively turning it into a
software-component library. Once a component is in the repository, it can be
reused in three ways: 
Copying the source code into your project.
Using it as a base class for a new component.
Referencing its code--without copying it--in your project.
Almost anything can be reused through the repository. The only essential
requirement is that the object be a Visual Component Library (VCL) component,
meaning that it must be derived from the Delphi VCL class TComponent. You can
even reuse an entire application--Delphi's way of creating new
projects--rather than using some kind of Project Expert or Wizard. 
There is no limit to the complexity of forms in the Delphi repository. For
instance, say that you need a form like the About box in the repository, but
you need to add a Help button to it. In C++ you're out of luck--you can't
inherit and make changes to dialog templates. You have to create a new
template by copying the original form, being careful to match all the child
IDs of the reused controls, so you can reuse the C++ code attached to the
dialog.
It's much simpler in Delphi. You derive a form from the About Box form, and
simply add the Help button graphically to it. The resulting derived form
inherits the contents of the base form. Changing the base class form affects
all forms derived from it, just like with ordinary classes. It gets even
better. When I added the Help button to my derived form, I adjusted the
position of the inherited OK button, so the buttons would be centered in the
form. Delphi inherited the original OK button and overrode its position. 


Built-in Multithreading Support 


Many programmers shy away from threads because of the complexities of thread
synchronization and maintenance. Delphi makes thread programming simple for
many applications, hiding the details of thread start-up, synchronization, and
termination. 
Threads often are the solution to user-interface problems. Say you need to
perform some lengthy operations, such as obtaining records from a database or
printing a file. If you build the code straight into the app, users have to
wait until the operation terminates to continue with the program. If you want
to let users abort the operation by clicking a Cancel button, you need to add
a loop that periodically checks the button. With multithreaded code, you
create a worker thread to handle the lengthy operation, and you monitor the
Cancel button in the main thread. If the user hits the Cancel button, the main
thread terminates the worker thread. 
For example, assume you have a form called TMyForm with a Start button to kick
off some operation and an Abort button to terminate it. The operation itself
will be performed by a class derived from the VCL class TThread, declared in
Listing One. The Execute method, where your code gets invoked, overrides the
virtual function by the same name in the base class. To use the new thread,
you add an instance of it to the form or class that will control the thread;
see Listing Two. To start the lengthy operation, you add a handler for the
start button; see Listing Three. To abort the operation, you just call the
thread's Terminate method in the handler for the form's Abort button; see
Listing Four.


Programming at a High Level


Delphi uses VCL components to completely encapsulate the Windows API,
providing a high-level programming interface. You might say that Delphi is to
C++ what C++ is to C. The result is that you can get the job done much easier
and with less code. Consider the process of drawing something on the screen.
All Delphi components that are displayable inherit a method called Paint. You
override this method and use a special Delphi Canvas object to draw on. Canvas
is a high-level abstraction of a Windows device context. It handles all the
drudgery of creating, selecting, deselecting, and destroying GDI objects for
you. To draw a circle, for instance, you add a Paint method to your form or
component and draw on the Canvas; see Listing Five.

You can change the attributes of the GDI objects, using simple assignment
statements on their properties. To create a navy-blue pen, you use the
statement Pen.Color := clNavy;. You don't even have to create the pen, because
Delphi does it for you automatically. Because Color is a property of TPen,
changing its value invokes an accessor function that takes care of perfunctory
Windows details automatically.
Even though Delphi encapsulates the Windows API, you can still get under the
hood to handle low-level or unsupported Windows features. If you want to hook
the message dispatcher, a simple WndProc handler will take care of it. Say you
want to intercept mouse clicks and typed characters for a form. The WndProc
might look something like Listing Six. Calling the inherited method lets the
base class Delphi handler process other events in the default way. Using a
WndProc is considered a low-level message-handling scheme. For all Windows and
internal Delphi messages, the proper way to install a message handler is
through the message keyword. To add a handler for the WM_LBUTTONDOWN message,
you add a class method like procedure WMLButtonDblClk(var Message:
TWMLButtonDown); message WM_LBUTTONDBLCLK;.
Before adding a message handler to a class, make sure Delphi hasn't already
provided an event-handler hook. For most commonly used messages and events,
Delphi provides easy entry points, using the Events page of the Object
Inspector. The Events page is usually the best way to add message handlers to
a VCL component. To provide a handler that is invoked when your button is
clicked, enter a handler name in the OnClick field. The handler can have any
name you want. By double clicking the empty field, Delphi will create a
default name, based on the event type and name of the control. Delphi creates
an empty handler with correctly formatted arguments in the parameter list.
Listing Seven is the default handler for the OnClick event. All event handlers
are passed a pointer to the object that sent the message.


Database Support


For database programming, Delphi includes a complete set of VCL components to
handle everything from simple table viewers to multiple-join queries to
triggers and stored procedures. The heart of the architecture is the Borland
Database Engine (BDE), which serves a role similar to the Microsoft ODBC
Driver Manager. When developing database applications, you typically populate
forms with a series of database-aware controls, like editboxes, listboxes, and
grids. The TTable and TQuery components encapsulate the details of connecting
with a database. A TDataSource component manages the mapping of database
fields to data-aware components. Using a VCL navigator tool (which appears as
a set of VCR buttons), you can move the cursor in the result set. Figure 2
shows the relationship between an application and database components.
All database access is through the BDE, including access to ODBC data sources.
Compared to direct ODBC access, accessing ODBC data sources incurs a slight
performance hit because of the pass-through of commands from the BDE DLL to
the ODBC Driver Manager DLL. For operations that produce multiple records, the
overhead is negligible. If your app makes lots of single-record operations,
some degradation may be perceptible. 
Access to all client/server SQL databases is through SQL Links, a Borland DLL
that uses vendor-specific drivers to talk to the underlying databases. I
tested Delphi with Oracle 7.2, the current version, using Personal Oracle.
(InterBase is Borland's SQL database. It is a full 32-bit client/server
database that competes against Oracle and Sybase SQL Server.) After getting
Oracle running with Delphi, everything worked, although a real application
will need to provide custom handlers for the multitude of database exceptions
that may originate (otherwise Delphi will display opaque message boxes).
Accessing Oracle data was just a matter of adding a TTable or TQuery
component, sprinkling in a TDataSource, and putting database controls on a
form. Oracle log-on was supported automatically. Delphi 2.0 is configured to
work with Oracle 7.1, but I had to use the BDE Configuration Utility to change
a number of low-level settings to get the BDE to work with Oracle 7.2.
While the Delphi side was easy, it took time to get the BDE to connect to an
Oracle 7.2 database, mainly for lack of documentation of exactly what changes
to make and where. I kept getting the message "Vendor Initialization failed:
ORANT71.DLL". A call to Borland revealed that I needed to use the BDE Config
tool to blank out the Server Name field, change the Net Protocol to the value
"2," and alter the vendor init entry to ORA72.DLL. Yeah, I know, it should
have been obvious....


User-Interface Development


Developing user interfaces for database apps requires laying out controls that
are tied to database columns. With grid controls, picture controls, and
graphic elements, it's difficult to lay everything out correctly unless you
can see how the actual database data looks on the form. Borland implements the
concept of "live data" with Delphi, which enables data-aware controls to
connect to the underlying database right at design time. You don't have to
compile or run any code to see the results in a form. 
The steps for creating live data forms are straightforward. Create a TTable or
TQuery component on the form. You set its DatabaseName property to reference
your database. You set the TableName to indicate which table in the database
you want to use, and set the Active property to True. That's it. Anything
attached to the TTable or TQuery after that will get data from the database
table at design time. To access the data, you create a TDataSource component
and set its DataSet property to the name of the TTable or TQuery component to
use. 
You can scroll through the database data with the grid scrollbar to see all
the records located in a table or returned by a query. Seeing your data is not
only convenient for layout purposes, it lets you verify immediately whether
the form is retrieving the data you really want. All the database controls
support live data.
By default, connecting a grid to a data source makes the grid create columns
for all the fields returned by the associated TTable or TQuery. The Object
Inspector provides a Columns property that gives you access to a full Column
editor, allowing you to eliminate columns you don't need. The editor also lets
you change the column titles, widths, read-only status, color, and more. You
can also make the grid control display check boxes or drop-down comboboxes in
a grid field. 


Report Writing


Many Delphi applications use simple forms with data-aware controls and grids
to show database data, but medium and large applications nearly always need
reports to present information. Delphi has a built-in report generator, called
"ReportSmith", that enables you to create formatted reports in a completely
graphical environment. ReportSmith lets you create reports that have formatted
text fields, pictures, and graphical elements such as boxes and lines. The
program has a Wizard that knows how to create columnar, crosstab, form, and
label types. Starting from a default report created by the Wizard, you can
create reports often without programming. For specialized reports, there is an
integrated SQL editor that lets you enter your own SQL Select statements.
ReportSmith produces fine printed reports. You can also export the report data
in a number of file formats, including Excel, Lotus 1-2-3, and Quattro Pro. I
was disappointed that rich-text format (RTF) wasn't supported because I often
include parts of reports in word-processing documents. Nor can you
cut-and-paste to select parts of a report, because ReportSmith doesn't allow
you to copy parts of a report to the clipboard. On the other hand, ReportSmith
does produce labels, something many high-end report writers don't do.


Writing a Multithreaded Program


To give you a taste of Delphi programming, I'll present a sample multithreaded
program (available electronically; see "Availability, page 3). With C or C++,
you'd expect to deal with lots of low-level Windows API calls to write a
multithreaded app. Of course you can get down and dirty in Delphi, but in most
cases, only if there's no alternative. 
My multithreaded program, called "Bounce," has three objects bouncing around
inside separate regions of a window. Each object is a black square with a
colored body. As it moves, it leaves a black wake, eventually painting over
the entire bounce region. Figure 3 shows how Bounce looks after running
awhile.
Each object starts off in a random position, moving downward and to the right.
When it hits the boundary of the enclosing region, it bounces off and
continues. The Start and Stop buttons control the three threads. The Stop
button suspends them, the Start button makes them resume. Each object is
animated by a separate thread, implemented as a class derived from the VCL
class TThread; see Listing Eight. The constructor for class
TThreadBouncingObject randomizes the initial position for each bouncing
object, and saves a pointer to the containment bounce region. The thread
execution is carried out by the Execute method, which runs an infinite loop
moving the object. At the end is a Sleep call, which suspends the thread
briefly, to slow down the bouncing objects: On my 100-MHz Pentium, the objects
were bouncing around so fast I couldn't tell what they were doing.
The main form creates three side-by-side regions of type TPanel, which are
used as the containment boxes for the bouncing objects. The thread objects are
created inside the FormCreate member, and destroyed in the FormDestroy member.
These two functions are similar to regular constructors and destructors,
except they are called as handlers by Delphi. The code for the main form is
shown in Listing Nine.
Objects are created in two phases in Delphi. The declaration itself isn't
enough, as in C++. You actually have to call an object's constructor to
initialize it. The arrangement is similar to the way GDI objects are handled
by Microsoft Visual C++, where you declare things like CPens and CBrushes, and
subsequently must invoke their Create members.


Conclusion


The support for database exceptions, live data, grids, and BLOBs make Delphi
2.0 nearly an ideal database-development environment. Delphi makes it possible
to develop local database applications that can easily be converted into full
client/ server apps. 


For More Information


Borland International 
100 Borland Way
Scotts Valley, CA 95066
800-331-0877
http://www.borland.com
Figure 1: The Delphi IDE.
Figure 2: The architecture of a Delphi database application.
Figure 3: Running the Bounce program.

Listing One

TMyThread = class(TThread)
protected
 procedure Execute; override;
public
 constructor Create(your parameters} );
 end;

Listing Two
type
 TMyForm = class(TForm)
private
 LengthyOperation: TMyThread;
 end;

Listing Three
procedure TMyForm.StartButtonClick(Sender: TObject);
begin
 LengthyOperation:= TMyThread.Create( your parameter list);
end;

Listing Four
procedure TMyForm. AbortButtonClick(Sender: TObject);
begin
 LengthyOperation.Terminate;
end;

Listing Five
procedure TShape.Paint;
begin
 with Canvas do
 Rectangle(100, 100, 200, 200);
end;

Listing Six
procedure TMyForm.WndProc(var Message: TMessage);
begin
 case Message.Msg of
 WM_LBUTTONDOWN, WM_CHAR:
 {do something}
 Exit;
 end;
 inherited;
end; 

Listing Seven
procedure TForm1.Button1Click(Sender: TObject);
begin
end;

Listing Eight
unit ThreadTest;
interface
uses
 Windows, Messages, SysUtils, Classes, Graphics, 
 Controls, Forms, Dialogs, ExtCtrls;
type
 TThreadBouncingObject = class(TThread)
 private
 ContainmentBox: TPaintBox;

 BoundsCheckBox: TRect;
 P: TPoint;
 XIncrement, YIncrement: Integer;
 procedure UpdateScreen;
 protected
 procedure Execute; override;
 public
 constructor Create(Box: TPaintBox);
 end;
const
 WIDTH = 10;
 HEIGHT = 10;
implementation
{$R *.DFM}
constructor TThreadBouncingObject.Create(Box: TPaintBox);
begin
 inherited Create(True);
 ContainmentBox := Box;
 BoundsCheckBox := Rect(0, 0, Box.Width - WIDTH, Box.Height - HEIGHT);
 P.X := Random(ContainmentBox.Width - WIDTH);
 P.Y := Random(ContainmentBox.Height - HEIGHT);
 XIncrement := 1;
 YIncrement := 1;
end;
procedure TThreadBouncingObject.Execute;
begin
 repeat
 begin
 {update the X position}
 P.X := P.X + XIncrement;
 if (XIncrement > 0) and (P.X >= BoundsCheckBox.Right) or
 (XIncrement < 0) and (P.X <= BoundsCheckBox.Left) then
 XIncrement := -XIncrement;
 
 {update the Y position}
 P.Y := P.Y + YIncrement;
 if (YIncrement > 0) and (P.Y >= BoundsCheckBox.Bottom) or
 (YIncrement < 0) and (P.Y <= BoundsCheckBox.Top) then
 YIncrement := -YIncrement;
 UpdateScreen;
 Sleep(10); {pause briefly}
 end
 until Terminated;
end;
procedure TThreadBouncingObject.UpdateScreen;
begin
 ContainmentBox.Canvas.Rectangle(P.X, P.Y, P.X + WIDTH, P.Y + HEIGHT);
end;
end.

Listing Nine
unit TestForm;
interface
uses
 Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
 ExtCtrls, ThreadTest, StdCtrls;
type
 TForm3Threads = class(TForm)
 Panel1: TPanel;

 PaintBox1: TPaintBox;
 PaintBox2: TPaintBox;
 PaintBox3: TPaintBox;
 ButtonStart: TButton;
 ButtonStop: TButton;
 procedure FormDestroy(Sender: TObject);
 procedure ButtonStartClick(Sender: TObject);
 procedure FormCreate(Sender: TObject);
 procedure ButtonStopClick(Sender: TObject);
 private
 Object1: TThreadBouncingObject;
 Object2: TThreadBouncingObject;
 Object3: TThreadBouncingObject;
 public
 { Public declarations }
 end;
var
 Form3Threads: TForm3Threads;
implementation
{$R *.DFM}
procedure TForm3Threads.FormDestroy(Sender: TObject);
begin
 Object1.Terminate;
 Object1.Free;
 Object2.Terminate;
 Object2.Free;
 Object3.Terminate;
 Object3.Free;
end;
procedure TForm3Threads.ButtonStartClick(Sender: TObject);
begin
 Object1.Resume;
 Object2.Resume;
 Object3.Resume;
end;
procedure TForm3Threads.FormCreate(Sender: TObject);
begin
 Randomize; {for the object initial positions}
 Object1 := TThreadBouncingObject.Create(PaintBox1);
 Object2 := TThreadBouncingObject.Create(PaintBox2);
 Object3 := TThreadBouncingObject.Create(PaintBox3);
end;
procedure TForm3Threads.ButtonStopClick(Sender: TObject);
begin
 Object1.Suspend;
 Object2.Suspend;
 Object3.Suspend;
end;
end.










































































Instantiating Code Patterns


Patterns applied to software development




Fred Wild


Fred is principal of Advantage Software Technologies and can be contacted at
72774.657@compuserve.com.


Although we often use patterns, we sometimes do so unconsciously--not by
design. Whenever we follow a pattern, we are applying experienced-based
learning. Whenever we need to accomplish something that we've seen others do,
we tend to mimic the strategies and actions that made that thing successful in
the past.
Experienced software developers use patterns heavily in the design-level
aspects of their work. Design patterns make us more productive, more likely to
achieve success, and generally more valuable as project participants. 
We might think that less-experienced engineers are disadvantaged in this
respect, but rather than accepting the risks that lack of experience brings
with it, effective organizations take an early and active role in sharing
important domain knowledge with them. They gather the collective design wisdom
of the group, and document those best practices in a descriptive language
(manifest vocabulary) of what the important design patterns are, and when and
how to apply them. A good example of this can be found in Design Patterns:
Elements of Reusable Object-Oriented Software, by Erich Gamma et al.
(Addison-Wesley, 1995). In my experience, developers of all ranks drink up
such design patterns and quickly begin to apply them effectively.
With respect to patterns of code, we can provide much more direct help and
support than simply documenting best practices. We can build the use of those
code patterns into the tools we use in the software-development process
itself. SNIP is a commercially available tool I designed for that purpose. In
this article, I'll discuss the use of patterns in the coding aspect of
software development, and describe how to use SNIP to define and instantiate
code patterns.


Design Patterns and Code Patterns


Design patterns are logical in nature. They embody approaches and strategies
that are relevant to a number of possible implementations. A design pattern is
something you can visualize independent of a particular programming language.
For example, given that you need to manage the dynamic allocation and
deallocation of objects, you can specify a logical strategy for minimizing
memory utilization by keeping reference counts on objects and copying those
objects only when an operation is performed to modify them. For a detailed C++
example of this strategy see "Item 29" within More Effective C++: 35 New Ways
to Improve Your Programs and Designs, by Scott Meyers (Addison-Wesley, 1996).
Having defined the strategy and given it a name, you can treat it as a design
idiom. During design you can say, "XYZ will be a reference-counted object,"
and experienced programmers will understand what that implies.
In contrast, code patterns are physical in nature. They focus on how a
particular structure or sequence of action is accomplished using the specific
mechanisms of a programming language. With C++, we've been trained to create
code-level abstractions to support certain design idioms (for example, using
collection classes and templates), but this practice has limitations. Classes
and templates are not adequate mechanisms to implement many of the code
patterns that are interesting to us. Many important code patterns either
describe how a class should deal with its parts (a parts-based pattern), or
describe how a number of cooperating classes are created to implement a single
design idea (a multiclass pattern). Developers currently use parts-based and
multiclass patterns, but they do so via hand coding. You either know the
pattern by heart, or you copy a code sample that already follows the pattern,
and modify it to work in the specific case.


Using Code Patterns


Most production software environments operate under time and quality
pressures. As a result, code patterns that should be staples are
all-too-seldom used. This is not because the patterns are unknown. It is not
because developers do not recognize cases in which they would be good
strategies to apply. It is because hand-crafting patterns adds just enough
time and risk to a schedule to cause them to be shunned. For example, a
correct implementation of a reference-counted object in C++ requires careful
attention to the use of a copy-on-write mechanism that each non-const function
plays into. This adds unwelcome detail and complexity to a handcoding-based
process.
To make use of nontrivial code patterns, we just need an easy way to
characterize objects during design and then have those characteristics
influence how code is produced.


Using SNIP for Pattern Instantiation


SNIP is a tool for instantiating patterns of code that can be derived from
object models. First, it allows you to define code patterns that are
important, based on your own local standards and classifications of object
characteristics. SNIP then provides the capability to instantiate those code
patterns for any set of specific objects. Given a properly thought-out set of
rules for creating code for objects and their parts, based on specific
characteristics, you can realize gains in standardization, quality, and
time-to-implementation. 
SNIP enables code patterns that use an object model as an instantiation contex
to be defined; see Figure 1. It applies the rules called out in an executable
template to the objects and their parts, and produces code files based on
those rules. 
To use SNIP, you first need a well-defined implementation strategy. You need
to call out the characteristics of the objects (and their parts) that
participate in that strategy, and decide how code should be created for each
of them. Given that strategy, you create a set of code-creation rules, placing
them in a SNIP template file. A number of template files that serve as good
starting points for this come with the tool. For example, a template for
handling frequent C++ class constructs that can be easily extended to include
special rules is provided. 
You can develop the template files iteratively using SNIP's user interface.
Once you are satisfied with the code pattern that is created by the template,
you can add the command form of SNIP into makefiles for use in non-interactive
builds. 
As an example of how SNIP is used, I'll use the common C++ programming task of
properly managing dynamically allocated objects. Whenever you have a contained
pointer, it is important for the code in the class containing the pointer to
manage the dynamically allocated object correctly. (Although you might
otherwise create a special container class or smart-pointer template, I'll use
this example because it is familiar and brief.) 
First, I'll define what it means to manage an object via a contained pointer.
(Feel free to disagree with the definition--it just means your template would
be different than the one I present.) I define the code-pattern requirements
as follows:
All pointers to objects, both owned and shared, are initially set to NULL.
A set operation keeps the pointer passed into it in both the owned and shared
pointer cases.
A clear operation deletes the object if it is owned, otherwise the pointer is
set to NULL.
The destructor deletes the object if it is owned.
A copy constructor or assignment operation calls makeCopy() on an owned object
and uses the returned pointer. Pointers are simply copied in shared object
cases. 
Listing One shows how these rules are expressed in the SNIP-template file
format. Before inspecting the listing, however, note that this template is
pared down for illustration, so it won't be hard to find a case it doesn't
handle (the production template that this example was taken from is much more
sophisticated). Secondly, there are only a few constructs used in SNIP
templates. Once you know them, readability isn't an issue.
SNIP's template statements draw information from an object model and generate
code based on that information. Before you can make sense of template
statements, you need to see what they are referring to. Listing Two is a
simple object model I'll use as input to SNIP in this example. It has just one
contained pointer of each type in it--one to an object that is owned
(PetOwner::Pet), and another to an object that is shared (PetOwner::Vet). When
you apply the template file in Listing One to this object model, your rules
will be used to generate the code in Listings Three (the resulting header
file) and Four (the resulting body file). 


Reading Template Statements


SNIP template files contain a mainline of executable statements and any number
of modules that contain executable statements. Executable statements come in
three forms: simple statements, blocks, and iterators. Simple statements have
the form left-hand-side ':' right-hand-side. To the left of the colon is a
selection expression. If the expression evaluates to True, the right side is
expanded, and the resulting text is emitted into the active output file. 
The module emit_class_decls in Listing One contains the segment of statements
in Example 1. The first five statements have no variables on the left, which
means they are selected in all cases, so each of their right sides are
expanded and emitted into the active output file (variables being substituted
where indicated). A backslash on the end of a statement continues the line--no
line feed is emitted when the right side's text is emitted. 

The next line has a single variable on the left, called <obj.has_parent>,
which takes on the value true if the object has at least one parent object
identified in the input model. When it is True, the statement is selected. It
emits a colon and space, and continues the line. 
The next statement is an iterator, prefaced by .each_parent. It iterates over
each parent identified for the current object in the model. The body of the
iterator continues until its corresponding .end is reached. Within this parent
iterator, each parent class name is emitted, followed by a comma, except with
the last one. The variable <xxx.is_nth> is used to indicate the last member of
an iterated list. The bang character (!) inverts the sense of the Boolean
variable. 
This segment of the template emits code that properly creates the preamble of
a class' declaration and takes into account cases where the class is a
subclass of one or more parent classes. 


Code for Handling Contained Pointers


Listing Two contains the template module emit_class_copy_ctor that will emit
the copy constructor for each object in the model. The pattern that the copy
constructor follows demonstrates the need to treat contained pointers to owned
objects differently from those that are shared. 
The module emit_class_copy_ctor starts by emitting the copy constructor's
signature and initialization list, including calls to copy constructors higher
up the inheritance paths, if there are any. After emitting the opening brace
of the constructor's body, the attributes that have been defined for the
object are iterated, and three possibilities are used as selection criteria
for producing code statements. The attribute may be simple, such as Name. In
this case, the template assumes that the assignment operator is defined for
the attribute type, and adds a statement to copy its associated value. Second,
the attribute may be a pointer that is shared. In that case, the pointer value
is copied. In the last case, you have a pointer that is owned. A call is
issued to make a copy of the object in that case, and the pointer of the newly
allocated object is retained. This module is also responsible for emitting the
body of the makeCopy function, which appears at its end. When this module is
applied to PetOwner, you get the code in the class PetOwner shown in Listing
Four. 
The destructor pattern (see the module emit_class_destructor in Listing Two)
iterates similarly over the object's attributes. This time the iteration is
done to find only the owned pointers. These are the only members that are
interesting to the destructor (which is responsible for deleting them). The
destructor for PetOwner has only one line, which deletes the Pet object, its
sole owned pointer. Example 2 (excerpted from Listing Four) is the code
created for PetOwner's destructor.


Conclusion


SNIP allows control of code generation, and is flexible enough that you can
define code patterns that relate to how objects fit into your local
frameworks. Where a consistent strategy and pattern of producing code can be
identified, quality and productivity can be increased proportional to the
amount of code that follows that strategy. The ratio of lines of code to lines
of input model is typically 20:1. This is a great productivity advantage.
Admittedly, SNIP does not provide a revolutionary new way to produce code.
Instead, it offers the opportunity to automate what you are already doing to
follow code patterns by hand. 


For More Information


Advantage Software Technologies
260 Abbott Run Valley Rd.
Cumberland, RI 02864 
401-334-4807 
72774.657@compuserve.com.
SNIP In Action
Mo Bjornestad
Mo, vice president of marketing for Mark V Systems, can be contacted at mob@
markv.com or http://www.markv.com.
CASE tools have traditionally focused on providing a means of modeling
analysis and design problems using the notations of prescribed methodologies,
such as those offered by Booch and Rumbaugh. As such, CASE tools have too
often been used as a special-purpose document-capture facility. This is an
unenlightened use. Modern CASE tools are now used more appropriately as rapid
application development (RAD) tools that generate applications in whole or in
part. That is what Mark V's ObjectMaker is designed to do.
ObjectMaker supplies a meta-CASE capability, which is another way to say that
it offers a highly extensible system that provides a breadth of customization.
ObjectMaker users simply build graphical models of their requirements and
design. The graphical diagrams are transformed into a unified semantic model.
The semantic model is then transformed to SNIP's macro language, which is sent
to SNIP for template expansion into the targeted programming language(s). The
template can be edited for form and content to conform to internal standards.
(Templates are currently available for Ada 83, Ada 95, C, C++, and Smalltalk,
with IDL and Java templates under development.) In short, users can define
their own variants of traditional methods, or add completely new methods. They
can then use the information captured in the repository in forward engineering
into code. That's where SNIP comes in. 
SNIP allows ObjectMaker to remain customizable. ObjectMaker's front end is
integrated with SNIP's capability to instantiate patterns of code from object
models. SNIP integration is accomplished by using Mark V's meta rule language
to traverse and transform the ObjectMaker semantic model contents into the
SNIP macro language. The macro is handed off to a macro expander, which
populates the templates, then exports the file as fully expanded code. We feed
the object model information into SNIP from our repository, and let it
generate code files according to user-modifiable code patterns delivered with
ObjectMaker. SNIP is the right tool for this because it is largely model and
language independent.
Using SNIP has exposed many opportunities and possibilities for patterns to be
applied in creating code from design-level information. As people's ideas
about using code patterns become more focused, we see a renewed and growing
interest in CASE as a front end for driving pattern-based approaches to
creating software. SNIP provides a timely capability in this area.
Figure 1: SNIP's operating model.
Example 1: Reading template statements.
 ://
 :// Class <obj.name> -------------
 ://
 :
 :class <obj.name> \
<obj.has_parent> :: \
.each_parent
 :public <parent.name>\
<!parent.is_nth> :, \
<parent.is_nth> :
.end
 :{
Example 2: Code created for PetOwner's destructor.
PetOwner::~PetOwner()
{
 delete m_Pet ;
}

Listing One
object PetOwner 
 Name : string ;
 Pet : ptr_to Pet [is_owner] ;
 Vet : ptr_to Veterinarian [is_shared] ;

end ;
object Veterinarian 
 Name : string ;
 StreetAddress : string ;
 Town : string ;
 Phone : string ;
end ;
object Pet 
 Name : string ;
 KindOfPet : string ;
end ;
foreign string [use_ref] ;

Listing Two
# Create the header file ...
#
 :$$NEWFILE <dsm.name>.hxx
 :
 :#ifndef <dsm.name>_H
 :#define <dsm.name>_H
 :
 :// Forward references for each class
 :
.each_obj
 :class <obj.name> ;
.end
 :
 :$$EXECMODULE emit_class_decls
 :
 :#endif
 :
# Create the body file ...
#
 :$$NEWFILE <dsm.name>.cxx
 :
 :#include "<dsm.name>.hxx"
 :
 :$$EXECMODULE emit_class_bodies
 :
###################################################################
# Module to emit class declarations
#
.module emit_class_decls
.each_obj
 :
 ://
 :// Class <obj.name> --------------------------------
 ://
 :
 :class <obj.name> \
<obj.has_parent> :: \
.each_parent
 :public <parent.name>\
<!parent.is_nth> :, \
<parent.is_nth> : 
.end
 :{
 :$$EXECMODULE emit_member_variables
 :

 : public:
 :
 : <obj.name>();
 : <obj.name>(const <obj.name>& obj);
 : <obj.name> *makeCopy() const ;
 : virtual ~<obj.name>();
 :
 : <obj.name> &operator=(const <obj.name> &rhs) ;
 :
 :$$EXECMODULE emit_attr_access_ftn_decls
 :
 :} ;
.end
.end
###################################################################
# Modules to emit class member variable declarations
#
.module emit_member_variables
.each_attr
<attr.simple> : <attr.kind> <30> m_<attr.name> ;
<attr.ptr_to> : <attr.kind> <30> *m_<attr.name> ;
.end
.end
###################################################################
# Module to emit attribute access function declarations
#
.module emit_attr_access_ftn_decls
.each_attr
 :$$EXECMODULE emit_get_decl
 :$$EXECMODULE emit_set_decl
 :
.end
.end
###################################################################
# Module to emit get functions for an attribute
#
.module emit_get_decl
<attr.simple><!attr.use_ref> : <attr.kind> <35>Get<attr.name> () const ;
<attr.simple><attr.use_ref> : const <attr.kind> & <35>Get<attr.name> () const;
<attr.ptr_to> : const <attr.kind> * <35>Get<attr.name> () const;
.end
###################################################################
# Module to emit set functions for an attribute
#
.module emit_set_decl
<attr.simple><!attr.use_ref> : void <35>Set<attr.name>(const<attr.kind> val);
<attr.simple><attr.use_ref> : void <35>Set<attr.name>(const<attr.kind> & val);
<attr.ptr_to> : void <35>Set<attr.name>(<attr.kind> *val) ; 
<attr.ptr_to> : void <35>Clear<attr.name> () ; 
.end
###################################################################
# Module to emit the bodies of member functions for each class
#
.module emit_class_bodies
.each_obj
 :
 ://////////////////////////////////////////////////////////
 :// class <obj.name>
 :$$EXECMODULE emit_class_default_ctor

 :$$EXECMODULE emit_class_copy_ctor
 :$$EXECMODULE emit_class_destructor
 :$$EXECMODULE emit_assignment_op_body
 :$$EXECMODULE emit_get_and_set_ftn_bodies
.end
.end
###################################################################
# Module to create a class' default constructor
#
.module emit_class_default_ctor
 :
 :<obj.name>::<obj.name>()
 :{
.each_attr
<attr.ptr_to><!attr.has_init> : m_<attr.name> = NULL;
<attr.has_init> : m_<attr.name> = <attr.init_val> ;
.end
 :}
.end
###################################################################
# Module to create a class' copy ctor and 'makeCopy' function
#
.module emit_class_copy_ctor
 :
 :<obj.name>::<obj.name>(const <obj.name>& obj)
<obj.has_parent> : : \
.each_parent
 :<parent.name>(obj)\
<!parent.is_nth> :,
<!parent.is_nth> : \
<parent.is_nth> :
.end
 :{
.each_attr
<attr.simple> : m_<attr.name> = obj.m_<attr.name>;
<attr.ptr_to><attr.is_shared> : m_<attr.name> = obj.m_<attr.name>;
<attr.ptr_to><attr.is_owner> : m_<attr.name> = obj.m_<attr.name>->makeCopy();
.end
 :}
 :
 :<obj.name> *<obj.name>::makeCopy() const 
 :{
 : return new <obj.name>(*this);
 :}
.end
###################################################################
# Module to create a class' destructor
#
.module emit_class_destructor
 :
 :<obj.name>::~<obj.name>()
 :{
.each_attr
<attr.ptr_to><attr.is_owner> : delete m_<attr.name> ;
.end
 :}
.end
###################################################################
# Module to create a class' assignment operator

#
.module emit_assignment_op_body
 :
 :<obj.name> &
 :<obj.name>::operator=(const <obj.name> &rhs)
 :{
 : if (this == &rhs) return *this ;
 : // No changes if assignment to self 
 :
.each_parent
 : ((<parent.name> &) (*this)) = rhs ; 
 : // Assign base class members of <parent.name>
.end
.each_attr
<attr.simple> : m_<attr.name> = rhs.m_<attr.name>;
<attr.ptr_to><attr.is_shared> : m_<attr.name> = rhs.m_<attr.name>;
<attr.ptr_to><attr.is_owner> : m_<attr.name> = rhs.m_<attr.name>->makeCopy();
.end
 :
 : return *this ;
 :}
.end
###################################################################
# Module to create class member functions that access single valued member
vars
#
.module emit_get_and_set_ftn_bodies
.each_attr
<attr.getset> :$$EXECMODULE emit_get_ftn_body
<attr.getset> :$$EXECMODULE emit_set_and_clear_ftn_bodies
.end
.end
###################################################################
# Module to create a 'Get' function body for accessing an attribute
#
.module emit_get_ftn_body
<attr.simple><!attr.use_ref> :<attr.kind> Get<attr.name> () const
<attr.simple><attr.use_ref> :const <attr.kind> & Get<attr.name> () const 
<attr.ptr_to> :const <attr.kind> * Get<attr.name> () const 
 :{
 : return m_<attr.name> ;
 :}
.end
###################################################################
# Module to create a 'Set' and 'Clear' function bodies
#
.module emit_set_and_clear_ftn_bodies
<attr.simple><!attr.use_ref> :void Set<attr.name> (const <attr.kind> val)
<attr.simple><attr.use_ref> :void Set<attr.name> (const <attr.kind> & val)
<attr.ptr_to> :void Set<attr.name> (<attr.kind> *val)
 :{
<attr.simple> : <attr.name> = val ;
<attr.ptr_to><attr.is_owner> : if (<attr.name> != NULL) delete <attr.name>;
<attr.ptr_to> : <attr.name> = val ;
 :}
<attr.ptr_to> :
<attr.ptr_to> :
<attr.ptr_to> :void Clear<attr.name> ()
<attr.ptr_to> :{
<attr.ptr_to><attr.is_owner> : if (<attr.name> != NULL) delete <attr.name> ;

<attr.ptr_to> : <attr.name> = NULL ;
<attr.ptr_to> :}
.end
.end

Listing Three
#ifndef pets_H
#define pets_H
// Forward references for each class
class PetOwner ;
class Veterinarian ;
class Pet ;
// Class PetOwner --------------------------------
class PetOwner {
 string m_Name ;
 Pet *m_Pet ;
 Veterinarian *m_Vet ;
 public:
 PetOwner();
 PetOwner(const PetOwner& obj);
 PetOwner *makeCopy() const ;
 virtual ~PetOwner();
 PetOwner &operator=(const PetOwner &rhs) ;
 const string & GetName () const ;
 void SetName (const string & val) ;
 const Pet * GetPet () const ;
 void SetPet (Pet *val) ; 
 void ClearPet () ; 
 const Veterinarian * GetVet () const ;
 void SetVet (Veterinarian *val) ; 
 void ClearVet () ; 
} ;
// Class Veterinarian --------------------------------
class Veterinarian {
 string m_Name ;
 string m_StreetAddress ;
 string m_Town ;
 string m_Phone ;
 public:
 Veterinarian();
 Veterinarian(const Veterinarian& obj);
 Veterinarian *makeCopy() const ;
 virtual ~Veterinarian();
 Veterinarian &operator=(const Veterinarian &rhs) ;
 const string & GetName () const ;
 void SetName (const string & val) ;
 const string & GetStreetAddress () const ;
 void SetStreetAddress (const string & val) ;
 const string & GetTown () const ;
 void SetTown (const string & val) ;
 const string & GetPhone () const ;
 void SetPhone (const string & val) ;
} ;
// Class Pet --------------------------------
class Pet {
 string m_Name ;
 string m_KindOfPet ;
 public:
 Pet();

 Pet(const Pet& obj);
 Pet *makeCopy() const ;
 virtual ~Pet();
 Pet &operator=(const Pet &rhs) ;
 const string & GetName () const ;
 void SetName (const string & val) ;
 const string & GetKindOfPet () const ;
 void SetKindOfPet (const string & val) ;
} ;
#endif

Listing Four
#include "pets.hxx"
//////////////////////////////////////////////////////////
// class PetOwner
PetOwner::PetOwner()
{
 m_Pet = NULL;
 m_Vet = NULL;
}
PetOwner::PetOwner(const PetOwner& obj)
{
 m_Name = obj.m_Name;
 m_Pet = obj.m_Pet->makeCopy();
 m_Vet = obj.m_Vet;
}
PetOwner *PetOwner::makeCopy() const 
{
 return new PetOwner(*this);
}
PetOwner::~PetOwner()
{
 delete m_Pet ;
}
PetOwner &
PetOwner::operator=(const PetOwner &rhs)
{
 if (this == &rhs) return *this ;
 // No changes if assignment to self 
 m_Name = rhs.m_Name;
 m_Pet = rhs.m_Pet->makeCopy();
 m_Vet = rhs.m_Vet;
 return *this ;
}
//////////////////////////////////////////////////////////
// class Veterinarian
Veterinarian::Veterinarian()
{
}
Veterinarian::Veterinarian(const Veterinarian& obj)
{
 m_Name = obj.m_Name;
 m_StreetAddress = obj.m_StreetAddress;
 m_Town = obj.m_Town;
 m_Phone = obj.m_Phone;
}
Veterinarian *Veterinarian::makeCopy() const 
{
 return new Veterinarian(*this);

}
Veterinarian::~Veterinarian()
{
}
Veterinarian &
Veterinarian::operator=(const Veterinarian &rhs)
{
 if (this == &rhs) return *this ;
 // No changes if assignment to self 
 m_Name = rhs.m_Name;
 m_StreetAddress = rhs.m_StreetAddress;
 m_Town = rhs.m_Town;
 m_Phone = rhs.m_Phone;
 return *this ;
}
//////////////////////////////////////////////////////////
// class Pet
Pet::Pet()
{
}
Pet::Pet(const Pet& obj)
{
 m_Name = obj.m_Name;
 m_KindOfPet = obj.m_KindOfPet;
}
Pet *Pet::makeCopy() const 
{
 return new Pet(*this);
}
Pet::~Pet()
{
}
Pet &
Pet::operator=(const Pet &rhs)
{
 if (this == &rhs) return *this ;
 // No changes if assignment to self 
 m_Name = rhs.m_Name;
 m_KindOfPet = rhs.m_KindOfPet;
 return *this ;
}






















Applying Design Patterns to PowerBuilder


The Observer pattern provides a window communication mechanism




Mark Nielsen and Nick Abdo


Mark and Nick are members of the research and development staff at MetaSolv
Software in Dallas, Texas. Mark can be reached at mnielsen@metasolv.com; Nick,
at nabdo@metasolv.com.


As the applications being developed with PowerBuilder become increasingly
complex, PowerBuilder developers often struggle to apply sound design
techniques to their applications. With support for most object-oriented
programming features, PowerBuilder provides a great environment for
implementing many of the recently published object-oriented design patterns.
Like many other programming languages however, PowerBuilder allows you to
break all the rules of good object-oriented design. Design patterns help to
provide a road map to achieving greater reusability and maintainability in
applications. Many of us know there are usually more questions than answers
when it comes to designing software systems, and anything describing solutions
tends to get our attention. While design patterns have the potential to help
PowerBuilder developers, most examples and related information on patterns are
presented in languages other than PowerBuilder. This leaves the translation
into PowerBuilder up to the developer and can limit the exposure of patterns
to the PowerBuilder community. 
In this article, we'll examine how common design patterns can help
PowerBuilder developers make good design decisions. In particular, we'll apply
one of the patterns to design a window communication mechanism. We'll then
implement this design in a simple PowerBuilder application. Once you've seen
one of the patterns implemented in PowerBuilder, you should be able to find
other design patterns useful as well.


The Problem


Most applications developed using PowerBuilder contain multiple windows that
display different views of the database. In many instances, multiple windows
are displayed at the same time, reflecting different views of the same data. A
common dilemma facing many PowerBuilder developers is deciding how and when
information will be passed between windows to keep the information in the
windows synchronized.
For example, Figure 1 shows a simple inventory application containing three
windows. The Item Picture and Item Chart windows display different graphical
representations of the item selected on the Inventory List window.
When you select an item from the Inventory List, the Item Picture and Item
Chart windows' contents must change to reflect the selected inventory item.
Decisions must be made about how the windows talk to one another. To
synchronize displayed windows, it is common for PowerBuilder developers to
simply add the necessary code to the Inventory List window to directly
reference the other windows and their controls.
While this approach may solve the synchronization problem, it introduces
others. First, because users may choose to open the windows in any order (or
not open them at all), the Inventory List window becomes burdened with not
only having to keep track of which specific windows are depending on it, but
also what those windows are doing with its information.
As this type of tightly coupled design is spread throughout the system,
maintenance of the application becomes difficult. Developers working on the
display windows may not be aware that other windows are making direct
references to its controls. As new features are added to the system, old
features can begin to break. Also, if a new display window is needed, the
other windows need to be updated to handle its synchronization. This tight
coupling also has a negative effect on reuse. The Inventory List window would
not function well in another context without modifications.


The Solution


The desired design would allow enhancements and additions to the windows
without changing existing windows in the system. To achieve these goals,
windows must be loosely coupled--that is, they must minimize direct references
to one another. This is where design patterns are extremely useful. In their
book Design Patterns: Elements of Reusable Object-Oriented Software
(Addison-Wesley, 1995) Erich Gamma, Richard Helm, Ralph Johnson, and John
Vlissides provide a catalog of general-purpose object-oriented design
patterns. These design patterns can be used to solve many common
object-oriented design problems. In reviewing the pattern catalog, we
discovered a pattern referred to as the "Observer." This pattern, which was
also presented in the article "Observations on Observer," by Erich Gamma and
Richard Helm (Dr. Dobb's Sourcebook, September/October 1995), describes a
register-and-notify mechanism that provides indirect communication among
objects. 


The Observer Pattern


According to Design Patterns, the Observer pattern "Define[s] a one-to-many
dependency between objects so that when one object changes state, all its
dependents are notified and updated automatically." In other words, the
Observer pattern is designed so that one or more independent objects
(Observers) watch or observe another object (the Subject) and perform some
action based on changes in, or actions taken by, the subject.
The key to the Observer pattern is the registration and notification concept.
An object that needs to observe another object registers its interest with
that object. When the observed object is subsequently modified, it sends a
message to each of its registered objects notifying them that a change has
occurred. Each notified object then queries the subject object for the
modification and performs the appropriate action to synchronize itself with
the new state of the subject.
In Figure 2, a class diagram depicts the structure of the Observer pattern
using OMT notation. As you can see, the Observer pattern consists of four
classes: 
The Subject class, which contains operations to Attach() or Register
observers, to Detach() or Unregister observers, and to Notify() observers to
update themselves by means of the Observer class' Update() operation.
The Observer class, which contains a default Update() operation defining the
update interface for the Subject class. 
The ConcreteSubject class, which is inherited from the Subject class and
contains attributes (represented by subjectState) to hold the current state of
the object and operations (represented by GetState()) to return the object's
state to any interested object.
The ConcreteObserver class, which is inherited from the Observer class and
contains attributes (represented by observerState) to hold the current state
of the object and an Update() operation that overrides the default Update()
operation from the Observer class. The ConcreteObserver's Update() operation
gets the state of the ConcreteSubject class (the observed object) with which
it is registered.
In the basic Observer pattern structure, a Subject may reference zero-to-many
Observers, while a ConcreteObserver may only reference a single
ConcreteSubject.
At run time, Observers register with a Subject, indicating their interest in
the Subject. As the Subject changes, it invokes its Notify() operation
(defined in the superclass), which invokes the Update() operation on all
registered Observers. Polymorphism is used here, allowing the Subject to only
need to know about an abstract Observer class. This is the key to achieving
the loose coupling of windows. Upon receiving the Update() message, each
Observer interrogates the Subject for its new state, and then, maintaining
encapsulation, updates itself accordingly.


PowerBuilder Implementation


Since many PowerBuilder applications are built with some type of class library
or framework, we'll assume all our windows are inherited from a common
ancestor window. This being the case, we can implement the Subject and
Observer classes within this common ancestor window, which we'll call
w_master. By implementing the pattern this way, all windows inherited from
w_master can act as a subject, an observer, or both. The result is reflected
in Figure 3. Referring to Figure 2, the association between the Subject class
and the Observer class can be implemented as an instance variable in w_master,
declared as an array of itself. This variable (iw_observers[]) will be used to
hold references to each registered object. The operations of both the Subject
and Observer classes are implemented as window functions of w_ master. The
f_Attach() and f_Detach() functions are responsible for maintaining the array
of registered objects, while f_ Notify() is responsible for looping through
the array and invoking the Update() operation on each element. The Update()
operations on w_master and all observing windows were renamed to f_Refresh()
to avoid confusion with the Update() function of the datawindow. In the
concrete windows, the associations between the ConcreteObservers and the
ConcreteSubject are implemented as instance variables in each of the
ConcreteObservers: w_picture, w_piechart, and w_bargraph (see Listing One).
The menu is used to open the observing windows and uses the f_SetSubject()
function to set their subject to w_inventory (see Listing Six). Within
f_SetSubject, the iw_ subject variable is set to hold a reference to
w_inventory, then the window registers its interest in w_inventory by calling
the f_Attach() function, using itself as an argument (see Listings Three,
Four, and Five).
The notification process needs to begin when the row focus changes in the w_
inventory window's datawindow. The script in this event simply calls
w_inventory's f_Notify() function (inherited from w_ master).
Finally, each observing window implements its own f_Refresh() window function
to process the change. This function's purpose is to retrieve the current
state of w_inventory (for instance, what are the quantity values for the
selected item?), then refresh the window's state to reflect that of
w_inventory. Encapsulation is achieved by the creation and use of the access
functions provided by w_inventory (see Listing Two).
As a test of the design's loose coupling, the w_bargraph window was actually
added as an afterthought. No existing code was changed. The w_bargraph window
was created by inheriting from w_master. The appropriate script was added to
have it register with the subject and its f_Refresh() function was scripted to
reflect the subject's state.



Other Uses and Implementations


Several implementation variations are worth mentioning. The observing windows
could also perform updates to the subject window. In this case, the subject
would again simply call its f_Notify() function to alert all registered
observers of any change. You also have options for establishing the reference
to the subject from the observing windows. We chose to implement the
f_SetSubject() function, but this could be performed in other ways, depending
on whether it's appropriate for the observing windows to initiate
relationships to the subject. The detaching of observers from the subject is
also specific to your application. We chose to close all observers when the
subject closes.


Conclusion


Design patterns can take much of the guesswork out of making design decisions.
They can help validate your design ideas and can provide a starting point for
weighing trade-offs and contradictions presented during object-oriented system
development. Putting design patterns to work in PowerBuilder involves mapping
the generic object-oriented constructs present in the patterns to
PowerBuilder-specific constructs. In this example, we implemented a variation
of the Observer pattern in PowerBuilder (available electronically; see
"Availability," page 3). In making the translation, the pattern was modified
slightly to fit within common PowerBuilder development practices. The
important thing to remember while working with patterns is that they only
present a generic solution to a design problem. In many cases, the actual
implementation varies significantly from the basic pattern. If you don't take
everything literally, and experiment a little, design patterns can be
invaluable when developing PowerBuilder applications.


References


Gamma, E., R. Helm, R. Johnson, J. Vlissides. Design Patterns: Elements of
Reusable Object-Oriented Software, Reading, MA: Addison-Wesley, 1995.
Coplien, J. and D. Schmidt, eds. Pattern Languages of Program Design, Reading,
MA: Addison-Wesley, 1995.


For More Information:


Powersoft
561 Virginia Road
Concord, MA 01742-2732
508-287-1500
http://www.powersoft.com
MetaSolv Software
14900 Landmark, Suite 530
Dallas, TX 75240
214-239-0623 
http://www.metasolv.com
Figure 1: Application requiring communication between windows.
Figure 2: Basic structure of the Observer pattern.
Figure 3: Class diagram depicting implementation of the Observer pattern.

Listing One
w_master -- Ancestor window for subjects and observers
instance variables
// holds references to attached observers
w_master iw_observers[]
window functions:
public function boolean f_attach (w_master aw_observer)
public function boolean f_detach (w_master aw_observer)
public function boolean f_notify ()
public function boolean f_refresh ()
public function boolean f_attach (w_master aw_observer);
// purpose: attach the passed object as an observer
// add the passed object to next slot in the array
iw_observers[upperbound(iw_observers) + 1] = aw_observer
// tell new observer to refresh itself based on current state of subject
aw_observer.f_refresh()
return true
public function boolean f_detach (w_master aw_observer);
// purpose: detach the passed object, removing it from list of known observers
// declare tmp array to hold observers still attached
w_master lw_tmp_observers[]
integer li_total_observers, li_index, li_tmp_index = 1

// get total number of observers
li_total_observers = upperbound(iw_observers)
// loop through observers and tranfer them to the tmp
// array except for the passed observer
for li_index = 1 to li_total_observers
 if iw_observers[li_index] <> aw_observer then
 lw_tmp_observers[li_tmp_index] = iw_observers[li_index]
 li_tmp_index = li_tmp_index + 1
 end if
end for
iw_observers = lw_tmp_observers
return true
public function boolean f_notify ();
// purpose: call the f_refresh() function on all attached observers. 
// this function should be called whenever changes are made to this subject
integer li_index, li_totalobservers
// get the total number of attached observers
li_totalobservers = upperbound(iw_observers)
// tell all attached observers to refresh
for li_index = 1 to li_totalobservers
 iw_observers[li_index].f_refresh()
end for
return true
public function boolean f_refresh ();
// purpose: resync to new state of the subject this function should be 
// implemented in descendants to refresh based on changes to subject object
return true

Listing Two
w_inventory -- Inventory list window as the subject
controls:
dw_items from datawindow within w_inventory
window functions:
public function string f_getitemname ()
public function long f_getonhand ()
public function long f_getbackordered ()
public function long f_getonorder ()
public function long f_getreserved ()
public function string f_getitemname ();
// purpose: return the name of the selected item
// returns: string - name of selected item
return dw_items.getitemstring(dw_items.getrow(),"itemname")
public function long f_getonhand ();
// purpose: return onhand value of the selected item
// returns: long - onhand value
return dw_items.getitemnumber(dw_items.getrow(),"onhand")
public function long f_getbackordered ();
// purpose: return backorder value of the selected item
// returns: long - backorder value
return dw_items.getitemnumber(dw_items.getrow(),"backordered")
public function long f_getonorder ();
// purpose: return onorder value of the selected item
// returns: long - onorder value
return dw_items.getitemnumber(dw_items.getrow(),"onorder")
public function long f_getreserved ();
// purpose: return reserved value of the selected item
// returns: long - reserved value
return dw_items.getitemnumber(dw_items.getrow(),"reserved")
window events:

on open
// initialize the datawindow to the first row
dw_items.setrow(1)
dw_items.postevent(rowfocuschanged!)
end on
on close
// close all observer windows
do while upperbound(iw_observers) > 0
 close(iw_observers[1])
loop
end on
control events:
on dw_items.rowfocuschanged
// purpose: highlight the selected row and notify observers
// highlight the chosen row 
this.selectrow(0, false)
this.selectrow(getrow(), true)
// tell the window to notify all observers of the change
parent.f_notify()
end on

Listing Three
w_picture -- Item picture window as an observer 
controls:
p_item from picture within w_picture
instance variables:
private:
// holds a reference to the subject object
w_inventory iw_subject
window functions:
public function boolean f_refresh ()
public function boolean f_setsubject (ref w_master aw_subject)
public function boolean f_refresh ();
// purpose: update ourself to reflect the current state of the subject
// get the item name from the subject and set the bitmap on picture control.
if isvalid(iw_subject) then
 p_item.picturename = iw_subject.f_getitemname() + ".bmp"
end if
return true
public function boolean f_setsubject (ref w_master aw_subject);
// purpose: set the subject object
// set the instance variable used to hold the subject
iw_subject = aw_subject
// attach ourself with the subject
if isvalid(iw_subject) then
 iw_subject.f_attach(this)
end if
return true
window events:
on close
// purpose: detach with the subject
// detach with the subject
if isvalid(iw_subject) then
 iw_subject.f_detach(this)
end if
end on

Listing Four
w_piechart -- Item pie chart window as an observer 

controls:
dw_pie from datawindow within w_piechart
instance variables
private:
// holds a reference to the subject object
w_inventory iw_subject
window functions:
public function boolean f_refresh ()
public function boolean f_setsubject (ref w_master aw_subject)
public function boolean f_refresh ();
// purpose: update ourself to reflect the current state of the subject
// return if not attached to a subject
if not isvalid(iw_subject) then
 return false
end if
// set the piechart data elements based on the subject.
dw_pie.setitem(1, "quantity", iw_subject.f_getonhand())
dw_pie.setitem(2, "quantity", iw_subject.f_getbackordered())
dw_pie.setitem(3, "quantity", iw_subject.f_getonorder())
dw_pie.setitem(4, "quantity", iw_subject.f_getreserved())
return true
public function boolean f_setsubject (ref w_master aw_subject);
// purpose: set the reference to the subject and attach
// set the instance variable used to hold the subject
iw_subject = aw_subject
// attach ourself with the subject
if isvalid(iw_subject) then
 iw_subject.f_attach(this)
end if
return true
window events:
on close
// detach with the subject
if isvalid(iw_subject) then
 iw_subject.f_detach(this)
end if
end on

Listing Five
w_bargraph -- Item Bar chart window as an observer
controls:
dw_bar from datawindow within w_bargraph
instance variables:
private:
// holds a reference to the subject object
w_inventory iw_subject
window functions:
public function boolean f_setsubject (ref w_master aw_subject)
public function boolean f_refresh ()
public function boolean f_setsubject (ref w_master aw_subject);
// purpose: set the reference to the subject and attach
// set the instance variable used to hold the subject
iw_subject = aw_subject
// attach ourself with the subject
if isvalid(iw_subject) then
 iw_subject.f_attach(this)
end if
return true
public function boolean f_refresh ();

// purpose: update ourself to reflect the current state of the subject
// return if not attached to a subject
if not isvalid(iw_subject) then
 return false
end if
// set the barchart data elements based on the subject
dw_bar.setitem(1, "quantity", iw_subject.f_getonhand())
dw_bar.setitem(2, "quantity", iw_subject.f_getbackordered())
dw_bar.setitem(3, "quantity", iw_subject.f_getonorder())
dw_bar.setitem(4, "quantity", iw_subject.f_getreserved())
return true
window events:
on close
// detach with the subject
if isvalid(iw_subject) then
 iw_subject.f_detach(this)
end if
end on

Listing Six
m_frame -- Main Frame menu
type m_inventory from menu within m_view
type m_picture from menu within m_view
type m_piechart from menu within m_view
type m_barchart from menu within m_view
m_inventory from menu within m_view
on clicked;
// purpose: open w_inventory window which will become subject in this example
// open inventory window as subject
opensheet(w_inventory, w_Frame, 0, Original!)
m_picture from menu within m_view
on clicked
// purpose: open w_picture window as an observer and set subject to
w_inventory
// open the w_picture
OpenSheet(w_picture, w_frame, 0, Original!)
// set its subject to w_inventory
w_picture.f_setsubject(w_inventory)
end on
m_piechart from menu within m_view
on clicked
// purpose: open w_piechart window as observer and set subject to w_inventory
// open the w_piechart window
OpenSheet(w_piechart, w_frame, 0, Original!)
// set its subject to w_inventory
w_piechart.f_setsubject(w_inventory)
end on
m_barchart from menu within m_view
on clicked
// purpose: open w_bargraph window as observer and set subject to w_inventory
// open the w_bargraph window
OpenSheet(w_bargraph, w_frame, 0, Original!)
// set its subject to w_inventory
w_bargraph.f_setsubject(w_inventory)
end on





































































PROGRAMMING PARADIGMS


Apple Fritters whileROM Burns




Michael Swaine


There's a lot of Apple-related stuff in the column this month, but not as much
as you might conclude from a cursory look. The bulk of it is about a book on
software design. Overall Macishness rating: 6. I thought you'd like to know.
In How to Drive Your Competition Crazy: Creating Disruption for Fun and Profit
(Hyperion, 1995), Guy Kawasaki itemized the steps Apple took to meet the
challenge of IBM's entry into the PC market in the early '80s: "First, we
created an advantage called the 'Macintosh user interface....' Second, we
fostered innovative applications like the desktop publishing program called
'Pagemaker....' Third, we incited customers to evangelize Macintosh and turn
it into a cause...."
Apple frittered away the first two advantages and is in the process of losing
the third. Although Apple's financial health is much better than reports would
suggest, corporations, unlike Mark Twain, can be killed by bad PR. And
although Apple has always been the kind of company that would consider it good
PR to make the cover of Rolling Stone, the recent cover story titled "The Fall
of Apple" just can't be what they had in mind.


Magic Recap


Everybody has a theory about the source of Apple's troubles, but at least you
can't blame those members of the Mac team who went off to start General Magic.
When I recently wrote about Alan Cooper's About Face: The Essentials of User
Interface Design (IDG Books, 1995), I quoted from Alan's critique of General
Magic's Magic Cap interface. The folks at General Magic had a different
perspective, and expressed it in our April 1996 "Letters" page. This is as it
should be. The truth is, the Magic Cap interface wasn't designed for me, or
for Alan, or for anyone who has a problem with cute bunnies. (Darn, I wasn't
going to do that.) It's a very interesting and innovative interface for an
interesting and innovative new platform (and I should probably get some help
for my bunny problem). If you want to make up your own mind about this
original and deep interface, you should read the book. 
The book to which I refer is Barry Boone's Magic Cap Programming Cookbook
(Addison-Wesley, 1995). The book is a fine tutorial on Magic Cap development.
You write Magic Cap programs on a Mac using a special C development
environment, a version of which is supplied on the included CD. Also in the
environment is a Magic Cap simulator, which seems to work very nicely. When
you want to test your code on a real Magic Cap device, the book explains how
to do that, and the CD has all the software you need. Programming for
communications is way easier for the Magic Cap environment than for the
Newton, because most of the work is already done. Although this is an
introductory book, it actually provides some useful communications program
examples. What puzzles me is why the book's cover artist decided to illustrate
the Magic Cap Programming Cookbook with a pot boiling on a stove, above which,
in the rising steam, floats the ghost of the Magic Cap bunny.


Mikhail Spindler


If I have underpraised the Magic Cap interface, I may have overpraised The
Power Mac Book. No sooner did this book by Ron Pronk come out, decorated by
flowery praise from me, than two people who know a lot about microprocessors
in general and programming for the PowerPC in particular called to tell me
there were a few mistakes in the book. There's now a new edition (The Power
Mac Book, Second Edition, Coriolis Group Books, 1996) that, Ron acknowledges,
fixes a number of flaws and adds a lot of new material.
This book tries to do a lot: Present the history and corporate politics behind
the PowerPC chip, second-guess Apple corporate strategy, and present the basic
facts on Apple's product lines in a way that a general reader can understand.
It's more successful in some of these attempts than in others. The discussion
of RISC versus CISC, in particular, will probably still strike my more
technical friends as oversimplified. Pronk analyzes Apple's strategic blunders
and blinders at length. Whether or not you see 'em as he calls 'em, it's nice
to see someone play the popular game under the constraint of having to be
consistent for a whole chapter. Those of us who play the game in monthly
magazine editorials and columns have it easier: We can lambaste Apple for
laying off employees one month, for not licensing clones early enough the next
month, and for giving away its interface to Microsoft the next--without having
to worry about what relationship, if any, there may be among these strategic
decisions (or nondecisions). I do like Pronk's analogy between Michael
Spindler and Mikhail Gorbachev, although it's more than a little generous to
one of those Mikes.
Although the book is directed at a non-technical or not-very-technical
audience, it does serve as a handy collection of facts about Power Macintoshes
that could be useful to anyone who supports or writes software intended to
serve the entire line, from the first Nubus Power Macs to the clones to the
new 604 machines.
Okay, I just have to get this off my chest. Although the reason for the World
Wide Web's creation was, more or less, so that the wheel did not have to be
reinvented daily, but could instead be invented once and linked to, one
consequence of the Web is that everyone who writes a book feels compelled to
include a chapter on the origin of the Internet or the World Wide Web, and
this book is no exception. For the benefit of future book authors, here's the
URL at which you can find all the documents pertaining to the origin of the
Web: http://www.cern.ch. Unless you know the story better than Tim
Berners-Lee, just give the link, okay? I feel better now.


Tog Story


The General Magic founders are not the only Mac magicians who have taken their
magic elsewhere. Bruce Tognazzini actually is an amateur magician, a hobby
that he claims has helped him a lot in his true calling as a software
designer/critic. Apple kept him up its Macintosh sleeve for years. Then, in
1992, perhaps after seeing his shadow, Tog sloughed off that winter's garment
and went out to play in the Sun. He is currently a Distinguished Engineer in
the Office of Strategic Technology at Sun Microsystems. Tog on Software Design
(Addison-Wesley, 1996) is, sort of, his Sun book, as Tog on Interface
(Addison-Wesley, 1992) was his Apple book. The centerpiece of the book is a
video script produced at Sun to prototype a vision of computing in the near
future. The goal, one assumes, was to help the Office of Strategic Technology
determine which technologies would be strategic. Tog presents the script,
complete with outtakes, early in the book, and then discusses software design
in the context of the technologies demonstrated in the video, which is called
"Starfire." (Well, not demonstrated. Faked, actually. This video is in the
Apple tradition of whizzy demos. The big difference is that all the
technologies in the Starfire video are supposed to be at least close to real.)
There are, then, two reasons to read Tog on Software Design: as a book on
software design principles and advice, and as a white paper on Starfire, Sun's
next-generation computing project.


Tog in Review


Tog's 1992 Tog on Interface was not as designed as Tog on Software Design. The
older book was a collection of usually useful, generally well-researched, and
often very funny essays offering advice on user interface issues, mostly
specific to the Macintosh user interface. Tog on Software Design has its
chapters of useful advice on specific software design issues, like
transparency (it's not a helpful concept), consistency (don't confuse it with
uniformity), and ease of use (it fixes the upper limit of users'
productivity).
But Tog on Software Design puts its advice in a larger context, a perspective
on where computers are going in the next decade. Anybody can spin out trends
and make predictions, and Tog spends a significant chunk of this book in
futurist mode. What makes this of more than casual interest is that the value
you'll attach to at least some of his advice hangs on how seriously you take
his extrapolations. But then, Sun is taking these extrapolations seriously
enough to fund Tog and the Starfire video team.
Another difference between this book and the earlier one is that the advice is
less platform specific. It's not that Tog doesn't deal with specifics of
software design. He does. He does it in terms of specific products for
specific platforms, as when he praises details of the design of several
Mac-based products from HSC Software, where innovative software designer Kai
Krause routinely breaks the rules to create great software. (In fact, most of
his concrete examples are Mac programs.) He does make comparisons across
platforms, although they are often broad generalizations:
The original Macintosh was so sensitive to the needs of the new users that
anyone using it for more than around a month ended up hopelessly frustrated.
UNIX, on the other hand, threw every single possible option in the new user's
face simultaneously, then demanded that the new user understand each option
well enough to decide whether it is needed.
A lot of us might agree with his assessment of the Mac, but I think we could
present a good argument that, in certain contexts, UNIX does just the opposite
of what he says. He does follow these generalizations with sound advice:
Never present an expert-user option in such a way that normal users must learn
all about it in order to know they don't need to use it.
And he follows this with a concrete example--the advanced paste capability in
Microsoft Word--that shows clearly how to do it right. Compared to About Face,
Tog on Software Design has fewer concrete examples, less practical advice on
specific problems of software design, and less software design advice overall.
But Tog on Software Design has its unique virtues. Tog's advice is more
grounded in research. Does this actually matter? Isn't it better to get the
advice of someone who's down in the trenches, like Alan Cooper? Actually, both
perspectives have their merit. Toward the end of the book, Tog presents the
results of some solid research that shows conclusively that research is
absolutely crucial to software design. If that doesn't convince you, I don't
know what will.


The Starfire Project


The other thing that distinguishes Tog on Software Design from About Face (and
from most books on software design and user interface design) is its heavy
emphasis on trying to predict the near future. Tog predicts at the level of
markets: "Keep your eyes peeled for successful enclaves. They will lead you to
the future." And he charts the future of hardware and software and the
Internet.

But the real reason Tog and the OST engineers at Sun want to make these
predictions has nothing to do with writing books on software design. Starfire
is Sun's exploration of where the computer industry and Sun should be heading
if the next generation is to be something better than the current generation.
Tog and the Starfire team apparently take these predictions pretty seriously,
and have formulated a vision based on them. That's the video. In case you're
interested, Sun will sell you Starfire: The Director's Cut for something like
fourteen bucks.


Unsung Heroes of the Computer Revolution


Long before the first "... for Dummies" book explained computers to the Great
Unwashed, there was Russ Walter's The Secret Guide to Computers (you can order
the self-published book by calling Russ at 617-666-2666). Russ may not be so
unsung to programmers in the Boston area, where his profile is higher, but he
deserves much more fame than he currently has. Ironically, more fame is
probably just what he doesn't need, for reasons that I will explain
momentarily. So maybe you could just make obeisance to the east and then keep
this under your hat.
One reviewer called The Secret Guide to Computers "the greatest single
self-teaching book on the subject of computers that I've ever read." Here's
Russ on the origin of C, which at least hints at his style:
In 1963 at England's Cambridge University and the University of London,
researchers developed a "practical" version of ALGOL and called it the
Combined Programming Language (CPL). In 1967 at Cambridge University, Martin
Richards invented a simpler, stripped-down version of CPL and called it Basic
CPL (BCPL). In 1970 at Bell Labs, Ken Thompson developed a version that was
even more stripped-down and simpler; since it included just the most critical
part of BCPL, he called it B. Ken had stripped down the language too much. It
no longer contained enough commands to do practical programming. In 1972, his
colleague Dennis Ritchie added a few commands to B, to form a more extensive
language. Since that language came after B, it was called C. So C is a
souped-up version of B, which is a stripped-down version of BCPL, which is a
stripped-down version of CPL, which is a "practical" version of ALGOL.
From his perspective, Russ sees his coverage of C programming as somewhat
outdated and plans to revise it in future editions. The 21st edition will be
out about the time you read this column; I have in my Quaint Old Computer
Books library the 10th edition, which was published in 1980. It has an
excellent comparison of the IBM Selectric golf ball print mechanism versus the
Teletype model 33 cylinder, daisy wheels, and thimble print mechanisms. Russ
is famous for publishing his phone number in the book and inviting readers to
call him with their questions. He gets about 60 calls a day, the last I heard,
at all hours. That's why I'm not so sure he needs more fame. In addition to
calling Russ directly, you can find out more about the $15.00 The Secret Guide
to Computers (discounts available when you buy multiple copies) at
ordway@ionet.net.


Scripting the Internet


Okay, I'm still using HyperCard and HyperTalk after all these years. I do find
myself integrating other scripting languages with HyperTalk--dropping an
AppleScript script into a HyperCard button here, hooking into some Frontier
tool there. And I'm finding that my HyperTalk scripting is having a rebirth as
I develop tools for my own use in browsing the Net and exploring Web server
technology.
The latest tool I've been playing with is Marionet, from Allegiant
(http://www.allegiant.com), the company that now owns SuperCard. It's an
inspiring story: Big company buys product, lets it languish, then finally sets
it free. Loyal users have been getting support from loyal creators of product
all this time. Loyal creators scrape together nickels and dimes, acquire
rights to product again, start a new company to enhance and support it. In a
few months, they're putting out new versions and working hard on Windows port.
And now they're doing new products, too.
Marionet is not an add-on to SuperCard or HyperCard, but a separate product.
It's basically a faceless background application that knows about all sorts of
Internet protocols and is callable via a HyperCard or SuperCard external
command, AppleScript, Frontier, or raw AppleEvents from C or other languages.
It does most of the grunt work of the interfacing, letting you put things
together very quickly. What you give up, obviously, is detailed control, but
sometimes that's an acceptable price. I'm playing around with it as a way to
hook up databases to my Web site without much programming. Marionet thinks in
terms of sessions, so you script it with session commands as in Example 1. 
The point? I'm finding that I am able to put together solutions to common
problems in very short order. Marionet may not be your magic lantern for Web
work, especially if you're working in Windows or UNIX, but something like it
may well be. The Web is such a fast-moving target, scripting tools have got to
be a big part of your arsenal.
Example 1: Scripting Marionet sessions.
Marionet "BeginSession"
Marionet "SendMail", sessionID, serverName, from, toList, messageText
{,errorVar}
Marionet "EndSession"






































C PROGRAMMING


In the Doghouse Again




Al Stevens


I am not exactly an old dog. New tricks are within my grasp if they aren't too
extreme. But sometimes they come to me reluctantly and don't want to stick.
Maybe I'm a middle-aged dog. (Last year I bought a mid-life Chrysler.) Here's
an example: Several years ago I devoted a full summer to learning the Rhapsody
In Blue (RIB) on, of course, the piano. This was no treat for my condo
neighbors, mostly retired, who had to listen to tedious, all-day practice
sessions for three months. By the end of that summer, and just as my neighbors
were about to form an elderly but organized lynch mob, I had RIB almost to
performance standards. I could rip the whole thing out end to end, including
that bothersome staccato passage (which I am convinced Gershwin included
solely to show off his pianistic technique).
I've since moved to the country: Those neighbors enthusiastically offered to
throw me a going-away party and help me move--especially the piano. Today,
having neglected RIB for a few years, I couldn't get past the first five
measures, and I can't imagine how anyone can play that stupid staccato
passage. See what I mean? If I had memorized it when I was a teenager, RIB
would be with me forever. But, since I waited until well after the decline of
my formative years, RIB didn't stick to my, er, ribs. Which, remarkably,
brings me to encapsulation.
At about the same time that I took on RIB, object-oriented programming (OOP)
was seeping into the mainstream. I credit the 386-class PC for bringing C++ to
the masses, and Windows programming for making it downright necessary. At
night, while my neighbors gratefully slept, I studied the few books available
on C++ and OOP and experimented with early MS-DOS C++ compilers. By the end of
that same summer, I had a good understanding of the syntax, if not the
implications, of C++, and I thought I understood OOP, particularly
encapsulation. The power of the C++ class was clear; by allowing us to add
data types to the programming language, the class mechanism makes C++ a truly
extensible language. Encapsulation, which is one part of class design, is a
more rigorous form of that old chestnut--informationhiding--and everyone knows
how good that is.
I relearned that lesson every time I wrote a program. When you encapsulate the
implementation and interface of a class, the class is more rugged and the
program is more reliable. But that lesson, although learned by example and
practice, does not persist willingly, at least not in my doghouse. I know the
lesson, I teach it, I consult for clients and recommend that they apply it,
and sometimes I forget to use it in my own programs.
Case in point. Download and study the source code for the first version of
Quincy 96, and you will find a conspicuous dearth of encapsulation.
Quincy 96 is a Windows 95 IDE front end to the gnu-win32 port of the GNU C and
C++ compilers. As such, Quincy 96 includes an editor, project manager,
compiler driver, and debugger. I've discussed these features and their
implementations in recent columns. During the development of Quincy 96, this
dog learned a lot of new tricks, mostly about Windows 95 programming and MFC.
But a close examination of the source code reveals that, while I used the
application/document/view class architecture of MFC, I eschewed the use of
encapsulation in my own design unless it was handed to me on a platter. Of the
four components, only the editor and project manager enjoy encapsulated
implementations because they derive from MFC's CEditView class for editing and
CListView to list the source-code files in a project. The other two
components, the compiler and debugger, are intertwined in the application
class in exactly the fashion that you would expect to see employed by an old C
programmer. The data members are, essentially, static and global because all
the application's functions can see them and because the member objects
persist for the life of the application, needing individual initialization for
each new compiler or debug session.
And the program does not work all the time. And I could not figure out why.
Something should have perched on my shoulder and yelled "Encapsulate, you
dog!" in my ear. Nothing did.
The compiler and debugger components have similar operations up front. They
both launch other programs and they both watch those programs execute. I used
the application class's OnIdle function to watch those executions. In response
to the column that explained how I implemented the compiler and debugger,
NuMega's John Robbins, who is a debugger expert in his own right, wrote to
suggest that I use a Win32 thread and some events to operate and manage the
debugged program from inside the IDE. He kindly provided some pseudocode,
which sent me scrambling to the dreaded MFC docs to figure out how threads and
events work.
Another reader suggested that I use the WaitForSingleEvent function to
determine that a compiler program has completed. I implemented both
suggestions, and the program worked better, but there were still several
predictable ways to crash not only Quincy 96, but Windows 95, too, during a
debug session. Some of the crashes involved the user doing unexpected things
to console windows--such as closing them using the X button on the title bar
while the debugger thought the program was still running. Other crashes
occurred following a set sequence of operations with the debugger in a set
sequence of debugging sessions.
Something landed on my shoulder and whispered in my ear, "One of those many
data members in that complex and tightly coupled design is not getting
properly initialized between sessions." From my other shoulder, finally, came
the yell, "And guess why? You dog!"
Every old dog programmer knows that you can make some bugs go away simply by
terminating the program and starting with a clean copy--like a new flea
collar. The program will run at least one iteration without a problem. The bug
comes later in subsequent iterations of the same execution, when the flea
collar starts to show some wear. This is one problem that encapsulation
specifically attacks. An object, properly constructed when instantiated and
properly destroyed when it exits scope, exhibits the same behavior as a
program that runs only one iteration. Objects of a reusable class do not have
to be reexecutable. Finding a solution to my problem became obvious.
Encapsulate those independent components of the IDE into classes, instantiate
objects of those classes only while using the components, and the problems
stemming from component reexecution will disappear. It's as if the dog gets a
new flea collar for each new flea.
The new version of Quincy 96 encapsulates the debugger operations into a
Debugger class, and the problems are mostly cleared up. I have not
encapsulated the compiler operations because they work, and I hesitate to mess
with something that works. But once again, I have relearned the encapsulation
lesson, one that should never have been forgotten.
Now let's ask why some of us keep writing code the old-fashioned way. Having
given it a great deal of thought, and being strongly motivated to attribute
the trait to something more acceptable than an inability to learn new tricks,
I have reached the following conclusion: We always underestimate the size and
complexity of an unwritten program. In our minds, the code is going to be
trivial because we have a clear picture of the process. This is particularly
true of today's interactive GUI programs that are mostly user interface
modules using canned component objects. Now, because we view the program as
being small, we want to get it all down in as few source code files as
possible. Seems reasonable. Each new class is a bother because it means a new
header file, a new implementation file, and all those pesky interface
functions. First we have to create a couple of new source-code files. Then we
have to keep loading them into the editor to add functionality and to see how
the classes work in the rest of the program. Since the program is going to be
small, why not keep all that stuff together? Only later, when the program
becomes large and unwieldy, do we rue that decision. I'm not saying that it
happens every time. I'm not saying that it happens to everyone. I'm not saying
that it happens to me every time. I'm just saying that it happens; therefore I
am not an old dog. Woof.


Herman: An Odyssey


Herman is a new "C Programming" column project. Herman is a small document
viewer program written in Visual C++ with MFC. I call it "Herman" because I
doubt that anyone would want that name for a commercial software product
title, and I am, therefore, unlikely to be treading on some obscure trademark.
(That is precisely the reason I did not call it "BookWorm," my first choice.
Someone is already using that moniker for a multimedia viewer program.) If, as
can happen, I turn out to be mistaken, please let me know, and I will, as
usual, wimp out and change the name to something else.
Herman's purpose is to display the contents of three books that I wrote. You
will recall that Quincy 96 supports a C/C++ programming tutorial CD-ROM that I
am developing to be published by Dr. Dobb's Journal. The lessons and exercises
are taken from those books, and one of the features of the CD-ROM is that
students will be able to read the full text of the books onscreen. The
tutorial program, written in Asymetrix OpenScript, calls Herman with DDE
protocols. It tells Herman to open a particular chapter in one of the books,
and to page to and display a specified topic. That way, students can navigate
easily between the interactive tutorial and the books. Once in the Herman text
viewer, students can use the menus and tables of contents to read anything
that interests them.
Herman is a general-purpose viewer, not hard-coded to the three books that I
am using it for. When it starts up, the program scans the current subdirectory
for files with a .toc extension. The names of those files, which use long
filenames with spaces, are the same as the titles of the books that they
represent. Herman uses the names of these files for two purposes: to enumerate
the books on a menu and to derive the name of a subdirectory that contains the
book's text files.
In addition to providing the book's name, the .toc file contains a list of
files that represent the chapter text, and an indented table of contents for
the entire book. Figure 1 is an abbreviated version of the file named "Al
Stevens Teaches C.toc," which, for this example, has only a preface and two
chapters. From Figure 1 you can deduce that there are three text files (named
preface.rtf, chapter1.rtf, and chapter2.rtf) and that those files are in a
subdirectory named "Al Stevens Teaches C."
Herman's text files use the rich-text format. This was an obvious choice for
three reasons: First, I already have the text files in Word for Windows
format, with graphics, paragraph styles for text, several levels of headings,
italics, and boldface already in place. Second, Word for Windows can export a
document to RTF. Third, MFC includes a custom edit control that supports RTF.
Finding a particular paragraph in a book is a straightforward process. Herman
accepts the book title, chapter name, and paragraph heading as arguments. The
program opens the book's .toc file if necessary, then associates the chapter
name with its file name (based on the position of the text among the other
chapter names, and the position of the filename in the filename list). The
program opens the chapter file and searches the text for a match on the
paragraph heading. The matching text must consist of a single line in the
heading text color. This method assumes that chapter headings are unique
within the chapter. If time permits, I'll add hypertext links and a search
engine. Time, however, draws short due to problems I am about to describe.
I hope that Herman evinces all the sound design virtues extolled in my
discussions on Quincy 96. The real story, however, is found not so much in how
the final program works, as in what I went through to get it to work. The
Odyssey begins.
Figure 2 displays Herman with an open chapter. Refer to that figure as you
read the story of how it came to be.
Herman is not a typical MFC document/view program. Nor is it an OLE server or
container. You don't open files or edit the text. The chapters are self
contained, so there are no embedded objects. There is no Herman document, per
se, that can be embedded in other applications. For those reasons, I decided
not to use the MFC document/view architecture. But I wanted the MFC
application architecture to handle menus, window titles, and so on.
Consequently, I used the Visual C++ Developer Studio to build an SDI program
with no bells and whistles. Then I removed the default-derived CDocument and
CView classes, took the command line stuff out of the overloaded InitInstance
function, instantiated an object of the CMainFrame class, and put a call to
the Create function into its constructor. Now I had a skeleton application
with nothing in it.
Next I derived a class from CTreeCtrl to manage the table of contents and a
class from CRichEditCtrl to be the text viewer. I put instances of these
classes into the CMainFrame class and allowed them to share space in the
CMainFrame object's client window. Observe Figure 2 again. Herman uses a
so-called "splitter" window. You can slide the splitter back and forth to
change the ratio of the two views in the main frame window. I was unable to
use the MFC CSplitter class to implement the splitter in the first version of
the program because that class depends on an orderly document/view
architecture, which I had eliminated. No problem. I instantiated two bordered
windows as children of CMainFrame to represent the panes. Then I implemented
my own splitter window by intercepting mouse drags and resizing those two
panes. That worked great, although the confluence of the two window borders is
not as slick-looking as MFC's splitter bar (shown in Figure 2). 
Onward to implementing the text viewer.... I tried to use MFC's CRichEditView
class, but ran into the first wall. Objects of CRichEditView insist on being
views of document classes. So I created a document class from CDocument and
found that CRichEditView insists on being a view of a CRichEditDoc class.
CRichEditDoc is buried in the MFC class hierarchy way down under all the OLE
stuff, which I didn't want. I subsequently decided to throw out the
CRichEditView code and use only a CRichEditCtrl object for the text viewer.
That choice involved a day of fooling around with that class's StreamIn
function and all its stupid callback nonsense. The MFC docs are wrong in their
discussion of the return value from the callback function. Eventually, I tried
enough different combinations, and the text viewer worked.
Next came the table of contents. Following my earlier success with using a
control as a child of the main frame window, I simply used the CTreeCtrl class
as the base class for a table of contents class. Everything worked like a
charm. I added the DDE stuff to allow the tutorial to call the program
requesting a particular paragraph in a particular chapter of a particular
book. I added the navigation code that chooses the right text when the user
expands and selects items in the table of contents, and all was well.
Until the other shoe dropped. 
I had been testing with small files of text from the first draft of a
manuscript. Just to be sure everything worked, I built a full chapter that
included figures and loaded the chapter into Herman. Guess what? Everything
didn't work. Amazingly, the CRichEditCtrl class totally ignores graphics. The
graphical metafile text is in the .RTF file, but apparently the control
bypasses it when loading data. As an experiment, I built a quick-and-dirty MFC
application with all the OLE stuff and the CRichEditView and CRichEditDoc
classes, loaded the .RTF file, and the graphics displayed just fine.
I rushed to the MFC forum on CompuServe and posted an urgent message asking
anyone to tell me how to get CRichEditCtrl to display graphics. That was two
weeks ago. I'm still waiting for at least an acknowledgment, not to mention an
answer. Forget CompuServe if you need help with MFC.
Much time and effort was burned as I tried many different combinations. It was
clear that I had to use CRichEditView and CRichEditDoc if I wanted graphics in
my displays. I built a single-document version of the application without the
table of contents and got it working. Then I added the CSplitter object to the
main frame window and put CreateStatic and CreateView calls into the main
frame's OnCreateClient function, the way the docs tell you to do. Now I needed
to add the table of contents again. It seemed obvious that I should use the
CTreeView class since the editor needed document/view support. I'll jump over
all the false starts and tell you this: If you have a single-document
application and one of the views uses CRichEditView, forget using any other
canned view. As near as I can tell, CRichEditDoc doesn't care to host any view
that is not derived from CRichEditView.
Just as I was about to surrender, Addison-Wesley sent me a copy of a new book
called MFC Internals written by fellow DDJ columnists Scot Wingo and George
Shepherd. At first glance, it looks very good. I looked into what they said
about the CSplitter class and read that a pane can host a window class other
than one derived from CView. That information got me pointed in the right
direction at last.
I tried using my CTreeCtrl-derived table of contents class as the table of
contents pane, which worked in the text-only version of Herman. Everything
compiled and ran, but the table of contents did not display anything and
appeared to be empty. As an experiment, I instantiated a second object of the
class, superimposed it over the dynamically created one, and the second tree
displayed, but the CSplitter object would not allow this second object to
receive any mouse clicks. This problem drove me to extremes; I used the
Developer Studio debugger to trace through MFC code and observe the
differences between how the two objects were treated. I saw that the CSplitter
class's dynamic creation of the derived CTreeCtrl class causes the new object
to lose its identity in MFC's kludgy pre-RTTI type-identification
implementation. Without its identity, the object refused to accept and process
the message that was supposed to add text to the control's tree-data
representation.
By this time, I was slathering and babbling and running around outside and
beating my head against the fender of my mid-life Chrysler. Judy came running
out and shook me and slapped me severely several times to make me regain my
composure and come to my senses. She seemed to enjoy it. If memory serves, she
was smiling. I did not say, "Thanks, I needed that."
Today, calmed and with the restraints removed, I reached the final solution,
and it seems simple now that I know it. Example 1 shows the main frame's
OnCreateClient function, which associates the table of contents and editor
views with the main frame's splitter window. The CTocWnd class is derived from
CWnd and has an embedded CTreeCtrl object to implement the table of contents.
To permit that object to receive keystrokes and mouse clicks, I had to make
the CTreeCtrl window a child window of the CSplitter window. To allow the
CTreeCtrl object to communicate with the text display, I had to let the main
frame window receive the WM_NOTIFY messages that the CTreeCtrl object sends to
report its activity. That was all there was to it. Today was a good day.


Howland OWL


You may recall that a while ago I got embroiled in controversy when I stated
my preference for MFC over OWL. The OWL mavens came from everywhere to defend
their favorite framework. Watch what happens now. Some smart OWL programmer is
bound to write and tell me that not only are all of Herman's features possible
by writing about three lines of OWL code, but that the OWL documentation
includes comprehensive explanations and at least 12 different examples of
precisely what I was trying to do. Make me howl and bark.



About: Much Ado About Nothing


When you use the Visual C++ Developer's Studio to construct the skeleton of a
new application, the menu always includes a Help popdown with an About
command. The source-code file for the implementation of the derived CWinApp
class includes about 50 lines of generated source code to declare a CAboutDlg
class complete with an empty message map, a do-nothing DoDataExchange
function, lots of comments, and an OnAppAbout command function to instantiate
and execute an object of the class when the user chooses the About command.
All that code clutters the program because it is unnecessary. Example 2 shows
the minimum code needed to achieve precisely the same effect.
The implementation in Example 2, which I installed in Quincy 96 and Herman,
supports a dumb About dialog box--one that simply displays itself and waits
for the user to close the box, which is exactly the kind of default About
dialog box that the Developer's Studio generates. If you want a more
intelligent About dialog box, you should start with what the Developer's
Studio generates and build upon that code instead of using the minimum
implementation that Example 2 shows.


Source Code


The source code files for Quincy 96 and Herman are free. You can download them
from the DDJ Forum on CompuServe and on the Internet by anonymous ftp; see
"Availability," page 3. To run Quincy, you'll need the GNU Win32 executables
from the Cygnus port. They can be found on ftp.cygnus.com in pub/sac. Get
Quincy 96 first and check its README file to see which version of gnu-win32
you need. Every time they release a new beta, I have to make significant
changes to Quincy 96. As I write this, the latest beta is Version 13 and
Quincy 96 works with Version 10.
If you cannot get to one of the online sources, send a 3.5-inch high-density
diskette and a self-addressed, stamped mailer to me at Dr. Dobb's Journal, 411
Borel Avenue, San Mateo, CA 94402, and I'll send you the Quincy and Herman
source code (not the GNU stuff, however--it's too big). Make sure that you
include a note that says which programs you want. The code is free, but if you
care to support my Careware charity, include a dollar for the Brevard
Community Food Bank. 
Figure 1: Abbreviated version of "Al Stevens Teaches C.toc."
preface.rtf
chapter1.rtf
chapter2.rtf
<end>
Table of Contents
 Preface
 Chapter 1. An Introduction to Programming in C
 A Brief History of C
 C's Continuing Role
 Chapter 2. Writing Simple C Programs
 Your First Program
 Identifiers
 Keywords
 Variables
 Characters
 Integers
 Constants
 Expressions
 Assignments
 Summary
Figure 2: The Herman application.
Example 1: CMainFrame::OnCreateClient function.
BOOL CMainFrame::OnCreateClient(LPCREATESTRUCT,CCreateContext* pCtxt)
{
 m_wndSplitter.CreateStatic(this, 1, 2);
 m_wndSplitter.CreateView(0,0,RUNTIME_CLASS(CTocWnd),
 CSize(130, 50),0);
 m_wndSplitter.CreateView(0,1,RUNTIME_CLASS(CViewView),
 CSize(0,0),pCtxt);
 m_pTocWnd = (CTocWnd*)m_wndSplitter.GetPane(0,0);
 m_pViewer = (CViewView*)m_wndSplitter.GetPane(0,1);
 return TRUE;
}
Example 2: Minimum OnAppAbout implementation.
void CYourApp::OnAppAbout()
{
 CDialog aboutDlg(IDD_ABOUTBOX);
 aboutDlg.DoModal();
}







































































JAVA Q&A


How Do I Write a Chat Program?




Cliff Berg


Cliff is vice president of technology at Digital Focus. He can be contacted at
cliffbdf@digitalfocus.com. Submit your questions to the Java Developer FAQ Web
site at http://www.digitalfocus.com/faq/.


Chat programs allow people to communicate simultaneously across a network.
Everything you type into a chat window is broadcast to the screens of all
other users currently logged into the program. Chat programs originated with
simple text-only capabilities (like the UNIX talk program) and have evolved
into elaborate graphical environments. Unfortunately, everyone who wants to
"chat" has had to obtain and install compatible chat programs. Java solves
this problem by making it possible for chat programs to be hosted on a Web
site and downloaded for execution with the click of a button--no installation,
no purchase. Just click and start chatting!
Chat programs are genuinely useful. Many chat programs allow discussions to be
saved in a log file, as a record of points made or decisions reached (or jokes
told). In short, chat programs are fun, but they can also be used to do real
work. We've used a chat program to coordinate the activities of software
developers separated by continents, and without adding a cent to our phone
bill! (One relative of mine even uses a chat program to communicate with
family in Rangoon.)
Real chat programs have features such as private chat "rooms," special
identities, and so on. I'll ignore these embellishments to demonstrate the
fundamental techniques.
The two basic chat architectures are peer-to-peer and client/server. A
peer-to-peer design requires no server program, but is far more complex to
implement. A client/server design requires a chat "server" program that serves
as a switchboard or data-routing center for all ongoing chat sessions.
In this case, the server program runs on a UNIX Web server, since this is the
typical configuration. The Java chat program can then be hosted on a Web page
on the server machine, with the chat program server running on the same
machine. In essence, to make this work, you need:
A chat server program.
A Java chat client program.
An HTML page that loads the Java chat client program.
When someone goes to the HTML page that has the Java chat client on it, the
Java client is automatically fetched and run on the user's machine. The client
program will then set up a communication session with the chat server program,
which must be running all the time on the Web server. Once the session is
established, users can then join any ongoing chat session.
Figure 1 illustrates the basic architecture of the system. All data
(keystrokes) entered by any client is sent to the server, which then
rebroadcasts the data to all clients. It is conceptually simple, and thanks to
Java's streamlined communications-programming classes, is surprisingly
straightforward to implement.
There is a hitch: Most Web servers still do not have Java available on them,
and Java is needed to run the server component if it, too, is in Java.
Therefore, we'll write the server in C++, and the client in Java. You can then
put the client on a Web page: The Web server does not need to understand Java
to host Web pages with Java on them. The only thing that is required is that
the client platform (typically Netscape) be able to run Java.


The Java Client


The chat client has three main classes: ServerChat, UserChat, and Chat, the
last of which is an applet class that instantiates the other two. ServerChat,
a thread class, is capable of running asynchronously in its own thread. This
is good because its main job is to monitor the connection with the chat server
program across the network and handle asynchronous incoming data from the
server. UserChat, on the other hand, is the component that handles
asynchronous data (input characters as they are typed) from the user. UserChat
forwards these characters to the server program over the communication line.
The most complex part of the program is the ServerChat class. Yet Java makes
even this component easy to write compared to an equivalent program in C. You
don't have to worry about things like network byte order or any details
usually required for TCP/IP programming.
ServerChat has a constructor and two primary methods: readln(); and run().
ServerChat extends the TextArea class, so it inherits all of the behavior and
features of a Java TextArea, which is a scrollable text window. Data retrieved
from the server is displayed in this window. ServerChat adds an important
feature to TextArea--a thread. The constructor for ServerChat creates a new
thread for itself to run in; later on, when the applet initializes itself, it
will call the thread's run() method (actually it calls start(), which then
calls run()), which starts the thread running.
ServerChat's run() method loops indefinitely, blocking on a read on the
communication link. Whenever a character is received, the thread resumes and
adds the incoming data to its text area by calling insertText(), automatically
scheduling a repaint of the text area. Thus, incoming data is asynchronously
displayed on the screen.
Retrieving data from the server link is achieved by the readln() method, which
performs a read on the socket's input stream and tests for the case in which
there is no data to read. It returns any retrieved data as a Java string
value.
The UserChat class also extends from TextArea. Thus, user input is entered
into this text area (one window), but data received from the server is
displayed in ServerChat's text area (another window). If you prefer a single
window, you can refrain from making ServerChat extend TextArea, and modify
ServerChat to do its insertText() on the UserChat window instead.
UserChat has one primary method, write(). It converts a Java string to an
equivalent stream of ASCII bytes, sending them to the server on the socket's
output stream.
The Chat class has a constructor and three primary methods in addition to a
main method for running this applet as a stand-alone program from the command
line. The three methods are init(), makeConnection(), and handleEvent().
init() establishes a connection with the chat server program on the Web server
by calling makeConnection(). If it succeeds, it then starts the ServerChat
thread.
Java has built-in support for TCP/IP. This is implemented in the Socket and
related classes. To establish a TCP/IP connection with a remote server, you
create a Socket object. To send and receive data across that socket, you get
the input and output stream handles of the socket. This is what
makeConnection() does, along with providing a little user-friendly echoing of
its progress along the way, including the remote and local IP addresses (proof
that a connection was made).
Finally, handleEvent() traps all GUI-related events that occur within the
applet, then checks for a key-press event. When this occurs, the keystroke is
inserted into the UserChat text area, and the keystroke is forwarded to the
chat server program for rebroadcast to all other listening chat clients.
handleEvent() must have some special code to work around a bug in the Java
toolkit. The period character (".") does not generate a KEY_PRESS event when
entered in a TextArea, like other characters do, so you have to watch for its
KEY_RELEASE event instead. (Using KEY_PRESS is better for other characters,
because characters seem to be converted to uppercase by the time KEY_RELEASE
is generated.) A practical chat program will also handle backspacing (this one
does not) and multiple-character events (the key held down).
An alternative implementation would be to encapsulate the keystroke handling
in the UserChat class. However, this has a disadvantage: The UserChat text
area would always have to have focus to receive the input (an annoyance to
users). If the applet processes the keystroke events instead, then it does not
matter which subcomponent has focus, since the events will always bubble up to
the applet that owns them.


The Server Program


Because Java has yet to be supported on most Web servers, we will provide a
C++ implementation of a chat server program. (We used the gcc GNU C++
compiler.) The program must be running to be accessible; your Webmaster can
explain options for having the program run all the time. Alternatively, you
can simply log on with Telnet and start it by hand at any time. If you are
lucky enough to have a dedicated Internet connection, you can install the
server program on your own machine.
It is interesting to compare the C++ program with the Java program. While they
do different things, they are not too different in the functions they perform.
However, the Java program is easier to read and understand.


The Applet HTML


To use the chat program, all you need to do is go to a Web page that hosts the
client on a server running the chat server. Example 1 is the HTML code the Web
page should contain. The [Text Chat Applet Would Run Here] is for those
forlorn surfers who do not have a Java-capable browser. They will see this
message, and dream about all the applets that they are missing. If you can run
Java, you will see the chat applet with its two text area windows. For the
browser to find the chat client's Java classes, the classes merely need be in
the same directory on the server as the HTML page. Otherwise, you will have to
include a codebase parameter in the applet tag specifying the directory or URL
where the classes are to be found.
The Java client and C++ server source code is available electronically; see
"Availability," page 3. Source code for a more advanced chat
program,"Whiteboard," is available at (http://www.digitalfocus.com/
ddj/code/). It uses the same principles to implement a graphical whiteboard
that many people across the Internet can draw on simultaneously.
Figure 1: Basic architecture of the chat system.

Example 1: HTML code for a web page hosting the chat client.
<html><head><title>Chat Client</title></head><body bgcolor="#ffffff">
<applet code="Chat.class" width=600 height=400>
<h2>[Text Chat Applet Would Run Here]</h2>
</applet>
</body></html>

























































ALGORITHM ALLEY


Building Decision Trees with the ID3 Algorithm




Andrew Colin


Andrew is a mathematician and proprietary trader in the Treasury, Commonwealth
Bank of Australia, Sydney. He can be reached at 100036.1647@compuserve.com.


From network routers to games, decision making is a crucial part of many
programs. In this column, Andrew deals with decision making at two levels.
Most obviously, he is interested in constructing binary decision trees.
Perhaps more interestingly, the algorithm Andrew discusses uses a metric to
decide how to build the tree. This particular metric is grounded in the
mathematics of information theory, but other decision algorithms use more ad
hoc measurements, such as those used to evaluate moves in game-playing
programs. If you'd like to write an article about an interesting metric or
other decision method, contact me at Dr. Dobb's Journal.
--Tim Kientzle
A decision tree is a series of nested If/Then rules that users apply to data
to elicit a result. For instance, a decision tree to assist in the diagnosis
of automobile engine trouble might start out with general questions ("Does the
engine start?") and use the outcome to determine which subsequent questions
are asked: 
1. Did the engine turn over? No.
2. Is the battery charged? Yes.
3. Is there a loose battery connection? Yes. 
A decision tree is usually a straightforward tool to use. However, inducing
rules from sets of past decisions to build such decision trees is a difficult
problem, especially when we want to reach a useful answer in as few questions
as possible.
In this article, I'll present one way of building such decision trees using
the ID3 algorithm, originally developed by J. Ross Quinlan of the University
of Sydney. This algorithm was first presented in 1975 (J.R. Quinlan, Machine
Learning, vol 1. no. 1) and has proven to be so powerful that it has found its
way into a number of commercial rule-induction packages. However, the
underlying concepts are simple enough for ID3 to be implemented in only a few
pages of code.
How does ID3 work? Suppose you have the animals ostrich, raven, albatross, and
koala. For each one, you know the following attribute values: warm blooded,
has feathers, has fur, can swim underwater, lays eggs. Based on these
attributes, you are asked to find a set of rules to determine whether the
animal lays eggs. In this case, a classifying attribute is readily apparent:
If the animal has feathers, then it lays eggs. Consequently, the decision tree
has just one question: "Does the animal have feathers?" If you add dolphin and
crocodile to the list, this rule is incorrect, since the crocodile does not
have feathers, but does lay eggs. By inspecting Table 1, you can amend this to
the rules in Figure 1(a).
However, there are a number of ways to elicit the correct answer from a
dataset. In Figure 1(b), the first question in the tree ("Does the animal have
fur?") is poorly chosen, since it gives you little information about the
answer. Clearly, some questions are more useful in this context than others.
For more realistic datasets--where there are tens of attributes and thousands
of samples--a well-chosen set of questions is of critical importance to the
effectiveness of the tree.
The ID3 algorithm searches through the attributes of a dataset for the one
that conveys the most information about the desired target. If the attribute
classifies the target perfectly, then you stop. Otherwise, the process is
repeated recursively on the two subsets generated by setting this "best"
attribute to their on/off states.
Information theory provides one way to measure the amount of information
conveyed by a particular attribute. The best attribute is the one with the
lowest "negentropy," as defined in Figure 2. The ID3 algorithm picks the
attribute with the lowest total negentropy.
To illustrate, I'll calculate the negentropies generated by Table 1. For
example, there are five animals in the sample that are warm blooded, of which
three lay eggs. The negentropy of the "on" (true) attribute is NE_on = -(3/5)
log2 (3/5) - (2/5) log2 (2/5) = 0.970951. There is one animal in the sample
that is not warm blooded but does lay eggs. The negentropy of the "off"
(false) attribute is NE_off = -(1/1) log2(1) - 0 = 0. The combined negentropy
is the weighted sum (5 * NE_on + 1 * NE_off) / (5 + 1) = 0.809125. Table 2
shows the negentropies for all of the attributes.
None of the attributes has a negentropy of zero, so none classifies all cases
correctly. However, the "feathers" attribute has the lowest negentropy and
hence conveys the most information. This is what we use as the root attribute
for our decision tree.
The process is now repeated recursively. All animals that have feathers lay
eggs, so the negentropy for this subset is zero and there are no further
questions to ask. When the animal does not have feathers, you compute the
negentropies using only those animals to produce Table 3. The attribute to use
for the subtree is therefore "warm blooded," which has the lowest negentropy
in this sample. The zero value indicates that all samples are now classified
correctly.
The algorithm can also be used for real-valued, rather than binary-valued,
attributes. At the outset of processing, you scan all of the variables to
select an attribute i and a value r so that the condition Attribute(i) >= r
yields the lowest negentropy possible. The "yes/no" question is now of the
form "Is attribute greater than or equal to r?" If the negentropy arising from
this partition is zero, then the data was classified perfectly, and the
calculation ends. Otherwise, the dataset is partitioned into two sets--one for
which the first condition is true, and the other for false.
The process is repeated recursively over the two subsets until all
negentropies are zero (perfect classification) or until no more splitting is
possible and the negentropy is nonzero. In this case, classification is not
possible and you must solicit more data from the user. Such a situation could
occur when all attributes in two records have the same values, but the
outcomes differ. More data must then be supplied to the program to allow it to
discriminate between the two cases.


Tree Pruning


Most real-world datasets do not have the convenient properties shown here.
Instead, noisy data with measurement errors or incorrect classification for
some examples can lead to very bushy trees in which the rule tree has many
special cases to classify small numbers of uninteresting samples.
One way to address this problem is to use "rule-tree pruning." Instead of
stopping when the negentropy reaches zero, you stop when it reaches some
sufficiently small value, indicating that you are near the end of a branch.
This pruning leaves a small number of examples incorrectly classified, but the
overall structure of the decision tree will be preserved. Finding the exact,
nonzero cutoff value will be a matter of experimentation.


Implementing Decision Trees


In implementing the decision tree described here, I've represented it within
the program (see Listing One, as a binary tree, constructed of NODE structures
pointing to other NODEs or to NULLs for terminal nodes. 
Rather than copying the data table for each partition, I pass the partially
formed data tree to the routine that calculates negentropy, allowing the
program to exclude records that are not relevant for that part of the tree.
Negentropy of a partition is calculated in routine negentropy (see Listing
Two), which is called for all attribute/threshold combinations by routine ID3
(Listing Three).
The ability to use real-valued as well as binary-valued attributes comes at a
price. To ensure the correct value of r, we scan through all attribute values
in the dataset--a process that can be quite computationally intensive for
large datasets.
No claims are made for the efficiency of this implementation. For cases where
many sample attribute values are the same, or where a mixture of real-valued
and binary-valued attributes is to be considered, the user is probably better
advised to sort the attributes into a list and to eliminate repeated values.
I've also not considered the case where a question can have more than two
outcomes.
Two illustrative datasets are available electronically; see "Availability,"
page 3. The first is a set of sample data from a botanical classification
problem, in which a type of flower, an iris, is to be classified into one of
two subgenera (Virgin, Setosa) according to the dimensions of sample pistils
and stamens. The data is taken from M. James' book Classification Algorithms
(Collins, 1985). Figure 3 shows the resulting decision tree.
The second dataset is a torture test for the algorithm. Given the attributes
{John, Paul, George, Ringo}, which are random numbers between 0 and 1, and a
target attribute that takes random values from {0, 1}, the program returns a
large and complex tree that classifies 100 examples. On a 486/75, the
calculation took about 220 seconds to run to completion. Most real-world
datasets will produce much simpler decision trees than Listing Four.
The code (available electronically) includes the files ID3.C, ID3.H, PROTO.H.
To build this program with Borland's Turbo C compiler, enter tcc id3.c at the
DOS prompt. The code has also been compiled under Watcom C and Microsoft
Visual C, so it should be fairly portable. To run the program, two datasets
are required. The first contains the sample data in ASCII form, with values
for the target attribute in the last column (0 or 1). The second dataset
contains the names of the attributes, again in ASCII. For the iris dataset,
these files have the names IRIS.DAT and IRIS.TAG. Enter id3 iris at the
command prompt to run the program.


Conclusion


ID3 is a conceptually simple but powerful classification algorithm. Used in
conjunction with other statistical and machine-learning techniques, this
algorithm will form a valuable addition to your armory of data exploration
tools.
Table 1: Various animal attributes.

Animal Warm blooded Feathers Fur Swims Lays eggs
Ostrich Yes Yes No No Yes
Crocodile No No No Yes Yes
Raven Yes Yes No No Yes
Albatross Yes Yes No No Yes
Dolphin Yes No No Yes No
Koala Yes No Yes No No
Table 2: Negentropies for each attribute from Table 1.
Attribute on_ctr Hits Off_ctr Hits Negentropy
Warmblooded 5 3 1 1 0.809125
Feathers 3 3 3 2 0.459148
Fur 1 0 5 4 0.601607
Swims 2 1 4 3 0.874185
Table 3: Negentropies of each attribute for animals that do not have feathers.
Attribute on_ctr Hits Off_ctr Hits Negentropy
Warmblooded 2 0 1 0 0.0
Feathers 0 0 3 1 0.918296
Fur 1 0 2 1 0.666666
Swims 2 1 1 0 0.666666
Figure 1: Two different decision trees to answer the question: Does the animal
lay eggs?
(a) Does animal have feathers?
 Yes: Lays eggs (raven, albatross, ostrich)
 No : Is animal warmblooded?
 Yes: Does not lay eggs (koala, dolphin)
 No: Lays eggs (crocodile)
(b) Does animal have fur?
 Yes: Does not lay eggs (koala)
 No: Does animal have feathers?
 Yes: Lays eggs (raven, albatross, ostrich)
 No: Is animal warmblooded?
 Yes: Does not lay eggs (dolphin)
 No: Lays eggs (crocodile)
Figure 2: Definition of negentropy. p(ON) and p(OFF) are the measured
probabilities of an answer being true or false.
If p(ON) and p(OFF) are both nonzero:
 -p(ON) log2 p(ON) - p(OFF) log2 p(OFF)
Otherwise:
 0
Figure 3: Decision tree for irises.
Is the pistil width >= 1.80?
 Yes: class Setosa
 No : Is the pistil length >= 5.60?
 Yes: class Setosa
 No : class Virgin

Listing One
typedef struct node {
 UINT idx; /* ID code for attribute */
 REAL threshold; /* Numerical threshold for attribute test */
 struct node *on; /* Address of 'on' node */
 struct node *off; /* Address of 'off' node */
 struct node *parent; /* Addess of parent node */
} NODE;

Listing Two
NEGENTROPY negentropy ( REAL **data, UINT n_samples, NODE *local, UINT target)
{
 /* Calculates the entropy of classification of an attribute, given a table 
 * of attributes already used, the attribute on which splitting is to be
 * taken, and the target attribute. Entropy is calculated in bits, so logs

 * are taken to base 2 by dividing by LN_2.
 * The returned value always lies in the (closed) range [0, 1].
 */
 NEGENTROPY ret_val;
 NODE *_node, *_parent;
 UINT on_ctr, off_ctr, p1, p2, i, _match;
 REAL p_on, p_off, negentropy_on, negentropy_off;
 on_ctr = off_ctr = p1 = p2 = 0;
 /* Scan through all supplied data samples */
 for (i=0; i<n_samples; i++) {
 /* If pattern matches the current position in the decision tree, then use
 * this vector. The match is determined by passing up the decision tree
 * and checking whether 'data[idx] >= threshold' matches at each step, 
 * where idx and threshold are taken from each node in turn.
 */
 _match = 1;
 _node = local;
 while (_node->parent != NULL) { /* If at root node, all entries match */
 _parent = _node->parent;
 if (_node == _parent->on) { /* if parent node is ON */
 if (data[i][_parent->idx] < _parent->threshold)
 _match = 0;
 }
 else
 if (_node == _parent->off) { /* if parent node is OFF */
 if (data[i][_parent->idx] >= _parent->threshold)
 _match = 0;
 }
 _node = _parent;
 }
 if (_match) {
 if (data[i][local->idx] >= local->threshold) {
 on_ctr++;
 if (data[i][target] >= 0.5)
 p1++;
 }
 else {
 off_ctr++;
 if (data[i][target] >= 0.5)
 p2++;
 }
 }
 } /* for (i=0; i<n_samples; i++) */
 /* 1: Entropy of subtable with activation ON */
 /* We now have the numbers of samples that match this part of the decision
 * tree, & the number of samples for which the supplied condition are true.
 * From these quantities we can find the negentropy of this partition.
 */
 if (on_ctr > 0)
 {
 p_on = (REAL) p1 / (REAL) on_ctr;
 p_off = 1 - p_on;
 negentropy_on = -entropy (p_on) - entropy (p_off);
 }
 else
 negentropy_on = 0.0;
 /* 2: Entropy of subtable with activation OFF */
 if (off_ctr > 0)
 {

 p_on = (REAL) p2 / (REAL) off_ctr;
 p_off = 1 - p_on;
 negentropy_off = -entropy (p_on) - entropy (p_off);
 }
 else
 negentropy_off = 0.0;
 ret_val.ne = (negentropy_on * on_ctr + negentropy_off * off_ctr);
 ret_val.ne /= (on_ctr + off_ctr);
 /* If all values examined were the same, set 'ret_val.status' to
 * the target value since this will be an end-of-branch node
 */
 if ((p1 == on_ctr) && (p2 == off_ctr))
 ret_val.status = ON;
 else if ((p1 == 0) && (p2 == 0))
 ret_val.status = OFF;
 else
 ret_val.status = INACTIVE;
 return ret_val;
}

Listing Three
NODE* ID3 ( MATRIX *matrix, NODE* parent, UINT target, UINT state)
/* Routine to build a decision tree, based on Quinlan's ID3 algorithm. */
{
 NEGENTROPY negentropy_struct;
 NODE *node;
 UINT n_vars = matrix->width, n_samples = matrix->height, i, j, split;
 REAL **data = matrix->data;
 REAL best_threshold, min_negentropy, _negentropy;
 /* Allocate memory for this node */
 node = (NODE*) malloc (sizeof(NODE));
 if (!node)
 err_exit (__FILE__, __LINE__);
 /* Set up links in decision tree */
 node->parent = parent; /* Set address of parent node */
 if (parent != NULL) /* parent to child; not relevant for root node */
 {
 /* Pass address of this node to the parent node */
 if (state == ON)
 parent->on = node;
 else
 if (state == OFF)
 parent->off = node;
 }
 /* Select attribute with lowest negentropy for splitting. Scan through
 * ALL attributes (except target) and ALL data samples. This is inefficient
 * for data sets with repeated values, but will do for illustrative purposes
 */
 min_negentropy = 1.0;
 for (i=0; i<n_vars; i++) {
 for (j=0; j<n_samples; j++) {
 if (i != target) {
 /* Set trial values for this node... */
 node->idx = i;
 node->threshold = data[j][i];
 /* ...and calculate the negentropy of this partition */
 negentropy_struct = negentropy (data, n_samples, node, target);
 _negentropy = negentropy_struct.ne;
 /* If this negentropy is lower than any other, retain the

 index and threshold for future use */
 if (_negentropy < min_negentropy) {
 min_negentropy = _negentropy;
 split = i;
 best_threshold = data[j][i];
 }
 } /*if (i != target)*/
 } /*for (j=0; j<n_samples; j++)*/
 } /*for (i=0; i<n_vars; i++)*/
 /* Save the combination of best attribute and threshold value */
 node->idx = split;
 node->threshold = best_threshold;
 /* If the negentropy routine found itself at an end-of-branch
 * for the decision tree, the 'status' flag in 'negentropy_struct'
 * is set to ON or OFF and the node labelled accordingly. Otherwise,
 * ID3 continues to call itself until all end-of-branch nodes are found.
 */
 if (negentropy_struct.status != INACTIVE) {
 node->on = node->off = NULL;
 node->idx = negentropy_struct.status;
 }
 else
 {
 node->on = ID3 (matrix, node, target, ON);
 node->off = ID3 (matrix, node, target, OFF);
 }
 return node;
}

Listing Four
Welcome to ID3
Last compiled on Dec 28 1995, 09:15:09
if { Ringo >= 0.33 then if { George >= 0.63 then if { Paul >= 0.58 then 
if { George >= 0.99 then OFF else ON } else if { Ringo >= 0.77 then 
if { Paul >= 0.13 then ON else OFF } else if { John >= 0.52 then 
if { Ringo >= 0.57 then ON else if { John >= 0.79 then ON 
else OFF } } else OFF } } } else ON } else if { George >= 0.34 
then if { John >= 0.52 then if { Paul >= 0.43 then if { George >= 0.76 
then OFF else if { John >= 0.65 then ON else if { John >= 0.52 
then if { Paul >= 0.79 then ON else OFF } else ON } } } else OFF } 
else OFF } else if { John >= 0.49 
then ON else if { Ringo >= 0.08 then if { John >= 0.01 
then ON else OFF } else OFF } } } } 




















UNDOCUMENTED CORNER


Customizing MFC




Scot Wingo and George Shepherd


Scot is a cofounder of Stingray Software, an MFC extension company. He can be
contacted at ScotWi@aol.com. George is a senior computer scientist with
DevelopMentor where he develops and delivers courseware for developers using
MFC and OLE. George can be contacted at 70023 .1000@compuserve.com. They are
the coauthors of MFC Internals (Addison-Wesley, 1996).


Customization is one of the best uses of undocumented MFC information. In our
April column, for instance, we examined the undocumented MFC print-preview
class CPreviewView. While CPreviewView's default print-preview support is a
good start for most applications, you'll have to customize it for enhanced
support. This month, we show you how to customize the MFC print-preview and
print-status dialog. Along the way we'll point out some techniques for working
with undocumented MFC classes that weren't really designed to be customized.
To its credit, Microsoft does outline print-preview customization in a tech
note; as you'll see, however, it's not the simple three-step task Microsoft
describes.


A Generic CPreviewView Derivative


By customizing the appearance of CPreviewView, you can create and plug in a
generic CPreviewView derivative that serves as a jumping-off point for
print-preview customization. Since CPreviewView is an undocumented class,
however, you have to create a derivative by hand instead of using a Wizard or
Expert. Our CPreviewView derivative, called CDDJPrintPreview (see Listings One
and Two), is a plain-vanilla derivative that overrides CPreviewView:: OnDraw()
with CDDJPrintPreview::OnDraw(). In the overriding CDDJPrintPreview::OnDraw(),
there's a TRACE statement and a call to the overridden CPreviewView::OnDraw().
The TRACE statement lets you know that your application is using the new
print-preview class.


Plugging in CDDJPrintPreview


Print previewing starts when the user selects File/Print Preview, thus
invoking the CView member function OnFilePrintPreview(), which is in MFC\SRC\
VIEWPREV.CPP. OnFilePrintPreview() does some error checking, then calls
DoPrintPreview(), passing along CRuntimeClass information for the
print-preview class.
To make MFC use a view other than CPreviewView, override OnFilePrintPreview()
in your application's view. Next, copy MFC's CView::OnFilePrintPreview() code
into your OnFilePrintPreview() and change the DoPrintPreview() call to use the
CPreviewView derivative; see Examples 1(a) and 1(b).
Finally, change the message map in your application's view to call your view's
OnFilePrintPreview() instead of CView:: OnFilePrintPreview(). After this step,
you can compile and run your application and the new print-preview view will
be displayed instead of CPreviewView (double check with your debugger, or by
looking for TRACE output).
In any CPreviewView derivative, you will need to customize the way the view
looks by changing how OnDraw() works. The best way to do this is to copy the
CPreviewView::OnDraw() code from MFC into your CPreviewView derivative's
OnDraw() routine.
Unfortunately there's a snag: OnDraw() calls CView::OnPrint(), which is a
protected CView member function, through a CView pointer, m_pPrintView. Your
CPreviewView derivative cannot call this protected CView member function, but
CPreviewView can call CView::OnPrint(), since it is a "friend" of CView. There
are several ways around the problem. You could add your CPreviewView
derivative to the list of MFC's CView friends by changing the MFC source code
and rebuilding your MFC library. (We discourage tweaking MFC source code since
there are numerous version problems, build issues, and the like.) 
To get around this particular problem, we recommend the following steps:
1. Override CView::OnPrint() in your application's view. Instead of making it
protected, make it a public member function. You can also keep it protected,
then make CDDJPrintPreview a friend--the choice is up to you.However, we also
discourage friends because they usually indicate a class design problem. If a
friend class needs to get to a protected/private member, others most likely
will too, so it is more flexible to make the member public.
2. Copy the code from CView::OnPrint() into the override (this code is in
MFC\SRC\VIEWCORE.CPP).
3. Make sure that the new OnPrint() is called instead of CView::OnPrint().
OnDraw() calls OnPrint() through a CView pointer:
m_pPrintView->OnPrint(m_pPreviewDC, m_pPreviewInfo);. 
This will always call CView::OnPrint() instead of your application view's
OnPrint(). To fix the problem, you have to typecast (or "up-cast") the CView
into a CDDJPrintPreview. Whenever you do this, it's wise to ASSERT that you
are performing a safe typecast; see Example 2.
With this final OnDraw() adjustment, you have a fully customizable
CPreviewView derivative. Depending on how much you want to customize your
application's print preview, you will probably have to override or substitute
some other CPreviewView member functions.
For example, if you want to change the number of pages displayed by print
preview from two to four, you'll need to change the CalcPageDisplaySize()
member function, since it makes calculations that are hard coded for a
two-page display. Since CalcPageDisplaySize() is not a virtual function, you
can't just override it. Instead, you have to override its caller,
PositionPage(), which is virtual, then copy the MFC code into your
PositionPage() override, and change the function to call your own version of
CalcPageDisplaySize() instead of CPreviewView::CalcPageDisplaySize().


The Notepad Example 


We thought it would be interesting to customize our generic customizable print
preview to display something other than a white piece of paper. For example,
what if your application prints checks, labels, or forms that don't look like
the default print preview? Consequently, we decided to create three-hole
punched, lined notebook paper. 
The first step is to create red and blue pens for drawing the vertical and
horizontal lines; see Example 3(a). Since OnDraw() already creates a black pen
to draw the three-dimensional page outlines (called rectPen), there's no need
to create a pen for drawing the three holes. The best point in the existing
OnDraw() for the creation of the new pens is at the top, where the other pens
are created. Listing Three implements the complete notebook paper OnDraw()
routine.
Once the pens are created, the next step is to draw the lines and holes.
OnDraw() has a for loop that iterates through each page and draws it. New
per-page drawing logic should be added to this loop after the FillRect call
that paints the page white. 
In the page-drawing for loop, the variable pRect contains the current
print-preview page's rectangle. You can use this (with some small offsets so
that you don't draw over the page's border) in your own drawing. Example 3(b)
is code for drawing the red line.
After selecting the pen, we set the x-coordinate of the red line to 1/8 of the
page size on the left side of the page. Once x is calculated, draw the red
line from the top of the page to the bottom of the page.
The algorithm for drawing the blue lines is trickier, since there are 30 lines
per page. Example 4 is what we came up with. nBlueCurrentY starts at 1/8 of
the page from the top. After the initial line, each subsequent line is 1/30 of
the remaining 7/8 page away from the previous blue line (that's the
nBlueYIncrement calculation). Once the starting y and y increment are
calculated, a for loop draws the lines down the page.
After the blue lines, it's time to draw the holes. Since there are only three,
you can take the brute-force approach and draw each hole individually instead
of using a loop; see Listing Three. After selecting a black pen with a black
brush, we calculate the left- and right-hole rectangle, which will be the same
for each of the three holes. We reused the already-calculated nBlueYIncrement
as the diameter of the hole in our print-preview algorithm. The first hole is
drawn 1/8 down the page; the second hole, the 1/2 page mark; and the last hole
is 1/8 of the page from the bottom of the page.
Figure 1 shows this algorithm in action. Listing Three presents the complete
customized OnDraw() routine. Code we added is marked with //DDJ -- New and
//DDJ -- End new.


Undocumented Printing-Status Dialog


When you print a view through MFC, a standard printing-status dialog displays
the name of the document, printer name, and current page being printed. Figure
2 shows the default MFC printing-status dialog. Admittedly, this is a dull
dialog. Let's look at how it is implemented in MFC and see how to customize
the printing-status dialog.

The printing-status dialog is implemented by the undocumented class
CPrintingDialog. The definition of CPrintingDialog is embedded in the MFC
source file VIEWPRNT.CPP.
Listing Four includes the class declaration. From this listing, you can see
that CPrintingDialog is a CDialog derivative that uses the resource ID AFX_
IDD_PRINTDLG. You can view this resource by opening FC\INCLUDE\ MFCDLL.RC.
The implementation of CPrintingDialog doesn't do much. The constructor calls
CDialog::Create() to create the modeless dialog. CView does all of the
CPrintingDialog creation and updating in OnFilePrint(). Example 5(a) creates
the CPrintingDialog.
After creating a local CPrintingDialog variable on the stack called
dlgPrintStatus, OnFilePrint() sets the document, printer, and port name. The
AfxFormatString1() call is an internal MFC convenience function that performs
a sprintf-like operation. After setting the text in the dialog, OnFilePrint()
shows and displays the dialog.
Next, CView::OnFilePrint() has a loop that iterates through the pages of a
document. Example 5(b) contains the logic that updates the CPrintingDialog
page number. Finally, when MFC is done printing, it cleans up by destroying
the CPrintingDialog using dlgPrintStatus.DestroyWindow();.


Customizing the MFC Printing-Status Dialog


You can now customize the MFC printing-status dialog. For example, we'll add a
progress bar indicating the percentage complete to the user. There are five
steps to creating a custom MFC print-status dialog:
1. Create a dialog resource with a progress bar in it. We copied the
CPrintingDialog resource and added a progress control with IDC_PROGRESS.
2. Create a CDialog derivative that uses the resource created in step 1.
3. Add an OnFilePrint() override to your application's CView derivative and
copy MFC's CView::OnFilePrint() code into your override. Delete all references
to CPrintingDialog so you do not have to copy the class out of MFC.
(CView::OnFilePrint() is in VIEWPRNT.CPP.)
Several complications are caused by copying this code out of its usual MFC
source home. First, OnFilePrint() calls AfxGetFileTitle() and other
implementation-specific MFC routines. To get over this hurdle, include the MFC
implementation file MFC\SRC\AFXIMPL.H. This file requires several other
internal MFC OLE header files. To get around this problem, wrap the include in
ifdef/undefs to turn off the need for the extra OLE header files. For example:
#define _AFX_NO_OLE_SUPPORT
#include "..\src\afximpl.h"
#undef _AFX_NO_OLE_SUPPORT
The second problem is that OnPrint() uses the undocumented callback function
_AfxAbortProc() to cancel a print job. The easiest solution would be to
declare the function extern and call the version of the function that lives in
MFC. Unfortunately, _AfxAbortProc() is not exported in the MFC DLLs, so you
have to be more creative. Another potential solution would be to copy the
routine from MFC into your code. However, this doesn't work because
_AfxAbortProc() accesses a private MFC global (also not exported in the DLL),
called _afxWinState, to set and check a Boolean value that is set to TRUE when
the user presses the CPrintingDialog cancel button.
To get around this, you need to get rid of the afxWinState dependence:
a. In your view source file, duplicate the _AfxAbortProc() callback function
and give it a new name. We renamed it _DDJAbortProc().
b. Create a global BOOL variable to act as the replacement for the
_afxWinState variable.
c. Replace all instances of _afxWinState.m_bUserAbort with your new global
variable.
d. In your printing-dialog constructor, be sure to initialize this variable to
FALSE.
e. Override CDialog::OnCancel() to set this variable to TRUE. 
f. Change all of the _AfxAbortProc() references in OnPrint() to use the new
abort procedure. 
Now that all of the OnFilePrint() issues are resolved, we can continue the
customization.
4. Change your application's view message map to use the overridden
OnFilePrint() instead of CView::OnFilePrint().
5. Change OnFilePrint() to use the CDialog derivative created in step 2
instead of CPrintingDialog. Example 6(a) illustrates how to initialize the
dialog with the progress bar, while Example 6(b) shows how to update the
progress bar dialog along with the page number.


Conclusion


That's it. A few easy steps (well, step 3 was tricky) and you have created
your own customized printing-status dialog! The customized printing-status
dialog (see Figure3) is also part of the sample available electronically (see
"Availability," page 3). In our next column, we'll take a look at some
undocumented areas of MFC's OLE support.
Learn from Microsoft's Mistakes
There's an important lesson on MFC class design to be learned by customizing
MFC's print preview. When designing an MFC class, be sure to make virtual any
member functions that could ever conceivably be customized. With this
approach, you dramatically increase the possibilities for your class users.
Assume that fellow developers will try to override behavior you never thought
they'd be interested in. 
Also, break long segments of implementation-specific code into fine-grained
member functions. For example, if CPreviewView::OnDraw() made several function
calls like: PrepareView(), DrawOutline(), DrawPages(), CleanView(), and the
like, it would be easy to change the way a page is drawn by overriding
DrawPages(), instead of having to override all of OnDraw() and copy the code
into your derivative.
Another common problem in MFC involves embedded classes. Embedding a class
makes it hard to customize (see Example 5). CDocument has a good solution to
this problem. In older versions of MFC, CDocument had an embedded CFile
object, so making CDocument use a CFile derivative instead of CFile was very
difficult. In MFC 4.0, Microsoft changed CDocument so that it calls the
virtual function GetFile(). The default implementation of this creates a CFile
(really CMirrorFile, as we exposed in the last article) and returns it. To
customize the CFile used by CDocument, all the CDocument user has to do is
override GetFile() to return a CFile derivative. Wouldn't it be nice if CView
had a GetPrintingDialog() routine that let you customize the printing dialog?
--S.W. & G.S.
Figure 1: Customized notebook paper example.
Figure 2: Default MFC printing-status dialog.
Figure 3: Customized printing-status dialog.
Example 1: (a) Copying MFC's CView::OnFilePrintPreview() into your
OnFilePrintPreview(); (b) change the DoPrintPreview() call to use the
CPreviewView derivative.
(a)
DoPrintPreview(AFX_IDD_PREVIEW_TOOLBAR, this,
 RUNTIME_CLASS(CPreviewView), pState));

(b)
DoPrintPreview(AFX_IDD_PREVIEW_TOOLBAR, this,
 RUNTIME_CLASS(CDDJPrintPreview), pState));
Example 2: Using ASSERT for a safe typecast.
CDdjsampView * pMyView;
ASSERT(m_pPrintView->IsKindOf(RUNTIME_CLASS(CDdjsampView)));
pMyView = (CDdjsampView *)m_pPrintView;
pMyView->OnPrint(m_pPreviewDC,m_pPreviewInfo);
Example 3: (a) Creating red and blue pens for drawing the vertical and
horizontal lines; (b) drawing the red line.
(a)
CPen redPen, bluePen;redPen.CreatePen(PS_SOLID,2,RGB(255,0,0)); //thickness 2
pixelsbluePen.CreatePen(PS_SOLID,1,RGB(0,0,255)); //thickness 1 pixel


(b)
pDC->SelectObject(&redPen);int nRedX = pRect->left + pRect->Width()/8;
pDC->MoveTo(nRedX,pRect->top + 2);pDC->LineTo(nRedX,pRect->bottom - 2);
Example 4: Algorithm for drawing the blue lines.
int nBlueCurrentY = pRect->top + pRect->Height()/8;
int nBlueYIncrement = (pRect->bottom - nBlueCurrentY)/30;
pDC->SelectObject(&bluePen);
for (int nBlueCount = 0; nBlueCount < 30; nBlueCount++){
 pDC->MoveTo(pRect->left + 3,nBlueCurrentY);
 pDC->LineTo(pRect->right - 2,nBlueCurrentY);
 nBlueCurrentY += nBlueYIncrement;
}//end blue line loop.
Example 5: (a) Creating CPrintingDialog; (b) logic that updates the
CPrintingDialog page number.
(a)
CPrintingDialog dlgPrintStatus(this);CString
strTemp;dlgPrintStatus.SetDlgItemText(AFX_IDC_PRINT_DOCNAME,
strTitle);dlgPrintStatus.SetDlgItemText(AFX_IDC_PRINT_PRINTERNAME,
printInfo.m_pPD->GetDeviceName());AfxFormatString1(strTemp, nFormatID,
strPortName);dlgPrintStatus.SetDlgItemText(AFX_IDC_PRINT_PORTNAME,
strTemp);dlgPrintStatus.ShowWindow(SW_SHOW);dlgPrintStatus.UpdateWindow();

(b)
// write current pageTCHAR szBuf[80];wsprintf(szBuf, strTemp,
printInfo.m_nCurPage);dlgPrintStatus.SetDlgItemText(AFX_IDC_PRINT_PAGENUM,
szBuf);
Example 6: (a) Initializing custom dialog with progress bar; (b) updating the
progress bar.
(a)
CDDJPrintDialog ddjPrintDlg(this);CString strTemp;//Now initialize the
progress controlCProgressCtrl * pProgress =
(CProgressCtrl*)ddjPrintDlg.GetDlgItem(IDC_PROGRESS);ASSERT(pProgress !=
NULL);pProgress->SetPos(0); //start at
0pProgress->SetRange(0,printInfo.GetMaxPage());//stop at
maxpProgress->SetStep(1); //increment by 1 (not 10)

(b)
TCHAR szBuf[80];wsprintf(szBuf, strTemp,
printInfo.m_nCurPage);ddjPrintDlg.SetDlgItemText(AFX_IDC_PRINT_PAGENUM,szBuf);
pProgress->StepIt();

Listing One
class CDDJPrintPreview : public CPreviewView
{
// Construction
protected:
 // protected constructor used by dynamic creation
 CDDJPrintPreview(); 
 DECLARE_DYNCREATE(CDDJPrintPreview)
// Attributes - none
// Operations - none
// Overrides
protected:
 virtual void OnDraw(CDC* pDC); // overridden to draw this view
// Implementation
protected:
 virtual ~CDDJPrintPreview();
#ifdef _DEBUG
 virtual void AssertValid() const;
 virtual void Dump(CDumpContext& dc) const;
#endif
protected:
 //Message map in case you want to handle any user
 //interaction.
 DECLARE_MESSAGE_MAP()
};

Listing Two
IMPLEMENT_DYNCREATE(CDDJPrintPreview, CPreviewView)
CDDJPrintPreview::CDDJPrintPreview()
{
//Do any custom print preview initialization here. 
}
CDDJPrintPreview::~CDDJPrintPreview()
{ //Destroy any custom print preview stuff here.

}
BEGIN_MESSAGE_MAP(CDDJPrintPreview, CPreviewView)
//Insert any custom print preview message handlers here.
END_MESSAGE_MAP()
//OnDraw override just calls TRACE and overriden OnDraw().
void CDDJPrintPreview::OnDraw(CDC* pDC)
{
 TRACE0("Called CDDJPrintPreview::OnDraw()!!");
 CPreviewView::OnDraw(pDC);
}
#ifdef _DEBUG
void CDDJPrintPreview::AssertValid() const
{
//Add any checks for customized print preview members here.
 CView::AssertValid();
}
void CDDJPrintPreview::Dump(CDumpContext& dc) const
{
//Add debug output for any customized print preview members here.
 CView::Dump(dc);
}
#endif //_DEBUG

Listing Three
void CDDJPrintPreview::OnDraw(CDC* pDC)
{
 // don't do anything if not fully initialized
 if (m_pPrintView == NULL m_dcPrint.m_hDC == NULL)
 return;
 //DDJ - New
 CDdjsampView * pMyView; 
 //DDJ - End New
 CPoint ViewportOrg = pDC->GetViewportOrg();
 CPen rectPen;
 rectPen.CreatePen(PS_SOLID, 2, GetSysColor(COLOR_WINDOWFRAME));
 CPen shadowPen;
 shadowPen.CreatePen(PS_SOLID, 3, GetSysColor(COLOR_BTNSHADOW));
 //DDJ - New 
 CPen redPen;
 redPen.CreatePen(PS_SOLID,2,RGB(255,0,0));
 CPen bluePen;
 bluePen.CreatePen(PS_SOLID,1,RGB(0,0,255));
 //DDJ - End New
 m_pPreviewInfo->m_bContinuePrinting = TRUE; 
 for (UINT nPage = 0; nPage < m_nPages; nPage++)
 {
 int nSavedState = m_dcPrint.SaveDC(); 
 // Use paint DC for print preview output
 m_pPreviewDC->SetOutputDC(pDC->GetSafeHdc());
 m_pPreviewInfo->m_nCurPage = m_nCurrentPage + nPage;
 // Only call PrepareDC if within page range, otherwise use default
 // rect to draw page rectangle
 if (m_nCurrentPage + nPage <= m_pPreviewInfo->GetMaxPage())
 m_pPrintView->OnPrepareDC(m_pPreviewDC, m_pPreviewInfo);
 m_pPreviewInfo->m_rectDraw.SetRect(0, 0,
 m_pPreviewDC->GetDeviceCaps(HORZRES),
 m_pPreviewDC->GetDeviceCaps(VERTRES));
 m_pPreviewDC->DPtoLP(&m_pPreviewInfo->m_rectDraw);
 // Draw empty page on screen

 pDC->SaveDC(); // save the output dc state
 CSize* pRatio = &m_pPageInfo[nPage].sizeScaleRatio;
 CRect* pRect = &m_pPageInfo[nPage].rectScreen;
 if (pRatio->cx == 0)
 { // page position has not been determined
 PositionPage(nPage); // compute page position
 if (m_nZoomState != ZOOM_OUT)
 ViewportOrg = -GetDeviceScrollPosition();
 }
 //This section draws the page and outline and 3-d shadow 
 pDC->SetMapMode(MM_TEXT);// Page Rectangle is in screen 
 pDC->SetViewportOrg(ViewportOrg);
 pDC->SetWindowOrg(0, 0);
 pDC->SelectStockObject(HOLLOW_BRUSH);
 pDC->SelectObject(&rectPen); 
 pDC->Rectangle(pRect);
 pDC->SelectObject(&shadowPen);
 pDC->MoveTo(pRect->right + 1, pRect->top + 3);
 pDC->LineTo(pRect->right + 1, pRect->bottom + 1);
 pDC->MoveTo(pRect->left + 3, pRect->bottom + 1);
 pDC->LineTo(pRect->right + 1, pRect->bottom + 1);
 // erase background to white (most paper is white)
 CRect rectFill = *pRect;
 rectFill.left += 1;
 rectFill.top += 1;
 rectFill.right -= 2;
 rectFill.bottom -= 2;
 ::FillRect(pDC->m_hDC, rectFill, (HBRUSH)GetStockObject(WHITE_BRUSH));
 //DDJ - New 
 //Now that the page is white we can draw our notebook paper!
 //If you want yellow legal paper, change the FillRect above.
 //Draw the red line
 pDC->SelectObject(&redPen);
 int nRedX = pRect->left + pRect->Width()/8; 
 pDC->MoveTo(nRedX,pRect->top + 2);
 pDC->LineTo(nRedX,pRect->bottom - 2);
 // Draw 30 blue lines - start 1/8th page from top
 int nBlueCurrentY = pRect->top + pRect->Height()/8;
 int nBlueYIncrement = (pRect->bottom - nBlueCurrentY)/30;
 pDC->SelectObject(&bluePen);
 
 for (int nBlueCount = 0; nBlueCount < 30; nBlueCount++){
 pDC->MoveTo(pRect->left + 3,nBlueCurrentY);
 pDC->LineTo(pRect->right - 2,nBlueCurrentY);
 nBlueCurrentY += nBlueYIncrement;
 }//end blue line loop.
 //Now let's do some three-hole-punching!
 //Draw one every 1/4 page except for last one.
 //Make it the size of a blue increment which looks good.
 pDC->SelectObject(rectPen);
 pDC->SelectObject(GetStockObject(BLACK_BRUSH));
 //All holes have same left/right, different top/bottom
 CRect rectHole; 
 rectHole.left = pRect->left + pRect->Width()/16;
 rectHole.right = rectHole.left + nBlueYIncrement;
 //Hole 1 
 rectHole.top = pRect->Height()/8;
 rectHole.bottom = rectHole.top + nBlueYIncrement;
 pDC->Ellipse(rectHole);

 //Hole 2
 rectHole.top = pRect->Height()/2;
 rectHole.bottom = rectHole.top + nBlueYIncrement;
 pDC->Ellipse(rectHole);
 //Hole 3
 rectHole.top = pRect->Height()/8 * 7;
 rectHole.bottom = rectHole.top + nBlueYIncrement;
 pDC->Ellipse(rectHole);
 //DDJ - End New
 pDC->RestoreDC(-1); // restore to synchronized state
 // Some old OnDraw() code removed for brevity.
 } //end of the page for loop.
 rectPen.DeleteObject();
 shadowPen.DeleteObject();
 //DDJ - New Nuke our pens..
 bluePen.DeleteObject();
 redPen.DeleteObject();
 //DDJ - end new
}

Listing Four
class CPrintingDialog : public CDialog
{
public:
 enum { IDD = AFX_IDD_PRINTDLG };
 CPrintingDialog::CPrintingDialog(CWnd* pParent)
 {
 Create(CPrintingDialog::IDD, pParent
 _afxWinState->m_bUserAbort = FALSE;
 }
 virtual ~CPrintingDialog() { }
 virtual BOOL OnInitDialog();
 virtual void OnCancel();
protected:
};




























PROGRAMMER'S BOOKSHELF


Power to the Programmer




Tom Saltsman


Tom Saltsman is a freelance writer and software developer working on an
enterprise-wide PowerBuilder project for Square D company. He can be reached
through the DDJ offices.


Until last year, my PowerBuilder library consisted of magazine articles,
vendor-training material, dog-eared photocopies, and the only book
available--Bill Hatfield's Developing PowerBuilder 3 Applications. As
PowerBuilder has grown in popularity, however, new books appear all the time.
A title search on "PowerBuilder" in the Computer Literacy Bookstore Web site
(http://www.clbooks.com) will return nearly 40 books. 
What's interesting about the books that come up on the title search, however,
is that a half dozen or so target PowerBuilder 5 which, at this writing, still
isn't shipping. With that in mind, I'll examine three books billed as
"complete guides" to PowerBuilder 4. Since book publishers commonly slap minor
revisions, updated version numbers, and new screen shots on existing books
just to get to the shelf first, my comments about these PowerBuilder 4 books
will likely apply to their "soon-to-be-released" Version 5 editions. (Indeed,
one of the books examined here already has a Version 5 edition announced.)


PowerBuilder 4: A Developer's Guide


David McClanahan's PowerBuilder 4: A Developer's Guide opens with an overview
of the PowerBuilder environment. He explains its object-oriented features,
where and how to get help, and presents an overview of the painters. (A
"painter" is PowerBuilder's development environment for creating specific
objects--windows, datawindows, and menus. Hence, there is a window painter, a
menu painter, and so on.) Next, McClanahan illustrates the Powerscript
language, followed by an application and library painters. He defines every
menu item and its usage. In doing so, he builds a simple .EXE, displaying
various response windows based upon your PB.INI files. 
In the heart of the book, McClanahan probes into window objects. Window
events, functions, and attributes are delineated, and he gives an account of
all window painter and menu items. You create another application, this time
demonstrating the principle window types, except MDI. Next, window controls
are revealed, and for each control (Command button, SLE, DDLB, and the like),
he discusses attributes, events, and functions. This is followed by embedded
SQL, leading to the main portion of the book--DataWindows. 
The examples here are not hands on, only described. In the "Extending
PowerBuilder" chapter, examples of DDE clients and DDE servers, DLLs,
integrating C++, and OLE are presented, along with code. McClanahan finishes
his book with an elegant chapter on inheritance, along with separate examples
for windows, user objects, and menu inheritance. There is plenty of material
for further study here. 
McClanahan is a widely read columnist, educator, and author of articles about
client/server technology. His book reads just like his articles--concise, with
no straying from the subject. McClanahan is at his best explaining OOP
concepts--polymorphism, inheritance, and encapsulation, with concrete code
examples. The accompanying diskette--with examples of user objects, DDE, and
OLE, is excellent. Another nice feature is the section explaining how Windows
processes the application message queue. In his preface to PowerBuilder 4: A
Developer's Guide, McClanahan states his intent to get you up to speed on
PowerBuilder as quickly as possible. One drawback I see, however, is that it
takes 350 pages to get to the DataWindows topic, the backbone of PowerBuilder
software. Another minor drawback is that every single attribute and event for
all controls is defined, duplicating the PowerBuilder manuals. 


Developing PowerBuilder 4 Applications


A quick examination of Developing PowerBuilder 4 Applications, Third Edition,
lets you know that this is not "Son of Developing PowerBuilder 3
Applications." Following an introduction to client/server computing, Bill
Hatfield plunges right into a PowerBuilder application--a Customer Address
window. Working in the library, application, database, DataWindow, and window
painters, you complete the application, and create an .EXE within the first 50
pages. In two hours, you have a firm grasp of the PowerBuilder environment,
which is particularly useful for those new to OOP. 
Hatfield proceeds to probe deeper into window controls, menus, powerscript,
and debugging in the middle of this book. Section titles like "Talking to Your
Parents" and "Joins and Other Cozy Subjects" border on being corny, but keep
your interest. Others, such as "Protecting Your Privates," suggest paying
closer attention. I was disappointed in not seeing a "Problem Child Windows"
chapter, a subject every PowerBuilder developer can relate to. Powertips,
Powergotchas, Powernotes, and Poweralerts are sidebar-type items interspersed
throughout, allowing you to gain additional knowledge. In the final third of
the book, Hatfield discusses user objects, graphs, and OOP. Inheritance,
polymorphism, and encapsulation are analyzed at arm's length without specific
examples--the one downside to the book.
Hatfield is the manager of education for a consulting firm and has been
teaching PowerBuilder for three years. His upbeat writing style makes learning
a fun and painless experience. This is clearly a book for beginners and a
worthy complement to his earlier work.


Special Edition: Using PowerBuilder 4


The cover states that this book is "geared toward the accomplished to expert
level," but Special Edition is also adequate for a PowerBuilder novice
possessing experience in the Windows environment. The best feature of this
book is that you build an Inventory Tracking System, not a handful of
unrelated windows. You use DDLBs, checkboxes, DDDWs, radio buttons, LBs,
command buttons, and the like. This provides a comprehensive example of a
half-dozen tables and windows.
There are seven parts to the book, including an introduction to the system
with E-R diagrams, helpful for those not familiar with the relational model.
The Powerscript language is discussed, followed by an investigation of
DataWindows. In the fourth section, you put it all together, delivering the
final software, debugging it, and creating the .EXE. Next, the authors delve
into dwModify, inheritance, and the data pipeline, with separate chapters
devoted to each topic; then you are guided through a dwShare example. The last
part is composed of a function-reference section, organized by object, and an
attribute and event quick lookup. The appendix contains a section about Watcom
C++ (which comes with the PowerBuilder Enterprise Edition). 
The accompanying CD-ROM contains over 100 MB of third-party tools and demos,
including scaled-down versions of InfoModeler, ObjectStart, GUI Guidelines,
PowerClass, PowerFrame, PowerTool, RoboHelp, and S-Designer--plenty of
material for further study. 


Conclusion


Each book emphasizes different parts of PowerBuilder and has its own
strengths. For an experienced windows developer, I'd recommend Special Edition
or A Developer's Guide. If you're visually oriented, or feel you would absorb
more by following one application throughout, Special Edition is your best
bet. If you thirst for a deeper understanding of DDE, OLE, inheritance, or
user objects, A Developer's Guide would be a better choice. For the novice,
Developing PowerBuilder 4 is your best option.
PowerBuilder 4: A Developer's Guide
David McClanahan
Henry Holt, 1995
873 pp., $44.95
ISBN 1-558-51417-1
Developing PowerBuilder 4 Applications, Third Edition
Bill Hatfield
SAMS Publishing, 1995
910 pp., $45.00 
ISBN 0-672-30695-6

Special Edition: Using PowerBuilder 4
Charles Wood et al.
Que Inc., 1995
682 pp., $49.99
ISBN 0-7897-0059-X


























































SWAINE'S FLAMES


Free Quip Art


I've ornamented this month's allocation of random ramblings with a random
selection of subheads from my extensive collection of clever subheads. You
should consider these a form of clip art. Feel free to use them anywhere they
work for you: in your e-mail sig file, in comments in your code, or in memos
to the boss. Naw, don't thank me. That's just the kind of big-hearted guy I
am.


Protocol Waiting


Now there's a nice subtitle. It almost suggests a whole essay on the need for
consensus on some as-yet unstandardized standard. Unfortunately, I don't have
such an essay handy, but as Voltaire said, "Si Pangloss n'avait pas t pendu,
dit Candide, il nous donnerait un bon conseil dans cette extrmit, car
c'tait un grand philosophe. A son dfaut, consultons la vieille."
Regarding my recent essay on speleology and spelunking, technology and
technunking, John F. Goyvaerts writes to ask, "According to an IBM ad (DDJ,
December 1995, page 63) 'Kevin invented automatic 16/32-bit thunking.' What
does IBM have to do with theology?" Sorry, John, I refuse to rise to such
obvious bait. Make your own joke.


The Exon Files


Like one in seven Webmasters, I turned my Web site black on February 8 to
protest the Communications Decency Act. I also hung a blue protest ribbon on
the site and felt righteous. Then I read what the citizens of Billings,
Montana, did not long ago. It was at a time when, according to Christopher
Hitchens, writing in the April 1 Nation, "Aryan supremacists and Christian
Identity types tried to move in.... After the stoning of a Jewish home...the
local paper printed a cut-out menorah and asked readers to take the
yellow-star example to heart by placing one in every window. The response was
enormous." Mere symbols are cheap, but it takes real courage to paint a target
on your chest.


Hunt the Wumpus


This subhead, a reference to an early microcomputer game, is a nice metaphor
for any pointless, convoluted quest. It also sounds like a definition of
searching the Internet. Speaking of pointless quests and searching the
Internet, a number of readers solved April's computer history quiz with a
quick Web search. Others did just as well by relying, so they claim, only on
their memories. The quiz asked for the correct chronological order for the
arrival of several computers. The correct order, with dates, is:
F 1938 Zuse Z1 8 1961 DEC PDP-1 5 1976 Cray-1
1 1941 Atanasoff-Berry 2 1964 CDC 6600 7 1978 DEC VAX 11/780
 Computer
3 1943 Colossus Mark I A 1965 IBM 360/50 4 1980 Commodore VIC-20
9 1945 ENIAC 6 1969 Data General Nova E 1981 Xerox Star 8010
D 1959 Sperry Rand C 1975 MITS Altair 8800 B 1982 IBM PC/AT
 UNIVAC 1103
Or, in the short form I requested, F139D82A6C574EB.
The first correct answer (and the first answer, for that matter) came from
Michael Passer. A Dr. Dobb's t-shirt is winging its way to him now. Several
entrants objected to the small-print clause stating that by entering the
contest they were ceding their immortal souls to Miller Freeman, Inc., so
we've dropped that condition. After examining some of the souls, it was an
easy call.


The Court Reporters of Chaos


This subhead only makes sense if you've read Roger Zelazny's Amber series. It
has nothing whatever to do with, for example, how England's problem with mad
cow disease produced a bull market for beef this year. Mad cows? At least now
we know what made all those crop circles a few years back.
Michael Swaineeditor-at-large
mswaine@cruzio.com


















OF INTEREST
Apple Computer has announced a developer's release of Apple Game Sprockets, an
SDK for creating multimedia and Internet-enabled games for Mac OS-based
computers. Using the SDK, all Mac OScompatible games can feature real-time 3-D
graphics, 3-D sound, Internet support, speech recognition, and input
device/monitor control. Available royalty-free to developers, Game Sprockets
includes the final release of QuickDraw 3D RAVE (Rendering Acceleration
Virtual Engine), a multiplatform technology that enables you to incorporate
plug-and-play 3-D acceleration hardware.
The Apple Game Sprockets is a set of APIs designed to work with other Apple
multimedia technologies such as QuickTime, QuickTime VR, QuickTime
Conferencing, and QuickDraw 3D. You can mix-and-match individual sprockets to
best enhance and complement the existing features of your title. 
The current Apple Game Sprockets SDK includes: NetSprocket, an Internet
connectivity and multiplayer gaming API; SoundSprocket, a 3-D sound and Sound
Manager API; SpeechSprocket speech-recognition API; InputSprocket,
digital-joystick control and input-device API; DrawSprocket, multiple
buffering/display control API; and QuickDraw 3D RAVE, a multiplatform 3-D
graphics API.
Apple Computer
1 Infinite Loop
Cupertino, CA 95014
408-996-1010
http://www.dev.apple.com/games
AstraTek is previewing its Visual LAN Probe (VLP) for Sybase, a design and
debugging tool used for Sybase-based client/server network development. Using
frame recognition technology, VLP decodes network frames into Sybase SQL API
calls. VLP translates network frames into Sybase SQL API calls so that you can
view the data stream in the Sybase SQL language, instead of in strings of
cryptic hex code. 
VLP analyzes the network stack from the datalink layer to the application
layer, showing SQL API calls in ODBC, DBLib, or CTLib. VLP runs on any Windows
NT workstation or server in a Sybase environment. VLP starts at $995.00.
AstraTek Inc. 
130 Liberty Street MS2088 
New York, NY 10006 
888-227-8728
http://www.astratek.com
NobleNet has announced its EZ-API family of Component APIs. EZ-APIs supposedly
simplify the integration of industry-standard and custom APIs and enable you
to enhance API functions. With the tool, multiple APIs can share client,
server, and network resources. Component API technology automatically
multiplexes API calls over a single network connection between client and
server and presents them to the server as a single-user process. By tightly
integrating multiple APIs on the server, application efficiency is increased
and servers can take on multiple API personalities. Additionally, EZ-APIs
encapsulate each API function call in a modifiable source-code wrapper on both
the client and server, allowing you to add enhancements to standard APIs such
as security, auditing, compression, naming, and caching. This is achieved
without modifying API behavior. 
The first member of the EZ-API family, the OneDriver ODBC SDK, relocates the
ODBC API from the Windows client to NT or to a variety of UNIX platforms. The
SDK includes all necessary communications software to connect Microsoft
Windows 3.1/95/NT clients to any standard TCP/IP server platform. OneDriver
ODBC works with any ODBC-compliant database or front-end development tool
including Visual Basic, PowerBuilder, Delphi, and Microsoft Access. The
NobleNet OneDriver ODBC SDK is priced at $2500.00 per server for an unlimited
number of users. Supported client platforms include Windows 3.1/95/NT, and
server platforms include NT, Sun Solaris, HP/UX, and IBM AIX
NobleNet Inc.
337 Turnpike Road
Southboro, MA 01772
508-460-8222 
http://www.noblenet.com
Willows Software and Borland International have announced a strategic
technology alliance to provide Borland's OWL as a cross-platform development
solution. 
Implemented on top of Willows' TWIN APIW, OWL will provide the application
framework necessary to easily build crossplatform applications. OWL 5.0
provides splash screens, dockable toolbars, splitter windows, and several
gadgets to speed up the development process. Additionally, OWL provides
synchronization objects for multithreaded development and eases the transition
to 32-bit development by supporting both 32- and 16-bit development, including
16-bit emulation of most Windows 95-based common controls.
TWIN APIW and OWL will be distributed by Willows on its TWIN APIW CD for an
annual subscription fee of $250.00.
Willows Software Inc.
12950 Saratoga Avenue, Suite A
Saratoga, CA 95070-4670
408-777-1820
http://www.willows.com

ILOG and SunSoft have announced an alliance to build a bridge between Java and
C++. The TwinPeaks project will pave the way for Java-ready C++ components.
The technology will support both standard C++ and ANSI C components. The
bridge is a gateway program that allows C++ components to be recognized and
used by Java through an automatically generated interface. A C++ component has
two elements: an interface or header file and the object code. The TwinPeaks
bridge will read and analyze the header file and produce both a Java interface
file and a thin layer of Java/C++ bridge code, which translates API calls and
data formats between Java and the object code of the existing component.
Through this translation, Java will be able to recognize and use existing C++
components. 
ILOG Inc. 
2005 Landings Drive
Mountain View, CA 94043
800-367-4564
http://www.ilog.com 

NuMega Technologies has announced BoundsChecker 4.0 and SoftICE for Windows
NT. BoundsChecker 4.0 is integrated into the Microsoft Visual C++ Developer
Studio debugger. This allows BoundsChecker to detect hidden errors as you step
through the code using the Visual C++ debugger without having to leave the
Developer Studio to locate, evaluate, or correct errors. BoundsChecker's
OLECheck features detect OLE Interface leaks and invalid parameters and return
codes for over 70 OLE interfaces. BoundsChecker ensures that all interfaces
are correct, execution is error free, and the software is dependable.
BoundsChecker 4.0 also offers support for Delphi 2.0 and ApiGen, a utility for
the validation of user-extensible API calls. 
BoundsChecker 4.0 is available in Professional and Standard Editions for
Windows 95/NT. BoundsChecker Professional Edition sells for $999.00, while the
Standard Edition lists for $299.00. 
NuMega Technologies Inc.
P.O. Box 7780 
Nashua, NH 03060-7780
603-889-2386 
http://www.numega.com
ClassAction, a collection of Windows API tools for Visual Basic, has been
released by the Crescent Division of Progress Software. With ClassAction, you
can access Windows functionality without resorting to C. ClassAction also
gives you an extensive library of error codes and conditions. The tool is
available for $139.00. 
Crescent Software
14 Oak Park
Bedford, MA 01730
617-280-3000 
http://www.progress.com/crescent
Softimage, a subsidiary of Microsoft, has announced its Softimage Software
Developers Connection, a program that features technical and marketing support
and a new Softimage SDK. The SISDK, which supports both Windows NT and Silicon
Graphics platforms, provides you with direct access to animation and modeling
(SAAPHIRE toolkit), rendering (mental ray toolkit) and motion-capture controls
(Channels toolkit) in Softimage 3D. Applications can run as stand-alone
programs or be fully integrated into Softimage 3D for Windows NT and Silicon
Graphics IRIS systems. 
Softimage Inc.
3510 Boulevard St. Laurent, Suite 400
Montral, Qubec Canada H2X 2V2
514-845-1636

http://www.softimage.com/
Microsoft has announced the release of Visual C++, Version 4.1. This release
includes support for the Microsoft Internet Information Server (IIS),
third-party ActiveX Controls (formerly known as OLE Controls), and performance
enhancements for managing large projects. Visual C++ 4.1 delivers reusable
components through Custom AppWizards, ActiveX Controls, and MFC 4.1 This
update includes support for Windows NT and Windows 95. 
Features new to VC++ 4.1 include: MFC for ActiveX Servers for creating
interactive Web applications using the Microsoft Internet Server API (ISAPI);
an ISAPI Extension Wizard for creating Internet server extensions and filters;
VRML support from Template Graphics Software through a Custom AppWizard and
ActiveX Controls; and Developer Studio Web Favorites, which lets you access
your favorite World Wide Web sites from within Developer Studio. 
Current VC++ subscribers will automatically receive Visual C++ 4.1. For new
users, the VC++ Subscription Edition is available for approximately $499.00. 
Microsoft Corp.
One Microsoft Way
Redmond, WA 98052
206-882-8080
http://www.microsoft.com/devonly
NetManage has announced a set of Internet ActiveX Controls called the
"Internet Control Pack." Jointly developed by NetManage and Microsoft, these
controls let you integrate Internet functionality into your development
projects. The Internet Control Pack will work with all Microsoft enhanced
ActiveX Controls and Microsoft development tools such as Microsoft Visual
Basic, Visual C++, Visual FoxPro, and Access, as well as any development
environment that supports ActiveX Control tools like Borland's Delphi. The
Internet Control Pack covers a wide range of Internet protocols including
Winsock, HTML, HTTP, FTP, NNTP, and SMTP/POP3. 
NetManage Inc. 
10725 N. De Anza Boulevard
Cupertino, CA 95014
408-973-7171
http://www.netmanage.com 
Mystic River Software has announced a scripting tool to support Microsoft's
ActiveX interface. The tool is based on Mystic River's SBL, an embedded
language for Visual Basic. The company also said it is extending its
technology to provide cross compatibility between Visual Basic Scripting
Edition (VBScript) and Java. VBScript, a subset of Visual Basic for
Applications, will be integrated into Web browsers and is designed to allow
ActiveX controls, applets, and other objects embedded in HTML documents to
interact. SBL is currently available for the Windows 3.1/95/NT, the Win32s
API, NLM, OS/2, Mac OS, the PowerPC, and UNIX. 
Mystic River Software
125 Cambridge Park Drive
Cambridge, MA 02140 
800-298-350
http://www.mysticriver.com 
Centigram Communications has announced that its TruVoice Version 5.0 English
SDK is now available. The TruVoice SDK supports the Microsoft Speech API used
in Windows 95/NT. TruVoice converts any text into spoken English and includes
an intelligent preprocessor for reading e-mail messages, fax headers, and
other messages. An extensive set of context-sensitive rules compensates for
poorly crafted text commonly found in these message forms. This "text
scrubber" also makes interactive voice-response (IVR) systems more cost
effective to implement, as there is no need for a system or database
administrator to make time-consuming corrections to database information
before the text is sent to the converter. The TruVoice SDK for Windows sells
for $295.00. 
Centigram Communications Corp.
91 East Tasman Drive
San Jose, CA 95134
408-944-0250
http://www.centigram.com
Scientific Software Tools has announced LabOBJX Real-Time Chart, a Visual
Basic custom control for charting real-time data. LabOBJX Real-Time Chart
maximizes visual display rates for creating realistic and fast oscilloscope
and strip chart displays. The control works with all versions of Visual Basic,
Delphi, and C++ compilers that support Visual Basic control technology.
LabOBJX Real-Time Chart's flexible input data format eliminates the need to
preprocess data before plotting. Users can input data as signed and unsigned
8-, 16-, and 32-bit integers, 32- and 64-bit floating-point values; scalars,
xy-pairs, x-vectors, y-vectors, xy-pair vectors, column- and row-major
xy-arrays. The control also allows charting of selected rows or columns from
arrays as well as charting every nth sample.
LabOBJX Real-Time Chart supports up to 32,767 channels and viewports and
16,384 points/channel screen updating. It also features adjustable priority. A
library of Visual Basic, Delphi, and C++ source- code example programs are
included, as well as a context-sensitive on-line help system. LabOBJX
Real-Time Chart sells for $199.00. 
Scientific Software Tools
19 East Central Avenue
Paoli, PA 19301 
610-889-1354
http://www.sstnet.com
ZyLAB has added a 32-bit Windows 95/NT-compatible toolkit to its family of
full text search and retrieval software. With the 32bit toolkit, you can
develop applications in Visual C 4.x and Visual Basic 4.x. The 32-bit toolkit
makes use of multiple threads in environments that facilitate multithreading. 
ZyLAB Corp. 
19650 Club House Road, Suite 106 
Gaithersburg, MD 20879 
301-590-0900 
http://www.zylab.com






















EDITORIAL


There's Nothing Dumb About Smart Debugging


Compilers seem to be getting smarter all the time. If you fire up Borland's
Delphi development environment or Visual C++ and start writing code, you'll
notice a feature called "syntax-directed highlighting," which color codes
program keywords, comments, and the like. In Visual Basic, when I enter a line
of code in my program editor, the syntax is immediately checked for mistakes.
If I forget to include a matching parenthesis, I'm immediately notified of my
error. These features started me thinking about the debugging process and led
me to ask the question, "Why haven't compiler vendors done more to automate
the debugging process in the same manner as syntax checking?" Much like a
spellchecker in a word processor, you'd think a debugger would be able to run
as a background task, evaluating and instrumenting your source code during
development.
Currently, the typical debugging tool in the programmer's arsenal is a
run-time debugger, usually built into the development environment of your
compiler. Okay. So you can set breakpoints, step through code, and see
variables instantiated. This is only a step away from including printf
statements in your program. Sure, there are other tools that let you check
array bounds, look for memory corruption, and detect orphaned pointers. The
problem is that a tool is only as good as the programmer using it. Too often,
debugging is an afterthought that is pursued only after a large portion of the
project has been developed.
Clearly, there are many different types of bugs, and no single tool can catch
them all. The process of detecting memory errors, however, can easily be
automated. One popular approach links in a library of functions, which is
called as your program executes. The library reports on memory leaks and
corruption. An interesting twist on this idea is used in Great Circle, an
automatic memory manager from Geodesic Systems. This tool provides a library
that performs the equivalent of automatic garbage collection. Instead of
calling malloc or new, you call one of Great Circle's memory-management
routines. The library monitors pointer variables and automatically frees them
when they are no longer used. 
Tools such as StratosWare's MemCheck for Windows read object code generated by
the compiler and look for processor instructions that access memory.
Memory-access instructions are then modified to check for corruption.
Source-code instrumentation tools such as Parasoft's Insure++ place test and
analysis functions around every statement in your program. At compile time,
these functions collect information about data structures, pointers, and
memory and store it in a database. At run time, values and memory references
are checked against this database. This approach allows the tool to catch
errors in static and dynamic memory, as well as stack memory. 
So, how can these tools be better integrated with compiler environments? For
tools that work at the source-code level, the key is in the compiler's parser.
Syntax-directed highlighting and syntax checking are possible because the
editor has implicit knowledge of the compiler's parser and access to
information in the parse tree. Currently, third-party vendors can create a
programmer's editor with similar features, but must write their own parser to
accomplish the task, thus duplicating a feature already built into the
compiler. The same problem exists for vendors creating debugging tools that
rely on source-code instrumentation. Again, these developers must reinvent the
wheel.
What if compiler developers provided a standard interface to the compiler's
parser? This is the question the people at Parasoft are currently asking.
Third-party developers would no longer be forced to write parsers for each
language or variant they wish to support. Instead, they could use a piece of
technology already built into the compiler. The compiler company would benefit
from additional third-party support for their product. Ultimately, all of us
will benefit from a class of new tools seamlessly integrated into the
development environment. 
But defining a standard interface to compiler-parser technology all starts
with Microsoft, Borland, Symantec, and other compiler vendors. I urge them to
contact the people at Parasoft, or me directly. 
Michael Floyd
executive editor














































LETTERS


Quibbles with "Rocket Science Made Simple"


Dear Dr. Dobb's,
I think Hal Hardenbergh's heart was in the right place when he wrote his
article "Rocket Science Made Simple" (Dr. Dobb's Sourcebook, September/October
1995), but his thesis is about eight years too late and now is more wrong than
right.
His article discusses the difference between "computer architects" and "chip
designers." The first place where I differ with his viewpoint is on page 53:
The people who designed the Pentium and the P6 and who are currently designing
the P7 are not computer architects. But they're pretty good engineers, based
on the results I've seen. I call them "chip designers."
There are several things wrong with this paragraph. First, even according to
Hal's own definition that only one who creates a new instruction-set
architecture from a blank sheet is worthy of the name "architect," the people
who designed these machines are architects, whether or not they demonstrated
that ability in the machines mentioned. For example, of the three senior
architects on the P6, one was a senior architect on the Intel 960 processor,
and the other two were senior architects on the Multiflow TRACE VLIW
processors. Both of those had new instruction sets.
But that's semantic quibbling. More importantly, the act of creating a new
instruction-set architecture is not the most difficult task in designing a
computer. It's not easy, but it's easier than realizing that ISA in
competitive silicon. At Intel, we call the people who figure out how to do
that "microarchitects," because they are responsible for the
microarchitecture. I claim that microarchitecture is where the action really
is: superscalar, superpipelining, out-of-order execution, speculative
execution, dataflow cores--the choices these people make and the balances they
strike among the possibilities have a much larger effect on performance than
most ISA choices. (Otherwise the x86 performance would have been left behind
long ago, and instead it has gradually caught up to the RISC designs.)
Further, these microarchitects cannot possibly do their jobs well without a
complete understanding of overall computer-system design and the software that
will run in that environment. In other words, they must know at least as much
as the "architects" that Hal acknowledges. By putting too much emphasis on
clean-sheet design, Hal distracts the reader from the real battleground:
implementations and microarchitectures. That's where the real wizardry is
being displayed nowadays.
I have no idea where Hal got the number "30" in conjunction with the P6. P6's
Reservation Station (RS) has 20 slots, and the Reorder Buffer has 40. Was he
trying to blur the two, and therefore averaged between them to characterize
the resulting storage capacity? I don't think that abstraction will work;
those two structures have much different functions.
For example, he says "A scoreboard keeps track of everything that's going on."
It most certainly does not. There is no scoreboard; that's part of the beauty
of the design. The micro-ops themselves carry the control flow of the program
through the machine. It's fundamental to the way the machine works and
facilitates the dataflow mechanism built into the RS. Presumably Hal meant the
Reorder Buffer, which collects the speculative state from the dataflow engine
and reimposes the original program order onto that state. But that's nothing
like scoreboarding, which attempts to track the execution state of a machine.
In P6, there is no corresponding execution state; everything is speculative
until it is retired.
Suggesting that we got register renaming from a paper on register allocation
is very far-fetched. I can state with absolute certainty, having read
Chaitin's paper, that there was no connection between that work and the P6
register-renaming facility. The basic idea of doing reg renaming is as old as
vector machines, meaning at least 25 years old.
Where did Hal get the idea that P6 only speculatively executes up to five
branches? Not true at all. P6 doesn't treat branches any differently from
other kinds of mops, meaning it could (in principle) fill the Reorder Buffer
with currently resolving jumps. The Reorder Buffer has 40 slots. Therefore, P6
can speculate to that depth. (Granted, beyond 8 or 10, there is little
performance advantage to this, but it was cleaner to implement the generalized
solution that we did.)
"P6 self-optimizes all shrink-wrapped code"? Not hardly. We designed it to run
all code well, but good code can always get better results than poor code, and
that's still true with Pentium Pro.
The philosophical design differences underlying the 486, Pentium, and P6
generations have nothing whatever to do with computer architecture and
everything to do with chip design.
I submit this is an obsolete view of what CPU designers do nowadays. It has
little to do with instruction-set design, but it very much depends on a deep
understanding of the entire computer system, and those who perform these tasks
deserve the appellation "computer architect" just as much as instruction-set
designers.
Bob Colwell 
chief architect, Pentium Pro
Intel Corp. 
bcolwell@ichips.intel.com
Hal responds: Let's clear the small beer. Along with their C compilers, most
readers of this magazine generate code with a conditional branch every seven
instructions. With "40 slots," that's five branches, which is what I wrote.
People who read publications with "Transactions" in the title might be
interested in 40 consecutive conditional branches.
About optimizing code: I did not mean to imply that the P6 could convert a
bubble sort to Quicksort (or to my favorite, Knuth's insertion sort). In that
sense you're right, the P6 can't fix bad code. What I meant was that the P6
automatically executes small groups of code in the correct order demanded by
register and data dependencies, and that the well-known Pentium optimizations
of breaking complex instructions into simple ones are done automatically by
the P6.
"[The P6 has] nothing like scoreboarding, which attempts to track the
execution state of a machine. In P6, there is no corresponding execution
state...." Bob, I assumed that information is maintained within the P6 to let
the P6 logic know when register conflicts (and such) have been resolved so
that speculatively executed instructions can be discarded or completed. I
chose to call this information a "scoreboard," since that term has been used
previously with respect to maintaining the internal state of a CPU. You're the
authority, so if you say the P6 doesn't maintain this information, then the
reader should accept your assertion. I don't believe it.
Since writing the DTACK Grounded newsletter in the early '80s, I've asserted
that there are no new solutions in the microprocessor world. All the problems,
and their solutions, were discovered a dozen or more years ago in the
mainframe world, and then again in the minicomputer world eight or so years
ago. Now you tell me no, some of the solutions are "at least 25 years old."
Thanks, Bob! In the future, I'll quote you often to support a point I thought
I was making.
The point of my article "CPU Performance: Where Are We Headed?" (Dr. Dobb's
Journal, January 1994) was that the mainframe world never solved the problem
of the limit to parallelism in scalar code, so we should not expect the
problem to be solved in the microprocessor world either. That article included
a graph showing that micros, as exemplified by x86s, were indeed gaining
ground on RISCs, just as you state in your letter. Are we having a violent
agreement?
I am bemused by your implication that I think the job of computer architect is
harder than that of chip designer. I ain't that dumb. When the 8086 ISA was
originally developed in 1976 or so, less than a man-year went into that
effort. Developing the P6 logic probably required 100 man-years, and that
doesn't count the efforts of software folk (for emulation) and process folk.
Now we come to the main point of your letter: Are CPU chip designers "computer
architects?" It's well known that garbage collectors prefer to be called
"surplus material relocation technicians." I have no problem with this,
because the job description of garbage collectors is identical to that of
SMRTs.
It's also well known in hardware circles that (CPU) chip designers would
rather have a business card describing them as a "computer architect" than an
$8000 raise and a business card that says "chip designer."
While you tap-dance all around this issue, I'm pleased that you do not flatly
assert that chip designers are computer architects. The job description of a
computer architect is quite different from that of a chip designer. That's as
true today as it was eight years ago.
A country doctor who spends a few weekends building a gazebo is a carpenter on
those weekends. This does not mean doctors are carpenters. A very few computer
architects may switch to chip designing (in a desperate search for employment;
there are precious few new instruction-set architectures being developed these
days) but that doesn't mean, as you seem to imply, that chip designers are
therefore computer architects.
But it harms no one to call designers "architects." If it makes the
hard-working chip designers happy and saves the company $8000 a head, then by
all means give all your chip designers business cards with the title "computer
architect." Has Intel done this?
Your use of the word "worthy" is interesting. If some occupations are worthy,
then others are unworthy. I thought Thorstein Veblen and his "invidious
distinctions" had gone out of style.
All this aside, I'd like to congratulate you and your P6 chip designers for
developing a chip that'll run 32-bit x86 code really fast. It's high time we
all climbed out of that 640K ghetto (but I wish there were a 32-bit DOS).
Since I occasionally do math modeling and such, I'm especially pleased with
the P6's FPU performance. I hope--and expect--to see more of the same in the
future.
























SQL Access Group's Call-Level Interface


An independent interface for database development




Roger Sippl


Roger is chairman of the SQL Access Group and founder of Visigenic Software.
He can be contacted at Visigenic Software, 951 Mariner's Island Blvd., Suite
460, San Mateo, CA 94404, (415) 286-1900.


The SQL Access Group (SAG) was formed in 1989 to define and promote standards
for database portability and interoperability. Initially, the membership
roster included database heavyweights Oracle, Informix, and Ingres (now
Computer Associates), as well as hardware vendors Digital, Hewlett-Packard,
Tandem, and Sun. Table 1 lists the current membership roster. The group's
initial projects involved developing the draft ISO/ANSI standard for SQL
(including the embedded-SQL interface specifications) and a specification for
remote data access (RDA). 
In 1990, SAG took the lead in developing an SQL-based Call Level Interface
(CLI). The CLI SAG is an API for database access, offering an alternative
invocation technique to embedded SQL that provides essentially equivalent
operations. SAG envisioned an interface that would enable client/server
applications to access data stored in heterogeneous relational and
nonrelational databases. The interface would be platform, vendor, database,
and language neutral. SAG and X/Open published the CLI Snapshot Specification
in 1992 as a "work in progress," and it was adopted for use in commercial
software products.
Microsoft helped define the X/Open CLI specification and became the first
company to commercialize the CLI specification by shipping Open Database
Connectivity (ODBC) 1.0 for Windows in 1992. To create ODBC, Microsoft
extended the CLI specification and created a three-layer specification in
which the "core" layer corresponds to the SAG CLI. Over the next two years,
the CLI specification underwent several transformations, reemerging in 1994 as
an X/Open Preliminary Specification. Also in 1994, Microsoft released ODBC
2.0, whose core functionality was still aligned with the SAG CLI. Earlier this
year, Microsoft announced that ODBC 3.0 (to be released in 1996) will be fully
aligned with both ISO's CLI standard and SAG's CLI Specification. 
In March 1995, the CLI was finalized and published as an X/Open Common
Application Environment (CAE) specification. CAE specifications are adopted by
consensus and are the basis against which suppliers brand their products. The
adoption and publication of the CLI as an X/Open CAE specification represents
the culmination of five years of cooperative development work within the SQL
Access Group. 
In the meantime, ODBC has gained broad support in the database industry. All
major software vendors--Oracle, Sybase, Informix, Computer Associates, IBM,
Gupta, Powersoft, and Borland, as well as more than 140 application-software
developers and VARs--have added ODBC support. Many of these vendors also offer
proprietary APIs; nonetheless, they see the X/Open CLI specification and ODBC
as important to their strategy. In 1994, Microsoft granted an exclusive
source-code license to Visigenic Software for porting and licensing the ODBC
SDK to non-Windows platforms. As a result, the ODBC SDK is now also available
on all major UNIX platforms, as well as OS/2 and Macintosh. 


A Closer Look at the SAG CLI 


The X/Open CLI specification is a standard API for database access that is
vendor, platform, and database neutral. It defines a set of functions that a
program can call directly using normal function-call facilities. The
specification is language independent and includes header files for both C and
Cobol. 
The CLI specification defines 57 functions that support a rich set of
database-access operations sufficient for creating robust database
applications, including: 
Allocating and deallocating handles (eight calls). 
Getting and setting attributes (ten calls). 
Opening and closing database connections (two calls). 
Accessing descriptors (six calls). 
Executing SQL statements (nine calls). 
Retrieving results (eight calls). 
Accessing schema metadata (four calls). 
Performing introspection (four calls). 
Controlling transactions (two calls). 
Accessing diagnostic information (three calls). 
Canceling functions (one call). 
A database application calls these functions for all interactions with a
database. The CLI enables applications to establish multiple database
connections simultaneously and to process multiple statements simultaneously,
depending on the capabilities of the database servers being accessed. Figure 1
and Figure 2 show the basic control flow for using the CLI functions. 
The X/Open CLI specification was developed with client/server architectures in
mind. In fact, the CLI is ideal for this environment, in which the developer
often knows little (if anything) about the database at the time the
application is written. The X/Open CLI specification defines a set of
introspection functions that enable an application to discover the
characteristics and capabilities of a particular CLI implementation and of any
database server accessed through that implementation. For example,
SQLGetTypeInfo lets you find out what data types are supported by a particular
server, and SQLDataSources returns a list of available database servers and
descriptions. Introspection functions facilitate a technique known as
"adaptive programming," whereby an application adapts its behavior at run time
to take advantage of the capabilities of a particular database environment. 


Processing SQL Statements


The X/Open CLI specification, including sample programs and header files, is
available directly from X/Open. Listings One, Two, and Three illustrate how
the CLI works. Listing One is from a typical database application. Listings
Two and Three are the significant portions of functions called by Listing One.

The application allocates memory for an environment handle and a connection
handle; both are required to establish a database connection. The SQLConnect
call establishes the database connection, specifying the server name
(server_name), user id (uid), and password (pwd). The application then
allocates memory for a statement handle and calls SQLExecDirect, which both
prepares and executes an SQL statement. The SQLGetDiagField call takes
advantage of the CLI's ability to interrogate the database server. In this
case, it returns a code describing the nature of the SQL statement just
executed. With this information in hand, the application calls the
user-defined DoSelect() function to process the results of the SQL statement.
Finally, the program frees the statement handle, disconnects from the
database, and frees the memory previously allocated for the connection and
environment handles. 
The body of the DoStatement() function in Listing Two is built around a switch
statement that processes the results of an SQL statement based on the return
value from the SQLGetDiagField call in Listing One. In the case of a SELECT
statement, which requires its own complex processing, the function calls
another user-defined function, DoSelect(); see Listing Three. In the case of
an UPDATE, DELETE, or INSERT statement, the DoStatement() function calls the
CLI diagnostic function SQLGetDiagField to find out how many rows were
affected by the statement, then calls SQLEndTran to commit the transaction.
The function prints one message indicating whether the commit was successful
and another giving the number of affected rows.
In the case of any Data Definition Language (DDL) statement, the DoStatement()
function first calls the CLI introspection function SQLGetInfo to find out if
the CLI implementation being used supports transaction processing for DDL
statements. If so, the function calls SQLEndTran to commit the transaction,
then prints a message indicating whether the commit was successful. 
The DoSelect() function in Listing Three processes the results of an SQL
SELECT statement. It is called by the DoStatement() function. First,
DoSelect() calls SQLNumResultCols to determine how many columns are in the
result set. Then, for each column in the result set, the function calls
SQLDescribeCol to get descriptive information about the column (that is, its
length, scale, and data type), prints an appropriate row of column headings to
the standard output, allocates memory to bind the results, and calls
SQLBindCol to establish the bindings. Next, DoSelect() calls SQLFetch to fetch
rows from the result set until none are left. For each row, the DoSelect()
function prints the column values, followed by a new line. The CLI can provide
various types of diagnostic information, such as whether a value is truncated
or null; DoSelect() tests for these two conditions and, when they occur, calls
the user-defined BuildMessage() function (not shown) to generate appropriate
error messages. At the end, the function prints a list of any such error
messages. Finally, the application closes the cursor for the statement handle
and frees the data buffers. 


What's Next?


The SQL Access Group formally merged with X/Open in 1995. The charters of SAG
and X/Open's Data Management Working Group were essentially the same, and
X/Open had always edited and published the work of the SQL Access Group. The
merger made it possible to eliminate duplicate efforts, reduce costs, and
unify development efforts. The X/Open Data Management Technical Committee
disbanded, and the X/Open SQL Access Group, now functioning within the X/Open
Technical Program, has assumed all of its responsibilities. 
SAG is working in close cooperation with ISO on its upcoming SQL CLI
specification, which is intended to mirror the X/Open specification. SAG is
also actively pursuing the next logical step for the CLI--the development of
an X/Open test suite for CLI conformance. Such a test suite will enable
developers to verify conformance to the CLI Specification. Development of the
test suite should be well underway by the time this article is published. 
SAG has already begun work on the next version of the CLI specification. Our
mission is to extend and refine the CLI for even more successful
interoperability and to define standards that incorporate newer database
technologies, including XA transaction processing, stored procedures, BLOBs,
triggers, and asynchronous calls. 

The efforts of the major standards organizations, including ISO, ANSI, and
X/Open, as well as the strategies of all the major players in the database
industry, now incorporate the X/Open CLI Specification. Vendors and standards
organizations are moving rapidly in the same direction, a direction the
marketplace has already validated.
Figure 1: Basic CLI control flow.
Figure 2: CLI control flow for processing SQL statements.
Table 1: SQL Access Group members.
AT&T
Borland International
Computer Associates
Fulcrum Technologies
Hitachi
IBM
Information Builders
Informix Software
INTERSOLV
Microsoft
Oracle
Progress Software
Sybase
Visigenic Software

Listing One
/* allocate an environment handle */
SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &henv);
/* allocate a connection handle */
SQLAllocHandle(SQL_HANDLE_DBC, henv, &hdbc);
/* connect to database */
if (SQLConnect(hdbc, server_name, SQL_NTS, uid, SQL_NTS, pwd, SQL_NTS) !=
 SQL_SUCCESS)
 return(PrintErr(SQL_HANDLE_DBC, hdbc));
/* allocate a statement handle */
SQLAllocHandle(SQL_HANDLE_STMT, hdbc, &hstmt);
/* execute the SQL statement */
if (SQLExecDirect(hstmt, sqlstr, SQL_NTS) != SQL_SUCCESS)
 return(PrintErr(SQL_HANDLE_STMT, hstmt));
/* see what kind of statement it was */
SQLGetDiagField(SQL_HANDLE_STMT, hstmt, 0, SQL_DIAG_DYNAMIC_FUNCTION_CODE, 
 (SQLPOINTER)&stmttype, 0, (SQLSMALLINT *)NULL);
/* process the SQL statement */
DoStatement(stmttype);
/* free statement handle */
SQLFreeHandle(SQL_HANDLE_STMT, hstmt);
/* disconnect from database */
SQLDisconnect(hdbc);
/* free connection handle */
SQLFreeHandle(SQL_HANDLE_DBC, hdbc);
/* free environment handle */
SQLFreeHandle(SQL_HANDLE_ENV, henv);

Listing Two
switch(stmttype) {
 /* SELECT statement */
 case SQL_DIAG_SELECT_CURSOR:
 DoSelect();
 break;
 /* searched UPDATE, searched DELETE, or INSERT statement */
 case SQL_DIAG_UPDATE_WHERE:
 case SQL_DIAG_DELETE_WHERE:
 case SQL_DIAG_INSERT:
 /* get row count */
 SQLGetDiagField(SQL_HANDLE_STMT, hstmt, 0, SQL_DIAG_ROW_COUNT, 

 (SQL_POINTER)&rowcount, 0, (SQLSMALLINT *)NULL);
 if (SQLEndTran(SQL_HANDLE_ENV, henv, SQL_COMMIT) == SQL_SUCCESS)
 printf("Operation successful\n");
 else 
 printf("Operation failed\n");
 printf("%ld rows affected\n", rowcount);
 break;
 /* other statements */ 
 case SQL_DIAG_ALTER_TABLE:
 case SQL_DIAG_CREATE_INDEX:
 case SQL_DIAG_CREATE_TABLE:
 case SQL_DIAG_CREATE_VIEW:
 case SQL_DIAG_DROP_INDEX:
 case SQL_DIAG_DROP_TABLE:
 case SQL_DIAG_DROP_VIEW:
 case SQL_DIAG_GRANT:
 case SQL_DIAG_REVOKE:
 SQLGetInfo(hdbc, SQL_TXN_CAPABLE, &txn_type, 0, 0);
 if(txn_type == SQL_TC_ALL) {
 if (SQLEndTran(SQL_HANDLE_ENV, henv, SQL_COMMIT) == 
 SQL_SUCCESS) 
 printf("Operation successful\n");
 else 
 printf("Operation failed\n");
 }
 break;
 /* other implementation-defined statements */
 default:
 printf("Statement type=%ld\n", stmttype);
 break;
} 

Listing Three
/* determine number of result columns */
SQLNumResultCols(hstmt, &nresultcols);
/* display column names */
for (i=0; i<nresultcols; i++) {
 SQLDescribeCol(hstmt, i+1, colname, sizeof(colname), &colnamelen, 
 &coltype, collen[i], &scale, &nullable);
 /* user-defined function to get the display length for the data type */
 collen[i] = DisplayLength(coltype, collen[i], colname);
 printf("%*.*s", collen[i], collen[i], colname);
 /* allocate memory to bind column */
 data[i] = (SQLCHAR *) malloc(collen[i]+1);
 /* bind columns to program vars, converting all types to CHAR */
 SQLBindCol(hstmt, i+1, SQL_CHAR, data[i], collen[i]+1, &outlen[i]);
}
printf("\n");
/*display result rows */
while (SQL_SUCCEEDED(rc=SQLFetch(hstmt))) {
 errmsg[0] = '\0';
 for (i=0; i<nresultcols; i++) {
 if (outlen[i] == SQL_NULL_DATA outlen[i] >= collen[i])
 /* set data text to "NULL" or add to errmsg */
 BuildMessage(errmsg, (SQLPOINTER *)&data[i], collen[i],
 &outlen[i], i);
 printf("%*,*s ", outlen[i], outlen[i], data[i]);
 } /* for all columns in this row */
 /* print any accumulated error messages and new line */

 printf("%s\n", errmsg);
} /* while rows to fetch */
SQLCloseCursor(hstmt);
/* free data buffers */
for (i=0; i<nresultcols; i++) {
 free(data[i]);
}
End Listings























































Performance Testing, ODBC, and Native SQL APIs


When query-execution time is critical




Ken North


Ken, a consultant and developer, is author of Windows Multi-DBMS Programming
(John Wiley & Sons, 1995). He can be contacted at 71301 .1306@compuserve.com.


Performance is a major factor in the selection and tuning of almost all
client/server application components--the network, server, database design,
and even the SQL API. Application performance can be affected by network
latency, programming techniques, and client libraries, among other factors. To
objectively evaluate an SQL API, you must differentiate between application
performance and API performance. You must also use a vehicle for performance
testing that isolates the API as a variable while holding constant other
factors that affect performance. APIBench, the benchmark software described in
this article, lets you vary the API while executing identical queries using
the same data, network libraries, and other C/S components. 
Most SQL engines and servers let developers use more than one API. SQL DBMSs
traditionally include a native or proprietary interface that is either a call
level interface (CLI) or embedded SQL (ESQL). More recently, developers have
had the option of writing to the Open Database Connectivity (ODBC) API. ODBC
provides a single interface for SQL queries to access a variety of relational
and nonrelational databases. A typical client/server database architecture
puts the database engine and the data on a server machine that is physically
separate from and networked to its clients. ODBC operates in this two-tier
architecture by putting the Driver Manager and one or more database drivers on
the machine with the client application. ODBC drivers are DLLs for Windows and
OS/2, and shared libraries for UNIX and the Macintosh. The ODBC Driver Manager
fields an application's ODBC function calls and routes them, when necessary,
to the appropriate driver.


C/S Benchmark Architecture


APIBench is a Windows client that uses an executable program for its user
interface and five DLLs to interface to different database APIs. The benchmark
software's extensible architecture lets you plug in other APIs by developing
DLLs and updating configuration files. Figure 1 illustrates the architecture
of APIBench. The DLLs that contain the native and ODBC code have a common
skeleton, so each type of test has comparable functions. For example, all of
the DLLs include a doInsert function to perform the INSERT test, printTime to
update the log, and so on.
My focus with APIBench has been client/server performance testing. The network
layer for these tests has used different network libraries because the
software works with a variety of protocols and networking components. The
latter are components that operate over lower-level network protocols, such as
TCP/IP, IPX/SPX, and NetBIOS. Examples include Oracle's SQL*Net and
Transparent Network Substrate (TNS), and SQL Server's Tabular Data Stream
(TDS). The layered architecture means that TNS and TDS can operate with
various network protocols and a client application can treat the network layer
as an abstraction. Changes in the network layer may affect performance and, by
extension, the results of the SQL API benchmarks. (Such changes don't affect
the native or ODBC code in these programs.) In other words, the Oracle
benchmark DLL will run without modification on Windows clients using
combinations such as SQL*Net with NetWare and IPX/SPX or Windows/NT with
TCP/IP. The network libraries are mostly transparent to the client
application. With the exception of changing TDS packet size for SQL Server,
there is no logic in the SQL API that varies based on the underlying network
libraries. 
The test suite includes typical SQL operations using an employee table and a
department table. When operating against them, the program uses an index on
the employee ID and department ID, respectively. The tests include an INSERT
test to populate both tables, two UPDATE and two DELETE tests, and five query
(SELECT) tests. The SQL statements for the five SELECT tests use joins,
aggregate functions, WHERE filters, and individual row selections. The test
suite uses a mix of prepare and execute queries and direct-execution queries.
A prepared query, sometimes called a "compiled query," is typically faster for
repetitive operations, while direct-execution queries are preferable for
one-time execution of an SQL statement. 
The benchmark software creates two log files with individual test information
and an array of execution times. The individual ODBC and native DLLs execute
one or more of the SQL queries defined in SQLSTMTS.H, measure the time to
execute each statement, and record information about the test in BENCH.OUT;
see Listing One. The programs also write the execution time for each test to
RESULT.OUT. Before relying on the accuracy of the numbers in RESULT.OUT, one
should check BENCH.OUT for anomalies such as execution errors, status
messages, or incorrect row counts. 


SQL Programming and Embedded SQL


ANSI and ISO recently approved a new standard for an SQL CLI, but the
benchmark DLLs described here used older SQL APIs. The Sybase System 10 and
Oracle 7 DLLs use a CLI to access their data while the CA Open-Ingres and
Informix 5 DLLs use ESQL. The CLIs support run-time binding to data, while
ESQL represents compile-time binding because it requires preprocessing and
compilation. SQL products that support ESQL typically provide preprocessors
for common languages such as C, Cobol, and Fortran. The preprocessor scans the
source code and replaces embedded SQL statements with calls to modules in a
vendor-supplied object library. A programmer compiles and links the
preprocessor output and must repeat the process when there are revisions to
the source or database design. The Oracle Pro*C preprocessor supports ESQL for
C programs. It lets you create a native DLL for Oracle that uses ESQL to
complement the benchmark DLL that uses the Oracle Call Interface (OCI).
The SQL standard defines two data structures--the SQL Communications Area
(SQLCA) and SQL Descriptor Area (SQLDA)--that ESQL programs use for feedback
from the DBMS. The SQLCA provides error and status information reported by the
DBMS, so error handlers in ESQL programs use it for run-time error checking.
The SQLDA is a dynamic host-language structure used to pass information, such
as parameters to EXECUTE statements, and metadata, such as column type,
length, or whether a column contains a null value. The SQLDA is the area where
this metadata or descriptor data resides after a program executes a DESCRIBE
statement. The SQLDA format varied with SQL products until it was defined as
part of the SQL-92 standard. ODBC 3.0 will be the first release of ODBC that
uses this type of descriptor. 
Embedded statements in these benchmark programs don't include hard-coded
column names. Instead, they use ESQL's DESCRIBE statement, dynamic SQL, and
the SQLDA. 
Typical SQL software includes a query optimizer that processes SQL statements
to determine the optimum access path or the execution plan for a query. Many
SQL tools permit you to prepare or compile statements that will be used
repetitively by saving the query's execution plan. PREPARE checks the validity
of a statement including dynamic parameters and optimizes its execution. An
application submits the prepare request once and may then use multiple
execution requests. This speeds up processing for repetitive queries. For
single-statement execution, ESQL provides EXECUTE IMMEDIATE to avoid the
overhead of preparing a statement. 
The logic to execute dynamic SELECT statements illustrates the use of these
features. ESQL uses host variables from the host programming language in SQL
statements. ESQL programmers use BEGIN DECLARE SECTION and END DECLARE SECTION
to frame the declaration of host variables. The routine described here
declares a host variable for a dynamically defined SQL string and uses it when
preparing the SELECT statement EXEC SQL PREPARE sel_stmt FROM :SQLstring;. The
dynamic SELECT routine declares host variables, writes the SQL statement to
the log file, and then starts the timer for the benchmark test. It allocates
an SQLDA, declares a cursor for the query, and prepares the SELECT statement.
Next, the routine DESCRIBEs the statement into a query descriptor and executes
a loop that sets variables (data type, length) based on the column
descriptions. The loop also aggregates the cursor size using the length of
each column. After stepping through all columns, the routine allocates a data
buffer based on the cursor size, adjusts for null indicators, and opens the
query cursor. Then it executes a FETCH loop using the query descriptor and
closes the cursor. Finally, it stops the benchmark timer, updates benchmark
log files, and frees the data buffers.


Native CLIs and ODBC


The Sybase Open Client (CT-Library) and Oracle Call Interface (OCI) do not use
source-code preprocessing because they support run-time calls to a library of
data-access functions. The CT-Library is new with System 10. It is a successor
to DB-Library and includes support for System-10 features such as cursors. It
is also intended to operate with servers other than SQL Server so it includes
functions such as ct_capability to define the features available with a
specific connection. OCI includes connection functions, binding functions,
statement processing, transaction control, and results processing that
includes array fetches. It supports cursors and batched execution of SQL
statements. To process a typical SQL query when using a CLI, you create a
connection to the database, bind the columns in the query to buffers or memory
variables in the application, submit the SQL command for processing, and use a
loop to fetch the results from the query. CLIs such as ODBC and CT-Library
include functions that provide a counterpart to ESQL's immediate execution and
prepared execution of queries.
Several features of ODBC programming are relevant to this benchmark program.
Because ODBC is a CLI, there is no source-code preprocessing: ODBC includes
functions to connect to databases, prepare and submit SQL statements, and so
on. Unlike other CLIs, however, ODBC can operate on multiple databases by
loading the appropriate database driver at run time. ODBC uses handles to
track resources such as memory and errors associated with the application
environment, connections, and SQL statements. It includes functions that
return information about the API and SQL dialect that a driver implements.
ODBC operates on data using SQL statements, so it includes functions for both
direct and prepared query execution. PREPAREand EXECUTE queries permit the
DBMS to optimize the query, store its access plan, and use the plan
repetitively without repeating the optimization process. ODBC's SQLExecDirect
is the counterpart to ESQL's EXECUTE IMMEDIATE, while SQLPrepare and
SQLExecute are the counterparts to PREPARE and EXECUTE, respectively. ODBC
drivers sometimes emulate prepared queries when a DBMS does not natively
support them. For example, the driver for Microsoft SQL Server will create
temporary stored procedures to use when an ODBC application uses prepared
queries.
A program using ODBC must allocate the environment handle before other
handles, so BENCHODB, the DLL for ODBC testing, allocates an environment
handle in its LibMain routine. A description of the ODBC equivalent to the
ESQL dynamic SELECT routine may provide a better understanding of CLI
programming. BENCHODB's doSelect routine processes SELECT queries using
prepare and execute logic. It writes the statement to the log file, allocates
a statement handle, and starts the benchmark timer. The program prepares the
query using SQLPrepare and then calls SQLNumResultCols to get the number of
columns in the result-set buffer. Next, it uses a loop to describe each column
with SQLDescribeCol so that it can allocate storage for database columns. To
bind columns to memory variables, doSelect calls SQLBindCol and passes
arguments such as the column type and length. After binding columns, it calls
SQLExecute to execute the statement and then executes a fetch loop that steps
through the result set by calling SQLFetch. Each call to SQLFetch populates
the bound columns with new data from the database. Once the program completes
the fetch loop, it stops the benchmark timer and updates the benchmark logs.
Some of the benchmark code for ODBC logic reflects the fact that ODBC supports
both escape clauses and parameter markers in SQL statements. The
SQLBindParameter function binds a buffer to a parameter marker. ODBC drivers
support the use of escape clauses to delineate SQL extensions such as outer
joins, LIKE predicates, and procedure calls. Using these escape clauses
simplifies the writing of interoperable SQL, but bypassing the scan for escape
clauses may increase performance in some circumstances. Some of the SQL
statements used with APIBench are in two forms.


ESQL and CLI Differences


To provide a realistic basis for comparison, the APIBench software uses
techniques representative of typical ESQL and CLI coding. This guarantees that
you will see differences as you examine the source code for each DLL. Whether
using ESQL or a native CLI, developing native code is different from ODBC
programming in several respects. The native programmer targets just one DBMS,
the features, function calls, and data types of which are known. ODBC
programmers write multi-DBMS code to get information about the database and
the driver so that an application can adapt to the available features and
types at run time. Error-handling differences arise in part because CLIs don't
use an SQLCA for run-time error checking. ESQL programmers sometimes use a
WHENEVER clause to define error-handling logic that will be executed whenever
certain errors occur, but developers using a CLI such as ODBC usually test the
return code from each function for errors. To retrieve error information after
calling ODBC functions, developers use a loop that calls SQLError to retrieve
all of the error codes and messages for a specific function. CLIs bind columns
instead of using ESQL host variables. SQLBindCol is the ODBC equivalent of
ESQL's host variables, while CT-Library uses ct_bind. 
ODBC differs from ESQL and other CLIs in that it is multidatabase and
therefore includes functions to get run-time information about the database
and driver it is using. APIBench includes a few examples of the use of
adaptive techniques in ODBC programs. The source code for BENCHODB uses
SQLGetInfo, an ODBC information function that returns the driver's version
number. With version 1.0 drivers, the program uses SQLSetParam calls to
specify values to replace parameter markers. SQLSetParam is a function
replaced by 2.0's SQLBindParameter, so the program calls SQLBindParameter to
bind parameters if the driver is a version 2 driver. 
Details such as the handling of buffers for column data and the functions to
describe those columns vary with APIs. The ODBC code binds columns in the
database to memory variables by calling SQLBindCol, gets type information from
SQLGetTypeInfo, and uses SQLDescribe to describe columns in result sets. An
ESQL application uses known data types and a DESCRIBE statement to get
descriptor data that includes null indicators, length, and type. ODBC 3.0
implements additional function calls to get descriptors similar to SQL-92
descriptors, so the logic of the ODBC benchmark DLL testing may change in the
future.


Performance-Testing Considerations



When creating the benchmark DLLs, it is important to properly frame the
individual tests with function calls to get timing information. To add new
DLLs, you can use the placement of _ftime (&begin) and _ftime (&end) calls in
an existing native DLL as a model. To verify the accuracy of the times
reported in RESULT.OUT, examine the details of each test recorded in
BENCH.OUT. The results of individual tests should be as expected and
consistent with tests using other native APIs and ODBC. For example, if you
specify a row count of 2500, verify that the log shows that the program
populated the tables with 2500 rows and that subsequent tests use 2500 row
tables. A simple oversight such as running the populate test twice in
sequence, without a deletion test, will populate the table with double the
expected number of rows and skew the data from subsequent tests. Other factors
may influence results. By setting NativeSQL=Yes in BENCHMRK.INI, you can
enable logic in the ODBC benchmark DLL to use native SQL by foregoing the use
of ODBC escape clauses. 
Although the goal of these benchmark programs is to provide the best
"apples-to-apples" comparison between ODBC and other APIs, in some areas,
comparisons are inherently problematic. For instance, Microsoft and SQL Server
use a Tabular Data Stream (TDS) protocol to send data to and from the server.
TDS uses a variable packet size with a default of 512 bytes. ODBC defines a
connection option (SQLConnectOption) to vary the packet size for Microsoft's
SQL Server driver. Intersolv lets you modify the packet size by using a switch
in ODBC.INI. To test with different packet sizes, revise the Sybase CT-Library
benchmark code to negotiate packet size with the server. 
When running these tests using Intersolv DataDirect drivers, ODBC execution
times were often less than native times, sometimes dramatically so. Two of the
reasons appear to be the skill level of the driver developer and the types of
performance optimizations--enabled by a switch in the ODBC.INI file--available
with each new generation of ODBC drivers.
My testing of one Ingres query produced results that provided an excellent
example of the effect of the .INI settings. If I used recommended driver
settings, the ODBC code executed in about half of the time required for my
native code. The driver was a prerelease version from Intersolv, so I
contacted the company for information about the optimizations available
through various ODBC.INI settings. The driver developer described settings
that affected cache size and other optimizations. With that explanation, I
could disable those optimizations and produce ODBC test times virtually
identical to my native tests. However, most people are likely to read the
documentation and use the correct switch settings, making the apples-to-apples
comparison that enables the driver's performance features.


Building and Operating APIBench


To implement new native API tests, you create a DLL containing the native code
and update the benchmark program's initialization file (see BENCHMRK.INI,
Listing Two). To update BENCHMRK.INI, add to the list of DLLs (API_DLLS) and a
section for API that includes the name of the new DLL. 
The C source code for the executable and DLLs (available electronically; see
"Availability" on page 3) has been compiled and tested using Microsoft Visual
C++. The programs that do not use ESQL are not preprocessed. They contain
conditionals, so you can build them using Borland C++. For example, the source
uses an #ifdef block because Borland uses a timeb structure, whereas the
Microsoft compiler uses a _timeb structure. I prefer C++ over C, but C still
enjoys an advantage when porting to multiple platforms. ODBC is available with
Windows, Windows NT, Windows 95, OS/2, Macintosh System 7, and various flavors
of UNIX such as AIX and Solaris. The MAKE files and IDE files that accompany
this software are for the Windows version only.
The benchmark application uses a simple form interface, as in Figure 2. To run
the benchmarks, select the individual tests with a check box, an API push
button (native or ODBC), a data source, and finally, the Run command button.
The application also presents buttons to create tables, clear prior
selections, and exit the application. Because network latency and database
checkpointing can affect the execution time for an individual run, you will
get more reliable data by executing multiple runs and calculating mean
execution times. The benchmark form includes a control that permits you to
enter the number of times to run the selected tests. My APIBench tests with
four native APIs have shown that there is no performance degradation when
using ODBC. 
Figure 1: SQL API benchmark program (APIBench) and dynamic link libraries.
Figure 2: The APIBench form to select SQL APIs and benchmark.

Listing One
Native Sybase
Date and time: 7-6-1995 15:19:52.21
 TDS Packet Size: 512
INSERT INTO EMPLOYEE
 (FIRST_NAME, MID_NAME, LAST_NAME, EMP_ID, DEPT_ID, SALARY, HIRE_DATE)
 VALUES (?,?,?,?,?,?,?)
Prepare & (Execute x 2500) 98.15
CT_Library: row select (prepare and execute) 
SELECT *
 FROM EMPLOYEE
 WHERE EMP_ID = ?
<doRowSelect> ct_dynamic success following ct_param
<doRowSelect> ct_send success following ct_param
Prepare & (Execute * 1) 153.46
==============================
ODBC System10 07
Date and time: 7-6-1995 15:31:09.93 Driver ODBC Version: 02.10 
 Native SQL: Yes
SELECT *
 FROM EMPLOYEE
 WHERE EMP_ID = ?
Prepare & (Execute * 2500) 264.53
==============================

Listing Two
[API]
API_DLLS = INFORMIX,ORACLE7,Sybase,INGRES
[INFORMIX]
Description="Informix"
DLL="BENCHINF.DLL"
LOGIN="test"
PWD="test"
RowCount=2500
[ODBC]
RowCount=2500
[ORACLE7]
Description="Oracle 7"
DLL="BENCHOR7.DLL"
LOGIN="test1@x:training"
PWD="test"
RowCount=2500
[INGRES]
Description="Ingres"

DLL="BENCHING.DLL"
RowCount=2500
[Sybase]
Description="Sybase"
DLL="BENCHSYB.DLL"
RowCount=2500
End Listings>>
<<























































Partitioning Applications in Smalltalk


Separating client and server objects




Jay Almarode


Jay has been programming in Small-talk since 1986 and is currently a senior
software engineer at GemStone Systems. He can be contacted at
almarode@slc.com.


Smalltalk has historically been used as a client-side-only development
language because its initial commercial implementations were single-user
systems primarily known for their GUI-building technology. For the most part,
multiuser, distributed Smalltalk systems have only been commercially available
for a couple of years. Such systems operate on server-class machines and take
advantage of shared memory, asynchronous I/O, and transaction logging to
provide the throughput required for multiuser, enterprise-wide database
applications.
When Smalltalk is used as a client-only language, the application and
business-logic objects, as well as the presentation-logic objects, must all
reside on the client machine. To share these objects and make them persistent,
the typical client Smalltalk application connects to a legacy or relational
data server, which stores the state of the objects. This architecture has both
technical and business drawbacks. The data servers typically do not have the
capability to execute complex business logic. They may provide some query
capability or stored procedures, but they do not provide an object model with
a computationally complete, extensible language such as Smalltalk.
Consequently, much data must be transferred to the Smalltalk client to execute
the application logic. As the number of client workstations in an
enterprise-wide application increases, the network becomes overloaded. As
applications require more data to execute complex business logic, the client
machine needs more memory and processing power. 
The business logic is a shared resource and a strategic asset to the company.
Duplicating business logic across thousands of clients poses security risks,
makes maintenance expensive, and discourages frequent updates to the
application. It should be under centralized control.


Server-Based Smalltalk 


Two new technologies are emerging to address these needs: three-tier
architecture and server-based Smalltalk. Three-tier architecture evolved from
the client/ server architecture. It defines a middle tier (called an
"application server") between multiple client machines and a data server. The
application server is where shared business logic is executed. The three-tier
architecture reduces the amount of data transferred between the client and the
application server, since business logic can be executed on the application
server rather than by the client. The application server also provides a
central point of control to update business logic, implement security
mechanisms, and provide fault tolerance of key data. 
Server-based Smalltalk provides the implementation technology to build object
application servers. Server-based Smalltalk is a multiuser environment with an
execution engine tuned for disk access to handle many large-sized objects. In
addition, server-based Smalltalk provides the following:
A model of transactions.
Concurrency control. Smalltalk can lock objects or collections of objects, and
can provide a semantic-based concurrency control for higher throughput.
Fault tolerance (backup and recovery).
Security mechanisms.
With a Smalltalk environment optimized for multiuser execution, you can
implement shared business objects with the same object-oriented language used
to build client applications. This makes partitioning the application easier,
since a developer can build an application entirely on the client workstation,
then move portions of it to the server as needed. Because the same code can
execute on either the client or the server, it is easier to change
partitioning decisions to tune the application. With a common object model on
the client and the application server, objects do not need to be transformed
from one form to another. Relational data is mapped into objects less
frequently in the object application server (where only a single network
connection to the relational database is required) than on the client.
When partitioning an application, you must determine whether objects should
reside on the client or the server. Candidates for server objects include
security-sensitive objects, large collections of objects requiring optimized
query capability, objects requiring shared access or fault tolerance, business
objects, and gateway objects (ones that provide a view of raw data on the data
server). Client-side objects include window or GUI, application-specific, and
view objects that provide a view of a server object. 


An Example Application 


To illustrate partitioning, consider an application for a lending institution
that must determine risk of loan default. For simplicity, the application will
evaluate only loan applicants who have previously borrowed from the lender. In
this case, the lender considers the applicant's loan history and other
customers with similar assets and liabilities.
In an unpartitioned application, the Smalltalk client relies upon a
relational-database server to store the shared customer records. Once all the
applicant's information is entered, an operation is invoked to determine if
the applicant qualifies for the loan. The Smalltalk code implements the
business logic as follows: 
1. Receive the applicant's social-security number, name, and requested loan
amount as input. 
2. Compose an SQL query string to determine if the applicant is a previous
customer, then transmit the query to the relational server. 
3. Receive the query result back as nested arrays of basic data types (a
tabular representation of data in Smalltalk). Map the raw data into a new
instance of class Applicant, with additional information from the query
result, such as address, phone number, and so on. 
4. If the applicant is a previous customer, compose another SQL query string
that retrieves the applicant's previous records. (A clever SQL programmer can
get this information bundled with the first SQL query.)
5. Receive the query result back as nested arrays of basic data types, and map
the raw data into instances of class LoanHistory.
6. Compose an SQL query string that retrieves other customers with similar
assets, liabilities, and loan amounts. 
7. Receive the query results back and map to instances of class Customer. 
8. Invoke the analysis code that determines if this applicant is a bad risk.
This architecture has a number of drawbacks. Every time the application is
run, tabular relational data must be transformed into objects. If the
relational schema is modified, every client Smalltalk application must be
updated with new transformation code. To execute the analysis algorithm, all
customer data must be transmitted to the client. If many client workstations
transmit large amounts of data for loan applicants, the network may become
overloaded. Populating the client with a large number of customers to execute
the analysis algorithm can stress the memory and CPU capacity of the client
machine. Moving the customer data to the client to perform the analysis poses
a security risk for sensitive data. Finally, the analysis algorithm is
duplicated on every client machine, requiring all clients to be updated if the
algorithm changes. 
To overcome these drawbacks, I redesigned the application for a three-tier
architecture and partitioned the application. I wrote the server code (see
Listing One in GemStone Smalltalk. The client code is written for either
VisualWorks, Visual Smalltalk, or VisualAge using the GemStone-Smalltalk
Interface; see Listing Two.
The input data (the applicant's name, social-security number, and requested
loan amount) belong on the client, as well as the window, form, and widget
objects used to prompt and display this data. Since large collections of
objects, or objects requiring security or fault tolerance, belong on the
server, the set of customers should reside there also. Likewise, business
objects and objects requiring shared access belong on the server; therefore,
the object(s) that implement the risk-analysis algorithm belong there.
Finally, objects that present a view of server objects belong on the client
machine. The applicant object is a view of a server object, because not all of
the applicant's state is needed in the client (for example, the applicant's
loan history remains on the server, but his address is displayed in a client
window). 


Partitioning Mechanisms 


Once the application is partitioned, the client can reference and manipulate
server objects in Smalltalk using either forwarders or replicates. 
A forwarder can be thought of as a cover for a server object masquerading as a
client object. A forwarder does not contain any state of the server object,
and when a message is sent to a forwarder, its behavior is executed on the
server. The Smalltalk message-sending mechanism allows forwarding of messages
automatically (by special handling of the doesNotUnderstand: error), so no
special code is required to check for the presence of forwarder objects. When
a message is sent to a forwarder, arguments are transformed automatically for
execution on the server. There are a number of ways you can get a forwarder to
a server object. In most cases by default, the return value of a message sent
to a server object is a replicate. In GemStone Smalltalk, you can specify that
a forwarder be returned instead by prepending the message with fw. Another way
is to send the message beForwarder to a replicate. If you want all instances
of a particular client class to be forwarders, you can implement the class
method instancesAreForwarders to return true. 
A replicate is a copy of a server object that resides on the client. When a
message is sent to a replicate, its behavior is executed locally. The
GemStone-Smalltalk Interface transparency mechanism keeps the state of the two
objects in sync so that the replicate always accurately reflects the state of
the server object (based upon the current transaction's viewpoint) and vice
versa. To enable this feature in either VisualWorks or VisualAge, you can send
the message makeGSTransparent to the class of the replicate. When the client
application modifies a replicate, it is automatically marked "dirty" and
changes are flushed to the server object at an appropriate time (before server
behavior is executed, for example, or when the transaction is committed). When
other users modify and commit changes to the server object, the replicate in a
client will not be updated until the transaction is committed or aborted. This
is called "faulting." Ordinarily, the replicate will not be faulted until it
is next accessed. However, it is possible to configure it to be faulted
immediately when the transaction begins by implementing a faultPolicy method
for the class of the replicate. It is also possible to execute additional
application code before or after the replicate is faulted by implementing a
preFault or postFault method. Listings One and Two illustrate how to set an
immediate fault policy and to trigger the Smalltalk-dependency mechanism after
a replicate is faulted. This might be useful if the application wanted a
window displaying the replicate to be updated immediately when a new
transaction began. 
An important consideration when programming with replicates is controlling the
replication of "composite objects," or objects with nested subobjects. This is
useful because a client application may only need a portion of the state of a
server object for a particular application. An application needs to control
which instance variables are retrieved and how they are assigned (with
replicates or forwarders). If an instance variable is to be assigned a
replicate, the application may also want to specify how many levels deep to
replicate. To exercise this control in GemStone Smalltalk, you implement the
method replicationSpec for the class of the replicant in the client. This
method returns nested arrays, each of which indicates the name of the instance
variable and how it is to be replicated. Again, Listing Two provides examples
of this method, where the name and social-security number of an applicant is
always faulted in, the address is faulted in to a minimum of two levels, and
the employer is always faulted in as a forwarder (the employer object remains
on the server). 

When not all of a deeply nested object is faulted into the client, a
placeholder object must take its place. This object, called a "stub,"
maintains knowledge of its corresponding object on the server so that it may
replicate the object if necessary. Thus, when a stub is sent a message, it
retrieves the object from the server, replaces all references to the stub with
the retrieved object, then resends the message. The application code does not
have to test for the presence of a stub object--in GemStone Smalltalk, this
all happens transparently. Conversely, sometimes it is desirable to turn a
replicate into a stub to free up the space used by the replicate and its
subobjects. You can do this by sending the message stubYourself to a
replicate. 


Conclusion


These mechanisms can be utilized in a number of ways to partition and then
fine tune an application for maximum performance in a client/server
environment. The advent of server-based Smalltalk allows this partitioning and
provides new solutions to building high-performance applications in Smalltalk.

Listing One
*******************************************************
*******************************************************
* On the GemStone (server) side 
*******************************************************
*******************************************************
!--------------------------------------------------------------
! This module consists of the class definitions for the object
! application server implemented in GemStone Smalltalk.
!--------------------------------------------------------------
! begin by defining the classes
run
Object subclass: 'Address'
 instVarNames: #(street city zip)
 classVars: #()
 poolDictionaries: #()
 inDictionary: UserGlobals
 constraints: #[ #[#street, String], #[#city, String], #[#zip, Integer] ]
 isInvariant: false.
%
run
Object subclass: 'Company'
 instVarNames: #(name address)
 classVars: #()
 poolDictionaries: #()
 inDictionary: UserGlobals
 constraints: #[ #[#name, String], #[#address, Address] ]
 isInvariant: false.
%
run
Object subclass: 'LoanHistory'
 instVarNames: #(amount interestRate date status)
 classVars: #()
 poolDictionaries: #()
 inDictionary: UserGlobals
 constraints: #[
 #[#amount, Integer],
 #[#interestRate, Float],
 #[#date, DateTime],
 #[#status, Symbol] ]
 isInvariant: false.
%
run
Set subclass: 'LoanHistorySet'
 instVarNames: #()
 classVars: #()
 poolDictionaries: #()
 inDictionary: UserGlobals
 constraints: LoanHistory
 isInvariant: false.
%

run
Object subclass: 'Customer'
 instVarNames: #(name ssn address employer loanHistory)
 classVars: #()
 poolDictionaries: #()
 inDictionary: UserGlobals
 constraints: #[
 #[#name, String],
 #[#ssn, Integer],
 #[#address, Address],
 #[#employer, Company],
 #[#loanHistory, LoanHistorySet] ]
 isInvariant: false.
%
run
Set subclass: 'CustomerSet'
 instVarNames: #()
 classVars: #()
 poolDictionaries: #()
 inDictionary: UserGlobals
 constraints: Customer
 isInvariant: false.
%
! automatically generate methods to access the instance variables
run
#[ Address, Company, LoanHistory, Customer ] do: [ :aClass 
 aClass compileAccessingMethodsFor: aClass.instVarNames
]
%
! implement various methods that execute on the server
category: 'Accessing'
method: Customer
getLoanHistory
" Return the receiver's loan history set. If one does not exist,
create it. "
loanHistory isNil
 ifTrue: [ loanHistory := LoanHistorySet new ].
^ loanHistory
%
category: 'Updating'
method: Customer
addLoanHistory: aLoanHistory
" Add the given loan history object to the reciever's loan history set. "
self getLoanHistory add: aLoanHistory
%
category: 'Qualification'
method: CustomerSet
currentInterestRate
" Return the current interest rate for loans. "
" A fixed rate for purposes of this example "
^ 0.12
%
category: 'Searching'
method: CustomerSet
findCustomerWithSSN: ssn
" Query the receiver to find a customer with the given social security
number. "
" Note: this query syntax allows the use of indexes and fast
lookup mechanisms for large collections "

^ self detect: { :cust cust.ssn = ssn } ifNone: [ nil ]
%
category: 'Qualification'
method: CustomerSet
qualifyBasedOnPastHistory: customer amount: anAmount
" A previous customer is applying for a new loan. Determine whether
he/she is a bad risk based upon their past loan history and other
previous customers with similar characteristics. Return a symbol
indicating the status of the loan request. "
" (This algorithm is left as an exercise for the motivated reader) "
 history 
history := LoanHistory new
 amount: anAmount;
 interestRate: self currentInterestRate;
 date: DateTime now;
 status: #accepted.
customer addLoanHistory: history.
^ #accepted
%
category: 'Qualification'
method: CustomerSet
qualifyNewApplicant: applicant amount: inputLoanRequest
" A new customer is applying for a loan. Determine whether
he/she is a bad risk based upon assets and liabilities.
Return a symbol indicating the status of the loan request. "
" (This algorithm is left as an exercise for the motivated reader) "
 history 
history := LoanHistory new
 amount: inputLoanRequest;
 interestRate: self currentInterestRate;
 date: DateTime now;
 status: #accepted.
applicant addLoanHistory: history.
^ #accepted
%
! Initialize the set of all customers
run
UserGlobals at: #AllCustomers put: CustomerSet new
%
commit

Listing Two
*******************************************************
*******************************************************
* On the GemStone-Smalltalk Interface (client) side 
*******************************************************
*******************************************************
!--------------------------------------------------------------
! This module consists of the class definitions for the client
! application implemented in VisualWorks
!--------------------------------------------------------------
!
Object subclass: #Applicant
 instanceVariableNames: 'name ssn address employer requestedAmount status '
 classVariableNames: ''
 poolDictionaries: ''
 category: 'nil'!
!Applicant methodsFor: 'replication'!
faultPolicy

 "Cause a replicate to be refreshed immediately 
 when a new transaction begins."
 ^#immediate!
postFault
 "The receiver has just been faulted in from 
 GemStone. Inform any dependents."
 ^self changed: #faulted!
replicationSpec
 "Return a specification of how instance variables
 should be faulted in."
 ^ super replicationSpec , 
 #(
 (name replicate)
 (ssn replicate)
 (address min 2)
 (employer forwarder)
 )! !
"-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- "!
Applicant class
 instanceVariableNames: ''!
!Applicant class methodsFor: 'initialization'!
initialize
" Connect this class with a GemStone class named 'Customer'. "
GSI addGlobalConnector: 
 (GSClassConnector stName: #Applicant gsName: #Customer)
! !
!Applicant class methodsFor: 'analysis'!
qualify: inputName ssn: inputSSN amount: inputLoanRequest
" Create a new applicant for a loan and qualify them. Return the new
applicant. "
 allCustomers applicant status 
" get a forwarder to the large collection of all customers that
 resides on the server "
allCustomers := (GSI fwat: #AllCustomers).
" send a message to find if the applicant is a customer "
" applicant will be instantiated in the client according to the
 replication spec "
applicant := allCustomers findCustomerWithSSN: inputSSN.
applicant isNil
 ifTrue: [ " applicant is a new customer "
 applicant := self new ssn: inputSSN; name: inputName.
 " execute method on server to qualify a first-time applicant "
 status := allCustomers qualifyNewApplicant: applicant amount: 
 inputLoanRequest.
 " if new applicant was accepted, add them to the set of all customers "
 status = #accepted
 ifTrue: [ allCustomers fwadd: applicant ].
 ]
 ifFalse: [ " applicant has borrowed money from us before "
 " execute method on server to qualify an existing customer "
 status := allCustomers qualifyBasedOnPastHistory: applicant amount: 
 inputLoanRequest.
 ].
applicant requestedAmount: inputLoanRequest; status: status.
^ applicant! !
End Listings



































































A Client/Server DBMS for Managing Clinical Data


Moving from a relational to a network data model




Richard A. Gams


Richard, an oncologist on the faculty of the Ohio State University, is
director of developmental therapeutics for the OSU Comprehensive Cancer
Center. He can be reached at gams.1@osu.edu.


As a physician who cares for cancer patients, I have been conducting clinical
trials of anticancer drugs for almost 30 years. During these trials, a group
of investigators must agree on a uniform therapy strategy for a large number
of patients. For example, if a patient's blood count is lowered as a side
effect of the treatment, some investigators might want to delay the next
treatment for a week or two; others might want to treat as scheduled, but with
a lower dose. We must agree to do one or the other, and this agreement, along
with all the other details of the treatment, is formalized in a document
called a "protocol."
Reading a complex protocol is time consuming and often confusing. Achieving
protocol compliance is a major obstacle to a successful clinical trial. One
approach we have taken has been to develop rule-based expert systems that
incorporate the rules of the protocol. These systems are written in Prolog
and/or C++ and run under various Microsoft Windows platforms. When a patient
is being considered for a course of therapy, the system is "consulted,"
resulting in proper adherence to protocol.


Summary of Clinical-Data Problem


The results of all therapy are recorded. This record serves as the basis for
further treatment decisions as well as for analyzing the results of the
therapy for both the individual patient and the entire study population. We
can compare the results to determine which treatment has the greatest efficacy
with the least toxicity (and of course, cost).
This data is naturally hierarchical. At each level, one data item may be
viewed as owning or being related to several subsidiary data items. For
example, a clinical study has a number of patients enrolled, and each patient
might receive multiple cycles or courses of treatment. In each treatment
cycle, drugs may be administered, vital signs measured, laboratory tests
obtained, and toxicities experienced. Of the universe of possible
measurements, we must create an adequate subset that shows the therapy to be
safe and effective.
Some observations lend themselves to logical grouping. For instance, a
complete blood count (CBC) consisting of hemoglobin, hematocrit, white-blood
count, and the like, would be one record type. Other data such as the
patient's age, sex, and visits (number, date) would require a separate record
type (table). There are many patients involved in large studies. One study
included 1500 patients treated according to two treatment plans at over 50
separate hospitals worldwide. We must find efficient ways to handle this
copious and diverse information.


Relational Data Model


Many (if not most) large database systems employ the relational model, in
which the data are viewed as a series of tables, much like individual
spreadsheets. Each column of the table represents an attribute (such as the
patient's age), and each row represents one record with a value (or NULL) for
each attribute. Relationships are represented by shared columns in two or more
tables. Often, one table's value is considered the primary key, while the
identical value in another table is the foreign key. When data from both
tables are requested, the tables are "joined" through these keys, as shown in
Figure 1. The SQL specification used to generate this database is shown in
Listing One. However, there is far more to the relational model than these
simple concepts.
The primary and foreign keys in each table need to be indexed for acceptable
join performance. For example, to see the visit-three hemoglobin for each
patient in the study, we need to join the patient, visit, and CBC tables
through the primary and foreign keys. If these keys were not indexed, the
query engine would have to walk through entire tables over and over again to
see whether a visit record were associated with a particular patient and
whether a CBC record "belonged" to that visit. Indexes expedite the process of
identifying related records.
For some time, we've felt that the relational model is inappropriate for large
volumes of data of many different types (clinical information, for example).
Such data require multiple tables (each representing a different data group)
and indexes for all primary and foreign keys. Performance is hampered by the
two-stage lookup for relational links: First, we look up the index to find a
key value, then we navigate to the actual record in the data table and read
the information. Furthermore, primary and foreign keys are repeated in each
table and index in which they are referenced, creating data redundancy.


Network/Hierarchical Data Model


An alternative is the network model. In this model, records still represent
each data type or group, but related records are connected by direct pointers
(database addresses). One record is considered the owner, and the other
record, the member. Each record may own or be owned by many types of records,
providing a natural way to express one-to-many (hierarchical) and many-to-many
(network) relationships. 
Indexes are optional in this model and are often needed only for rapid
navigation to the primary-key value in a particular record type. For example,
we may wish to find an individual patient from among all patients in a study.
Once the patient is located, other data may be found by direct navigation
through sets. Each set consists of an owner and one or more member record
types. Such a model results in data being "hashed" into logical groups. Linear
traversal through such a group is quite efficient. The sample database
organized in such a model is shown in Figure 2. Listing Two is the data
definition language (DDL) file for this schema.
We have found network/hierarchical databases especially suitable for managing
clinical-trial data. There is little need for indexes, since navigation is
accomplished through direct links from an owner to members in sets.
Performance is enhanced by direct pointer lookup, resulting in improved access
speed. Such pointers allow speedy navigation from owner to members and vice
versa. Furthermore, data is stored only once: There is no need for redundant
storage as primary/foreign keys or duplicating data in indexes.


Raima Data Manager 


We use the Raima Data Manager (RDM) to implement the network model. RDM is a
database engine for C programmers that includes a manager, SQL-based query
system, and database restructure program. The tool provides relational B-tree
indexing, network-database model, multiple-database access, built-in
referential integrity, record- and file-locking functions, automatic recovery,
and a relational query and report writer. Although RDM has been reliable and
efficient, we recognize a number of drawbacks. While available on multiple
platforms, RDM is implemented solely as a peer-to-peer system. Furthermore, it
isn't scalable in the sense of providing tools for very large data sets or
distributed databases.
On the benefit side, RDM has a very usable API with a C/C++ interface. We
often require record-by-record data access, and RDM provides this somewhat
more readily than do SQL cursors. Furthermore, the RDM API has been
encapsulated in the Raima Object Manager (ROM), a C++ interface that provides
object persistence and object-relationship management. ROM lets you
encapsulate object storage and database navigation into C++ class definitions.
Unfortunately, RDM provides limited query tools. Raima has put a "relational"
face on its network database with db_Query, a tool that provides SQL access to
RDM. However, it is only a limited SQL subset and doesn't interface with
standards such as ODBC.


Velocis


We're also using Raima's Velocis, a database engine for building client/server
applications for DOS, Windows, Windows NT, OS/2, NetWare, and UNIX. The engine
is compliant with the 1989 ANSI SQL standard and provides an ODBC Level 1
interface with several Level 2 features. It has a number of unique features
including the ability to extend the database with C/C++ "extension modules"
that permit portions of the program to reside on the server rather than on the
client. This is important to us, since it is essential that everyone work from
the same protocol rules. Protocols are frequently revised, and, although some
rules may be expressed as database record values, others must be coded. With
extension modules, we can keep the protocol logic in one place rather than
having to upgrade all the clients. (In the usual database context, this
permits business rules to reside in one place rather than be distributed
throughout the organization.) We have implemented Velocis on a Windows NT
server using 16-bit Windows for Workgroup clients, but we will soon move to
32-bit Windows 95/NT clients. In addition, Velocis is available as a
stand-alone Windows 3.x system. 
Velocis uses the same data-file structure as RDM, so we can continue using
existing RDM data files. Additionally, Velocis may be accessed through ROM (or
the lower-level RDM API), so we can use existing RDM programs with a simple
recompile. Velocis can also be accessed through an ODBC interface, so we can
use a variety of query tools and report generators. We may modify the database
using ROM and query it using an ODBC-compliant tool.
An interesting feature of Velocis is the implementation of virtual foreign
keys. Instead of joining tables through traditional relational links using
primary and foreign keys, joins can optionally be implemented as pointers (as
in the network model). This would eliminate the need for indexes on foreign
keys, which would be accessed directly through pointers. Data would not be
stored for foreign keys, but instead retrieved from the primary table through
pointers. This could yield the space and speed advantage of RDM while
providing an ANSI-compliant SQL interface. Listing Three presents the modified
SQL specification employing virtual foreign keys. The CREATE INDEX statement
for the visit, CBC, and CHEM tables have been replaced by CREATE JOIN
statements.
To test this feature, I generated two databases with Velocis--the traditional,
relational model in Listing One and the model using virtual foreign keys in
Listing Three. Using ODBC, I then inserted 10,000 patient records into each
database. The patient numbers were taken sequentially from an array of 10,000
integers that had been shuffled to produce a pseudorandom distribution of
values. I then inserted ten visits into every hundredth patient, again in
random sequence. Each visit was associated with a CBC and a CHEM record.
Finally, I performed the query shown in Example 1(a), which joined all the
tables for each of the patients who had associated visit, CBC, and CHEM
records. The question mark at the end of the query is a placeholder for a
value supplied at run time, permitting the program to loop through the
patients and perform the query on each. All of these tests were performed
using the stand-alone version of Velocis. Programs were compiled as QuickWin
applications in Microsoft Visual C++ Version 1.52 on a Windows NT 3.51
workstation. Table 1 shows the results of these tests.
The time to insert 10,000 patient records into each of the databases was
approximately the same. The default for Velocis is to place each data record
and each key in its own file. The file containing the patient records was
larger in the database set up to have virtual foreign keys, since space had to
be allotted for the direct links to the foreign tables. I was disappointed
that the visit, CBC, and CHEM record inserts took 25 percent longer (5.5
versus 4.5 minutes) for the virtual-foreign-key insertion. In addition, the
total file size was more than 11 percent larger. I assume these differences
are due to the bookkeeping necessary to maintain the direct links. 

Virtual foreign keys performed much better during queries. The selection of
data from the joined tables took 13 seconds versus 2.5 minutes in the
traditional model. To be certain that the problem was not with the ODBC
interface, I used Raima's interactive query tool to perform the query in
Example 1(b). The query took 52 seconds using the traditional model and six
seconds using virtual foreign keys.


Conclusion


Velocis provides an efficient implementation of the relational model in a
client/ server database. A unique feature is that it is based on a network
data structure, permitting pointer access to related records. While this
doesn't reduce the size of the data store and may ultimately hurt bulk
insertion speed, it enables extremely rapid retrieval of joined records. For
situations with a very large number of record types, Velocis provides a fully
SQL-compliant database with efficient query capability. Our long-range plan is
to use the ROM object-oriented interface for programmatic interaction with our
data, but employ ODBC tools for query and report generation.


For More Information


Raima Database Manager 4.0
Raima Object Manager 3.0
Velocis 1.3
Raima Corp.
1605 NW Sammamish Road Suite 200
Issaquah, WA 98027
800-275-4724
Figure 1: Relational table joined by primary and foreign keys; (a) patient
table; (b) visit table; (c) CBC table; (d) CHEM table.
Figure 2: Network schema.
Example 1: (a) Query that joins all the tables for patients who have
associated visit, CBC, and CHEM records; (b) testing virtual foreign keys
using the interactive query tool.
(a)
"SELECT PNUM, VDATE, HGB, SGOT FROM PATIENT, VISIT, CBC, CHEM
WHERE
PATIENT.PNUM = VISIT.PNUM AND VISIT.PNUM = CBC.PNUM
AND VISIT.VDATE = CBC.VDATE AND VISIT.PNUM = HEM.PNUM
AND VISIT.VDATE = CHEM.VDATE AND PNUM = ?"

(b)
SELECT PNUM, VDATE, HGB FROM PATIENT, VISIT, CBC
WHERE
PATIENT.PNUM = VISIT.PNUM AND VISIT.PNUM = CBC.PNUM
AND VISIT.VDATE = CBC.VDATE
AND PNUM IN (260, 445, 526, 534)
ORDER BY PNUM;
Table 1: Results of various operations using indexed versus virtual keys.
 Indexed Virtual
 Foreign Keys Foreign Keys
Size of patient records (bytes) 688,128 819,200
Size of all data (including index) 1,146,880 1,277,952
Time to insert patients (minutes) 11:56 12:31
Time to insert visits 04:26 05:23
Time to query (ODBC) 02:21 00:13
Time to query (interactive) 00:52 00:06

Listing One
create database clin1 on clindev1;
create table patient
(
 pnum smallint primary key
 "patient number",
 inits char(4)
 "initials",
 dob long
 "date of birth",
 filler char(40)
 "just to add length to record"
);

create unique index pnum_key on patient(pnum);
create table visit
(
 vdate long not null
 "visit date",
 pnum smallint not null references patient(pnum),
 primary key(pnum, vdate)
);
create unique index visit_key on visit(pnum, vdate);
create table cbc
(
 pnum smallint not null,
 vdate long not null,
 hgb float "hemoglobin",
 hct float "hematocrit",
 wbc float "white blood count",
 filler char(20),
 foreign key(pnum, vdate) references visit(pnum, vdate)
)
create index cbc_key on cbc(pnum, vdate);
create table chem
(
 pnum smallint not null,
 vdate long not null,
 sgot float "liver enzyme",
 sgpt float "liver enzyme",
 filler char(20),
 foreign key(pnum, vdate) references visit(pnum, vdate)
);
create index chem_key on chem(pnum, vdate);

Listing Two
 
database clin {
 data file file000_patient = "clin.000" contains patient;
 key file ndx000_pnum_key = "clin.001" contains pnum_key;
 data file file001_visit = "clin.002" contains visit;
 data file file00_cbc = "clin.003" contains cbc;
 data file file003_chem = "clin.004" contains chem;
 key file ndx000_key00101_visit = "clin.005" contains key00101_visit;
 record patient {
 short pnum;
 char inits[5];
 long dob;
 char filler[41];
 compound unique key pnum_key {
 pnum asc;
 }
 }
 record visit {
 long vdate;
 short pnum;
 compound unique key key00101_visit {
 pnum asc;
 vdate asc;
 }
 }
 record cbc {
 double hgb;

 double hct;
 double wbc;
 char filler[21];
 }
 record chem {
 double sgot;
 double sgpt;
 char filler[21];
 }
 set visit_set {
 order last;
 owner patient;
 member visit;
 }
 set cbc_set {
 order last;
 owner visit;
 member cbc;
 }
 set chem_set {
 order last;
 owner visit;
 member chem;
 }
}

Listing Three
create database clin2 on clindev2;
create table patient2
(
 pnum2 smallint primary key
 "patient number",
 inits2 char(4)
 "initials",
 dob2 long
 "date of birth",
 filler2 char(40)
 "just to add length to record"
);
create unique index pnum2_key on patient2(pnum2);
create table visit2
(
 vdate2 long not null
 "visit2 date",
 pnum2 smallint not null references patient2(pnum2),
 primary key(pnum2, vdate2)
);
create join visits on visit2(pnum2);
create table cbc2
(
 pnum2 smallint not null,
 vdate2 long not null,
 hgb2 float "hemoglobin",
 hct2 float "hematocrit",
 wbc2 float "white blood count",
 filler2 char(20),
 foreign key(pnum2, vdate2) references visit2(pnum2, vdate2)
)
create join cbc on cbc2(pnum2, vdate2);

create table chem2
(
 pnum2 smallint not null,
 vdate2 long not null,
 sgot2 float "liver enzyme",
 sgpt2 float "liver enzyme",
 filler2 char(20),
 foreign key(pnum2, vdate2) references visit2(pnum2, vdate2)
);
create join chems on chem2(pnum2, vdate2);
End Listings




















































Programming with M


Extending Visual Basic for high-performance transaction processing




Dan Shusman


Dan, an engineer with InterSystems, previously worked at the Laboratory of
Computer Science at Massachusetts General Hospital, the birthplace of the M
language.


As the modern day successor to MUMPS, M is an ANSI-standard
application-development environment incorporating a programming and
data-manipulation language and integrated database. M was traditionally used
to complement SQL by providing an extra dimension of data-access control that
can prove especially useful in select, high-volume, transaction-processing
applications. M's direct-access approach to data lets you exert greater
control over performance issues, thereby letting you manage update concurrency
at a finer level of granularity. For example, M database structures can be
designed to support field-level locking and the ability to frame a transaction
(TSTART ... TCOMMIT) to include only those database structures involved in
processing the current transaction. You can also control the physical
proximity of related data, such as grouping header and line-item data together
in processing invoice or order data. 
Conversely, SQL abstracts data structures from the programmer, so that the
function of database administration (DBA) is more centralized. By optimizing
the physical data model, the database administrator strives to optimize
performance across a breadth of programs. This philosophical center-of-control
difference is one key reason why M is more commonly used by application
developers than by end-user organizations and why M applications exhibit very
good performance, operational, and cost-of-ownership characteristics. Whether
deployed by an application developer or end-user organization, performance is,
to a large degree, a function of direct program control and M system
processes--not DBA activity. DBA functions still exist but are not dominated
by database-tuning activities.


M: Past and Present


Starting in the mid-1970s, MUMPS gained widespread use in health-care
applications. M evolved from the need for mainframe-like
transaction-processing applications for autonomous departmental computing. M
is an ANSI/FIPS/ISO/JIS (Japan Industrial Standard) applications environment
and is accepted by the ANSI SQL Standards committee as a host language for
embedded SQL (ESQL). M integrates a programming language and database into a
single application-development environment. It is common to access the same
data with both SQL and M. Indeed, M and SQL data accesses can even be mixed
within the same programs. As a result of the integrated programming/database
standard, M-based programs are highly portable across operating platforms. M
is most commonly used on UNIX and Windows, but it can also be found in DOS,
VMS, and other platforms. At the same time, M is an environment particularly
suited to high-volume, online transaction-processing applications.
M database structures are implemented as highly optimized B-tree structures on
disk. To the application programmer, the database presents itself as
user-designed, multidimensional arrays ("globals" in M parlance). All database
designs (hierarchical, network, relational, and so on) are supported, and the
design is under the control of the application developer. The ANSI/MDC
X11.1-1990 specification of the M language component describes 21 commands, 17
intrinsic functions, and 7 intrinsic variables and reserves a "namespace" for
vendor-specific extensions to the ANSI specifications. (The next M ANSI
standard specification is now in the formal acceptance stage.) Additionally,
there is a full complement of unary, arithmetic binary, numeric- and
string-relational, pattern-matching, and concatenation operators. The ANSI
standard also describes extrinsic functions and user-authored variables. M
data typing is variant. A type is applied based on the context in which the
data is referenced. Both private routine and database structures are
dynamically allocated, supporting both scalar and array structures. All
structures are of variable length. Arrays support numeric and string
subscripting, and allocation of these structures is sparse--only those nodes
defined by application logic exist. 
A complete description of M is beyond the scope of this article. However,
Table 1 presents a summary of ANSI M's operators, commands, and intrinsic
functions and variables. (You can obtain a complete copy of the ANSI
specification by contacting the M Technology Association.) 
M is implemented by a variety of commercial vendors as well as academia. Many
facilities are associated with each implementation of M, including networking
(two- and three-tier client/server and peer-to-peer), system-management
facilities (bulletproof database, concurrent read/ write incremental backup),
application libraries and generators, debuggers, relational/SQL
implementations describing M databases (supporting such APIs as the ODBC), and
a host of evolving technologies (object orientation, for instance).


M Benefits


The major benefit of M for the programmer is increased direct control over
data structures. In M, you can define data in terms of multidimensional arrays
that map to an intended transaction process. It is easy to represent data in a
hierarchical, network, or relational model. Because M provides direct control
over logical data structures, a "flat relational structure" can be
implemented. In any case, the M relational data dictionary provides a
superimposed relational view of this same data. In this regard, M is identical
to most relational DBMS products. After describing a developer-designed M data
structure in the Open M relational data dictionary, the application developer
will see in the M environment all characteristics expected of relational DBMS
products.
The relational model is optimized for change. It excels at serving
unanticipated information requirements, such as the ad hoc query. It is ideal
for decision support systems (DSSs), executive information systems (EISs), and
data-warehousing applications. However, the relational model can present
significant challenges to high-volume transaction processing of complex data
structures. M gives you an alternative, one that increases programmer control
over key performance considerations such as concurrency, dependency, and data
proximity. 


Extending Visual Basic


To contrast SQL with M in a complex client/server transaction environment,
consider a banking database containing banking products (checking, loans,
mortgages, safe deposit, savings, credit cards, and investments) and customers
(who individually or jointly may use several of the bank's products). Each
product has multiple accounts. Each account may have multiple owners and many
transactions a month. There are complex relationships between products,
customers, transactions, and accounts. Aggregate transaction rates and data
volumes can be very high, and security and accuracy are critical. 
Representing an integrated banking model as a set of database tables is not
trivial. Achieving acceptable performance can be even more daunting--both for
online processing, where customers conduct transactions at a teller or an ATM,
and for overnight batch runs, where voluminous data must be processed in a
narrow time window. Figure 1 is a metadescription of the tables involved in
the scenarios demonstrating Open M and SQL.
Listing One is a portion of a Visual Basic front-end application. The code
shows the typical method of using SQL statements to retrieve data. Listing Two
is the same program using M. As the user enters characters into a combo box,
customer names starting with those characters populate the combo box. During
the Form_Load event, Listing One retrieves all rows into the combo box. By
setting the Sort property of the combo box to True, Visual Basic automatically
sorts the list of names. SQL provides a perfectly acceptable scripting
mechanism for this, and performance should not be an issue as long as the
number of rows being retrieved is reasonable for the application configuration
(available memory, network bandwidth, server performance, and so on). As the
number of rows grows larger, however, it becomes more difficult to get the
database to load in one shot, and repetitive queries against the server are
required.
Note in Listing Two that having the data structures under the control of the
application developer allows logical partitioning of data to suit the
application. This also permits clustering of tables to any depth by using
multidimensional array characteristic of M globals. The number and performance
of JOINS are a function of the data structure. Inherent'Natural' ASCII
collation of data in Open M globals (for English-speaking markets; other
collation schemes are available for other markets) provides ordering of rows,
and newly inserted data is automatically collated. This also allows for finer
granularity in locking, transaction-processing control, and distribution of
data. This improves database throughput by, for example, allowing data to
serve concurrently as both a primary and index key.
I've taken liberties with the M code in Listing Two to assist those unfamiliar
with M programming. The code is logically and syntactically correct but
deoptimized for comprehensibility. Many commands and programming techniques
that could have been used for this exercise were avoided for this reason.
Table 2 provides a quick description of the M commands used in Listing Two,
while Table 3 describes the data structures (restated in Open M globals). In
this database structure, the numeric and string-collation characteristics of
Open M globals facilitate access to the upright data through its primary key
or through a foreign key. The structures used here are simple: Each record is
described as a collection of columns whose names are self descriptive. In this
simple structure, data for a record is treated as a string with a caret (^)
delimiting each column. The design of the global structure is analogous to
defining a table in another DBMS environment. Listing Two is actually
implemented in InterSystems' Visual-M, in which the Open M language and
database are directly accessible to Visual Basic. With Visual-M, M and Basic
are colleague languages. VB controls are registered with M and are
programmatically introduced by an underscore character. 


When to Use M


You can access the same data with either M or SQL. M provides both the
"physical layer" and its own "intermediate logical representation layer." M
also efficiently implements SQL tables, which are considered the "logical
layer." Therefore, you are not forced to make a systemic choice between access
alternatives. 
Although, relative to SQL, M provides access that appears closer to the
physical layer, data in M is still accessed at a very high level of
programming--far above the level of disk pages, B-trees, and the like. M data
is accessed using multidimensional arrays that provide associative memory and
flexible, convenient data organization, where memory and disk allocation are
dynamic and automatic.
SQL's strengths clearly lie in data analysis, decision support, and ad hoc
queries. For very complex data structures and demanding transaction-processing
requirements (high volumes, large databases, or critical response times), M's
level of control provides generally better performance. 
Many programming requirements will lie between the two polarities for which
SQL and M are each optimized. In this middle ground, a wide variety of factors
affect the choice.
Programmer preference is one obvious factor--experienced SQL programmers may
prefer to stick to what they know best. On the other hand, data complexity and
performance demands may lead you to choose M over SQL as the data-manipulation
language. Because they are interchangeable, both M and SQL can be tried, and
empirical evidence (user-specific benchmarks) can guide the final decision.
For many organizations, the interoperability and portability of M's
development environment are significant. Standard SQL is available from
numerous vendor sources, many of whom emphasize their own proprietary
client/server program development environment to build SQL systems. In
contrast, client/server architecture is embedded within the ANSI M language
itself, which avoids proprietary programming requirements. Additionally,
certain M vendors have embraced emerging de facto client/server standards,
especially Visual Basic, as key development platforms. M extensions to Visual
Basic can add a high-performance, transaction-processing dimension to this
widely used GUI system-building tool. The growing market for commercially
available Visual Basic custom controls adds acquired component software to the
program-development arsenal.


Conclusion



M's historical focus on transaction processing with complex data in
distributed implementations has lead to the development of several key
characteristics. First, the programmer is given more direct control of data
structures in which to optimize performance. Second, data can be organized to
optimally suit program-performance considerations. And, third, requirements
for systems and database administration, a key attribute in distributed
systems, are minimized compared to traditional DBMS systems.


For More Information


M Technology Association
1738 Elton Road, Suite 205
Silver Spring, MD 20903-1725
301-431-4070
MTA1994@aol.com
Table 1: Summary of ANSI M: (a) selected ANSI M commands; (b) selected M
intrinsic functions; (c) selected intrinsic variables; (d) operators.
Command Description
(a)
Break Programmatically interrupt routine execution.
Close Deallocate and release ownership of a device
 (terminal, file, tape, and so on).
Do Inititate execution of a program, subroutine,
 extrinsic function.
Else Introduce lines of code to execute if IF condition
 is not true.
For Initiate looping.
Goto Generalize transfer of control.
Halt Terminate process.
Hang Suspend for a specified period.
If Introduce condition which, if true, causes rest of
 line to be run.
Job Programmatically spawn another M process.
Kill Remove data structure (scalar or array).
Lock Programmatically interlock M structures among M
 processes.
New Initiate variable scoping.
Open Obtain ownership of a device.
Quit Terminate execution of routines, subroutines, and
 For loops; unscope variables scoped by New, and
 so on.
Read Take input from the current device (terminal, tape,
 file).
Set Assign value to variable.
Use Specify a device as the current device.
View Peek into memory.
Write Direct output to the current device.
Xecute Execute the results of expression evaluation as M
 code.

(b)
$Ascii Return the integer ASCII value of the character
 argument.
$Character Return the ASCII character of the integer argument.
$Data Return integer values denoting the argument's state
 of definition.
$Extract Positionally assign or extract a substring from a
 string.
$Find Locate the position of a substring within a string.
$FNumber Apply format mask to data.
$Get Return NULL or data depending on argument's
 existence.

$Justify Apply justification mask to argument.
$Length Return character length of argument.
$Order Return the next subscript value at the current
 subscript level of the argument.
$Piece Extract or assign a delimited substring from/to a
 string.
$Query Return the next full local- or global-array
 reference.
$Random Return a random number.
$Select Return the evaluated expression of the first of
 many conditions that evaluates to true.
$Text Return the source code of a program at the reference
 that is the argument of the function.
$TRanslate Translate positionally one character for another in
 a string.
$View Peek into memory.

(c)
$Horolog System date and time.
$Io Current device.
$Job Current process identifier.
$Storage Amount of private process memory available to the
 M process.
$Test Contains truth value of last executed condition and
 other operations.
$X & $Y Return the current x- and y-coordinates on a
 cursor-positionable device with some restrictions.

(d)
Arithmetic +, -, /, \ (integer divide), # (modulo), *, **
 (exponentiation)
Unary +, -, '(not)
Relational =, >, <, ] (follows), [ (contains), ]] (sorts after)
Pattern
 Match ?<argument>
Logical & (and), ! (or)
Concatenation _____LINEEND____
Table 2: M commands used in Listing Two.
Command Function
Semicolon (;) Introduces a comment.
SET M's assignment command.
$$upper Extrinsic function to convert characters to upper
 case.
$LENGTH M intrinsic function returning string length.
FOR Looping structure.
$ORDER Traverses a specified branch of the B-tree
 presented logically as the M global structure
 until programmatically terminated.
QUIT Terminates the For loop and the DO command.
DO Calls an external routine, executes a subroutine
 or extrinsic function, and executes a
 "structured" block of code introduced by one or
 more periods denoting the nesting level of the
 block.
$PIECE Intrinsic function which assists in extracting or
 assigning data from/to a specified delimited
 substring of the argument string. In this
 example, a caret is used as the delimiter: The
 first ^ piece of ^people("owners",pkey) is

 "name."
$GET Intrinsic function that tests for the existence
 of the argument and returns either the data
 (exists), if true, or NULL (""), if not.
$EXTRACT Extracts characters from a string from position
 N through optional position M.
$ASCII Returns the ASCII value of a character position
 in the argument string.
$CHARACTER Returns the character for the ASCII value.
Table 3: Data structures (restated in Open M globals). (a) People 'Table'; (b)
Account 'Table'; (c) Service_Type 'Table'.
(a)
^people("owners",pkey)=name^date_of_birth^social_security_number
^people("by_name",<uppercase_name>,pkey)=<null> (index)
^people("owners",pkey,"accts",account_number)=<null> (index)

(b)
^accounts(account_number)=service_type^balance^open_date^close_date
^accounts(account_number,"owners",pkey)=<null>

(c)^srvtypes(type_number)=type_name
Figure 1: Metadescription of the tables used in a typical banking application.
Tables: People
 Fields:
 pkey (number,10,unique)
 name (text,25)
 date_of_birth (date,8)
 social_security_number (text,11)
 Primary Key:
 pkey
 Index: name (uppercase transformation applied)
 Accounts
 Fields:
 account_number (numeric, 10, unique)
 service_type (foreign key -> Service_Types)
 balance (float,10,2)
 open_date (date,8)
 close_date (date,8)
 Primary Key;
 account_number
 Acct_Owners
 Fields:
 account_number (foreign key -> Accounts)
 owner_id (foreign key -> People)
 Primary Key:
 account_number, owner_id
 Index: owner_id
 Service_Types
 Fields:
 type_number (number, 10, unique)
 type_name (text, 25)
 Primary Key:
 type_number

Listing One
<Begin Combo1 KeyPress Event>
' ============================================================
' Bind to database via DSN in data control
' ============================================================
server$="ODBC;DSN=Server;UID=Username;PWD=Password"

Data1.Connect = server$
' ============================================================
' Set up query to find all rows where name 'starts with' character 
' input by user perhaps under the KeyPress event for Combo1
' ============================================================
query$="Select Acct_Owners.Owner_ID, Acct_Owners.Account_Number,"
query$=query$ & " People.Name,Service_Types.Type_Name
query$=query$ & " From Acct_Owners, People, Accounts, Service_Types"
query$=query$ & " Where People.Name Like '" & ucase$(combo1.text) & "%'"
query$=query$ & " And People.Pkey = Acct_Owners.Owner_Id"
query$=query$ & " And Account.Account_Number=Acct_Owners.Account_Number"
query$=query$ & " And Service_Types.Type_Number = Account.Service_Type"
query$=query$ & " Order by 3, 2"
' ============================================================
' Go to server to get rows
' ============================================================
Data1.Recordsource=query$
Data1.Refresh
' ============================================================
' Populate Combo Box
' ============================================================
Combo1.Clear
Do while not Data1.Recordset.EOF
 name$=Data1.Recordset.Name
 account=Data1.Recordset.Account
 srvtype$=Data1.RecordSet.Type_Name
 pkey=Data1.Recordset.Primary_Key
 Combo1.Additem name$ & " " & account & " " & srvtype$
 Combo1.ItemData(Combo1.NewIndex)= pkey
 Data1.Recordset.MoveNext
Loop
' ============================================================
' If the text area of the combo box is touched with new input, reexecute query

' and do loop to recover rows from the database which satisfy the new input.
' If the end-user has double-clicked an item; issue new query to recover all 
' elements of row selected for presentation/manipulation by end-user
' ============================================================
<End Combo1 KeyPress Event>
<Begin Combo1 Double-Click Event>
' ============================================================
' detect which item in the combo box was selected
' ============================================================
selected=Combo1.ListIndex
' ============================================================
' get the row id in the people table for the item selected
' ============================================================
pkey=Combo1.ItemData(selected)
' ============================================================
' recover the info in the combo box to reuse it. piece is a user-written Basic

' module which returns the portion of a string (1st arg) found between the 
' instances (specified by the 3rd and 4th args) of the delimiter-string 
' (2nd arg)
' ============================================================
x$=Combo1.List(selected)
name$=piece(x," ",1,1)
account=piece(x," ",2,2)
srvtype$=piece(x," ",3,3)
' ============================================================
' issue a query to recover all information to be displayed on the form

' ============================================================
query$="Select People.Date_of_Birth, People.Social_Security_Number,"
query$=query$ & " Accounts.Open_Date, Accounts.Close_Date, Accounts.Balance
query$=query$ & " From People, Accounts
query$=query$ & " Where People.Pkey = " & pkey
query$=query$ & " And Accounts.Account_Number = " & account
Data1.RecordSource=query$
Data1.Refresh
' ============================================================
' assign recovered information to appropriate controls on the form
' ============================================================
Pname.Text=name$
Pdob.Text=Data1.RecordSet.Date_of_Birth
Pssn.Text=Date1.RecordSet.Social_Security_Number
Pacct_num.Text=account
Pacct_odate.Text= Data1.RecordSet.Open_Date
Pacct_cdate.Text= Data1.RecordSet.Close_Date
Pacct_balance.Text=Data1.RecordSet.Balance
Pacct_srv_type.Text=srvtype$
<End Combo1 Double-Click Event>

Listing Two
; ============================================================
; get user input from combo box perhaps under the KeyPress event, transform
; to uppercase and get Length 
; $$upper is an extrinsic funtion which performs the work of UCASE$ 
; in Listing 1
; ============================================================
set input=$$upper(_combo1.text)
set inlng=$length(input)
; ============================================================
; initialize scanning variable to start at the string collating just before 
; user input, e.g., if combo1.text="smith", then input="SMITH" and 
; name="SMITG~" after the following code executes. It sets up to perform 
; the function of the LIKE operator in Listing One's SQL SELECT
; ============================================================
set name=input
set name=$extract(name,1,inlng-1)_$character($ascii(name,inlng)-1)_"~"
; ============================================================
; find all names in the data structure which start with characters input. 
; In Direct Database Access, this code actually does the SELECT, WHERE,
; and ORDER BY work described in Listing One by using Open M's ascii 
; collation, easily clustered tables, and $order function
; ============================================================
for set name=$order(^people("by_name",name)) 
 quit:$extract(name,1,inplng)'=input set pkey="" do
. For set pkey=$order(^people("by_name",name,pkey)) quit:pkey="" 
 set account="" do
. . for set account=$order(^people("by_name",name,pkey,"accts",account)) 
 quit:account="" do
. . . set name=$piece(^people("owners",pkey),"^")
. . . set acctype=$p($get(^accounts(account),"^"))
. . . set srvtype=^srvtypes(acctype)
. . . ; ============================================================
. . . ; The listbox portion of the combo box will hold peoples' names, account

. . . ; numbers and service type. The DO command is used to execute the 
. . . ; Additem method to place these rows in the combo box.
. . . ; The innermost FOR loop (above) corresponds in a sense to the DO WHILE 
. . . ; loop in Listing One.

. . . ; ============================================================
. . . do _combo1.additem(name_" "_account_" "_srvtype)
. . . set _combo1.ItemAdd(_combo1.newindex)=pkey
. . . quit
. . quit
. Quit
quit
; ============================================================
; As a reminder, the above code is placed in the KeyPress event of combo box.
; If text area is touched with new input, data fetch and processing loop will
; execute again to recover rows from the database which satisfy the new input.
; This processing logic is that of VB and is independent of the 
; scripting/database language: BASIC/SQL vs. M
; ============================================================
<End of the KeyPress Event>
<Begin the Double-Click Event>
; ============================================================
; The following Visual-M code is placed under double-click event of combo box.
; If the end-user has double-clicked an item; a new query is issued to recover
; and display all elements of the row selected.
; First, detect which item in the combo box was selected just as in 
; Listing One but using Visual-M 
; ============================================================
set selected=_combo1.list
; ============================================================
; Now, get the row id from ItemData of the row selected. Again, using 
; Visual-M
; ============================================================
set pkey=_Combo1.ItemData(selected)
; ============================================================
; recover the info in the combo box to reuse it. This code uses the 
; $PIECE function emulated in VB by the 'piece' module
; ============================================================
set x=_Combo1.List(selected)
set name=$piece(x," ")
set account=$piece(x," ",2)
set srvtype=$piece(x," ",3)
; ============================================================
; issue M code to recover all information to be displayed on the form and
; display on the form.
; $$date is an extrinsic function to convert internal date format to 
; external format
; $$number is an extrinsic function to convert numbers to a desired 
; external format
; These might be defined in an Open M Include file and would 
; take arguments to define the output format.
; ============================================================
set _Pname.Text=name
; ============================================================
; Open M Direct Database Access is used to recover rows from the database 
; for the people 'table' and the account 'table'. The logic is to recover a 
; row into a private process variable (here 'x') and the assign the data to 
; the appropriate VB control. This code snippet and the one below combine the 
; function of the SELECT query in the Double-Click event of the combo box and 
; the assignments of the resulting data to the VB controls.
; ============================================================
set x=$get(^people("owners",pkey))
set _Pdob.Text=$$date($piece(x,"^",2),"dd/mm/yy")
set _Pssn.Text=$piece(x,"^",3)

set _Pacct_num.Text=account
; ============================================================
; recover 'account' row into local M variable and assign to VB controls.
; ============================================================
set x-$get(^account(account))
set _Pacct_odate.Text=$$date($piece(x,"^",3),"dd/mm/yy")
set _Pacct_cdate.Text=$$date($piece(x,"^",4),"dd/mm/yy")
set _Pacct_balance.Text=$$number($piece(x,"^",2),"$n.n")
set _Pacct_srvtype.Text=srvtype
quit
<End Combo Box Double-Click Event)
End Listings



















































RAMBLINGS IN REAL TIME


Inside Quake: Visible-Surface Determination




Michael Abrash


Michael is the author of Zen of Graphics Programming and Zen of Code
Optimization. He is currently pushing the envelope of real-time 3-D on Quake
at id Software. He can be contacted at mikeab@idsoftware.com.


Years ago, I was working at Video Seven, a now-vanished video- adapter
manufacturer, helping to develop a VGA clone. The fellow designing Video
Seven's VGA chip, Tom Wilson, had worked around the clock for months to make
his VGA chip run as fast as possible and was confident he had pretty much
maxed out its performance. As Tom was putting the finishing touches on his
chip design, however, news came fourth-hand that a competitor, Paradise, had
juiced up the performance of the clone they were developing by putting in a
FIFO.
That was all; there was no information about what sort of FIFO, or how much it
helped, or anything else. Nonetheless, Tom, normally an affable, laid-back
sort, took on the wide-awake, haunted look of a man with too much caffeine in
him and no answers to show for it, as he tried to figure out, from hopelessly
thin information, what Paradise had done. Finally, he concluded that Paradise
must have put a write FIFO between the system bus and the VGA so that when the
CPU wrote to video memory, the write immediately went into the FIFO, allowing
the CPU to keep on processing instead of stalling each time it wrote to
display memory.
Tom couldn't spare the gates or the time to do a full FIFO, but he could
implement a one-deep FIFO, allowing the CPU to get one write ahead of the VGA.
He wasn't sure how well it would work, but it was all he could do, so he put
it in and taped out the chip.
The one-deep FIFO turned out to work astonishingly well; for a time, Video
Seven's VGAs were the fastest around, a testament to Tom's ingenuity and
creativity under pressure. However, the truly remarkable part of this story is
that Paradise's FIFO design turned out to bear not the slightest resemblance
to Tom's, and didn't work as well. Paradise had stuck a read FIFO between
display memory and the video-output stage of the VGA, allowing the video
output to read ahead, so that when the CPU wanted to access display memory,
pixels could come from the FIFO while the CPU was serviced immediately. That
did indeed help performance--but not as much as Tom's write FIFO.
What we have here is as neat a parable about the nature of creative design as
you could hope to find. The news about Paradise's chip contained almost no
actual information, but it forced Tom to push past the limits he had
unconsciously set in his original design. And, in the end, I think that the
single most important element of great design, whether in hardware or software
or any creative endeavor, is precisely what the Paradise news triggered in
Tom: the ability to detect the limits you have built into the way you think
about your design, and transcend those limits.
The problem, of course, is how to go about transcending limits you don't even
know you've imposed. There's no formula for success, but two principles can
stand you in good stead: Simplify, and keep on trying new things.
Generally, if you find your code getting more complex, you're fine tuning a
frozen design, and it's likely you can get more of a speed-up, with less code,
by rethinking the design. A really good design should bring with it a moment
of immense satisfaction in which everything falls into place and you're amazed
at how little code is needed and how all the boundary cases just work
properly.
As for how to rethink the design, do it by pursuing whatever ideas occur to
you, no matter how off-the-wall they seem. Many of the truly brilliant design
ideas I've heard over the years sounded like nonsense at first, because they
didn't fit my preconceived views. In fact, such ideas are often off-the-wall,
but just as the news about Paradise's chip sparked Tom's imagination,
aggressively pursuing seemingly outlandish ideas can open up new design
possibilities for you.
Case in point: The evolution of Quake's 3-D graphics engine.


The Toughest 3-D Challenge of All


I've spent most of my waking hours for the last seven months working on Quake,
id Software's successor to DOOM, and after spending the next three months in
much the same way, I expect Quake will be out as shareware around the time you
read this.
In terms of graphics, Quake is to DOOM as DOOM was to its predecessor,
Wolfenstein 3D. Quake adds true, arbitrary 3-D (you can look up and down,
lean, and even fall on your side), detailed lighting and shadows, and 3-D
monsters and players in place of DOOM's sprites. Sometime soon, I'll talk
about how all that works, but this month, I want to talk about what is, in my
book, the toughest 3-D problem of all--visible-surface determination and
culling. Visible-surface determination essentially involves drawing the proper
surface at each pixel, while culling involves discarding nonvisible polygons
as quickly as possible to accelerate visible-surface determination. In the
interest of brevity, I'll use "VSD" to mean both visible-surface determination
and culling from now on.
Why do I think VSD is the toughest 3-D challenge? Although rasterization
issues such as texture mapping are fascinating and important, they are tasks
of relatively finite scope and are being moved into hardware as 3-D
accelerators appear. Also, they only scale with increases in screen
resolution, which are relatively modest.
In contrast, VSD is an open-ended problem, and dozens of approaches are
currently in use. Even more significantly, the performance of an
unsophisticated implementation of VSD scales directly with scene complexity,
which tends to increase as a square or cube function, so this very rapidly
becomes the limiting factor in creating realistic worlds. I expect VSD
increasingly to be the dominant issue in real-time PC 3-D over the next few
years, as 3-D worlds become increasingly detailed. Already, a good-sized Quake
level contains on the order of 10,000 polygons, about three times as many
polygons as a comparable DOOM level.


The Structure of Quake Levels


Before diving into VSD, note that each Quake level is stored as a single, huge
3-D BSP tree. This BSP tree, like any BSP, subdivides space, in this case,
along the planes of the polygons. However, unlike the BSP tree I presented
previously, Quake's BSP tree does not store polygons in the tree nodes, as
part of the splitting planes, but rather in the empty (nonsolid) leaves, that
appear unshaded in Figure 1.
Correct drawing order can be obtained by drawing the leaves in front-to-back
or back-to-front BSP order, again as discussed in my previous column. Also,
because BSP leaves are always convex and the polygons are on the boundaries of
the BSP leaves, facing inward, the polygons in a given leaf can never obscure
one another and can be drawn in any order. (This is a general property of
convex polyhedra.)


Culling and Visible Surface Determination


The process of VSD would ideally work as follows: First, you cull all polygons
that are completely outside the view frustum (view pyramid) and clip away the
irrelevant portions of any polygons that are partially outside. Then, you draw
only those pixels of each polygon that are actually visible from the current
viewpoint, as in Figure 2, wasting no time overdrawing pixels multiple times
(note how little of the polygon set in Figure 2 actually needs to be drawn).
Finally, in a perfect world, the tests to figure out what parts of which
polygons are visible is free, and the processing time is the same for all
possible viewpoints, giving the game a smooth visual flow.
It is easy to determine which polygons are outside the frustum or partially
clipped, and it's quite possible to figure out precisely which pixels need to
be drawn. Alas, the world is far from perfect, and those tests are far from
free, so the real trick is to accelerate or skip various tests and still
produce the desired result.
As I discussed last time, given a BSP, it's easy and inexpensive to walk the
world in front-to-back or back-to-front order. The simplest VSD solution,
which I demonstrated last time, is to simply walk the tree back-to-front, clip
each polygon to the frustum, and draw it if it's facing forward and not
entirely clipped (the painter's algorithm). Is that an adequate solution?
For relatively simple worlds, it is perfectly acceptable. It doesn't scale
very well, though. As you add more polygons in the world, more transformations
and tests have to be performed to cull polygons that aren't visible; at some
point, that will bog down performance considerably.
Happily, there's a good workaround for this particular problem. Remember that
each leaf of a BSP tree represents a convex subspace, with the nodes that
bound the leaf delimiting the space. Perhaps less obvious is that each node in
a BSP tree also describes a subspace--the subspace composed of all the node's
children; see Figure 3. Another way of thinking of this is that each node
divides the subspace created by the nodes above it in the tree, and the node's
children then further carve that subspace into all the leaves that descend
from the node.
Since a node's subspace is bounded and convex, it is possible to test whether
it is entirely outside the frustum. If it is, all of the node's children are
certain to be fully clipped and can be rejected without additional processing.
Since most of the world is typically outside the frustum, many of the polygons
in the world can be culled almost for free, in huge, node-subspace chunks.
It's relatively expensive to perform a perfect test for subspace clipping.
Therefore, bounding spheres or boxes are often maintained for each node,
specifically for culling tests.
So culling to the frustum isn't a problem, and the BSP can be used to draw
back to front. What is the problem?


Overdraw


The problem John Carmack, the driving technical force behind DOOM and Quake,
faced when he designed Quake was that in a complex world, many scenes have an
awful lot of polygons in the frustum. Most of those polygons are partially or
entirely obscured by other polygons, but the painter's algorithm described
earlier requires that every pixel of every polygon in the frustum be drawn,
often only to be overdrawn. In a 10,000-polygon Quake level, it would be easy
to get a worst-case overdraw level of ten times or more; that is, in some
frames each pixel could be drawn ten times or more, on average. No rasterizer
is fast enough to compensate for an order of magnitude more work than is
actually necessary to show a scene; worse still, the painter's algorithm will
cause a vast difference between best-case and worst-case performance, so the
frame rate can vary wildly as the viewer moves around.

So the problem John faced was how to keep overdraw down to a manageable level,
preferably drawing each pixel exactly once, but certainly no more than two or
three times in the worst case. As with frustum culling, it would be ideal to
eliminate all invisible polygons in the frustum with virtually no work. It
would also be a plus to draw only the visible parts of partially visible
polygons. This balancing act had to be a lower-cost operation than the
overdraw that would otherwise result.
When I arrived at id in March 1995, John already had an engine prototyped and
a plan in mind, and I assumed that our work was a simple matter of finishing
and optimizing that engine. If I had been aware of id's history, however, I
would have known better. John had done not only DOOM, but also the engines for
Wolfenstein 3D and several earlier games, and had actually done several
different versions of each engine in the course of development (once doing
four engines in four weeks), for a total of perhaps 20 distinct engines over a
four-year period. John's tireless pursuit of new and better designs for
Quake's engine, from every angle he could think of, would end only when we
shipped.
Three months after I arrived, only one element of the original VSD design was
anywhere in sight, and John had taken "try new things" further than I'd ever
seen it taken.


The Beam Tree


John's original Quake design was to draw front to back, using a second BSP
tree to keep track of what parts of the screen were already drawn and which
were still empty and therefore drawable by the remaining polygons. Logically,
you can think of this BSP tree as being a 2-D region describing solid and
empty areas of the screen (see Figure 4), but in fact it is a 3-D "beam tree."
A beam tree is a collection of 3-D wedges (beams), bounded by planes,
projecting out from some center point, in this case the viewpoint, as in
Figure 5.
In John's design, the beam tree started out as a single beam describing the
frustum; everything outside that beam was marked solid (so nothing would draw
there), and the inside of the beam was marked empty. As each new polygon was
reached while walking the world BSP tree front to back, that polygon was
converted to a beam by running planes from its edges through the viewpoint,
and any part of the beam that intersected empty beams in the beam tree was
considered drawable and added to the beam tree as a solid beam. This continued
until either there were no more polygons or the beam tree became entirely
solid. Once the beam tree was complete, the visible portions of the polygons
that had contributed to the beam tree were drawn.
The advantage to working with a 3-D beam tree, rather than a 2-D region, is
that determining which side of a beam plane a polygon vertex is on only
involves checking the sign of the dot product of the ray to the vertex and the
plane normal, because all beam planes run through the origin (the viewpoint).
Also, because a beam plane is completely described by a single normal,
generating a beam from a polygon edge requires only a cross product of the
edge and a ray from the edge to the viewpoint. Finally, bounding spheres of
BSP nodes can be used to do the aforementioned bulk culling to the frustum.
The early-out feature of the beam tree--stopping when the beam tree becomes
solid--seems appealing, because it appears to cap worst-case performance.
Unfortunately, there are still scenes where it's possible to see all the way
to the sky or the backwall of the world, so in the worst case, all polygons in
the frustum will still have to be tested against the beam tree. Similar
problems can arise from tiny cracks due to numeric precision limitations.
Beam-tree clipping is fairly time consuming, and in scenes with long view
distances, such as views across the top of a level, the total cost of beam
processing slowed Quake's frame rate to a crawl. In the end, the beam-tree
approach suffered from much the same malady as the painter's algorithm: The
worst case was much worse than the average case, and it didn't scale well with
increasing complexity.


3-D Engine du Jour


Once the beam tree was working, John relentlessly worked at speeding up the
3-D engine, always trying to improve the design, rather than tweaking the
implementation. At least once a week, and often every day, he would walk into
my office and say, "Last night I couldn't get to sleep, so I was thinking..."
and I'd know that I was about to get my mind stretched yet again. John tried
many ways to improve the beam tree, with some success, but more interesting
was the profusion of wildly different approaches that he generated, some of
which were merely discussed, others of which were implemented in overnight or
weekend-long bursts of coding, in both cases ultimately discarded or further
evolved when they turned out not to meet the design criteria well enough. Here
are some of those approaches, presented in minimal detail in the hopes that,
like Tom Wilson with the Paradise FIFO, your imagination will be sparked.
Subdividing raycast. Rays are cast in an 8x8 screen-pixel grid; this is a
highly efficient operation because the first intersection with a surface can
be found by simply clipping the ray into the BSP tree, starting at the
viewpoint, until a solid leaf is reached. If adjacent rays don't hit the same
surface, then a ray is cast halfway between, and so on, until all adjacent
rays either hit the same surface or are on adjacent pixels; then the block
around each ray is drawn from the polygon that was hit. This scales very well,
being limited by the number of pixels, with no overdraw. The problem is
dropouts; it's quite possible for small polygons to fall between rays and
vanish.
Vertex-free surfaces. The world is represented by a set of surface planes. The
polygons are implicit in the plane intersections, and are extracted from the
planes as a final step before drawing. This makes for fast clipping and a very
small data set (planes are far more compact than polygons), but it's
time-consuming to extract polygons from planes.
Draw-buffer. Like a z-buffer, but with one bit per pixel, indicating whether
the pixel has been drawn yet. This eliminates overdraw, but at the cost of an
inner-loop buffer test, extra writes and cache misses, and, worst of all,
considerable complexity. Variations include testing the draw-buffer a byte at
a time, completely skipping fully occluded bytes, or branching off each
draw-buffer byte to one of 256 unrolled inner loops for drawing 0-8 pixels, in
the process possibly taking advantage of the ability of the x86 to do the
perspective floating-point divide in parallel while eight pixels are
processed.
Span-based drawing. Polygons are rasterized into spans, which are added to a
global span list and clipped against that list so that only the nearest span
at each pixel remains. Little sorting is needed with front-to-back walking,
because if there's any overlap, the span already in the list is nearer. This
eliminates overdraw, but at the cost of a lot of span arithmetic; also, every
polygon still has to be turned into spans.
Portals. The holes where polygons are missing on surfaces are tracked, because
line-of-sight can extend only through such portals. Drawing goes
front-to-back, and when a portal is encountered, polygons and portals behind
it are clipped to its limits, until no polygons or portals remain visible.
Applied recursively, this allows drawing only the visible portions of visible
polygons, but at the cost of a considerable amount of portal clipping.


Breakthrough


In the end, John decided that the beam tree was a sort of second-order
structure, reflecting information already implicitly contained in the world
BSP tree, so he tackled the problem of extracting visibility information
directly from the world BSP tree. He spent a week on this, as a byproduct
devising a perfect DOOM (2-D) visibility architecture, wherein a single,
linear walk of a DOOM BSP tree produces zero-overdraw 2-D visibility. Doing
the same in 3-D turned out to be a much more complex problem, though, and by
the end of the week John was frustrated by the increasing complexity and
persistent glitches in the visibility code. Although the direct-BSP approach
was getting closer to working, it was taking more and more tweaking, and a
simple, clean design didn't seem to be falling out. When I left work one
Friday, John was preparing to try to get the direct-BSP approach working
properly over the weekend.
When I came in on Monday, John had the look of a man who had broken through to
the other side--and also the look of a man who hadn't had much sleep. He had
worked all weekend on the direct-BSP approach, and had it working reasonably
well, with insights into how to finish it off. At 3:30 a.m. Monday morning, as
he lay in bed, thinking about portals, he thought of precalculating and
storing in each leaf a list of all leaves visible from that leaf, and then at
run time, just drawing the visible leaves back-to-front for whatever leaf the
viewpoint happens to be in, ignoring all other leaves entirely.
Size was a concern; initially, a raw, uncompressed potentially visible set
(PVS) was several megabytes in size. However, the PVS could be stored as a bit
vector, with one bit per leaf, a structure that shrinks a great deal with
simple zero-byte compression. Those steps, along with changing the BSP
heuristic to generate fewer leaves (choosing the polygon that splits the
fewest other polygons as the next splitter is clearly the best heuristic,
based on the latest data, contrary to what I said a few months back) and
sealing the outside of the levels so the BSPer can remove the outside
surfaces, which can never be seen, eventually brought the PVS down to about 20
KB for a good-size level.
In exchange for that 20 KB, culling leaves outside the frustum is speeded up
(because only leaves in the PVS are considered), and culling inside the
frustum costs nothing more than a little overdraw (the PVS for a leaf includes
all leaves visible from anywhere in the leaf, so some overdraw, typically on
the order of 50 percent but ranging up to 150 percent, generally occurs).
Better yet, precalculating the PVS results in a leveling of performance; worst
case is no longer much worse than best case, because there's no longer extra
VSD processing--just more polygons and perhaps some extra overdraw--associated
with complex scenes. The first time John showed me his working prototype, I
went to the most complex scene I knew of, a place where the frame rate used to
grind down into the single digits, and spun around smoothly, with no
perceptible slowdown.
John says precalculating the PVS was a logical evolution of the approaches he
had considered, that there was no moment when he said "Eureka!". Nonetheless,
it was clearly a breakthrough to a brand-new, superior design--a design that,
together with a still-in-development, sorted-edge rasterizer that completely
eliminates overdraw, comes remarkably close to meeting the "perfect-world"
specifications we laid out at the start.


Simplify and Keep Trying New Things


What does it all mean? Exactly what I said up front: Simplify, and keep trying
new things. The precalculated PVS is simpler than any of the other schemes
considered (although precalculating the PVS is an interesting task that I'll
discuss another time). In fact, at run time, the precalculated PVS is just a
constrained version of the painter's algorithm. Does that mean it's not
particularly profound?
Not at all. All really great designs seem simple and even obvious--once
they've been designed. But the process of getting there requires incredible
persistence, and a willingness to try many different ideas until the right one
falls into place.
Chris Hecker has a theory that all approaches work out to the same thing in
the end, since they all reflect the same underlying state and functionality.
In terms of underlying theory, I've found that to be true; whether you do
perspective texture mapping with a divide or with incremental hyperbolic
calculations, the numbers do exactly the same thing. When it comes to
implementation, however, my experience is that simply time shifting an
approach, or matching hardware capabilities better, or caching can make an
astonishing difference. Terje Mathisen likes to say that "almost all
programming can be viewed as an exercise in caching," and that's exactly what
John did. No matter how fast he made his VSD calculations, they could never be
as fast as precalculating and looking up the visibility. His most inspired
move was to yank himself out of the "faster-code" mindset and realize that it
was in fact possible to precalculate (in effect, cache) and look up the PVS.
The hardest thing in the world is to step outside a familiar, pretty good
solution to a difficult problem and look for a different, better solution. The
best way I know to do that is to keep trying new, wacky things, and always,
always, always try to simplify. One of John's goals is to have fewer lines of
code in each 3-D game than in the previous game, on the assumption that as he
learns more, he should be able to do things better with less code.
So far, it seems to have worked out pretty well for him.


Learn Now, Pay Forward


There's one other thing I'd like to mention before I close up shop for this
month. As far back as I can remember, Dr. Dobb's Journal has epitomized the
attitude that sharing programming information is a Good Thing. I know a lot of
programmers (me, for one) who were able to leap ahead in their development
because of James Hendrix's Tiny C, or Al Stevens' D-Flat, or simply by
browsing through Dr. Dobb's Journal's annual collections. Most companies
understandably view sharing information in a very different way, as potential
profit lost--but that's what makes Dr. Dobb's Journal so valuable to the
programming community.
It is in that spirit that id Software is allowing me to describe in these
pages how Quake works, even before Quake has shipped. That's also why id has
placed the full source code for Wolfenstein 3D on
ftp.idsoftware.com/idstuff/source; you can't just recompile the code and sell
it, but you can learn how a full-blown, successful game works; check
wolfsrc.txt in the aforementioned directory for details on how the code may be
used.
So remember, when it's legally possible, sharing information benefits us all
in the long run. You can pay forward the debt for the information you gain
here and elsewhere by sharing what you know whenever you can, by writing an
article or book, or posting on the net. None of us learns in a vacuum; we all
stand on the shoulders of giants such as Wirth and Knuth and thousands of
others. Lend your shoulders to building the future!


References


Foley, James D. et al. Computer Graphics: Principles and Practice. Reading,
MA: Addison-Wesley, 1990.
Teller, Seth. "Visibility Computations in Densely Occluded Polyhedral
Environments" (dissertation), available at http://theory.lcs.mit.edu/~seth/
along with several other papers relevant to visibility determination.
------. "Visibility Preprocessing for Interactive Walkthroughs." SIGGRAPH 91
Proceedings.

Figure 1: In Quake, polygons are stored in the empty leaves. Shaded areas are
solid leaves (solid volumes, such as the insides of walls).
Figure 2: An ideal VSD architecture would draw only visible parts of visible
polygons.
Figure 3: Node E describes the shaded subspace, which contains leaves 5, 6,
and 7, and node F.
Figure 4: Quake's beam tree effectively partitioned the screen into 2-D
regions.
Figure 5: Quake's beam tree was composed of 3-D wedges, or beams, projecting
out from the viewpoint to polygon edges.


























































DTACK REVISITED


Gresham Emerges Triumphant




Hal W. Hardenbergh


Hal is a hardware engineer who sometimes programs. He is the former editor of
DTACK Grounded, and can be contacted through the DDJ offices.


Once upon a time, Americans would make major purchases--Conestoga wagons, for
instance--using gold coins for payment. Then silver dollars appeared, and gold
coins vanished from the marketplace. When I was a youngster, retail purchases
were frequently made using silver dollars. But paper money drove them out;
it's rare to see a silver dollar change hands these days.
Those of you who had to endure Economics 101 will recognize Gresham's law: Bad
money drives out the good. Sir Thomas Gresham, the founder of the English
Royal Exchange, first explained this law to Queen Elizabeth I in 1558.


My New Minitower Cases


My PCs used to consist of the original, horizontal-format case with a small
monitor sitting on top. As I moved to larger monitors, it became evident that
the minitower case was better for my purposes. In recent years I've swapped
cases on four PCs, once because of a power-supply cooling-fan failure.
Here in Silicon Valley you can buy a PC case (always with power supply) at
most any clone outlet. I've always checked Micro Times, a local, free tabloid
filled with PC ads, to find the lowest prices. I think my first minitower case
set me back $63.00. Most recently, minitower cases have dropped to $39.00.
Except for the plastic front, PC cases are made from cold rolled steel (CRS).
Until recently, the insides were all alike: The case's innards had deburred
edges, were cadmium-plated (to provide that silvery finish) and were welded to
provide a strong, rigid structure. 
About 18 months ago, I bought a $43.00 case. It was immediately obvious that
the CRS was not cadmium-plated; the sheet metal was an ugly gray. After
installing the hard drive, the floppy drive, and the motherboard, I discovered
both my hands were bleeding. The individual pieces of CRS had been
die-punched, a cheap method of fabrication that leaves the edges ragged
because die-punching actually tears the edges of the metal. To save money, the
metal had not been deburred. The pieces had, however, been welded together.
Recently I bought a $39.00 case. Once again I found unplated, die-punched CRS.
This time the structure had not been welded; instead it had been pop-riveted,
a very cheap and not particularly strong way to assemble a frame. I have to
admit the structure was strong enough to support a couple of disks, a
motherboard, and a few cards--modern computer parts aren't very heavy.
But I didn't want to cut up my hands again. I had an old minitower case whose
power supply had failed and which I was planning to throw away on the next
city spring-cleaning drive. But the new power supply fit into the old case. So
I unscrewed the plastic front panel from the new case and discovered it fit
the old case just fine. Now I had a new front panel and a new power supply in
a deburred, cad-plated, welded case.
That left me with a useless, pop-riveted steel cage. I noticed the sheet steel
looked awfully thin. So I stepped on the cage and it immediately collapsed. I
turned it and stepped on it a couple more times and wound up with a much
smaller mass of smushed metal, which I tossed in my garbage (no waiting for
spring cleaning). Oh, yes: I had to use the new U-shaped cover on my hybrid
minitower case because the fit to the front panel changed.
I can now find the occasional ad in Micro Times that offer hand-cutting-free
PC cases at considerably more than $39.00. Very few are sold.
I still have the cardboard boxes the various minitower cases came in. All have
the same diamond-shaped logo on the side, meaning they came from the same Far
East trading company. But the good original cases came from Taiwan and had a
net weight of 8 kilograms (KG). The unplated, undeburred welded case also came
from Taiwan and netted 7.5 KG. The chintzy pop-riveted version was made in
China and weighed 5.65 KG.
If this trend continues, the next version of the minitower case will come from
Tibet or Chad, will be made of used aluminum foil, and will weigh 3.25 KG,
including front panel, cover, and power supply. But it will cost $29.00
retail, and people will buy it.


The New Gresham's Law


The PC-buying public has clearly decided that personal computers are generic
products. It matters how much "horsepower" the CPU has, and how much DRAM and
hard-disk space the computer has, but there is remarkably little brand
recognition and even less brand loyalty.
This is a smart decision by the public because, in fact, there's precious
little difference between brands of PCs. Some thoughtful consumers are even
making the deliberate choice to purchase systems from low-priced clone shops.
Not only do you get a lower price up front, but the PC will have been
assembled from generic parts and hence can be easily repaired. A major
brand-name supplier like Compaq or IBM may well have used a proprietary
motherboard (or such) that can't be repaired three or four years later because
the part is out of production.
Indeed, the fastest-growing PC-system supplier is Packard Bell, whose
computers closely resemble those sold by the cheap clone outlets. Some reviews
of Packard Bell's PCs have noted how flimsy the case seems to be....


Product Differentiation


The marketing types at the PC-system makers have been flummoxed. The market
absolutely rejects any attempt to differentiate PCs on the excellent grounds
that anything that's different is likely to be incompatible with some of those
warehouses full of shrink-wrapped software. Even refrigerators exist in
greater variety than PCs.
So the blue-suede-shoes types wept with joy at the development of multimedia.
But it turned out that multimedia meant a CD-ROM drive, sound card, and stereo
speakers. After growing 300 percent in 1993 and another 300 percent in 1994,
multimedia sales are now growing at 20 percent--exactly as fast as the overall
market. Multimedia equals encyclopedias and games, and a PC is a horrendously
expensive game machine. And, for that matter, how often do you really need an
encyclopedia at home?
According to the business section of my local Silicon Valley newspaper, system
makers have decided that future PCs will also be TV sets. I wish them luck. I
can buy a 20-inch TV for about $200, which frees up the PC. And The Wall
Street Journal has begun to print articles about contention within the home
for the keyboard of the lone PC. Tie up a $1200-$2500 multimedia PC just to
watch "Jeopardy?" I don't think so.
The point of all this is that excess costs have been thoroughly squeezed out
of the PC market. Which means anybody who expects to charge a premium price
for a different-color case or unnecessary bells or whistles is doomed to
failure.
Have you noticed that Apple Computer has been losing market share lately?
Apple is the company that historically wants to charge more for its computers
because they run a smaller selection of shrink-wrapped software. Uh, did I get
that right?


UNIX on Life Support


In case you missed it a while back, AT&T sold UNIX to Novell. Novell first
tried to figure out what UNIX was good for, then sold it to the Santa Cruz
Operation for a heck of a lot less than it paid AT&T.
You like UNIX? Stick around. In a few years you'll probably be able to afford
it--not a UNIX system, but UNIX itself. The whole ball of wax.
Intergraph, for instance, makes turnkey CAD stations, just the sort of thing
that Fortune 5000 companies like to buy. Well, Intergraph is making the switch
from UNIX (and RISC) to Windows NT (and the CISCy Pentium Pro). That's almost
a billion dollars a year fleeing the UNIX/RISC market for a Wintel safe haven.
Guess why UNIX-system makers have, like Apple, lost market share to Wintel the
past couple of years?



RISC Sinking Fast


So why did Hewlett-Packard cut a deal with Intel to make its future CPUs
compatible with Wintel systems? Why is IBM widely rumored to be developing the
Wintel-compatible 615 CPU? Why have both the SPARC and MIPS camps announced
that their future CPUs will feature hardware support for x86 emulation?
"The war between RISC and CISC is over and RISC won." I've been reading that
for over a decade, and I always get a big kick out of it. You see, I used to
write the newsletter DTACK Grounded; then I wrote for the late Programmer's
Journal, and more recently for Dr. Dobb's Journal (see the January 1994 DDJ,
for instance). This publication trail proves I've consistently asserted that
CISC would emerge triumphant over RISC. Did you notice how Intel has prospered
these past dozen years, years in which RISC folk incessantly forecast Intel's
imminent demise?
I wasn't the lone ranger; a few other sensible techies remained in the CISC
camp. But I'd guess that the RISC camp, at its peak, included 98 percent or
more of the techie crowd who follow such things.
There may be several reasons CISC won. I can tell you my reason for supporting
CISC during the long, cold winter years when I was ridiculed for doing so.
CISC code is twice as dense as RISC code. RISCs require twice as much (in
bytes) to accomplish a given amount of work. That includes the PowerPC.
Which means RISC needs more hardware: twice as much bus bandwidth and twice as
much L1 and L2 instruction cache, to get the same work done. And while RISC
system makers could and did provide those additional resources, they made a
critical error: They failed to notice that more bandwidth and bigger caches
cost more money for the same performance.
The mass personal-computer marketplace will absolutely not tolerate anything
that costs extra money without providing a corresponding benefit.


High Prices, Yay! UNIX, Hooray!


To the extent that RISC prospered, it prospered in the UNIX arena, where high
prices prevailed and there were no personal computers. But the PC market, with
its economies of scale, is a huge black hole that is sucking in everything
nearby. The RISC folk are straining to keep their heads above the event
horizon, but they're doomed. You don't think Intergraph is the only company
fleeing the high costs of the UNIX/RISC camp, do you?


Thought Experiments


Once upon a time, a supercomputer would support 2000 scientists. Each worker
would submit jobs for batch processing. If the supercomputer wasn't down, the
job would be completed and the results printed in about 24 hours. Now, imagine
the supercomputer as a large black box. What's the most economical black box
that'll do the same job as the supercomputer? Why, a cluster of desktop
workstations! Yesterday, the workstations were RISC based, but tomorrow
they'll be Pentium Pro based.
Did you notice that bankruptcy liquidators just auctioned off the assets of
Seymour Cray's Colorado Springs supercomputer company? And that the major
"asset" was a completed, up-and-running Cray-3 supercomputer that nobody
wanted to buy?
Another black box: It takes a MIPS-based Silicon Graphics workstation about an
hour to produce one computer-generated frame for a movie such as Jurassic
Park. (Workstations get faster with time but the amount of computer-generated
detail demanded by movie patrons does, too, so that hour is pretty much a
constant.)
Movies use 24 frames per second, so it takes 60 days for one workstation
running 24 hours a day to generate one minute of a movie. Given a z-buffer,
doesn't it make more sense to have a roomful of (a black box full of) Pentium
Pro systems (already under $4K/system) than a roomful of more-expensive RISC
workstations?


The Bottom Line


The point is, Wintel computers provide the most performance for the least
dollars. Soon, any computing job that can be performed by a Wintel computer
will be performed by a Wintel computer. This has everything to do with thin,
unplated, pop-riveted, hand-cutting computer cases. All of the excess costs
have been squeezed out. Gresham would have instantly understood the situation.































20/20


Recycling Windows Controls for Delphi




Al Williams


Al is a consultant specializing in software development, training, and
documentation. He is also the author of Steal This Code! (Addison-Wesley,
1995). You can reach Al at 72010.3574@compuserve.com.


Remember Nehru jackets, hula hoops, and mood rings? Fads come and (thankfully)
go. However, sometimes it's hard to tell what's a passing fad and what's here
for the long term. Hey, even telephones, television, and computers were once
considered fads. For that matter, who remembers the Cauzin strip reader?
Bubble memory? The PS/2?
But then, sometimes fads come back. I've recently seen kids wearing bell
bottoms, for instance. Which brings me to Delphi....
Delphi has brought new life to Pascal. Once, Pascal was a major PC language,
but C and C++ have crowded it off most developer's PCs. Now that Delphi is one
of the more exciting visual-development environments around, Pascal
programmers are in higher demand. 
If you are using Delphi now, do you have to give up all the controls you use
in your C programs? Of course not. Delphi is a complete programming
language--you can create controls and link to DLLs as easily as you can with
C. There are a few tricks you need to know, but there's not much to it.
With a bit more effort, you can wrap your existing controls with a VCL
component, then use them as simply as you use any other Delphi component. The
control should reside in a DLL. You can link .OBJ files to your Delphi
project, too. These techniques allow you to benefit from Delphi's component
architecture without rewriting code. In this installment of "20/20," I'll
examine how you build wrappers for existing controls in DLLs or .OBJ files. 


The Control


The existing C control I wanted to use with Delphi is a simple countdown
timer: the TimeCtrl class (the complete source code is available
electronically; see "Availability," page 3). You send the timer a message
(TC_SET) to set the number of seconds before the time-out period expires. Then
you send a TC_GO message with wParam set to 1 to start the timer; a 0 in
wParam suspends the timer. The control sweeps a clock hand around for each
second until it reaches zero. The control sends its parent window a special
message (TC_TICK) for each second that elapses with wParam set to the current
count. When wParam is 0, the countdown is complete. You can also read the
current timer value using TC_GETTIME.
In addition to these four messages, the control's DLL defines the
TControl_Init() function. This call does nothing; however, if you call it from
within your code, you force Windows to load the DLL before your program loads.
If you examine the code for this control you might argue that the countdown
isn't truly accurate. It depends on each timer message received to compute the
number of seconds. Of course, you can't depend on getting timer messages at
exactly the interval you request. However, the intent for this control is to
count down the time you have left to do something (cancel an operation, for
example). Suppose the system load is so high that timer messages are not
getting through to the application. Then the user probably couldn't interact
with the system during that time either. So, although the elapsed time may not
be correct, for this purpose, it is better to be incorrect.


The Plan


Delphi controls derive from the class TWinControl, which provides the base
mechanisms for incorporating a window as a Delphi component. Normally, you
derive a class from TWinControl (or one of its subclasses) and implement the
control's functionality in the new class. In our example, you still need to
derive a new class; however, this class will control an existing window--the
TimeCtrl window.
Although creating a wrapper class for a control may seem unusual, it isn't.
Delphi creates TWinControls to encapsulate the standard Windows controls (for
example, TButton). Indeed, the easiest way to start is to examine the existing
code for classes like TButton. To make this work you need to figure out the
following: 
How to send a window a message.
How to handle an incoming message.
Where to tell Delphi what window class to use when it creates the window.


Delphi Message Handling


Sending messages from a Delphi program is no problem--just use SendMessage()
or PostMessage(). Receiving a message seems simple. Any TControl-derived class
can contain a function that uses the Message keyword. This function will
automatically receive the indicated message. Example 1 shows WM_SETFOCUS
messages in a form (TForm derives from TControl).
The message-handling procedure must take a single var argument (usually a
record type from the Messages unit). By using a specific record type that
corresponds to the message, the procedure will parse the message parameters
correctly. For example, the TWMActivate record picks apart the wParam and
lParam parameters into more-meaningful fields for WM_ACTIVATE messages. If the
message has no special record, use the generic TMessage record. You can also
define your own records to handle custom messages.
Why does this work? Messages travel a convoluted route before they wind up in
your TControl-derived class. All messages for the control go to the WndProc
procedure (in the \DELPHI\SOURCE\VCL\ CONTROLS.PAS of the VCL source code).
This function does some housekeeping and eventually calls Dispatch to route
the message to the appropriate message-handling function. If there is no
corresponding message function, Dispatch calls DefaultHandler. Since the
WndProc procedure is virtual, derived classes may override it to handle
particular messages. To handle a message yourself, you can override WndProc
or, better still, define a method with the Message attribute.


Inside TWinControl


To use legacy code, you may need a few items inside TWinControl. The first is
the read-only Handle property, which accesses the ordinary window handle for
the control. You'll need this almost every time you make a Windows API call.
The biggest problem is how to make Delphi create a window of a specified
class. To do this, you need to link the external DLL to your Delphi program
and supply the correct class name. If the DLL has a function you call to
initialize (the way the timer control does), you'll want to declare the
function using the external statement. For example: procedure TControl_Init;
far; external 'TCONTROL' name 'TControl_Init';.
To load the DLL dynamically, use the LoadLibrary and FreeLibrary calls, just
as in an ordinary Windows program.
Once you have the DLL linked in, override the CreateParams procedure to make
sure Delphi uses the new class name when it creates the new control. Call the
base-class version and then CreateSubClass, which requires two arguments: a
TCreateParams record (the one passed into CreateParams) and a class name.
You may also want to override the new component's constructor. Set a
reasonable default size using the Width and Height properties. If you don't,
the control will be practically invisible when you first drop it on a form.
You can now write properties and methods to control the window class. You can
also define const message IDs to facilitate sending and receiving messages.



The Wrapper


Listing One presents the final wrapper code for the timer control. Several
const statements define each of the control's messages. Next, the code defines
a TTimeControl class derived from TWinControl. The protected section of this
class contains methods to support the Time property (defined later) and the
override for CreateParams. The public section defines verbs that send messages
to the control (Go and Stop) and overrides the constructor (Create). The class
also defines a Time property.
In the implementation section, the code defines an external reference to the
DLL function TControl_Init. The SetTime, GetTime, Go, and Stop members simply
use the Handle property to send messages to the control. The constructor sets
a default size and also calls TControl_Init. Since this function does no
actual work, it is safe to call it multiple times.


Where's the Message Handler?


You might expect to find a message handler for TC_TICK inside the TTimeControl
class, but this isn't the right place for a handler--the message goes to the
control's parent window, not the control itself. This is contrary to the way
ordinary Delphi controls appear to work.
When you press an ordinary button control, the WM_COMMAND message goes to the
button's parent. A Delphi button allows you to intercept this message by
setting the button's OnClick property. This only works because TForm (the
class that typically receives the WM_COMMAND message) routes the command
message back to the originating control (as a CN_COMMAND message). To get this
same behavior from a custom control, you'd need to modify TForm or one of its
base classes. Alternatively, you could add special code to your program's
TForm-derived class.
If you're adding a function to your TForm-derived class anyway, you might as
well process the TC_TICK message in that function. While this is odd for a
Delphi program, it closely models the work of ordinary controls.


Building the Control


To build and install the control, select Install Components from the Options
menu. Press the Add button and select the TIMECTL.PAS file. When you close the
dialog, Delphi will rebuild the component library including the new component.
The TCONTROL.DLL file must be in a directory that appears in your path or
Delphi will refuse to load the new component library.
If you make changes to the component, select the Rebuild Library menu item
from the Options menu. If you make a mistake and Delphi refuses to load the
new library, recover with the backup library file (COMPLIB.~DC). The build can
take quite some time. You might want to set the Show Compiler Progress option
in the Environment dialog so you can see the progress of the build.


Using the New Control


Listing Two is an example program called "StartUp" that uses the new control;
Figure 1 is its startup screen. The program manages a list of .BAT files in a
specific directory. Since Windows 95 batch files can execute Windows
applications, this is a useful program to put in your startup group. When
Windows starts, you'll see a list of batch files. You have several seconds to
select one or cancel the program. If you select a batch file or the time
expires, the selected batch file executes.
StartUp uses an .INI file to control the amount of time it waits, the
directory it looks for batch files in, and the default batch file to use. The
program is unremarkable in most respects. It uses a standard file list box and
a few buttons. All of the INI file settings are in the [config] section. The
variables available are:
DIR, the directory StartUp searches for batch files.
TIME, the time in seconds to wait for user input.
DEFAULT, the name of the batch file to execute if there is no user input.
StartUp uses a TimeCtrl to manage the time-out period. Once you install the
TimeCtrl component, you can drop it on a form just like any other control.
Here, the control's name is CountDown. There are only a few places the program
interacts with CountDown:
The form includes the TimeCtl unit in the Uses clause.
During the FormCreate procedure, StartUp reads its INI file, and sets the
CountDown.Time property based on the information in the file.
After setting the Time property, the FormCreate procedure calls CountDown's Go
method to start the countdown.
When there is a change to the file list box (FileBoxClick), the program calls
the CountDown.Stop procedure. This will allow you to select a file and stop
the countdown.
The form incorporates a procedure that uses the Message keyword to intercept
TC_TICK (the Tick procedure). When the parameter passed to Tick is 0, the code
simulates a button push and calls Close to end the program.


Linking to Object Files


You can also use the external keyword to link to an object file written in
another language. Suppose you have an .OBJ file that contains a function named
AboutBox() written in C (ABOUTBOX.C is available electronically). If the
function uses the Pascal calling convention, you can declare it in your Delphi
program this way: function AboutBox(w:HWnd; s:PChar) : Integer; external;.
You also need to link the object file with your program by using the $L
directive; for instance, {$L ABOUTBOX.OBJ}.
Delphi sets strict limits on the object files you can use. In particular, all
code must be in a segment named CODE, CSEG, or a name ending with _TEXT.
Initialized data must reside in a segment named CONST or a name ending with
_DATA. Uninitialized data must appear in a segment named DATA, DSEG, or a name
ending with BSS.
You can link C code into your Delphi program as long as you are careful about
using C library calls. Some C calls (for example, malloc()) require
initialization before their use. In a C program, that happens automatically,
but in a Delphi program, you probably shouldn't use them. You can use Windows
API and Delphi calls freely. If you use a standard C-library function, you'll
get a link error that the symbol is undefined because Delphi doesn't search
the C run-time library automatically.
To bring the C functions in, extract the .OBJ files from the library (usually
CWS .LIB). Those functions may use other functions--it can take a while to get
it right. For example, if Delphi complains that _strcat is undefined, you need
to generate STRCAT.OBJ. Use TLIB to extract the .OBJ file from the standard
library: tlib cws.lib *strcat.obj.
Next, make Delphi link the file {$L STRCAT.OBJ}. After recompiling, you'll
need strcpy, so repeat the process.
Here are a few other tips for mixing C code into a Delphi program:
Use the small model, but declare all pointers far (for example, LPSTR).
Use the PASCAL keyword to set the Pascal calling convention.
Pass any values you need as parameters.
Be aware that you can step through your C code with the integrated debugger,
but you won't be able to inspect variables.
Enable TDW debugging information in both the C program and the Delphi program
to use Turbo Debugger. (This only works if Delphi's linker and your copy of
TDW agree on version numbers.)
Retrieve the window handle for any control (including a form) using the
Handle.
Use the null-terminated string functions and the PChar type for strings.
Thoroughly test all C library functions to ensure they will work apart from a
C program. 
Of course, the example given here doesn't do anything exciting. However, if
you had a large C-language algorithm available, being able to link it directly
into your Delphi program could save a lot of time and frustration.



Recycling Visual Basic


If you have existing code in Visual Basic and you want to move to Delphi,
Conversion Assistant from EarthTrek (Woburn, MA) may be just what you need.
You can save your Visual Basic program as text and pass the MAK file into the
Conversion Assistant.
The conversion is not perfect--it won't handle some things, and it does a
less-than-perfect job on others. Still, it does quite a bit of work to get you
started. Since Delphi can use most Visual Basic controls, programs that rely
heavily on VBXs are not a problem. At $149.00, this tool provides a
cost-effective first pass at translation. 


I am Not Al Stevens


Contrary to popular rumor, Al Stevens (Dr. Dobb's Journal's C columnist) and I
really are two different people. We even have witnesses who have seen us
together in public. Perhaps we should hold a "Name That Al" contest where you
win prizes for correctly identifying our pictures. Then again, if you've seen
us, it would be too easy--we don't look a thing alike.


About 20/20


This column launches the first installment of 20/20. In future columns, I'll
explore advanced visual-programming techniques for Visual Basic, Delphi, and
PowerBuilder. Occasionally, I'll look at some tools off the beaten
track--including some for non-Windows platforms.
The complexity of Windows programming, coupled with the pressure to decrease
product-development cycle times is fostering an explosion of
visual-programming tools. Although current visual-programming environments are
impressive, future releases promise to be even better. (If you don't think
there's room for improvement, check out NeXT's Interface Builder.)
As the visual-programming milieu changes, you'll read about it here. In the
meantime, drop me some e-mail and let me know what you are doing with visual
development and what you would like to talk about in future columns. Oh, and
tell me if you know where I can trade some bell bottoms for some of those
8-track stereo tapes.... 
Figure 1: The StartUp program's startup screen.
Example 1: Automatically receiving an indicated message.
type
 TAWindow = class(TForm)
private
 { Private declarations }
 procedure OnFocus(var Msg : TMessage);
 message WM_SETFOCUS;
 .
 .
 .

Listing One
unit TimeCtl;
interface
uses Messages, Controls, Classes, WinTypes, WinProcs, DsgnIntf;
const
 TC_SET = WM_USER;
 TC_GO = WM_USER + 1;
 TC_TICK = WM_USER + 2;
 TC_GET = WM_USER + 3;
type
TTimeControl = class(TWinControl)
protected
 procedure SetTime(seconds : Integer);
 function GetTime : Integer;
 procedure CreateParams(var Params: TCreateParams); override;
public
 procedure Go;
 procedure Stop;
 constructor Create(AOwner : TComponent); override;
published
 property Time : Integer read GetTime write SetTime;
end;
procedure Register;
implementation
procedure Register;
begin
 RegisterComponents('Samples', [TTimeControl]);

end;
 procedure TControl_Init; far; external 'TCONTROL' name 'TControl_Init';
 procedure TTimeControl.SetTime(seconds : Integer);
 begin
 SendMessage(Handle,TC_SET,seconds,0);
 end;
 function TTimeControl.GetTime : Integer;
 begin
 result:=SendMessage(Handle,TC_GET,0,0);
 end;
 procedure TTimeControl.CreateParams(var Params: TCreateParams);
 begin
 inherited CreateParams(Params);
 CreateSubClass(Params,'TimeCtrl');
 end;
 procedure TTimeControl.Go;
 begin
 SendMessage(Handle,TC_GO,1,0);
 end;
 procedure TTimeControl.Stop;
 begin
 SendMessage(Handle,TC_GO,0,0);
 end;
 constructor TTimeControl.Create(AOwner : TComponent);
 begin
 inherited Create(AOwner);
 Width:=33;
 Height:=33;
 TControl_Init;
 end;
end.

Listing Two
unit Mainunit;
interface
uses
 SysUtils, WinTypes, WinProcs, Messages, Classes, Graphics, Controls,
 Forms, Dialogs, StdCtrls, ExtCtrls, Buttons, FileCtrl, IniFiles, TimeCtl;
type
 TForm1 = class(TForm)
 FileBox: TFileListBox;
 Label1: TLabel;
 ExecuteBtn: TBitBtn;
 BitBtn2: TBitBtn;
 CountDown: TTimeControl;
 procedure ExecuteBtnClick(Sender: TObject);
 procedure FileBoxClick(Sender: TObject);
 procedure BitBtn2Click(Sender: TObject);
 procedure FormCreate(Sender: TObject);
 private
 { Private declarations }
 procedure Tick(var Msg : TMessage); message TC_TICK;
 public
 { Public declarations }
 end;
var
 Form1: TForm1;
 ini : TIniFile;
implementation

{$R *.DFM}
procedure TForm1.ExecuteBtnClick(Sender: TObject);
var
Buffer : PChar;
Size : Byte;
index : Integer;
ps : String;
begin
index := FileBox.ItemIndex;
if index <> -1 then
begin
 ps := FileBox.Items[index];
 Size := Length(ps);
 Inc(Size);
 GetMem(Buffer,Size);
 StrPCopy(Buffer,ps);
 WinExec(Buffer,SW_MINIMIZE);
 FreeMem(Buffer,Size);
end;
Close;
end;
procedure TForm1.FileBoxClick(Sender: TObject);
begin
Countdown.Stop;
end;
procedure TForm1.BitBtn2Click(Sender: TObject);
begin
Close;
end;
procedure TForm1.FormCreate(Sender: TObject);
var
 dir : String;
 i : Integer;
begin
ini:=TIniFile.Create('STARTUP.INI');
dir := ini.ReadString('Config','Dir','');
if (dir <> '') then
 ChDir(dir);
FileBox.Directory:=dir;
FileBox.Update;
dir := ini.ReadString('Config','Default','');
i := FileBox.Items.IndexOf(dir);
if i = -1 then
 FileBox.ItemIndex:=0
else
 FileBox.ItemIndex:=i;
Countdown.Time := ini.ReadInteger('Config','Time',30);
Countdown.Go;
end;
procedure TForm1.Tick(var Msg : TMessage);
begin
if Msg.wParam=0 then
 begin
 ExecuteBtnClick(Self);
 Close;
 end;
end;
end.
End Listings
































































PATTERNS AND SOFTWARE DESIGN


The Courier Pattern




Richard Helm and Erich Gamma


Richard is an architect with the IBM Consulting Group's Object Technology
Practice in Sydney, Australia. Erich is an architect and object technologist
at IFA in Zurich, Switzerland. They are coauthors of Design Patterns: Elements
of Reusable Object-Oriented Software (Addison-Wesley, 1994).


In previous columns, we've stressed the importance of ensuring that objects
that send a request assume only that objects receiving those requests support
a particular interface, but not anything about the receiving objects' concrete
class (that is, its implementation). This ensures that senders are decoupled
from receivers, and results in a characteristic design where the definition of
operations refers only to abstract--not concrete--classes. Several design
patterns from our book Design Patterns: Elements of Reusable Object-Oriented
Software involve decoupling senders and receivers of requests; the Observer
pattern discussed in our "Patterns and Software Design" column (Dr. Dobb's
Sourcebook, September/October 1995) is one of them. 
As Figure 1 illustrates, the Observer pattern decouples Subjects from their
dependent Observers by defining an interface for signaling changes in the
Subject. When decoupling senders and receivers, a common design issue is the
kind of information passed between them. The simplest solution is to pass no
information at all. Listing One implements a Notify() operation that doesn't
pass any information. Since no information is passed to the Observer and its
Update operation, the Observer has to find out what changed by asking the
Subject for its current state. In other words, it has to "pull" the state from
the subject. When it is expensive for the Observer to determine what changed,
the pull approach can be also be expensive. Clearly, there needs to be a way
for the Subject to pass more-specific change information. The challenge, then,
is making a concrete subject pass specific information about what changed to
its concrete observers, given that the interface between subjects and
observers is defined by the abstract classes Subject and Observer. 
One way to solve this is by adding a void* as a parameter to a tag that
identifies the information passed in the void*. For example, you could change
the Update operation of Observer to Update(longwhatChanged, void* info). This
approach is not particularly type safe, and a void* in a higher-level class
interface is not particularly elegant.
Chain of Responsibility is another pattern that decouples the sender from the
receiver. It does this by passing a request along a chain of potential
receivers. As Figure 2 suggests, event handlers are a good use of this
pattern. With the Chain of Responsibility pattern, objects at the front of the
chain try to handle requests (events) generated by some initial sender. If
they cannot, they pass the request along to the next object in the chain,
hoping that some object down the line will eventually be able to handle it.
Chain of Responsibility requires that each object in the chain have an
interface for handling the request. Again, the problem is, how can a sender
pass arbitrary information to its candidate receivers, or how can new types of
events be handled by the chain of objects? Defining a fixed interface for a
fixed set of events such as HandleMouseEvent or HandleTimerEvent is not a
viable solution when the set of events is extensible. For example, if the
system were extended to support drag-and-drop events, you would have to change
existing classes and add a HandleDropEvent operation.
The Mediator pattern also decouples objects by having them refer to each other
indirectly through a Mediator object (see Figure 3). The Mediator object is
responsible for routing requests between Colleague objects. In the Mediator
pattern, Colleague objects do not send requests directly to each other but
instead go through an intermediary Mediator. This nicely decouples Colleagues
from each other, but also means that Colleagues can only communicate with each
other via the fixed Mediator interface. Again, you have the problem of
defining an interface that enables concrete colleagues to pass specific
information to one another.
To summarize, decoupling objects by defining them in terms of interfaces
defined in abstract classes is often the basis for reusable, object-oriented
designs. But the objects may become so decoupled from each other that their
interfaces do not allow for the passage of specific information. The Courier
pattern is one solution to this problem. 


The Courier Pattern


Courier patterns allow objects to pass arbitrary requests and information
through a fixed interface. The key to the Courier pattern is to package the
information to be sent between objects as an object itself and pass this
"message object" as an argument to requests. 
Let's begin by exploring the Courier pattern in the context of the Observer
pattern. Assume that a TextSubject class defines an editable text buffer, and
a TextObserver somehow displays the TextSubject (Figure 1). The TextObserver
class receives notifications about changes in its TextSubject. Implementing a
TextObserver using the interface defined for the Observer class in Figure 1
would require that, when the TextSubject changes in any way, the TextObserver
has to retrieve the complete text buffer from the TextSubject. There is no way
for the TextObserver to determine what changed in the TextSubject using just
the Update() interface. Ideally, the TextSubject would be able to inform its
Observer exactly what about itself changed. Using the Courier pattern, we
would package this information into an object and send that to the
TextObserver. 
We first define a message object using the Notification class. In this case,
we define it as an empty class; see Listing Two. Then we change the Notify and
Update interfaces of Subject and Observer; see Listing Three.
A TextSubject that wants to pass additional information to its Observers can
now define a TextChangeNotification subclass that stores additional
information about the changed text range. Assume in Listing Four that there is
already a TextRange class that can be used to specify a range of changed text.
Now the TextSubject can pass a TextChangeNotification to notify its observers;
see Listing Five. Finally, a TextObserver can use this additional change
information to optimize how it updates itself; see Listing Six.
Other kinds of notification requests (text deletion, style changes) could be
easily added and handled in a similar way.
In the Courier pattern, notification requests are represented as objects. The
recipient of the request must analyze the message object and handle it
appropriately. First, the notification object has to be decoded to identify
its concrete class (we did this using the run-time, type-identification
mechanism provided by C++). Then the appropriate code to handle this request
must be determined and executed. In some situations, if the recipient class
cannot handle the request, it may have to defer it to its parent classes. This
adds run-time overhead.


Participants


As Figure 4 shows, the key participants in the Courier pattern are:
Message class (Notification in Listing Six), which defines the message passed
to a Recipient. It can represent requests and also define parameters and
return values of the request. The TextChangeNotification represented a
text-change request and also carried the request parameters (the range of text
that changed). The interface to Message must allow clients to determine the
kind of message (this can also be done by language mechanism). 
Sender, which creates a concrete Message and sends it to a Recipient.
Recipient class (Observer), which defines the interface to Handle messages;
for example, Handle(Message&). The Recipient must decode the Message,
determine which code to execute, and retrieve any parameters from the Message.

ConcreteRecipients (TextObserver), which receives Messages and decodes them in
the implementation of their Handle operation.
ConcreteMessages (TextChangeNotification), which defines the actual contents
of a particular Message.
Intermediary classes, which forward Messages from the initial Sender on to
their ultimate Recipient.
Note that in Figure 4, only the Sender and the ConcreteRecipient know about
the ConcreteMessage.


Applicability


When should you use the Courier pattern? The first situation is when a fixed
interface defined between abstract classes is insufficient to pass all
information required to recipients that are concrete subclasses of these
abstract classes. 
A second situation is when the requests to be sent between objects cannot be
anticipated in advance, and you need to provide hooks to allow the interface
to be extended for new requests. This occurs in event handlers implemented
using Chain of Responsibility when you want to extend the kinds of events the
handlers can process.
In the third situation, the class and the interface of the receiver are
unknown to classes sending requests, because the requests are sent via a third
object. In Mediator and Chain of Responsibility, the class and even the
interface of the class of object receiving requests are unknown to the sender.
Rather than having the intervening object modifying or adapting the requests
passing through it so it can be received by the ultimate recipient, we instead
define a single Handle(Message&) interface to allow the Messages to be passed
to the final Recipient unchanged.
Most uses of Courier are for more-loosely decoupling classes than is possible
through interfaces defined by abstract classes. Another situation in which
Courier can be very useful is during initial system prototyping and
development. 
One problem with statically typed languages (such as C++) is that you must
explicitly specify the interfaces to abstract classes in their declaration.
You also have to duplicate much of this interface declaration (certainly for
the virtual functions) in all subclasses inheriting from these abstract
classes. Yet, as systems evolve, their interfaces change. This is especially
true of newer systems, where it usually takes a few design iterations to find
the major abstractions and specify their interfaces correctly. Changing
interfaces to classes in C++ can be a costly operation. Not only do all
subclass declarations and definitions need to be rewritten, but a significant
amount of compile time may be spent in all clients implemented in terms of
this interface.
For systems being prototyped or evolving rapidly, an alternative is to use the
Courier pattern to send requests between classes in a system. Because Courier
allows interfaces of classes to be changed just by modifying class
implementations (or more precisely, the dispatching code in their
Handle(Message&) operations), there is much less cost in changing interfaces.
You simply have to add new Message subclasses and the appropriate dispatching
code.
When the interfaces eventually begin to "harden," the implicit dispatching
provided by the Courier pattern can be replaced by direct requests to
operations defined explicitly by the classes. The bodies of these operations
will be what was previously the bodies of the If statements in the Recipient's
dispatching operations.



Advantages and Disadvantages


The Courier pattern offers a number of benefits:
Sending Messages to a Recipient class requires few assumptions about the
Recipient's interface. It only has to define an interface to handle messages.
This allows the Sender and Recipient to remain loosely coupled. 
It permits message-sending strategies such as broadcasting messages to all
possible Recipients without caring about the class or interface of the
receivers (only that they implement a Handle message). Receivers not
interested in a particular message can simply ignore it. This avoids base
classes becoming bloated with an interface that supports all possible
operations. All a base class has to provide is a Handle(Message&) operation. 
Adding a new kind of message does not require changing existing classes.
Simply define a new Message class and implement the code that decodes and
executes the message.
The Courier pattern also has some inherent disadvantages:
The cost of Message dispatching. Decoding the Message to determine its class,
retrieving any parameters, and finding the operation to perform must be done
explicitly with conditional statements defined in the recipient. This is
inelegant and can be less efficient than using the language's dispatching
mechanisms directly. 
There is no guarantee that a Message can be handled by a Recipient. The
compiler will not check that Messages are valid for a Recipient, that the
message is decoded correctly, or that the operation to be performed by the
Recipient is the correct one.
The implementation of the Handle operation determines completely what a
Recipient can do. It is not possible to infer any semantics about a Recipient
from its interface.


Implementation Issues


One of the challenges when implementing Courier patterns is identifying the
Message classes. To be able to handle a Message, a Recipient must be able to
identify the kind of Message it receives. The Message can define an accessor
function that returns an identifier for the class as a constant or a string.
This requires that the sender and receiver agree on the encoding.
Alternatively, the receiver can use the language-supported, run-time type
information, as we have in the Observer example. In this case, there is no
need for special Message encoding conventions. 
A second issue is acknowledging Message receipt. Senders sometimes need to
know whether a Message was handled or not. For example, when propagating a
message along a chain of responsibility, the sender can stop propagating the
request once it is handled. To provide this kind of information, the
Handle(Message&) operation can return a Boolean.
Another implementation issue involves handling requests, which requires
identifying the message, then performing the appropriate action. Message
handling is often distributed across a base recipient class and its
subclasses. The base class can handle a more-general message, and the
subclasses can handle more-specific messages. To enable this kind of message
handling the derived class has to invoke the inherited Handle operation when
it receives a message that it does not handle, as in Listing Seven.
The Recipient's message dispatch code is essentially boilerplate code. Where
possible, the burden of having to create this boilerplate should be removed
from the client. One way is to have a code generator (often called a "wizard")
that generates the code based on some specification.


Self-Dispatching Messages


One potential problem with implementing Courier patterns is that all the
dispatching code is in the Recipient. An alternative solution is to make
Messages self dispatching. 
When we perform the dispatching, we have different Recipients and different
Messages. Code that actually gets executed when a message is received depends
on both the concrete classes (more strictly speaking, the type) of both the
Recipient and Message objects. There are languages (CLOS, for example) that
support operations (multi-methods-methods) that can be dispatched on more than
one parameter type. In such languages, message dispatching would be handled by
language directly and not implemented programmatically by the programmer.
However, languages such as C++ and Smalltalk dispatch operations on the type
of only one object--the object receiving the request. 
One technique to dispatch operations on multiple classes in such languages is
"double dispatch." We use a variation of this technique in Listing Eight. (The
only implementation we're aware of that uses this technique is the Input
System of the Taligent Frameworks.) 
The Messages dispatch themselves. To do so, the base Message class defines a
Dispatch function that takes a Handler as a parameter. The Handler class
itself has the interface in Listing Eight. The Message base class implements
the Dispatch function by simply calling HandleMessage on the handler passed to
it; see Listing Nine. A Sender uses these classes in Listing Ten. So far this
appears not very useful, as we've only introduced an additional level of
dispatching. But Sender need know nothing about the concrete classes of the
Message or the MessageHandler. 
Next, we define abstract classes that specify an interface for classes that
wish to handle particular kinds of messages. Let's take a TimerMessage class
defined as in Listing Eleven. For a TimerMessage, we define the
TimerMessageHandler class in Listing Twelve. These handler classes for
specific messages are used as mixin classes. A class that wants to handle a
TimerMessage has to inherit from the TimerMessageHandler class and implement
the pure virtual HandleTimerMessage function; see Listing Thirteen. Since the
Sender sees only Messages and MessageHandlers, you may wonder how a particular
Message finds its way to the specific MessageHandler. We first request a
message to dispatch itself to a particular handler, then the message requests
the handler to handle it. Consider the implementation of Dispatch for
TimerMessage. It uses run-time type information to check whether the handler
is a TimerMessageHandler. If it is, the message is delivered as a TimerMessage
by calling HandleTimerMessage; see Listing Fourteen. Dispatch invokes its
inherited operation that delivers the Message as an ordinary message. The
handler can use the less efficient, but more general, HandleMessage operation
to handle the message.
The Sender's first dispatch shows that the message is in fact a TimerMessage.
The message then works out the type of the handler to call the appropriate
operation for it, in this case HandleTimerMessage.
This technique frees the client code from dispatching messages by delegating
the dispatching logic to the Message itself. However, it does have some
drawbacks:
Each different message kind needs a corresponding Handler class, so the number
of classes is increased by two for each new message kind.
A class that handles many different message kinds has a HandlerClasses list. 
When a class wants to handle an additional message kind, its interface has to
be changed, since an additional handler class has to be mixed in.
Figure 1: Observer pattern decouples Subjects from their dependent Observers.
Figure 2: Using the Chain of Responsibility pattern for event handlers.
Figure 3: The Mediator pattern; (a) with no Mediator; (b) with Mediator.
Figure 4: Key participants in the Courier pattern.

Listing One
class Observer {
public: 
 //...
 virtual void Update();
};
class Subject {
public:
 void Attach(Observer*);
 void Detach(Observer*);
 void Notify();
private:
 List<Observer*> _observers;
};
void Subject::Notify ()
{
 ListIterator<Observer*> i(_observers);
 for (i.First(); !i.IsDone(); i.Next() ) {

 i.CurrentItem()->Update();
 }
}

Listing Two
class Notification {
public:
 virtual ~Notification();
protected:
 Notification();
};

Listing Three
class Subject {
public:
 // ...
 void Notify(Notification&);
};
class Observer {
public:
 // ...
 void Update(Notification&);
};

Listing Four
class TextChangeNotification: public Notification {
public:
 TextNotification(TextRange range);
 TextRange GetRange() const;
 //...
private:
 TextRange _range;
};

Listing Five
void TextSubject::ReplaceText (TextRange range)
{
 // Change the Text...
 // ... and tell all Observers
 Notify(TextChangeNotification(range));
}

Listing Six
void TextObserver::Update(Notification& notification)
{
 TextChangeNotification *changeNotification;
 changeNotification = 
 dynamic_cast<TextChangeNotification*>(&notification);
 if (changeNotification) {
 TextRange range = changeNotification->GetRange()
 // do an incremental update using _text->GetText(range);
 } else {
 // do a full update using _text->GetText();
 }
}

Listing Seven
void Recipient::Handle(Message& m)
{

 MyMessage *myMessage = dynamic_cast<MyMessage*>(&m);
 if (myMessage) {
 HandleMyMessage(myMessage);
 return;
 }
 OtherMessage *otherMessage = dynamic_cast<OtherMessage*>(&m);
 if (otherMessage) {
 HandleOtherMessage(otherMessage);
 return;
 }
 // unknown message, pass it to the base class BaseHandler
 BaseRecipient::Handle(m);
}

Listing Eight 
class MessageHandler {
public:
 virtual void HandleMessage(Message& message);
 //...
};

Listing Nine
class Message {
public:
 virtual void Dispatch(MessageHandler& handler);
 //...
};
void Message::Dispatch(MessageHandler& handler)
{
 handler.HandleMessage(*this);
}

Listing Ten
void Sender::Operation() {
 Message *aMessage;
 MessageHandler *aHandler;
 // .. dispatch the message...
 aMessage->Dispatch(aHandler);
}

Listing Eleven
class TimerMessage: public Message {
public:
 TimerMessage(long);
 int GetTime() const;
 virtual void Dispatch(MessageHandler& handler);
 //...
};

Listing Twelve
class TimerMessageHandler : public MessageHandler {
public:
 virtual HandleTimerMessage(TimerMessage& message) = 0;
};

Listing Thirteen
class MyHandler: 
 public SomeBaseClass, 
 public TimerMessageHandler {

public:
 virtual void HandleTimeMessage(TimerMessage& message);
 //...
};

Listing Fourteen
void TimerMessage::Dispatch(MessageHandler& handler)
{
 TimerMessageHandler* timerHandler;
 timerHandler = dynamic_cast<TimerMesageHandler>(&handler);
 if (timerHandler)
 timerHandler->HandleTimerMessage(*this);
 else
 Message::Dispatch(handler);
}
End Listings















































SOFTWARE AND THE LAW


Determining Software Patent Infringement




Marc E. Brown


Marc is a patent attorney and shareholder of the intellectual-property law
firm of Poms, Smith, Lande & Rose in Los Angeles, CA. Marc specializes in
computer law and can be contacted at 73414.1226@compuserve.com.


Like it or not, software patents are here to stay. Courts are enforcing them,
the Patent Office is liberalizing rules for granting them, and copying usually
does not have to be proven.
Unfortunately, few nonlawyers have any idea how to determine whether a
software patent has been infringed. The patent owner often cries infringement
when the software in question contains what he regards as his invention. Those
accused of infringement, on the other hand, often feel protected if their
system lacks some of the features described in the patent.
Both of these approaches are wrong. This month, I'll explore how to determine
the question of infringement.


Gather the Needed Ingredients


The first thing you'll need is a copy of the patent. Patents can be ordered
from the Commissioner of Patents and Trademarks (Washington, DC 20231) for
$3.00. A faster source is MicroPatent at http://www.micropat.com, where you
can download an entire patent for 25 cents per page or order just its text by
return e-mail for $3.00.
For all but the most superficial analysis, you will also need a copy of the
"file wrapper" and "references cited." The file wrapper is the name the Patent
Office gives the file it maintains on each patent application. After someone
files for a patent, all of the communications between the Patent Office and
the inventor are stored in the file wrapper. A copy of this file can also be
ordered from the Commissioner of Patents and Trademarks for $150.00.
On the face of every patent is a list of references cited. After an
application for patent is filed, the case is assigned to an examiner who is an
expert in the relevant technology. The examiner conducts a search for the
closest technology previously known. The patent applicant may also submit
prior art. Collectively, these documents are known as the "references cited"
and are itemized on the first page of every patent. If the document is itself
a patent, a copy of it can be obtained by following the procedures discussed
earlier. If it is a published article or other item, a copy is usually
contained in the file wrapper. 


Know Where to Look


A patent usually has at least one illustration of the embodiments of the
invention, followed by text. The first section, "Background of the Invention,"
usually describes preexisting related technology and its problems. Next come
the "Summary of the Invention" and "Brief Description of the Drawings." The
last (and usually longest) section is typically captioned "Detailed
Description of the Preferred Embodiments." 
None of these sections delineate the scope of the technology embraced by the
patent. That scope (and what you need to study carefully to determine the
question of infringement) is exclusively set forth at the end of the patent in
one or more separately numbered paragraphs called "claims."
Each claim describes the boundaries of what the patent covers. A good analogy
is a real-estate deed. A deed contains phrases such as "40 feet north of the
oak tree at the northeast most portion of the property, 30 feet due east," and
so on. A patent claim is just like the deed on real property.


Apply the Basic Rule


A "direct" infringement of the patent occurs when each element recited in a
single patent claim is present in the accused party's product or process.
(Some claims recite a process; others a product.) Here are some guidelines:
Each numbered claim has a scope different from every other numbered claim. For
there to be a direct infringement, only the elements recited in a single claim
must be present.
A claim is not directly infringed if a single element in the claim is not
present in the accused party's product or process exactly as it is described
in the claim. Thus, if the claim calls for a "16-bit register," a system
utilizing an 8-bit register does not infringe. (The one exception is the
"doctrine of equivalents," which I'll discuss shortly.)
Extra elements in the product or process in question are irrelevant. Thus, the
product or process will infringe, even if it works better (or worse) than the
product or process described in the patent. If it has every element recited in
a single patent claim, it infringes. End of discussion.
Don't read limitations in the claim which aren't recited. The "Detailed
Description of the Preferred Embodiments" may describe a preferred embodiment
which contains a 16-bit register. But if the patent claim merely recites "a
register," an 8-bit register will infringe because it is "a register," albeit
not the one described in the patent. Reading unwritten limitations in a claim,
simply because they are described in the Detailed Description of the Preferred
Embodiments, is one of the most common mistakes.


Study the Broadest Claims


If only the terms of one claim must be satisfied, you might be asking yourself
why the patent contains multiple claims. The answer is simple: The validity of
a patent claim can be attacked during litigation by proof that it embraces a
system which was known in the prior art. Patent applications therefore usually
contain several claims, each of a different scope. Broad claims embrace the
greatest number of systems, but are the most vulnerable to a validity attack.
Having a spread of claims provides backup. 
It is usually best to analyze infringement of the broadest claim, that is, the
claim with the fewest number of recited elements.
In many patents, no single claim is broadest. For example, claim 1 might
include the language "a register containing the address of a local
subroutine," while claim 5 might require "a 16-bit register containing the
address of a subroutine." Claim 1 is broader than claim 5 because it embraces
a register with any number of bits. Claim 5 is broader than claim 1 because it
embraces a subroutine which is not local. In this common situation, several
claims must be analyzed before it can safely be concluded that there is no
infringement.
When determining infringement, "dependent" claims can safely be ignored. A
dependent claim refers to another claim; for example: "The system of Claim 1
further including a compression algorithm."
Dependent claims add additional elements and thus are always narrower in scope
than the claim to which they refer. "Independent" claims, which do not refer
to any other claims, are broadest, and should be studied.


Tips on Interpretation



The precise meaning of language in a patent claim is often unclear when it is
first read. It will usually become clearer after the entire patent is studied.
The documents in the file wrapper, particularly the remarks of the patent
applicant, also often clarify the patent claim. During the efforts to procure
a patent, the applicant often distinguishes the invention from the prior art
and, in the process, usually provides clarification of his patent claims.
Even after this study, the exact meaning of certain claim phrases may still be
uncertain. In this case, ask yourself what this phrase would have meant to a
person of ordinary skill in the art at the time the patent application was
filed. If the claim recites a "software module," would this have embraced
firmware embedded in ROM? (Don't ask me for the answer; I'm just a patent
lawyer!)
Consider also whether other language in the patent or its prosecution history
explicitly or implicitly defines the claim phrase in a manner contrary to its
ordinary meaning. Patent-claim language is given its ordinary meaning in the
art, unless a contrary meaning was clearly expressed.
Although language in the claim is interpreted in light of the entire patent
and its prosecution history, I repeat that a patent claim is not restricted to
the illustrative embodiments of the invention discussed in the patent, unless
the patent claim itself recites the features of those illustrative
embodiments. Thus, when the patent claim says "register," it is not restricted
to a 16-bit register, even though a 16-bit register is the only type of
register described.
One type of patent-claim element does not follow this rule: an element written
in "means-plus-function" format. These elements recite a "means" for
performing a specified function. An example is "memory means for storing a
pointer." Under the general rule just outlined, this element would be
interpreted to encompass any type of memory device that stored a pointer. But
the rule for claim elements written in means-plus-function format is
different. These claim elements embrace only the specific means described in
the patent for performing the specified function and its equivalents.
If the only type of memory described in the patent were RAM, the claim
language "memory means for storing a pointer" would be restricted to RAM and
its equivalents.
Determining equivalency for language in means-plus-function format is a mushy
business. Generally, the courts look to see if the element in the accused
party's software is regarded by persons of ordinary skill in the art as
interchangeable with the elements described in the patent. Although some
latitude is afforded, it is usually less then all of the means that could
perform the recited function.


Going Beyond Claims: The Doctrine of Equivalents


There is often an element in the software of the accused party that is similar
to the element in the patent claim, but not accurately described by it. If the
difference is "insubstantial," an infringement will nevertheless be found
under the "doctrine of equivalents." 
Suppose the patent claim requires "RAM to store a pointer," but the software
in question utilizes a shift register to store a pointer. Plainly, a shift
register is not RAM. But in many applications, the difference might be
insubstantial. The "doctrine of equivalents" was created to ensure that such
an insubstantial difference does not avoid infringement. If the accused
party's software contains an equivalent of the missing claim element, it can
nevertheless be an infringement.
The classic test for determining equivalency is whether the element in the
software in question performs "substantially the same function, in
substantially the same way, to achieve substantially the same result" as the
element recited in the patent claim.


File-Wrapper Estoppel


During an inventor's efforts to obtain a patent, he often makes clear that his
invention does not embrace certain types of systems. When he does, the
doctrine of "file-wrapper estoppel" bars him from later attempting to
interpret his patent claim to embrace these systems under the doctrine of
equivalents. He is said to be "estopped" (Latin for "stopped") from
interpreting his claim to embrace these systems.
Consider the following example. The inventor applied for a patent and recited
"memory for storing a pointer" as one of his claim elements. This claim was
rejected by the examiner because of a prior patent, which disclosed a shift
register that stored a pointer. In response to this rejection, the inventor
amended his patent claim, replacing "memory for storing a pointer" with "RAM
for storing a pointer." Along with this amendment, he argued that his
invention is RAM for storing the pointer and that RAM is very different from
the shift register shown in the prior art. His claim was then allowed.
Now, the inventor makes an accusation of infringement against software that
has every element of the patent claim, but uses a shift register instead of
the claimed RAM. Had the foregoing amendment and remarks not been made, the
inventor might credibly argue that the accused party's shift register
constitutes an insubstantial variation of the claimed RAM and, accordingly,
that the accused party's software infringes under the doctrine of equivalents.
But because the inventor told the Patent Office that his invention did not
embrace a system using a shift register, he is now barred under the doctrine
of file-wrapper estoppel from claiming otherwise. 


Contributory Infringement


Most software-patent claims recite elements in addition to software, such as
memory, a display, or a microprocessor. In the computer industry, however, it
is common for a company to sell only one of these components.
Can a person be held liable for infringement if he has manufactured or sold
only one component of the infringing system, such as a piece of software? The
answer is yes--as a "contributory infringer." But three legal requirements
must first be met.
The component must be recited in the patent claim. 
The component must have no substantial use other than in the combination of
elements recited in the claim. A patent claim directed to a data-compression
system might recite RAM as one component. But RAM has many uses other than in
a compression system. Thus, the sale of RAM could not be a contributory
infringement. On the other hand, the manufacturer of a ROM chip containing the
compression algorithm could be liable as a contributory infringer. Such a ROM
chip probably has no substantial use, other than in the combination recited in
the patent claim.
The manufacturer or seller must be aware of the patent and of the possibility
that the system in which his component is installed might infringe. Normally,
one who manufactures, sells, offers for sale, or uses a system which meets all
of the requirements of a single patent claim is liable as a "direct"
infringer, whether he knows about the patent or not. But for there to be a
contributory infringement, the opposite is true. Knowledge of the patent is
required.


Inducement of Infringement


One who induces another to infringe a patent can also be held liable as an
"inducer" of infringement.
Like contributory infringement, inducement requires knowledge of the patent.
It does not, however, require the person to have manufactured or sold any
component, let alone one with no substantial use other than in the combination
of elements recited in the patent claim.
An allegation of inducement of infringement is often brought against officers
or directors of companies alleged to have manufactured or sold a system which
directly infringes a patent claim. The individuals did not manufacture or sell
the product themselves, but they induced their company to do so, sometimes
with knowledge of the patent. Along with the company, these individuals are
often sued to ensure that there will be an adequate source to pay any
judgment, as well as to increase the pressure to settle.
Table 1 summarizes the requirements for the three types of patent
infringement.


Other Considerations


Whether or not the accused party's software meets the requirements of a patent
claim is not the only consideration. Although the Patent Office has decided
that the invention warrants a patent, that decision can be reversed in a
lawsuit for patent infringement. The validity of a patent is most easily
assailed when the infringer locates prior art which is closer to the patented
invention than the prior art of which the Patent Office was aware. (This art
is listed on the face of the patent under References Cited.)
Another popular ground for attack is when the inventor failed to disclose in
the patent the "best mode" of which he was aware for implementing his
invention. Substantial delay in asserting a claim for patent infringement can
also cause a loss of rights. For a more detailed discussion of these other
considerations, refer to my column "Patents: Best Protection for Software
Today?" (Dr. Dobb's Sourcebook, September/October 1995).


Conclusion


The steps for determining infringement have now been set forth. The real
difficulty often lies in predicting how a court will interpret the meaning of
certain patent-claim phrases, whether a court will find equivalency in missing
but similar elements, and whether a court will limit equivalency by the
doctrine of file-wrapper estoppel.
Patent owners should move quickly to enforce their patent, as rights can be
lost by delay. But they should also move carefully. Companies accused of
infringement can sue the patent owner to have the patent declared invalid. A
reckless charge of infringement, therefore, can lead to loss of the patent.
Companies charged with infringement should almost always seek a written
opinion from a patent attorney before proceeding to manufacture or sell the
product after being notified of the patent. Otherwise, they are likely to be
found to be a "willful" infringer if they are sued for infringement and lose.
Once found to be a "willful" infringer, the court can treble the damages and
assess attorneys' fees.
Table 1: Infringement requirements.

 Direct Contributory Inducement
 Infringement Infringement Infringement
All claim Yes No No
elements needed?
Knowledge of No Yes Yes
patent required?
Non-staple claim No Yes No
component needed?
Inducement No No Yes
required?





















































EDITORIAL


Twists of Fate


Faced with a mature market and leveling sales, many companies are looking to
port their applications to new platforms. The problem is that these companies
tend to underestimate the porting effort. According to some estimates, porting
features without the aid of cross-platform tools could involve as much as 85
percent of the overall project.
Of course, several companies offer cross-platform solutions. Mainsoft, for
example, licenses the source code to Windows from Microsoft and offers a
cross-platform API called "MainWin" that's based on the Windows API. MainWin
lets you recompile and run Windows code on UNIX platforms by placing a layer
between the application and Xlib. While applications can run native, the
licensing arrangement with Microsoft forces Mainsoft to charge hefty prices
for the development environment, with additional costs for deployment.
Now there is an initiative within the European Computer Manufacturers
Association (ECMA) that offers a less-expensive alternative. In December, the
ECMA Technical Committee (TC) 37 voted on and passed a specification, called
"APIW," that provides a common interface to the Windows API that is both
platform independent and vendor neutral. The next step for the initiative is
to submit the specification to ISO.
The APIW is confined to a select subset of Windows APIs that include the most
commonly used interfaces, maintaining portability across platforms. Because
APIW is a specification, it defines how features within the API should
function. In essence, it defines the interface. Thus, an implementor can
develop a layer that conforms to the interface, but does not require the
source code to Windows. Some of the goals of the APIW are to fully document
those Windows features that are included in the specification, to protect the
investment in current software development, and to foster competition through
multiple APIW implementations that offer a choice of language bindings. The
bottom-line benefits are that developers get a universal API without built-in
royalties or licensing fees.
But in a strange twist of fate, Microsoft is moving to block the proposed
standard that would make the Windows API ubiquitous among software developers.
Prior to the ECMA vote, Microsoft reportedly lobbied hard against the
initiative, making a presentation to the TC 37 General Assembly on December
14. According to Hugh Lunardelli, Microsoft's manager of Open Systems and
Standards, there is no market value or customer demand for APIW. Lunardelli
cited a written statement from senior corporate attorney Daniel Laster that
"Microsoft objects to the publication of the ECMA APIW standard and expressly
reserves its intellectual property rights."
In the past, Microsoft has said that Windows applications will be able to
migrate with Windows NT to other platforms, such as PowerPC. However,
Microsoft has also instituted a licensing program called "Windows Interface
Source Environment" (WISE). In this arrangement, a company can license the
Windows source code to create either an API- or binary-compatible interface
for either UNIX or the Macintosh. (Hmmm... what happened to OS/2?) Strangely,
just four companies have made licensing arrangements with Microsoft. Mainsoft
and Bristol Technology offer Windows-compatible SDKs (called "WISE SDKs"),
while Insignia Solutions and Locus Computing provide "WISE Emulators" that
will run off-the-shelf Windows software directly on the target platform.
It's understandable that Microsoft would want to maintain control over its
API. Standards bodies are cumbersome and could slow innovation. But the
underlying issue is that APIW could potentially unseat Windows as the dominant
operating system. In the near term, APIW could detract from WISE, a program
apparently designed to claim royalties on every Windows application running on
a non-Wintel box. Either way, ECMA is proceeding with its submission to ISO in
the near future. The question now is whether Microsoft can enforce its
intellectual-property rights over an interface, a point ECMA has queried
Microsoft on over a 16-month period. No doubt, the lawyers will be arguing
this point for years to come.
Michael Floyd
executive editor














































Customizing Delphi Applications


The DragCtrl unit does the work for you




Al Williams


Al is a consultant specializing in software development, training, and
documentation. You can reach him at
http://ourworld.compuserve.com/homepages/Al_Williams.


The look of the Windows 95 shell is certainly customizable. You can relocate
the task bar, change colors, even relabel system icons. All this has raised
user expectations. Are you willing to do the work to make your Delphi
applications customizable? Surprisingly, many customizations are
straightforward to program in Delphi.
In this article, I'll present a Delphi program, WININFO, that displays a
simple data form containing fields (edit controls); see Figure 1. WININFO
allows you to do the following:
Move fields to new locations.
Select which fields to display (see Figure 2).
Change field labels.
Adjust the program's colors.
Save the customized layout and colors in an .INI file.
Although this sounds like a lot of work, Delphi provides tools that make this
simpler than you'd expect. For example, after fine tuning, drag and drop can
do most of the field moving. You'll see how WININFO changes the drag-and-drop
behavior, and you'll also learn how to set event handlers at run time, write
enumerated types to an .INI file, and do a few other assorted tricks. To
follow along, you'll need either a 16- or 32-bit version of Delphi running
under any version of Windows.


About Drag and Drop


Delphi components are drag-and-drop aware--sort of. Each component has a
BeginDrag and an EndDrag method. When you call BeginDrag with a True argument,
the cursor changes (to the DragCursor property's value) and the component
begins special processing of mouse events. During dragging, the Dragging
method returns True. As the mouse moves over other ("target") components,
Delphi calls the target's OnDragOver event handler. The component can specify
what action, if any, it will allow with the dragging component.
When you call EndDrag, Delphi calls the target component's OnDragDrop event
handler and the original component's OnEndDrag event. The target component
does all the required work to accept the dropped object. The OnDragDrop
handler receives the dragged object and the drop coordinates. What happens
then is up to you.
If you call BeginDrag with a False argument, everything works as just
described, except dragging does not commence until the mouse moves slightly
(five pixels). In both cases, I've assumed the DragMode property is set to
dmManual. If, however, you set it to dmAutomatic, then Delphi assumes that any
mouse drag should cause a BeginDrag, and it will call EndDrag when you release
the mouse button. This frees you from writing drag code for every component.
However, it makes it impossible to use the mouse for normal operations in, for
example, a TEdit control.
The normal Delphi drag-and-drop mechanism has a few other limitations. The
Delphi designers obviously didn't intend it to work as a positioning method.
There is no idea of a drag "hot spot"--the point at which the mouse cursor is
located inside the dragging component. Delphi drag and drop works best for
operations like dragging a file onto a printer icon or a directory. Suppose
you wrote the code in Example 1, hoping it would reposition dropped controls.
This works, but the X and Y parameters refer to the mouse cursor's position at
the drop. However, the code uses this to set the top-left corner of the
control. Users may be surprised when they begin dragging in the center of the
control, and the control repositions relative to the top-left corner when they
let go of the mouse button.
You could manually store the cursor's offset and use it during repositioning,
but this would require changes to the standard components and cooperation
between the drop target and the dragging component that isn't currently in
place.


Solving the Problems


So far, we have several problems with off-the-rack drag and drop:
There is no cursor hot spot.
Setting DragMode to dmAutomatic prevents normal mouse operations.
Manual dragging requires a lot of mindless code.
The DragMode problem is the easiest to fix. At first, WININFO sets all
DragMode properties to dmManual. When the user selects Move from the Customize
menu, WININFO walks the main form's component list and sets all TEdit and
TLabel components so that their DragMode property is dmAutomatic. It also
checks the menu item to indicate that Move mode is active. This code is in
TForm1.Move1Click (see Listing One).
Working with all components at once has several advantages. First, all the
controls behave the same way. Second, when you add components, they
automatically work properly. The only problem is that the code assumes you are
working with TEdit and TLabel controls because there is no single ancestor
class that has a visible DragMode property. TEdit, for example, derives from
TCustomEdit, which hides all of its properties. This leads to ugly type
identifications and casts, as in Example 2.
Once users are allowed to move objects around, they have a problem keeping
track of the items on the screen when the fields are separated from their
labels. To help combat this, each field and label is described in the Hint
property. It doesn't ordinarily appear, since the ShowHint property is set to
False. However, when the main program sets the drag mode to automatic, it also
sets ShowHint to True. Now, when the cursor floats over a field or label, the
description appears in a balloon. WININFO also uses the description when
selecting which fields are visible--you'll see that a little later.
The only unresolved issue is the cursor offset. Any solution that tracks the
cursor offset requires you to create new edit-control and label components and
use them instead of TEdit and TLabel. Instead of going to that trouble, I used
a slightly different user interface. When the user first drags a field, the
cursor jumps to the top-left corner of the field automatically. Then the user
intuitively knows that the top corner is the drag hot spot. This works well
and is easy to implement.
The trick is to install an OnDragOver handler in each field. This handler
recognizes when the control is dragged over itself (Sender=Source). The first
time the control enters itself, the program converts the top-left-corner
coordinates to screen coordinates and sets the mouse cursor (see CtrlDragOver
in Listing Two). Of course, if you drag over the control again, the cursor
position will reset, but this isn't a problem in practice.
To avoid installing any handlers in the fields themselves, I wrote a small
routine to automatically set the OnDragOver property for all TLabel and TEdit
controls. Don't forget, what appear to be events in the Component Browser are
actually just special properties. You can set and read them like any other
property.
Fields cannot be dropped onto themselves--as you drag over a field, the cursor
turns into a "No" sign. This is not a problem unless you want to move a field
slightly. Then, you need to move the control away from the area, drop it, and
drag it back to the new position. This wouldn't be difficult to fix, but you'd
need special processing in each field's OnDragOver and OnDragDrop events.


Putting it Together


Since many of the drag-and-drop routines belong in the form, it is tempting to
write them there. However, this doesn't lend itself to code reuse. Therefore,
I wrote a stand-alone unit (see DRAGCTRL.PAS, in Listing Two) containing
several procedures (see Table 1). To use these in your program, just add
DragCtrl to your uses clause and delegate certain key procedures to those in
the DragCtrl unit. The FormDragOver and FormDragDrop routines should handle
the corresponding events for forms or other window parents (panels, for
example). The CtrlDragOver procedure handles the drag-over logic for controls.
The example program uses a panel to illustrate the use of these routines with
panels and forms.
Since user-interface preferences vary, I made no attempt to manage the user
interface in DragCtrl. Instead, the main form is responsible for setting the
drag mode (or manually initiating the drag sequence), setting the hints on or
off, and handling all other user-interface issues. This allows maximum
flexibility in the look and feel of your program. To display the hints while
moving, modify the control's ShowHint property. To exclude certain controls,
don't set their event properties or enable their autodragging. Using
SetCallbacks will alter every TEdit and TLabel on the form, but you also can
manage the callbacks at design time or handle them at run time with your own
code. 
As the DragCtrl unit stands, SetCallbacks only handles TEdit and TLabel
components (or components derived from these objects). To handle other
components, simply add the appropriate tests and handle them separately. This
isn't very clean, but there is no common ancestor class that exposes the
OnDragOver property, so you have little choice. The other routines only rely
on properties visible for all TControls, so you don't need to worry there.
Much of the code in the MainWin unit (Listing One) also needs to distinguish
between TEdits and TLabels, so expect to make similar changes in your main
form code if you are working with other component types.
The DragCtrl unit handles only control drag and drop. All the other
customization features are much simpler to implement and are built into the
form.



Changing Edit Labels


Changing edit-control labels is simple compared to the drag-and-drop feature.
The Customize menu has a selection for changing labels. This mode is mutually
exclusive of the Move command; only one is active at a time. The chglbl flag
is True when this mode is active. Each label has its OnDblClick handler set to
TForm1.LabelDblClick. The program could automatically set the handler in the
same fashion as the OnDragOver property. However, in this case, I decided to
set this property at design time.
When the LabelDblClick routine executes, it examines the chglbl flag. If this
is False, the routine exits. If the flag is True, the code executes the
LblEditor dialog (for more on dialogs, see the accompanying text box entitled
"Delphi Dialogs"). If the dialog box's modal result is mbOK, the program
assigns the user's string to the label's Caption property. Only one routine
handles all labels, since the Sender parameter allows the code to work with
the correct component.


Chameleon Application


Colors are certainly an area where personal preferences differ wildly.
Luckily, this is the easiest area to customize because of the built-in
TColorDialog component. Example 3 changes the form's background color. The
first line of code sets the current color as the default choice in the
color-selection dialog. (This may not work if the color isn't standard.) Then,
if Execute returns True, the program simply assigns the dialog box's color
property to the form's color property. That's it!
Changing the field color is similar, but you have to walk the component list
again and alter the color property for TEdit controls. This code is in
TForm1.SetFieldColor. This is a separate procedure, because both the form's
menu and the load procedure call it. 


Customizing the View


The most visually interesting customization allows users to pick which fields
and labels are visible (see Figure 2). This dual-listbox dialog looks complex,
but it is easy to create with Delphi's form template. Simply choose New Form
from Delphi's file menu and select Dual List Box from the Template tab. The
result is in the DispList unit (available electronically; see "Availability,"
page 3). Delphi generated all the code in that unit.
By using the dual-listbox dialog, it is very simple to achieve the
customizable-field selection. You can find the code in TForm1.Show1Click (see
Listing One). The dialog (DisplayList) contains two listboxes: SrcList and
DstList. The code first clears them just to make certain they are empty. Then,
the program walks the component list (again). For each TControl object it
finds, it checks the Visible property. If Visible is True, the program adds
the control's hint string to the DstList listbox. It also places the TControl
object (which is actually a pointer) in the listbox's Objects array. If
Visible is False, the string and the object go to the SrcList box.
Once the listboxes are ready, a simple call to SetButtons and ShowModal gets
things started. The SetButtons call causes the dialog box to enable and
disable the correct buttons when it starts. The ShowModal call, of course,
starts the dialog box.
If the dialog box's ModalResult property is mrOK, WININFO reverses the
process. It scans the SrcList and DstList listboxes. Each item in the SrcList
gets its Visible property set to False. The DstList entries have Visible set
to True. This is simple, since the listboxes contain the TControl object in
their Objects array. The hint text is merely a convenient way for the user to
select items. The TControl object does all the work.


Saving Customizations


These customizations wouldn't be very useful if you couldn't save them.
Delphi's TIniFile class simplifies using .INI files. Since .INI files are a
common place to store configuration information, I decided to save the
application's state using TIniFile.
The important procedures are Load and Save (both in Listing One). The .INI
file uses three sections: COLORS stores the form and field background colors;
TEDIT stores information about each edit control; and TLABEL stores data
pertaining to labels. The program calls Load during the main form's Create
procedure and Save during the Close processing.
Storing the colors isn't difficult. The only problem is that .INI files want
to store strings and integers, not enumerated types. In particular, TColor is
the type WININFO needs to save. There is a ColorToRGB function that will
convert a TColor value to a number, but there isn't a readily available way to
reverse the process. Once again, WININFO must resort to type casting to get
the job done. Although you could cast the color to a LongInt type, this isn't
a good idea. A TColor type can have a value less than 0. In this case, it
isn't an RGB value, it is a system-color index made negative. ColorToRGB
handles this translation. When you read the RGB value back in, you can simply
cast it to a TColor since you never store system-color indexes in the .INI
file; see Example 4.
Filling in the TEDIT and TLABEL sections takes a bit more work. Each entry in
these sections uses the component name as a key. The string that follows has
the x- and y-coordinates (separated by commas), and a flag to indicate if the
component is visible (T is visible, F is not visible). Labels also have
another comma in the string, followed by their caption.
To generate this string, Save has to walk the component list, build the string
(using Format), and write it to the .INI file. Load reverses the process by
reading the string, parsing it by brute force using Pos and Copy, and filling
in the component's properties. Load only reads strings for components on the
form--it ignores any spurious entries in the .INI file.
For Windows 95, you might consider writing this data to the registry. That
wouldn't be any more difficult; in fact, since the registry can store binary
data, it might even be easier. However, even under Windows 95 or Windows NT,
the .INI file method will work.


Summary 


All the files you need to build WININFO are available electronically. For a
16-bit version of Delphi, use WININF16.DPR as the project; for 32-bit
versions, use WININFO.DPR. The source code is exactly the same, but you need
different project files.
Using these techniques, you shouldn't have any problems writing your own
customizable applications. You can use the DragCtrl unit as-is to do most of
the work. It is not difficult to work with other component types (TMemo, for
example) or exclude certain fields.
Delphi Dialogs and Windows 95
If your experience is in more traditional Windows programming, you may find
Delphi dialogs confusing: Where's the callback routine? Where's the resource
template? In Delphi, a dialog is really just a special case of an ordinary
form. You create a new form by selecting New Form from the File menu and
filling it in with components as you would with any other form. The New Form
dialog has several choices you can use for your initial layout (like the
dual-listbox layout WININFO uses).
When your program starts, the dialog box does not appear. To show the form as
a modeless dialog, call the Show method. For a modal dialog, call ShowModal,
after which the form will run and will not allow the user to interact with
other forms in your program. To close the dialog, set the ModalResult property
to a nonzero value. Buttons also have a ModalResult property; when you press a
button, it automatically sets its parent's ModalResult property to the value
you set. Your program can examine the dialog's ModalResult property to see
what action dismissed the dialog.
--A.W.
Figure 1: WININFO.
Figure 2: Customizing WININFO.
Table 1: DragCtrl procedures.
Procedure Description
FormDragOver Handles OnDragOver event for form, panel, and so on.
FormDragDrop Handles OnDragDrop event for form, panel, and so on.
CtrlDragOver Handles OnDragOver event for controls.
SetCallbacks Sets OnDragOver event for all controls in a form.
Example 1: Drag-and-drop code.
procedure TForm1.FormDragDrop(Sender, Source: TObject; X, Y: Integer);
var
 ctl:TControl;
begin
 ctl:=Source as TControl; { type cast }

 if (X < 0) or (Y < 0) then exit; { don't drop in menu bar, etc. }
 ctl.Left:=X;
 ctl.Top:=Y;
end;
Example 2: Ugly type identifications and casts in bold
procedure TForm1.SetMoving(flag : Boolean);
var
eobj : TEdit;
lobj : TLabel;
i : Integer;
begin
for i:=0 to ComponentCount-1 do
 begin
 eobj:=nil;
 lobj:=nil;
 if (Components[i] is TEdit) or
 (Components[i] is TLabel) then
 begin
 if Components[i] is TEdit then
 eobj:=Components[i] as TEdit
 else
 lobj:=Components[i] as TLabel;
 . . .
Example 3: Changing a form's background color.
procedure TForm1.BackgroundColor1Click(Sender: TObject);
begin
ColorDialog.Color:=Color;
if ColorDialog.Execute then
 Color:=ColorDialog.Color;
end;
Example 4: Casting an RGB value to a Tcolor.
inifile.WriteInteger('Colors','Form',ColorToRGB(Color));
inifile.WriteInteger('Colors','Field',ColorToRGB(fieldColor));
and:
tmpint:=inifile.ReadInteger('Colors','Form',-1);
if tmpint <> -1 then
 Color:=TColor(tmpint);
tmpint:=inifile.ReadInteger('Colors','Field',-1);
if tmpint <> -1 then
 SetFieldColor(TColor(tmpint));

Listing One
{ WININFO Main Unit }
unit Mainwin;
interface
uses
 SysUtils, WinTypes, WinProcs, Messages, Classes, Graphics, Controls,
 Forms, Dialogs, Menus, StdCtrls, IniFiles,
 DragCtrl,EditDlg,DispList,About, ExtCtrls;
type
 TForm1 = class(TForm)
 CXScreen: TEdit;
 MainMenu1: TMainMenu;
 File1: TMenuItem;
 Customize1: TMenuItem;
 Move1: TMenuItem;
 Show1: TMenuItem;
 BackgroundColor1: TMenuItem;
 About1: TMenuItem;

 N1: TMenuItem;
 Exit1: TMenuItem;
 CYScreen: TEdit;
 ChangeLabels: TMenuItem;
 N2: TMenuItem;
 ColorDialog: TColorDialog;
 SMMouse: TEdit;
 FieldColor1: TMenuItem;
 Panel1: TPanel;
 Label1: TLabel;
 Label2: TLabel;
 Label3: TLabel;
 Label4: TLabel;
 procedure Exit1Click(Sender: TObject);
 procedure FormCreate(Sender: TObject);
 procedure Move1Click(Sender: TObject);
 procedure FormDragOver(Sender, Source: TObject; X, Y: Integer;
 State: TDragState; var Accept: Boolean);
 procedure FormDragDrop(Sender, Source: TObject; X, Y: Integer);
 procedure CtrlDragOver(Sender, Source: TObject; X, Y: Integer;
 State: TDragState; var Accept: Boolean);
 procedure LabelDblClick(Sender: TObject);
 procedure ChangeLabelsClick(Sender: TObject);
 procedure BackgroundColor1Click(Sender: TObject);
 procedure Show1Click(Sender: TObject);
 procedure FieldColor1Click(Sender: TObject);
 procedure About1Click(Sender: TObject);
 procedure FormClose(Sender: TObject; var Action: TCloseAction);
 private
 { Private declarations }
 Moving, chglbl : Boolean; { Change label text? }
 inifile : TIniFile; { Our INI file }
 fieldColor : TColor; { Field color (-1=default)}
 procedure SetMoving(flag : Boolean); { Moving?}
 procedure Load;
 procedure Save;
 procedure SetFieldColor(aColor : TColor);
 public
 { Public declarations }
 end;
var
 Form1: TForm1;
implementation
{$R *.DFM}
procedure TForm1.Exit1Click(Sender: TObject);
begin
Close;
end;
procedure TForm1.FormCreate(Sender: TObject);
var
s : String;
begin
{ Reset modes and set defaults }
 Moving:=False;
 chglbl:=False;
 fieldColor:=-1;
{ Set INI file }
 inifile:=TIniFile.Create('WININFO.INI');
{ Set up info fields }

 Str(GetSystemMetrics(SM_CXSCREEN),s);
 CXScreen.Text:=s;
 Str(GetSystemMetrics(SM_CYSCREEN),s);
 CYScreen.Text:=s;
 if GetSystemMetrics(SM_MOUSEPRESENT)=0 then
 SMMouse.Text:='No'
 else
 SMMouse.Text:='Yes';
{ Set up callbacks }
 DragCtrl.SetCallbacks(self,CtrlDragOver);
 Load; { Load custom config }
end;
{ Helper to set moving status on }
procedure TForm1.SetMoving(flag : Boolean);
var
eobj : TEdit;
lobj : TLabel;
i : Integer;
begin
{ Walk component list }
for i:=0 to ComponentCount-1 do
 begin
 eobj:=nil;
 lobj:=nil;
 if (Components[i] is TEdit) or
 (Components[i] is TLabel) then
 begin
 if Components[i] is TEdit then
 eobj:=Components[i] as TEdit
 else
 lobj:=Components[i] as TLabel;
{ Set drag mode and hint flag }
 if not flag then
 if eobj <> nil then
 begin
 eobj.DragMode:=dmManual;
 eobj.ShowHint:=False;
 end
 else
 begin
 lobj.DragMode:=dmManual;
 lobj.ShowHint:=False;
 end
 else
 if eobj <> nil then
 begin
 eobj.DragMode:=dmAutomatic;
 eobj.ShowHint:=True;
 end
 else
 begin
 lobj.DragMode:=dmAutomatic;
 lobj.ShowHint:=True;
 end;
 end;
 end;
end;
{Move menu }
procedure TForm1.Move1Click(Sender: TObject);

var
menu : TMenuItem;
begin
menu := Sender as TMenuItem;
 if menu.Checked then
 menu.Checked:=False
 else
 menu.Checked:=True;
 SetMoving(menu.Checked);
 Moving:=menu.Checked;
 if (Moving) then {uncheck label change flag }
 begin
 chglbl:=False;
 MainMenu1.Items[0][0][1].Checked:=False;
 end;
end;
{ Pass DragOver to CtrlDrag }
procedure TForm1.FormDragOver(Sender, Source: TObject; X, Y: Integer;
 State: TDragState; var Accept: Boolean);
begin
 DragCtrl.FormDragOver(Sender,Source,X,Y,State,Accept);
end;
{ Pass DragDrop to CtrlDrag }
procedure TForm1.FormDragDrop(Sender, Source: TObject; X, Y: Integer);
begin
 DragCtrl.FormDragDrop(Sender,Source,X,Y);
end;
{ Pass DragOver to CtrlDrag }
procedure TForm1.CtrlDragOver(Sender, Source: TObject; X, Y: Integer;
 State: TDragState; var Accept: Boolean);
begin
 DragCtrl.CtrlDragOver(Sender,Source,X,Y,State,Accept);
end;
{Double click label to edit }
procedure TForm1.LabelDblClick(Sender: TObject);
var
 lbl : TLabel;
begin
 if not chglbl then exit; { ignore if wrong mode }
 lbl:=Sender as TLabel;
 LblEditor.text:=lbl.Caption;
 LblEditor.ShowModal;
 if LblEditor.ModalResult=mrOK then
 lbl.Caption:=LblEditor.text;
end;
{ Toggle label change mode }
procedure TForm1.ChangeLabelsClick(Sender: TObject);
begin
chglbl:=not MainMenu1.Items[0][0][1].Checked;
MainMenu1.Items[0][0][1].Checked:=chglbl;
{need to turn off moving }
Moving:=False;
SetMoving(False);
MainMenu1.Items[0][0][0].Checked:=False;
end;
{ Change form's background color }
procedure TForm1.BackgroundColor1Click(Sender: TObject);
begin
ColorDialog.Color:=Color;

if ColorDialog.Execute then
 Color:=ColorDialog.Color;
end;
{ Show which fields? }
procedure TForm1.Show1Click(Sender: TObject);
var
 i,n: integer;
 obj : TControl;
begin
 DisplayList.SrcList.Clear;
 DisplayList.DstList.Clear;
{ Put all hidden fields in SrcList and
 all visible fields in DstList }
 for i:=0 to ComponentCount-1 do
 begin
 if Components[i] is TControl then
 obj := Components[i] as TControl
 else
 continue;
 if obj.Visible then
 begin
 n:=DisplayList.DstList.Items.Add(obj.hint);
{ Store object too }
 DisplayList.DstList.Items.Objects[n]:=obj;
 end
 else
 begin
 n:=DisplayList.SrcList.Items.Add(obj.hint);
{ Store object too }
 DisplayList.SrcList.Items.Objects[n]:=obj;
 end;
 end;
{GO!}
DisplayList.SetButtons;
DisplayList.ShowModal;
if DisplayList.ModalResult=mrOk then
 begin
{ Show/hide based on which list control is in }
 for i:=0 to DisplayList.SrcList.Items.Count-1 do
 (DisplayList.SrcList.Items.Objects[i] as TControl).Visible:=False;
 for i:=0 to DisplayList.DstList.Items.Count-1 do
 (DisplayList.DstList.Items.Objects[i] as TControl).Visible:=True;
 end;
end;
{ Change field colors }
procedure TForm1.FieldColor1Click(Sender: TObject);
begin
ColorDialog.Color:=fieldColor;
if ColorDialog.Execute then
 SetFieldColor(ColorDialog.Color);
end;
{ Set field color }
procedure TForm1.SetFieldColor(aColor : TColor);
var
 i : Integer;
begin
 fieldColor:=aColor;
 for i:=0 to ComponentCount-1 do
 if Components[i] is TEdit then

 (Components[i] as TEdit).Color:=aColor;
end;
procedure TForm1.About1Click(Sender: TObject);
begin
AboutBox.ShowModal;
end;
procedure TForm1.FormClose(Sender: TObject; var Action: TCloseAction);
begin
Save; { Write custom config to INI file }
end;
procedure TForm1.Save;
var
 tfstring : String;
 i : Integer;
 eobj : TEdit;
 lobj : TLabel;
begin
{ Save colors }
 inifile.WriteInteger('Colors','Form',ColorToRGB(Color));
 if fieldColor <> -1 then
 inifile.WriteInteger('Colors','Field',ColorToRGB(fieldColor));
{ Save components }
 for i:=0 to ComponentCount-1 do
 begin
 if Components[i] is TEdit then
 begin
 eobj:=Components[i] as TEdit;
 if eobj.Visible then
 tfstring:='T'
 else
 tfstring:='F';
 inifile.WriteString('TEdit',eobj.Name,Format('%d,%d,%s',
 [eobj.Left,eobj.Top,tfstring]));
 end;
 if Components[i] is TLabel then
 begin
 lobj:=Components[i] as TLabel;
 if lobj.Visible then
 tfstring:='T'
 else
 tfstring:='F';
 inifile.WriteString('TLabel',lobj.Name,Format('%d,%d,%s,%s',
 [lobj.Left,lobj.Top,tfstring,lobj.Caption]));
 end;
 end;
end;
procedure TForm1.Load;
var
 tmpint : LongInt;
 str : String;
 i,n : Integer;
 eobj : TEdit;
 lobj : TLabel;
begin
{ Read colors }
 tmpint:=inifile.ReadInteger('Colors','Form',-1);
 if tmpint <> -1 then
 Color:=TColor(tmpint);
 tmpint:=inifile.ReadInteger('Colors','Field',-1);

 if tmpint <> -1 then
 SetFieldColor(TColor(tmpint));
{ Read config for each Edit control/label }
 for i:=0 to ComponentCount-1 do
 begin
 if Components[i] is TEdit then
 begin
 eobj:=Components[i] as TEdit;
 str:=inifile.ReadString('TEdit',eobj.Name,'');
 n:=pos(',',str);
 if n<>0 then
 eobj.Left:=StrToInt(Copy(str,1,n-1));
 str:=Copy(str,n+1,255);
 n:=pos(',',str);
 if n<>0 then
 eobj.Top:=StrToInt(Copy(str,1,n-1));
 n:=pos(',',str);
 if (n<>0) and (str[n+1]='F') then
 eobj.Visible:=False
 else
 eobj.Visible:=True;
 end;
 if Components[i] is TLabel then
 begin
 lobj:=Components[i] as TLabel;
 str:=inifile.ReadString('TLabel',lobj.Name,'');
 n:=pos(',',str);
 if n<>0 then
 lobj.Left:=StrToInt(Copy(str,1,n-1));
 str:=Copy(str,n+1,255);
 n:=pos(',',str);
 if n<>0 then
 lobj.Top:=StrToInt(Copy(str,1,n-1));
 n:=pos(',',str);
 if (n<>0) and (str[n+1]='F') then
 lobj.Visible:=False
 else
 lobj.Visible:=True;
 if n <> 0 then
 lobj.Caption:=Copy(str,n+3,255);
 end;
 end;
end;
end.

Listing Two
{ Drag and Drop Control Unit -- Williams }
unit Dragctrl;
interface
uses
 SysUtils, WinTypes, WinProcs, Messages, Classes, Graphics, Controls,
 Forms, Dialogs, Menus, StdCtrls;
{ Stand in for Form (or parent) OnDragOver }
procedure FormDragOver(Sender, Source: TObject; X, Y: Integer;
 State: TDragState; var Accept: Boolean);
{ Stand in for Parent's OnDragDrop }
procedure FormDragDrop(Sender, Source: TObject; X, Y: Integer);
{ Stand in for Control's OnDragOver }
procedure CtrlDragOver(Sender, Source: TObject; X, Y: Integer;

 State: TDragState; var Accept: Boolean);
{ Set Edit and label OnDragOver events }
procedure SetCallbacks(aForm : TForm; cb : TDragOverEvent);
implementation
procedure FormDragOver(Sender, Source: TObject; X, Y: Integer;
 State: TDragState; var Accept: Boolean);
begin
{ Don't accept over non-client area }
Accept:= (X>=0) and (Y>=0);
end;
procedure FormDragDrop(Sender, Source: TObject; X, Y: Integer);
var
ctl, own:TControl;
begin
{ don't drop in title bar, etc.}
if (X < 0) or (Y < 0) then exit;
ctl:=Source as TControl;
own:=Sender as TControl;
{ If not over parent, then reparent control }
 if ctl.Parent <> own then
 ctl.Parent:=own as TWinControl;
 ctl.Left:=X;
 ctl.Top:=Y;
end;
procedure CtrlDragOver(Sender, Source: TObject; X, Y: Integer;
 State: TDragState; var Accept: Boolean);
var
 obj : TControl;
 pt : TPoint;
 aForm : TWinControl;
begin
if (Sender = Source) and (State = dsDragEnter) then
 begin
{ Drag into self, so adjust hot spot }
 obj:=Source as TControl;
 aForm:=obj.Parent;
 pt.x:=obj.Left;
 pt.y:=obj.Top;
 pt:=aForm.ClientToScreen(pt);
 SetCursorPos(pt.x,pt.y);
 end;
{ Don't allow control to accept another control }
Accept:=False;
end;
procedure SetCallbacks(aForm : TForm; cb : TDragOverEvent);
var
i : integer;
eobj : TEdit;
lobj : TLabel;
begin
{ Walk component list }
for i:=0 to aForm.ComponentCount-1 do
 begin
 eobj:=nil;
 lobj:=nil;
{ Set eobj for edits and lobj for labels }
 if (aForm.Components[i] is TEdit) then
 eobj:=aForm.Components[i] as TEdit;
 if (aForm.Components[i] is TLabel) then

 lobj:=aForm.Components[i] as TLabel;
{ Set event as appropriate }
 if (eobj <> nil) then
 eobj.OnDragOver:=cb;
 if (lobj <> nil) then
 lobj.OnDragOver:=cb;
 end;
end;
end.
End Listings





















































Fast Interrupt Processing in Windows 95


Improving Windows' real-time performance




Victor Webber


Victor is an independent consultant in Silicon Valley. He is presently writing
a book on device-driver development for Windows NT and Windows 95. He can be
contacted at 73553 .3533@compuserve.com.


When DOS was king, a vibrant market for real-time 80x86 systems existed. Many
small manufacturers produced single-board PCs that served both as an embedded
system and as a user interface. Even IBM made an "Industrial PC." But they
were all DOS-based machines, employing sophisticated ISRs hooked into the 8254
timer hardware interrupt (IRQ 0). This DOS environment allowed you to easily
create algorithms that fully exploited every real-time machine cycle of the
motherboard microprocessor. The proliferation of the mighty TSR hooked to IRQ
0 was probably due to the demands of this real-time market.
With the introduction of Windows, however, we are saddled with the inability
to do real-time work. For example, a 66-MHz 80486 running DOS can process
timer interrupts at a frequency of 20,000 Hz with a ten-microsecond ISR
latency. The same machine running Windows 95 using the standard virtual device
driver (VxD) calls for manipulating the timer interrupt might be able to
process timer interrupts at a frequency of 500 Hz with a 300-microsecond ISR
latency. Further, the latency gets worse for the Windows machine when the ISR
is in a DLL instead of a VxD.
In this article, I'll discuss why documented methods are inadequate for
increasing the accuracy and responsiveness of the real-time aspects of
Windows. I'll also point out how these artificial limitations can be overcome
and present a utility that modifies Windows 95 internals to reduce the number
of instructions required per interrupt from approximately 3200 to about 660.
Finally, I'll provide a new real-time utility for Windows 95 that will narrow
the gap between the real-timer power of the new machines and the real-time
power harnessed by Windows 95. This utility installs interrupt-service
routines into the kernel and processes interrupts at frequencies over 40,000
Hz with a ten-microsecond ISR latency on a 66-MHz 80486 machine. The essence
of this algorithm is an undocumented Windows function. When this function is
used it causes a twist in the internal architecture of Windows, making it more
real-time sensitive. 


Artificial Limitations


The current generation of PCs provide awesome real-time power, which,
unfortunately, is underutilized by Windows 95. Consider
VTD_Begin_Min_Int_Period(int PeriodLength), the Windows 95 routine that
increases the sensitivity of its scheduler. This call can easily be upgraded
to increase its efficiency by at least a factor of 10. The parameter (int
PeriodLength) decreases the timer interrupt (IRQ 0) period, thus increasing
the timer interrupt frequency. This increases the accuracy and responsiveness
of the scheduler, which is hooked into IRQ 0. To illustrate the inefficiency
of this call, I'll calculate the number of instructions executed during a
minimum interrupt period on an ancient 386 16-MHz machine. I'll then use this
instruction count as a benchmark, and calculate the possible minimum interrupt
period on the new, faster machines. The discrepancy between this actual limit
and the artificial Microsoft limit is surprising.
The calculation in Figure 1 uses dimensional analysis to conclude the number
of instructions executed in a single timer interrupt when the minimum
interrupt period is reduced to its Microsoft limit of one millisecond. (This
limit is derived from the minimum parameter passed to
VTD_Begin_Min_Int_Period().) Figure 1 shows that Microsoft allows you to
interrupt the machine every 3200 instructions. For comparison, the calculation
in Figure 2 shows the minimum interrupt period on a 150-MHz Pentium, assuming
at least 3200 instructions per interrupt period. 
The calculation shows that if a timer interrupt can be accommodated every 3200
instructions, then on a 150-MHz Pentium, we should be able to accomodate a
minimum interrupt period of .045 milliseconds for no other reason than Windows
imposes an unnecessary limit on the parameter passed to VTD_B_MIP(). This
illustrates that Microsoft has not even upgraded one of its most important
real-time functions, even though all that is required is a change in parameter
limits.


Limitations of New Structure


Certainly, the real-time internals of Windows 95 can be improved. But, by how
much? Figure 3 calculates the minimum interrupt period that might be achieved
given an entirely new, more-efficient internal interrupt architecture. A
minimum interrupt period of 0.009 milliseconds represents a frequency of
greater than 100,000 Hz. This is too fast even to do I/O work on the slow ISA
bus. We need to be on a PCI or VESA bus to exploit this power.


Windows 95 Internals


Windows supplies several internal components, including the Virtual Machine
Manager (VMM), VxDs, BIOS, and Dynamic Link Libraries (DLLs). The utility I
present here uses the VMM, the programmable interrupt controller virtual
device (VPICD.386), and the timer virtual device (VTD.386). The VMM,
VPICD.386, and the VTD.386 are instantiated in WIN386.EXE. The VMM is the
first component loaded, then VPICD.386, and finally the VTD.386.
The VMM provides a large variety of services to applications and to the
device-specific components. These services fall into more than 15 categories
including memory-management services, virtual machine interrupt and callback
services, primary scheduler services, time-slice scheduler services, event
services, timer services, and processor fault services. Figure 4 shows the
hierarchical relationship between the VMM and other components.
The VPICD.386 virtualizes the interrupt controller and presents the
functionality to the virtual machines running the application software. The
virtualization process involves receiving a hardware interrupt indication and
knowing which virtual machine owns it. It also involves trapping requests for
interrupt controller services and arbitrating the actual state of the
interrupt controller with the state of the interrupt controller as seen by the
requesting application. For example, when the application sends an end of
interrupt (EOI), the VPICD.386 provides a response to the application
indicating it was written to the interrupt controller hardware, whereas, in
most cases, it is not actually written to the controller. 
The VPICD.386 initializes by setting up interrupt-service routines for every
interrupt request line (IRQ) on the master and slave programmable interrupt
controllers. These ISRs are either global or owned. When a global hardware
interrupt is generated, it is reflected into any virtual machine that is
currently running. When an owned hardware interrupt is generated, it is
reflected into the virtual machine that owns it; see Figure 4.
The VTD.386 virtualizes the hardware timer, presenting services to other
virtual devices. These services involve reporting information about the
internals of the VTD.386, such as providing the present minimum timer
interrupt period. They also involve changing the internals of the VTD.386. For
example, the VTD.386 offers a function to change the minimum timer interrupt
period, VTD_ Begin_Min_Int_Period(). 


The Loading Process


The VMM is the first component loaded and it creates IDT entries for all the
hardware interrupts on the interrupt controllers. In Figure 5, area 1 shows
the architecture of the system after the VMM creates the initial IDT and adds
the vectors. The VPICD.386 loads next and calls into the VMM to hook all the
interrupts. The VPICD.386 uses the VMM's fault hooking services for this
operation. In a sense, it creates a secondary IDT for all the interrupts from
the interrupt controllers and creates vectors to default ISRs. Furthermore,
the VPICD.386 reads the programmable interrupt controller registers to
determine which are owned and which are global. Thus its own interrupt hooking
services know which interrupts other users can be allowed to hook. Area 2 of
Figure 5 shows the architecture of the system after the VPICD.386 is loaded
and all the hardware interrupts have been provided with default ISRs. 
The VTD.386 loads hooking the timer interrupt, IRQ 0, which is connected to
the 8254 timer. It uses the VPICD virtualization functions for this operation.
The VTD.386 obtains exclusive ownership of IRQ 0. In Windows 95, it reprograms
the timer to interrupt every 20 ms. (In Windows 3.1 it uses 50 ms, and in
Windows for Workgroups it uses 20 ms.) It also arbitrates all further requests
to change this minimum interrupt period; see Figure 5, area 3.
The process for handling an interrupt is predictable. The interrupt vectors
from the VMM IDT to the VMM ISR. The VMM ISR saves all the registers, does
some administrative work and then passes control to whoever has hooked the
interrupt with the fault hooking services of the VMM. This almost always is
the VPICD .386 code. The VPICD.386 handles all the administrative work of
virtualizing the interrupts. Its first action is to send out an EOI, then
issue a CLI. Next, it passes control to whoever has virtualized the interrupt
with the VPICD.386 services. In the case of IRQ 0, which has been hooked by
the VTD.386, control passes to the ISR within VTD.386. 


Interrupt Handling


The process for handling a single interrupt is simple, as long as you assume
the microprocessor is in a single monolithic mode. However, the 80386
generation of microprocessors has protected paging circuitry and runs in three
different hardware modes. All of these hardware modes must be handled by the
operating-system software-interrupt structure. Windows 95 creates three
corresponding software modes for this purpose: 
Protected mode (PM).
Virtual 86 mode (V86).

Virtual machine monitor mode (VMM).
Again, these operating-system software modes correspond to the actual hardware
modes of the microprocessor. The different modes of the microprocessor, along
with the different modes of the operating system, are entered by writing to
the microprocessor hardware. PM software mode is active when the
microprocessor is at privilege level 3 and in protected mode. V86 software
mode is active when the microprocessor is at privilege level 3 and in V86
mode. VMM software mode is active when the microprocessor is at privilege
level 0 and in protected mode. When standard Windows applications are
executing, PM mode is active. When kernel code is executing, VMM mode is
active. When a DOS box is executing, V86 mode is active. 
Arbitration between the modes is done in the VPICD.386. The VPICD.386 keeps a
table of vectors for each different mode. When the VPICD.386 loads and
initializes all the vectors, it uses the VMM fault hooking services to hook
each interrupt 3 times, once for each mode. It uses Hook_PM_Fault(), Hook_
V86_Fault(), and Hook_VMM_Fault() from the VMM fault hooking services. When an
interrupt occurs, it goes through the IDT, which never changes, and the
VPICD.386 directs the vector as a function of the mode of the microprocessor.
Generally the ISRs do similar work in the different modes. Figure 6 shows the
architecture of the system when all the interrupts are hooked in all the
different modes. 


Fast Interrupt Processing


With the technical background out of the way, I'll now turn to the algorithm
which provides for this fast interrupt processing. At this point, the
technique should be rather obvious -- you hook the interrupt service routine
mechanism closer to the hardware so it does not go through all the software
components every time it executes. You do not use the standard virtualization
functions of the VPICD.386 because the ISR would only get control after the
interrupt processing goes through the VMM and the VPICD.386 code. Instead, you
hook the hardware interrupt with an undocumented function, or more correctly,
you hook the hardware interrupt by using a documented function in an
undocumented manner. Quite simply, you hook the hardware interrupt with the
Hook_XX_Fault() routines. Even though the official Microsoft documentation
clearly states that this function cannot be used for hooking hardware
interrupts, it can. Officially, these routines are provided only for hooking
software interrupts.
The VxD in Listing One hooks the IRQ 0 timer interrupt, reprograms the timer
to interrupt at a higher frequency, steals all the interrupts which occur, and
only passes control to the original interrupt handler at the original
frequency. This hooking code is complicated because the interrupt-service
routine no longer uses the VPICD.386 to administer the specifics of handling
the programmable interrupt controller chip, such as sending out the EOI
command. Figure 7 shows the architecture of the system when the timer
interrupt (IRQ 0) is hooked. The VxD code is initialized and uninitialized
when standard VxD system messages are received. Figure 8 provides an
explanation of a few important procedures. 
Listing One demonstrates the viability of this algorithm. However, not all the
demands of the system are satisfied. Some additional features should be added
to make the utility more robust. Specifically, some VTD.386 functions are
unreliable because they are unaware that the timer has been reprogrammed and
these VTD .386 functions receive unexpected values when they read the timer
registers. This problem can be eliminated by overriding
VTD_Begin_Min_Int_Period(), VTD_ End_Min_Int_ Period(), VTD_Get_ Real_Time(),
and TD_ Update_System_Clock(). The new functions must maintain an internal
database for the timer.


Other Enhancements 


Fast interrupts are hardly the only enhancement needed to make Windows 95 an
adequate real-time operating system. Another simple (yet important)
enhancement is ensuring that the system interrupts are not turned off for long
periods of time. This is such an important point that analyzing the entire
system for excessively long periods of cleared interrupts is warranted. One
extremely offensive module is the file system. It turns off system-level
interrupts for periods of several milliseconds, and a sensitive real-time
system can only tolerate periods of cleared interrupts for a few hundred
microseconds. In Windows 3.1, this is a difficult problem because the
offensive file system is hard to replace. In Windows 95, however, you have
installable file systems, and you can easily replace the offensive default
file system with a real-time file system. 
Another important enhancement involves the implementation of a real-time
scheduler. The real-time scheduler needs to be preemptive. That is, if some
event occurs in an ISR which precipitates rescheduling, the rescheduling must
take place at that instant. In addition to being preemptive, the scheduler
should use an algorithm that respects the temporal requirements of the various
tasks. This refers to how the scheduler decides which task runs when several
tasks are ready to be executed. Some popular algorithms are rate monotonic,
least-laxity first, earliest-deadline first, and maximum-urgency first. Also,
the scheduler should support numerous priorities. Most dedicated real-time
operating systems support at least 256 levels of priorities. Windows' lack of
such a real-time scheduler may be the biggest impediment to its becoming a
truly real-time operating system.
If you really wanted to pick nits, you could reprogram the master and slave
8259 programmable interrupt controller so it runs in special fully nested
mode, instead of running in fully nested mode. Of all the different modes of
the 8259 programmable interrupt controller, this is the most real-time
sensitive. It speeds up interrupt servicing by 50 to 120 instructions. 
Even if we add a real-time file system, we still have to deal with the problem
that Windows has no specifications for the maximum duration of disabled
interrupts. Hopefully, Microsoft will publish this in the future.
Are fast interrupts a big deal? Yes. The interrupt overhead in unmodified
Windows is so bad that it precludes involving the system board in any relevant
real-time processing. This is truly a shame because of the awesome power of
the new system-board microprocessors. Not only do they have greater
computational power, but they have greater real-time functionality. (The
80586/Pentium has a 64-bit counter of system clock cycles, which allows
near-picosecond time stamps and never rolls over.) Intel has already cracked
this market with its ProShare video conferencing package. This is the first
product to use system-board microprocessor CODECs instead of peripheral-card
DSP CODECs. Surely, a more-real-time sensitive Windows would cause a flood of
such products to appear. We could do sound-card work without a sound card,
speech-processing work with a DSP speech card. The new 80x86 PCs can win work
from the peripheral cards in the same way they won work from the mainframes.
Figure 1 Minimum instructions per ISR in a 16-MHz 80386.
Figure 2 Potential minimum interrupt period on a 150-MHz Pentium.
Figure 3 Potential interrupt length on a 150-MHz Pentium using the new ISR
hook.
Figure 4 Subsystem hierarchy.
Figure 5 ISR hierarchy.
Figure 6 Interrupt hierarchy for different modes.
Figure 7 System hierarchy with the new ISR hook included.
Figure 8 Call graph for the major functions in Listing One.

Listing One
;-
; File Name..........: VHKD.ASM
; File Description...: Installs interrupt hooks to demonstrate the 
; undocumented fault hooking process.
; Author.............: Victor Webber
;-
; Build Notes:
;-
; set include=\ddk\include
; masm5 -p -l -w2 hk.asm;
; link386 hk,vhkd.386,,,hk.def
; addhdr vhkd.386
;
; The following segment must be placed into hk.def
;
; LIBRARY VHKD
; DESCRIPTION VHKD VIRTUAL DEVICE (VERSION 1.0)
; EXETYPE DEV386
; SEGMENTS
; _LTEXT PRELOAD NONDISCARDABLE
; _LDATA PRELOAD NONDISCARDABLE
; _ITEXT CLASS ICODE DISCARDABLE
; _IDATA CLASS ICODE DISCARDABLE
; _TEXT CLASS PCODE NONDISCARDABLE
; _DATA CLASS PCODE NONDISCARDABLE
; EXPORTS
; VHKD_DDB @1
;

;-
; Operation Notes:
;-
; 1) Add device=vhkd.386 to the [386Enh] section of the SYSTEM.INI file
; 2) Open this VxD and call functions from a DLL. 
;
;
; A S S E M B L E R D I R E C T I V E S
;
.386p
;
; I N C L U D E F I L E S 
;
INCLUDE VMM.INC
;
; E X T E R N A L L I N K S 
;
;
; E Q U A T E S
;
; The device ID  get your own ID from vxdid@microsoft.com
VHKD_VXD_ID EQU 28C2h
VERS_MAJ EQU 1
VERS_MIN EQU 0
VERSION EQU ((VERS_MAJ SHL 8) OR VERS_MIN)
;- Equates used in reprogramming the 8254
; Number of reg 0 ticks which equal 1 milli sec
TMR_MTICK EQU 04A9h
; The count down value in Windows 3.1 
TMR_W31_CNT_DWN EQU (TMR_MTICK*50)
; The count down value in Windows for Workgroups
TMR_WWG_CNT_DWN EQU (TMR_MTICK*20)
TMR_OLD_CNT_DWN EQU TMR_WWG_CNT_DWN
; Count down value for a tick every 15 milli secs
TMR_NEW_CNT_DWN EQU (TMR_MTICK*15)
; 8254 Registers
CTRL_8254 EQU 043h 
RWCNT_8254 EQU 034h 
DATA_8254 EQU 040h 
; 8259 values
VPICD_SP_EOI_0 EQU 60h
VPICD_CMD_M8259 EQU 20h
; Identifies interrupt to hook
TMR_IRQ0 EQU 050h
; Mode id
VMM_MODE_ID EQU 0
PM_MODE_ID EQU 1
V86_MODE_ID EQU 2
CFLAG EQU 1
;
; S T R U C T U R E S
;
;
;
HK_ST STRUC
;
; State of the hook system
wSt dw ?
 ; Known uninitilized state

 ;HK_UNINIT EQU 1
 ; All fault hooks initialized
 ;HK_INIT EQU 2
;
; Frequency counters
dwNewFreq dw ? 
dwOldFreq dw ? 
;
; Old 8254 count down value
dwOldTmrCntDwn dw 0
; New 8254 count down value
dwNewTmrCntDwn dw 0
;
; System Accumulator
dwCntDwnAccum dw 0
;
; System VM Handle
ddSysVM dd 0 
;
; Array to hold old vectors
aVctr dd 3 dup (?)
HK_ST ENDS
 ; Known uninitilized state
 HK_UNINIT EQU 1
 ; All fault hooks initialized
 HK_INIT EQU 2
;
; V I R T U A L D E V I C E D E C L A R A T I O N
;
Declare_Virtual_Device VHKD, VERS_MAJ, VERS_MIN, _VhkdVxDControl, \
 VHKD_VXD_ID, Undefined_Init_Order, \
 _VhkdAPIHandler, _VhkdAPIHandler,
;
; R E A L M O D E I N I T I A L I Z A T I O N
;
VXD_REAL_INIT_SEG
_real_init proc near
 mov ah, 9
 mov dx, offset claim
 int 21h
 xor ax, ax ; set up to tell Windows its okay 
 xor bx, bx ; to keep loading this VxD
 xor si, si
 xor edx, edx
 ret
_real_init endp
claim db VHKD.386  VHKD v. 1.0,0dh,0ah
 db Copyright (c) 1995 and 96 Victor Webber. All Rights Reserved.
 db 0dh,0ah,$
VXD_REAL_INIT_ENDS
;
; D A T A S E G M E N T S
;
VxD_DATA_SEG
;-
; State Vars
 
sHK HK_ST < HK_UNINIT, 0, 0, TMR_OLD_CNT_DWN >
;-

; API jump table
VhkdAPICall label dword
 dd offset32 _HkGetNewFreq
 dd offset32 _HkGetOldFreq
VHKD_API_MAX EQU ( $ - VhkdAPICall ) / 4
VxD_DATA_ENDS
;
; L O C K E D C O D E S E G M E N T
;
VxD_LOCKED_CODE_SEG
;
BeginProc _VhkdAPIHandler
; - In: ebp.Client_AX :: function index
;
; - Description:
; - This is the handler which funnels all service calls.
; - Out:
; - Calls indexed function
; - Return:
; - ebp.Client_EFlags :: CFLAG set is error
; - ebp.Client_EFlags :: CFLAG clear is error
; - See individual function
;
 movzx eax, [ebp.Client_AX] ; get callers AX register
 cmp eax, VHKD_API_MAX ; valid function number?
 jae short fail
 and [ebp.Client_EFlags], NOT CFLAG ; clear carry for success
 call VhkdAPICall[eax * 4] ; call through table
 ret
fail:
 or [ebp.Client_EFlags], CFLAG
 ret
EndProc _VhkdAPIHandler
;-
BeginProc _VhkdVxDControl
; - In: void
;-
; - Description:
; - Process system messages
; - Out:
; - Call message handlers
; - Return:
; - void
;-
 Control_Dispatch Sys_Critical_Init, _VhkdCritInitMsg
 Control_Dispatch Sys_Critical_Exit, _VhkdCritExitMsg
 clc 
 ret
EndProc _VhkdVxDControl
;
BeginProc _VhkdCritInitMsg
; In: void
;
; - Description:
; - Process Crit Init message. This message occurs 
; during system initialization.
; - Out:
; - Filles database with control information
; - Init Fault Hooks

; - Init first stage of tmr 
; - Return:
; - cy clr :: success, everything installed
; - cy set :: error, nothing has been installed
;
 VMMcall Get_Sys_VM_Handle
 
 mov sHk.ddSysVM, ebx
 ; Install hooks, but keep hook system uninitialized.
 call _HkInstallHooks
 jc CI_EXIT
 ; Increase frequency of hook system
 call _HkScheduleFreq
 clc 
 CI_EXIT:
 ret
EndProc _VhkdCritInitMsg
 
;
BeginProc _VhkdCritExitMsg
;
; - Description:
; - Process Crit Exit message
; - Out:
; - Uninit Fault Hooks
; - Return:
; - cy clr :: error, nothing has been installed
;
 ; Reset hook system
 call _HkUnScheduleFreq
 clc
 ret
EndProc _VhkdCritExitMsg
;
BeginProc _HkInstallHooks
; - In: Void
;
; - Description:
; - Hook the Timer faults. This should be the last 
; hook of these faults. No Hook should occur after this.
; - Out:
; - Call Hooking procedures
; - Fill structure with address of old fault hooks
; - Return:
; - cy set :: error, no handlers installed
; - cy clr :: success
;
 
 push esi
 mov eax, TMR_IRQ0
 mov esi, OFFSET32 _HkPmIRQ0FaultHook
 VMMcall Hook_PM_Fault
 jc short HK_ERR
 mov sHK.aVctr[PM_MODE_ID*4], esi
 
 mov esi, OFFSET32 _HkV86IRQ0FaultHook
 VMMcall Hook_V86_Fault
 jc short HK_ERR
 mov sHK.aVctr[V86_MODE_ID*4], esi

 
 mov esi, OFFSET32 _HkVMMIRQ0FaultHook
 VMMcall Hook_VMM_Fault
 jc short HK_ERR
 mov sHK.aVctr[VMM_MODE_ID*4], esi
 HK_ERR:
 pop esi
 ret
EndProc _HkInstallHooks
;
BeginProc _HkScheduleFreq
; - In: void
;
; - Description:
; - Reprograms the 8254 to interrupt at a higher frequency. 
; - Changes database for higher frequency
; - Out:
; - Reprograms 8254
; - Sets minimum interrupt period reload variable
; - Return:
; - void
;
 ; Turn off interrupt
 cli
 ; Initialize hook system variables
 mov sHK.wSt, HK_INIT
 mov sHK.dwCntDwnAccum, TMR_NEW_CNT_DWN
 ; Load new count down into 8254
 cCall _HkReload8254 < TMR_NEW_CNT_DWN >
 ; Write reload values into structure
 mov sHK.dwNewTmrCntDwn, TMR_NEW_CNT_DWN
 ; Turn on interrupts
 sti
EndProc _HkScheduleFreq
;
; H O O K P R O C S
; - In: void
;
; - Desc:
; - ISRs for the PM mode, V86 mode, and the VMM mode
; - All registers have been saved prior to the ISR being called.
; - Out:
; - Call another function
; - Return:
; - void
; - Uses:
; - eax 
;
BeginProc _HkPmIRQ0FaultHook
 mov eax, PM_MODE_ID
 call _HkGenIRQ0FaultHook
 ret
EndProc _HkPmIRQ0FaultHook
BeginProc _HkV86IRQ0FaultHook
 mov eax, V86_MODE_ID
 call _HkGenIRQ0FaultHook
 ret
EndProc _HkV86IRQ0FaultHook
BeginProc _HkVmmIRQ0FaultHook

 mov eax, VMM_MODE_ID
 call _HkGenIRQ0FaultHook
 ret
EndProc _HkVmmIRQ0FaultHook
;
BeginProc _HkGenIRQ0FaultHook, PUBLIC
; eax = FAULT ID
;
; - Desc:
; - Checks if critical frequency is reached. Increment variables
; - Out:
; - Increment Variables
; - Return:
; - void
; - Uses: 
; - Nothing
;
 ; Check if timer initialize, exit if not
 cmp sHK.wSt, HK_UNINIT
 je short GEN_EXIT
 ;Increment the count for new frequency
 inc sHK.dwNewFreq
 ; Check if old frequency reached
 sub sHK.dwCntDwnAccum, TMR_NEW_CNT_DWN
 jc GEN_EXIT
 ; Send EOI to 8259
 mov al, VPICD_SP_EOI_0
 out VPICD_CMD_M8259, al
 ret
 GEN_EXIT:
 ;Increment the count for old frequency
 inc sHK.dwNewFreq
 mov sHK.dwCntDwnAccum, TMR_OLD_CNT_DWN
 call sHK.aVctr[eax*4]
 ret
EndProc _HkGenIRQ0FaultHook, PUBLIC
;
BeginProc _HkUnScheduleFreq
; - In: void
;
; - Description:
; - Resets 8254 to original state
; - Out:
; - Reprograms 8254
; - Resets hook system control variables
; - Return:
; - void
;
 ; Turn off interrupt
 cli
 
 ; Uninitialize the hook system variables
 mov sHK.wSt, HK_UNINIT
 ; Load original count down into 8254
 cCall _HkReload8254 < TMR_OLD_CNT_DWN >
 ; Turn on interrupts
 sti
EndProc _HkUnScheduleFreq
;

BeginProc _HkGetNewFreq
; - In: ebp :: pointer to client register structure
;
; - Desc:
; - Reports frequency count for new 8254 period
; - Out:
; - Read hook system structure
; - Return:
; - Client_AX :: Frequency count
;
 
 mov ax, sHK.dwNewFreq
 mov [ebp.Client_AX], ax
EndProc _HkGetNewFreq
;
BeginProc _HkGetOldFreq
; - In: ebp :: pointer to client register structure
;
; - Desc:
; - Reports frequency count for old 8254 period
; - Out:
; - Read hook system structure
; - Return:
; - Client_AX :: Frequency count
;
 
 mov ax, sHK.dwOldFreq
 mov [ebp.Client_AX], ax
EndProc _HkGetOldFreq
;
BeginProc _HkReload8254, PUBLIC
; - In: Word value to write into 8254
_HkRld STRUC 
 dd ? ;bp
 dd ? ;ret
rl1 dd ? 
_HkRld ENDS 
;
; - Desc:
; - Reload the 8254 with a new IPC
; - Interrupts should be disabled throughout this process. 
; - Out:
; - Write value into 8254
; - Return:
; - void
;
 
 push ebp
 mov ebp, esp
 mov al, RWCNT_8254
 out CTRL_8254, al 
 mov ax, WORD PTR [ebp.rl1]
 out DATA_8254, al
 shr ax, 8 
 out DATA_8254, al 
 pop ebp
 ret
EndProc _HkReload8254
VxD_LOCKED_CODE_ENDS

 END
End Listing





























































Direct Thunking in Windows 95


Using undocumented calls to bypass the thunk compiler




Matt Pietrek


Matt is a programmer for Nu-Mega Technologies and the author of Windows 95
System Programming Secrets (IDG Books, 1995), from which this article is
adapted. He can be contacted at 71774.362@compuserve.com.


One of the most difficult areas of Windows 95 programming is thunking. Many of
the 32-bit system DLLs in Windows 95 rely heavily on 16-bit code. For example,
when your 32-bit program calls GetMessage, control ultimately ends up in the
GetMessage routine in the 16-bit USER.EXE. Likewise, there are many places
where 16-bit system code makes calls to 32-bit system DLLs. For example, when
creating a new window, the 16-bit USER.EXE thunks up to KERNEL32.DLL to
allocate memory for the window's data from a 32-bit heap. Windows 95 handles
both types of thunks (32- to 16-bit, and 16- to 32-bit) seamlessly.
Unfortunately, if you need to call from 32-bit code to 16-bit code (or vice
versa) you're supposed to write what are called "flat thunks," which require
the use of a special compiler (THUNK.EXE), an assembler, and several hours (or
days) of trial and error. Once you get it working correctly, you end up with
at least two extra DLLs, one a 16-bit and the other a 32-bit.
Although standard Windows 95 DLLs thunk between 16- and 32-bit code, these
system DLLs don't mess around with the thunk compiler or thunking DLLs.
Instead, the 32-bit system code uses the undocumented function QT_Thunk. You
can use this same routine from your own programs. In fact, if you write
thunking DLLs, you'll see that the code that THUNK.EXE emits contains calls to
the QT_Thunk function. Be careful, though. Using QT_Thunk is trickier than
calling a standard Win32 API function. 
In this article, I'll describe what's necessary to call QT_Thunk on your own.
I'll also present a sample program that uses QT_Thunk to bypass the thunk
compiler. In effect, I'll be thunking in the same manner as the system DLLs.
As part of the example, you'll see how a 32-bit program can obtain the free
system resources. This is a question I see quite often in online forums, so I
can cover two interesting topics at once. However, before jumping into the
example code, it's instructive to first look at QT_Thunk and see how it works.
In this way, we can get a sense of the Windows 95 architecture. Remember that
this function is entirely Windows 95 specific. QT_Thunk (and flat thunks in
general) don't exist in Windows NT.


Examining the QT_Thunk function


Listing One is pseudocode I've created for QT_Thunk. QT_Thunk is located in
KERNEL32.DLL, and is quite obviously coded in assembler. Rather than
presenting the raw assembler code for QT_Thunk, Listing One is a mix of C
pseudocode and assembler, which I believe conveys the intent of this fairly
complex routine. If you really want to see what goes on, by all means, set a
breakpoint on QT_ Thunk in your favorite system debugger (such as SoftIce/W)
and step through it. I guarantee that you won't wait long for the breakpoint
to be hit.
Looking at the routine from orbit, the job of QT_Thunk is simple: Take the
16:16 address passed to it in the EDX register and transfer control to that
address. Of course, nothing is ever that simple, and there are several issues
that need to be taken care of. For starters, saving away the address that
execution should return to after the 16-bit code finishes would be very
helpful. Likewise, it's important to switch the stack from a flat 32-bit stack
selector to a 16-bit selector. 
Moving in a bit closer to the routine (a "helicopter view" if you will),
QT_Thunk is divided into five distinct phases. First, in the debug version,
the code calls a routine that logs the call (assuming the right logging flag
is set). This section of code also verifies that the Thread Information Block
selector is the same as the FS register. If not, the routine complains (in the
debug version, that is). 
Phase 2 of QT_Thunk pushes the 16:16 address that's the ultimate target of the
thunk onto the stack. I'll come back to this in phase 5. Phase 2 also handles
the preservation of the return address and the 32-bit register variables. The
32-bit return address that control returns to after the 16-bit code completes
is stored in an area of the stack that won't be touched. The register
variables that are saved away are ESI, EDI, and EBX. These are the commonly
used register variables that Win32 compilers expect will be preserved across
routines. 
Phase 3 of QT_Thunk relates to acquiring the Win16Mutex. Whenever 32-bit code
thunks down to 16-bit code, Windows 95 needs to acquire the Win16Mutex. The
Win16Mutex is just a critical section, although it's somewhat unique in that
it's used by all processes, both 16- and 32-bit. By forcing all Win32 code
that thunks down to 16-bit code to acquire the Win16Mutex, Windows 95 can
guarantee that only one thread at a time is executing through the Win16 system
DLLs (as well as other 16-bit DLLs). This is how Microsoft got around the
unfortunate fact that the 16-bit system DLLs were written without
multithreading in mind. The whole subject of the Win16Mutex has been highly
controversial. Here I'm simply going to say that the QT_Thunk routine is one
of the primary places where Windows 95 acquires the Win16Mutex.
Phase 4 of QT_Thunk is where the routine switches from the flat 32 stack used
by the Win32 code to a 16:16 stack for use by the Win16 code. Since Win32
threads typically have 1-MB stacks, and the ESP at the time of the thunk could
be anywhere within that 1 MB, you can see that switching to a 16:16 stack
could be tricky. It's not sufficient to just allocate a 16-bit stack selector
during the thread's start-up and set its base address at that time. Instead,
during the thunk to 16 bits, the QT_Thunk routine may need to adjust the base
address of the stack selector used by the thread when executing in 16-bit
code. The base address of the 16-bit selector is set so that it points to the
same general linear-address region that the ESP register was using prior to
the thunk. After fiddling with the stack selector as necessary, QT_Thunk
figures out an appropriate 16-bit SS:SP combination and loads those values
into the SS and SP registers.
The final phase of QT_Thunk transfers control to the 16:16 address that's the
target of the thunk. As I showed in phase 2, the 16:16 target address was
stored in EDX upon entry to QT_Thunk, and was subsequently pushed on the
stack. QT_Thunk jumps to the 16:16 address via the standard RETF trick. But
before transferring control to that address, the QT_Thunk code zeros out all
the nonessential segment registers (DS, ES, FS, and GS). It wouldn't do to
hand the target 16:16 function a DS register set up with a nice, juicy, flat
32-bit selector to scribble with. It's expected that the 16:16 function will
set up the segment registers however it needs to.
Note that there is no memory context switching in the QT_Thunk code. At the
CPU level, the only changes from one side of the thunk to the other were that
the instruction pointer and the stack switched from using 32-bit segments to
16-bit segments. This is significant if you think about the memory arrangement
of Windows 95. It implies that all 16-bit DLLs are always mapped into the
address space of the current application. Put another way, all 16-bit DLLs
reside in memory that is global between all processes. While this sacrifices
overall robustness, it is key to keeping Windows 95 flat thunks both small and
fast.


Calling QT_Thunk on Your Own


A good example to demonstrate QT_ Thunk is to get the free system resources
(FSRs). If you're familiar with Windows 3.x, you might recall that the free
system resources value is simply the percentage of free space in the 16-bit
USER and GDI local heaps. Believe it or not, Windows 95 provides no way for
32-bit applications to get the FSR value. Even when the standard Windows 95
utilities display the FSR in their About box, the FSR value is retrieved by a
call that thunks down to a 16-bit DLL. The 16-bit USER.EXE contains a function
called GetFreeSystemResources. If only you could get to it from 32-bit code
without writing thunk scripts. Since GetFreeSystemResources takes only one
simple parameter, it's a perfect candidate for using QT_Thunk and dispensing
with thunk scripts.
Calling QT_Thunk in your code requires you to do two things. First, you have
to put the 16:16 address to call into the EDX register. Second, the code from
which you're calling QT_Thunk should have an EBP stack frame set up, and at
least 0x3C bytes of local storage that you're not relying on. This unusual
step is required because QT_Thunk builds its convoluted stack frame for
calling the 16-bit code in the region below where the EBP register points to.
The FSR32.C program in Listing Two provides a sample implementation of a call
to QT_Thunk. Besides the two calls to QT_Thunk, there are a couple of other
things in Listing Two that need describing. For starters, how is FSR32.C
getting the address of the 16-bit GetFreeSystemResources function from 32-bit
code? I had to cheat a bit and use some additional undocumented KERNEL32.DLL
functions. The functions are LoadLibrary16, FreeLibrary16, and
GetProcAddress16. These functions are just like their 16-bit counterparts, but
can be called directly from 32-bit code.
Since LoadLibrary16, FreeLibrary16, and GetProcAddress16 aren't in Windows NT,
you won't find them in the standard KERNEL32.LIB import library. Instead, I
got sneaky and created my own import library for them, using the UNDOCK32.DEF
file in Listing Three. The prototypes for these three undocumented functions
appear at the top of Listing Two. As you might guess, there are more than just
three undocumented functions in KERNEL32.DLL. (See my book Windows 95 System
Programming Secrets for a complete list of the undocumented functions in
KERNEL32.DLL, along with corresponding .DEF and .LIB files.)
To ensure that there are at least 0x3C bytes of unused memory below the EBP
frame so that QT_Thunk can set up its peculiar stack frame, the code declares
a local array of 0x40 characters that it doesn't use for anything. The
QT_Thunk code can bash this memory with impunity. Any variables that are
important to FSR32.C are declared as globals, and can't be trashed by
QT_Thunk. (I learned this lesson the hard way!) FSR32.C makes the actual call
to QT_Thunk using inline assembly code. The reason FSR32.C doesn't make a
regular C call to QT_Thunk is because EDX needs to be set up with the 16:16
addresses to call beforehand. You could theoretically just load EDX with one
line of inline assembly code before calling QT_Thunk. However, you'd be
relying on the compiler to not trash the EDX register before the CALL
instruction executes.


Final Thoughts


To build FSR32.EXE, use the included BUILDFSR.BAT file (available
electronically; see "Availability," page 3). This batch file first invokes the
Microsoft LIB program to turn the UNDOCK32.DEF file into an import library.
Afterwards, the batch file calls the command-line compiler (CL.EXE) and passes
the file to be compiled. The only additional arguments are the two import
libraries, UNDOCK32.LIB and THUNK32.LIB. THUNK32.LIB comes with the Win32 SDK
and with Visual C++. This means there is no complicated .MAK file. The
resulting program is a console-mode program that you can run directly from the
command line. The output is the percentage of free space in both the USER and
GDI heaps.
Be advised that the FSR32 code doesn't do anything tricky like passing
pointers to 16-bit code. The Win32 API functions that thunk down to 16-bit
code and pass pointers to 16-bit DLLs, have additional code for setting up
alias 16:16 selectors, and so forth. The main point here is that if you're
going to do anything at all tricky, I would suggest using the thunk compiler,
which really is the proper way of doing things. The earlier example passes
only one parameter, and that parameter doesn't require any translation to be
used by the 16-bit code. Examples of parameters that would need to be
translated include pointers and window message values. In short, while
QT_Thunk can be handy, think carefully before you decide to bypass the thunk
compiler and use Windows 95 thunks directly. On the other hand, if you can
avoid thunk scripts and creating extra DLLs, go for it!

Listing One
Pseudocode for QT_Thunk 
 // On entry, EDX contains the 16:16 address to transfer control to 
 //
 // Phase 1: logging and sanity checking
 //
 if ( bit 0 not set in FS:[TIBFlags] )
 goto someplace else; // Not interested in that here

 PUSHAD // Save all the registers
 SomeTraceLoggingFunction( "LS", EDX, 0 ); // EDX is 16:16 target
 // Make sure that the FS register agrees with the TIB register stored
 // in the current thread database.
 if ( (ppCurrentThread->TIBSelector != FS)
 && (ppCurrentThread != SomeKERNEL32Variable) )
 {
 _DebugOut( SLE_MINORERROR,
 "32=>16 thunk: thread=%lx, fs=%x, should be %x\n\r",
 ppCurrentThreadId, FS, ppCurrentThread->TibSelector );
 }
 POPAD // restore all the registers
 //
 // Phase 2: saving away the return address and register variable registers
 //
 POP DWORD PTR [EBP-24] // Grab return address off the stack
 // and store it away for later use
 PUSH DWORD PTR [someVariable] // ???
 PUSH EDX // Push 16:16 address on the stack. The RETF
 // at the end will effectively JMP to it.
 MOV DWORD PTR [EBP-04],EBX // Save away the common
 MOV DWORD PTR [EBP-08],ESI // compiler register variables
 MOV DWORD PTR [EBP-0C],EDI
 //
 // Phase 3: Acquiring the Win16Mutex
 //
 PUSHAD, PUSHFD // Save all registers
 _CheckSysLevel( pWin16Mutex )
 POPFD, POPAD // restore all registers
 FS:[Win16MutexCount]++; // If we don't have the mutex already,
 if ( FS:[Win16MutexCount] == 0 )// grab it now.
 GrabMutex( pWin16Mutex );
 PUSHAD, PUSHFD // Save all registers
 _CheckSysLevel( pWin16Mutex )
 POPFD, POPAD // restore all registers
 //
 // Phase 4: Saving off the old SS:ESP and switching to the 16:16 stack
 //
 Calculate the 16:16 stack pointer
 MOV DX,WORD PTR [EDI->currentSS] // Load DX with 16 bit SS
 MOV DI,SS // Save away the flat SS value into DI
 // (The callee is expected to preserve it)
 MOV SS,DX // Load SS:(E)SP with the 16 bit stack ptr
 MOV ESP,ESI
 SUB EBP,EBX // Adjust EBP for the thunk
 MOV SI,FS // Save away FS (TIB ptr) register into SI
 // (The callee is expected to preserve it)
 //
 // Phase 5: Jumping to the 16:16 bit code
 //
 GS = FS = ES = DS = 0; // Zero out the segment registers
 RETF // Effectively does a JMP 16:16 to the address
 // passed in the EDX register

Listing Two
//==================================
// FSR32 - Matt Pietrek 1995
// FILE: FSR32.C
//==================================

#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <stdio.h>
#pragma hdrstop
typedef int (CALLBACK *GFSR_PROC)(int);
// Steal some #define's from the 16 bit WINDOWS.H
#define GFSR_GDIRESOURCES 0x0001
#define GFSR_USERRESOURCES 0x0002
// Prototype some undocumented KERNEL32 functions
HINSTANCE WINAPI LoadLibrary16( PSTR );
void WINAPI FreeLibrary16( HINSTANCE );
FARPROC WINAPI GetProcAddress16( HINSTANCE, PSTR );
void __cdecl QT_Thunk(void);
GFSR_PROC pfnFreeSystemResources = 0; // We don't want these as locals in
HINSTANCE hInstUser16; // main(), since QT_THUNK could
WORD user_fsr, gdi_fsr; // trash them...
int main()
{
 char buffer[0x40];
 buffer[0] = 0; // Make sure to use the local variable so that the
 // compiler sets up an EBP frame
 
 hInstUser16 = LoadLibrary16("USER.EXE");
 if ( hInstUser16 < (HINSTANCE)32 )
 {
 printf( "LoadLibrary16() failed!\n" );
 return 1;
 }
 FreeLibrary16( hInstUser16 ); // Decrement the reference count
 pfnFreeSystemResources =
 (GFSR_PROC) GetProcAddress16(hInstUser16, "GetFreeSystemResources");
 if ( !pfnFreeSystemResources )
 {
 printf( "GetProcAddress16() failed!\n" );
 return 1;
 }
 
 __asm {
 push GFSR_USERRESOURCES
 mov edx, [pfnFreeSystemResources]
 call QT_Thunk
 mov [user_fsr], ax
 push GFSR_GDIRESOURCES
 mov edx, [pfnFreeSystemResources]
 call QT_Thunk
 mov [gdi_fsr], ax
 }
 printf( "USER FSR: %u%% GDI FSR: %u%%\n", user_fsr, gdi_fsr );
 return 0;
}

Listing Three
LIBRARY KERNEL32
EXPORTS
 LoadLibrary16@4 @35
 FreeLibrary16@4 @36
 GetProcAddress16@8 @37
End Listings

































































Building VxDs in Windows 95


An assembly shell for creating dynamically loadable VxDs




John Forrest Brown


John is the author of Embedded Systems Programming in C and Assembly (Van
Nostrand Reinhold, 1994). He can be reached at baysidec@aol.com.


Writing a virtual device driver (VxD) for Windows 95 should not be a massive
task. After all, the operating system is reasonably solid, the device being
interfaced is likely to be a debugged product, and Microsoft provides all the
necessary tools and documentation in the Windows 95 SDK and device driver
development kit (DDK). Neophyte VxD writers, however, often have difficulty
navigating the plethora of Microsoft documentation, filtering out unneeded
information, and learning the OS-to-VxD and VxD-to-application interface
frameworks. To help you through this process, I've built a very simple
assembly shell for creating dynamically loadable VxDs.
Because of problems I encountered in the C materials supplied with the MSDN
Development Platform Device Driver Kit (October '95 release), I do not
recommend using C exclusively for creating VxDs. While the supplied
documentation provides a robust discussion of assembly interfaces, it is
relatively unsophisticated in its treatment of C. After studying the DDK
materials, I found it easier to write a generic VxD shell. The assembly shell,
called VANYDEV.VXD, calls C routines to do the real work and uses assembly
language to take advantage of the DDK-supplied routines and macros. The
working examples provided were written using the MSDN Development Platform
SDK, DDK, and the Visual C++ (VC++) integrated development environment (IDE).
The assembly shell should flatten the VxD learning curve, and should provide a
functioning VxD framework that can be used in most situations. Even if you're
a seasoned VxD developer, the information I present here enhances the
Microsoft DDK documentation. 


Approaching the VxD


Figure 1 presents an overview of the VxD interfaces addressed in this article.
When a Win32 application attempts to load a dynamically loadable VxD, control
is passed to the Windows 95 virtual machine manager (VMM) to supervise the
loading process. The VMM is the heart of Windows 95 and is essentially a
preemptive, multitasking operating system; it supervises virtual machines (VM)
and VxDs. Under Windows 95, each application executes in its own VM. During
the execution life cycle of a dynamically loaded VxD, there are only three
pertinent messages initiated from the VMM to the VxD: Sys_Dynamic
_Device_Init, Sys_Dynamic_Device_Exit, and W32_DeviceIoControl. By convention,
these three messages are fielded from a control dispatch table within the VxD.

The first two messages are used during the VxD load and unload processes. The
third is used when device I/O control (DIOC) requests are made to the device. 
DIOC requests are typically generated by an application using the
Microsoft-provided C function DeviceIoControl. Function arguments include the
VxD handle (which is created when the VxD is loaded), and pointers to input
and output parameters. Input parameters include a desired service number that
is programmer defined. All DIOC information contained in the DDK is clear and
concise, and I've provided an example DIOC call in the test program (see
Listing Four).


Writing the VxD Shell


Listing One presents VANYDEVD.ASM, the main assembly module for a dynamically
loadable VxD for the ANYDEV board. Listing One includes three files from the
DDK: VMM.INC, VWIN32.INC, and VPICD.INC. In addition to providing standardized
constants, these INCLUDE files contain assembly macros that save the
programmer some work. There also are three similarly named header files used
by VANYDEVC.C; see Listing Three.
Listing One first calls the Declare_Virtual_Device macro to transparently
create a device descriptor block (DDB) for the VxD. The DDB is a data block
that serves as the entry point for the VxD. In the call to
Declare_Virtual_Device, VANYDEVD is the name of the VxD, 1 and 0 indicate the
major and minor revision levels of the VxD, VANYDEVD_Control is the address of
the control procedure that fields the three messages from the VMM,
Undefined_Device_ID is the ID used for all devices not registered with
Microsoft, and Undefined_Init_Order specifies the order in which devices are
initialized (in this case, the order is not important).
Working in conjunction with this macro are the VANYDEVD_Control procedure and
the Control_Dispatch macro. VANYDEVD_Control will generate a control dispatch
table. Within the control procedure, the Control_Dispatch macro precedes a
message number and the address of a routine to process that message. The macro
ensures proper building of the dispatch table. Any message sent to the VxD
other than one of the three defined in the control procedure will be ignored,
and a "good" return code will be returned to the caller.
The three message-processing assembly routines specified in the control
procedure should be defined in the same module as the dispatch table. In
addition, their code segments must be of the VxD_LOCKED_CODE_SEG type. As
written in Listing One, these routines call C functions for actual processing.
Primitive C functions can be put into a separate module using VC++, and they
can be compiled using default IDE settings. Typical C routines are contained
in VANYDEVC (see Listings Two and Three).


Expanding the VxD Shell


As an example, assume that an input board exists that can be plugged into a
PC's ISA bus. External to the PC, the board interfaces with a wheelbarrow that
is filled with bits. The board's purpose is to sort the bits into a readable
format and provide 4 KB (one page) of readable data at the adapter board's
base address (under 1 MB). Each time the wheelbarrow is filled with bits, an
interrupt occurs that signals the presence of readable data at the board's
base address. To expand the VxD shell into a driver for this board, an
interrupt-service routine (ISR) must be written to handle the board's
interrupts. Also, a shared memory area would be desirable to speed
communication between the VxD and the application.
Unlike earlier versions of Windows, Windows 95 splits the 4 GB of linear
address space into four distinct areas. Two areas under the 2-GB boundary are
reserved for DOS and for applications. Above the 2-GB boundary, there are two
1-GB areas reserved for global (system-wide) and application sharing,
respectively. System DLLs and memory-mapped files are mapped to the
application-shared area. The global area is used for system page tables, VxDs,
and globally mapped addresses.
Since VANYDEVD's data area is locked and globally mapped, addresses of data
structures defined within the VxD can be passed to an application via a DIOC
request. Using these globally mapped addresses, the application can directly
access data structures contained in the VxD. Access to the board's base
address (under 1 MB) also is required. Using the Microsoft-supplied
MapPhysToLinear function from within the VxD provides a global address usable
by both the application and the VxD. Again, the application can obtain the
requisite global address from the VxD by issuing a DIOC request using the
DeviceIoControl function. To facilitate all of the aforementioned operations,
the assembly module must contain the MapPhysToLinear interface as shown in
Listing One; and the C module must be expanded to handle the device-specific
requests, define the shared area, and store the global addresses needed by the
application; see Listing Three.
In adapting the VxD shell to drive the wheelbarrow board, the IRQ known to the
board is virtualized using the services of the Windows 95 virtual device
driver for the PC's programmable interrupt controller (VPICD). To converse
with VPICD, more routines are added to the assembly module. These routines
constitute a shell interface to VPICD and it is from this shell that VxD calls
are issued to VPICD. The assembly module also includes an IRQ descriptor block
(IDB), a necessary part of the VPICD service protocol. Both the IRQ number and
ISR address are specified within the IDB. In addition, when using a
virtualized IRQ, the ISR must use a specific exit protocol. The reason is that
the VMM actually fields interrupt requests and calls the ISR specified in the
IDB. The ISR must issue the physical end of interrupt (EOI) to VPICD and then
return (not iret) to VMM. The routine required for exiting the ISR is included
in Listing One.
Finally, the ISR might do no more than set flags and status indicators in the
shared data structure. The flags could indicate to the application that a page
of data is available for processing at the board's reserved area below 1 MB.
The application might poll the flags until data is available and then display
it, store it, or whatever. The C module's initialization and uninitialization
routines could be fleshed out to initialize the board and preclude more than
one successful loading of the VxD. These areas are commented accordingly in
the example source code. Listing Three shows a testable C module and indicates
how the C routines might be used to drive the wheelbarrow board. 


Conclusion


TEXT1.CPP (Listing Four) is a test program which can be stepped through using
the debugger in VC++. The complete source code and project files are available
electronically; see "Availability" on page 3. Listing Four loads the VxD and
provides viewable data via the shared VxD-resident structure. Results of
various operations can be determined from the supplied data. Of course, IRQ
selection is environment-specific. Although there is no wheelbarrow board
available, an unused IRQ should be specified in VANYDEV's IDB in order to
avoid conflicts returned by the DeviceInit routine (which calls CVirt_IRQ). As
supplied, the test program looks for the VxD in the C:\xyz directory. The link
batch file that I've supplied (see "Availability") will copy the VxD to
C:\xyz\ after successfully linking the objects.
To build the test program, first ensure that drive C: contains a directory
called "c:\xyz." Next, use the VANYDEVD.BAT file to assemble VANYDEVD.ASM. You
can then compile VANYDEVC.C using VC++. The LVANYDEV.BAT file can be used to
link the two modules (VANYDEVD.DEF must exist). Note that LVANYDEV will copy
the VANYDEVD.VXD to c:\xyz\VANYDEVD.VXD. Finally, use VC++ to compile and link
the test program in Listing Four. You should then be able to step through the
TEXT1 program using the integrated debugger, and check values inserted into
the global structure after the VxD is loaded.
Figure 1: Essential ANYDEV system software-construction overview.

Listing One
;;VANYDEVD.asm
name VANYDEVD
.386p
;; ------------------------------------------------

;; VANYDEVD.asm -- Assembly module, MAIN module for
;; Dynamically loadable VxD for ANYDEV board
;; ------------------------------------------------
;; -----------------------------------
;; INCLUDE files needed by this module
;; -----------------------------------
include <\ddk\inc32\vmm.inc>
include <\ddk\inc32\vwin32.inc>
include <\ddk\inc32\vpicd.inc>
;; ------------------------------------
;; C routines/data used by this module
;; ------------------------------------
extrn _CVANYDEVD_Device_Init:near
extrn _CVANYDEVD_Device_UNInit:near
extrn _CVANYDEVD_Device_IOctrl:near
extrn _ISR_ANYDEV:near
;; -------------------------------------
;; Routines/data called from this module
;; -------------------------------------
public _GET_BD_MEM
public _Virt_IRQ
public _End_ISR
public _Physically_Mask_IRQ
public _Physically_UNMask_IRQ
public _UNVirt_IRQ
public _Get_IRQ_Status
; ===========================================================================
;; Misc VANYDEVD-specific Equates -- NOTE: This is where you change
;; the REV level or put any MS-assigned DEVICE ID (if DEVICE registered with 
;; MS). Also change here the address & length of the below 1mb area used by 
;; ANYDEV board and the IRQ used.
;; ---------------------------------------------------------------------------
VANYDEVD_MajoREV equ 1 ;ANYDEV's MAJOR revision level
VANYDEVD_MinoREV equ 0 ;decimal number of revision
VANYDEVD_DeviceID equ Undefined_Device_ID ;no need for device number
assignment
VANYDEVD_IRQ_Used equ 10 ;IRQ used by ANYDEV board
VANYDEVD_1mb_adr equ 0c0000h ;Below 1mb address used by ANYDEV board 
VANYDEVD_1mb_len equ 01000h ;Length of below 1mb area used by ANYDEV board
; ============================================================================
;; ---------------------------------------------
;; Virtual Device Declaration (Required). (Declares this code as virtual 
;; device driver). Also creates the Device Data Block
;; ---------------------------------------------
 
Declare_Virtual_Device VANYDEVD,VANYDEVD_MajoREV,VANYDEVD_MinoREV,
 VANYDEVD_Control,VANYDEVD_DeviceID,Undefined_Init_Order
VxD_LOCKED_CODE_SEG
;; --------------------------------------------
;; Control Dispatch Table & Proc (Required)
;; Used to dispatch supported messages sent by
;; VMM -- clears carry for unsupported mssgs.
;; --------------------------------------------
;; Only 3 VMM messages are recognized and processed
;; by this routine -- all DIOC interface messages
;; translate to W32_DeviceIoControl mssgs from the VMM.
;; "Control_Dispatch" precedes MSSG NUMBER, PROCEDURE
BeginProc VANYDEVD_Control
 Control_Dispatch Sys_Dynamic_Device_Exit, VANYDEVD_Device_UNInit
 Control_Dispatch Sys_Dynamic_Device_Init, VANYDEVD_Device_Init

 Control_Dispatch W32_DeviceIoControl, VANYDEVD_Device_IOctrl
 xor eax,eax ;;return 0 (required in some instances)
 clc ;;clear carry flg for GOOD indicator 
 ret
EndProc VANYDEVD_Control
;; -------------------------------------------------------------
;; NOTE: "BeginProc & EndProc" are needed in conjunction with
;; the above dispatch table -- below routines facilitate C fcns
;; -------------------------------------------------------------
;; =======================================================================
;; Routines below are VXD interface (load, unload, process) ROUTINES
;; =======================================================================
;; --------------------------------------------
;; Routine to jump to C routine for processing
;; SYS_DYNAMIC_DEVICE_INIT message
;; --------------------------------------------
BeginProc VANYDEVD_Device_Init
 call _CVANYDEVD_Device_Init
 ret
EndProc VANYDEVD_Device_Init
;; --------------------------------------------
;; Routine to jump to C routine for processing
;; SYS_DYNAMIC_DEVICE_EXIT message
;; --------------------------------------------
BeginProc VANYDEVD_Device_UNInit
 call _CVANYDEVD_Device_UNInit
 ret
EndProc VANYDEVD_Device_UNInit
;; --------------------------------------------
;; Routine to jump to C routine for processing
;; W32_DEVICEIOCONTROL messages -- These are
;; VxD requests from the application.
;; At entry, esi points to the DIOC interface
;; structure passed by the application
;; --------------------------------------------
BeginProc VANYDEVD_Device_IOctrl
 push esi
 call _CVANYDEVD_Device_IOctrl
 pop esi
 ret
EndProc VANYDEVD_Device_IOctrl
;; ======================================================
;; Routines below are miscellaneous assembly interfaces
;; ======================================================
;; -------------------
;; GET MEM BELOW 1MB
;; -------------------
BeginProc _GET_BD_MEM
 VMMcall _MapPhysToLinear <VANYDEVD_1mb_adr,VANYDEVD_1mb_len,0>
 ret
EndProc _GET_BD_MEM
;; -------------------
;; Handle EOI for ISR
;; -------------------
BeginProc _End_ISR
 VxDcall VPICD_Phys_EOI
 ret
EndProc _End_ISR
;; =======================================================================

;; Routines below are IRQ specific - for virtualization/unvirtualization
;; =======================================================================
;; -------------------
;; Virtualize IRQ
;; -------------------
BeginProc _Virt_IRQ
 push edi
 mov edi, OFFSET32 _VIRQdat
 VxDcall VPICD_Virtualize_IRQ
 jnc VIRQEXIT
 mov eax, 0ffffffffh ;set ERR if appro
VIRQEXIT:
 pop edi
 ret
EndProc _Virt_IRQ
;; -------------------
;; UN Virtualize IRQ
;; -------------------
BeginProc _UNVirt_IRQ
 VxDcall VPICD_Force_Default_Behavior
 ret
EndProc _UNVirt_IRQ
;; ---------------
;; Get IRQ Status
;; ---------------
BeginProc _Get_IRQ_Status
 push ecx
 xor eax,eax
 mov ax,[_VIRQdat]
 VxDcall VPICD_Get_IRQ_Complete_Status
 jc statusbad
 xor eax,eax
 jmp statusex
statusbad:
 push ecx
 pop eax
statusex:
 pop ecx
 ret
EndProc _Get_IRQ_Status
;; -------------------
;; Physically Mask IRQ
;; -------------------
BeginProc _Physically_Mask_IRQ
 VxDcall VPICD_Physically_Mask
 ret
EndProc _Physically_Mask_IRQ
;; ----------------------
;; Physically UN-Mask IRQ
;; ----------------------
BeginProc _Physically_UNMask_IRQ
 VxDcall VPICD_Physically_Unmask
 ret
EndProc _Physically_UNMask_IRQ
VxD_LOCKED_CODE_ENDS
;; --------------------------------
;; VPICD IRQ Descriptor Block
;; --------------------------------
VxD_LOCKED_DATA_SEG

_VIRQdat dw VANYDEVD_IRQ_Used ;; IRQ# 
 dw 0
 dd OFFSET32 _ISR_ANYDEV ;; ISR ADDRESS
 dd 0
 dd 0
 dd 0
 dd 0
 dd 500
 dd 0
 dd 0
 dd 0
VxD_LOCKED_DATA_ENDS
 END

Listing Two
//VANYDEVD.H
#define BOGUSADDRESS 0xffffffff //Returned by MS routines
// These constants define the services of the VXD required by the application
// They are named from the perspective of the application (originator)
// NOTE: Any code numbers may be used except -1 & 0
#define VANYDEVD_GRAB_ADDRESSES 10 
 //Obtains an IN CONTEXT address of the global areas
#define VANYDEVD_INIT_ADEV_HARDWARE 11 
 //Requests the VXD to INIT the ANYDEV hardware using current parameters
#define VANYDEVD_ENABLE_ANYDEV_HDWR 14 //Requests the VXD to ENABLE ANYDEV 
#define VANYDEVD_DISABLE_ANYDEV_HWR 15 //Requests the VXD to DISABLE ANYDEV 
// SHARED by the application and the ANYDEV virtual device driver. It contains

// board configuration information, board status, and flags used by
// both software ends, This structure is built by the VXD and resides at
ring-0
typedef struct ANYDEVprimary
 {
 DWORD flags; //Status Flags, etc
 DWORD Global_addr_1MB; //Address of board's memory below 1mb
 DWORD IRQhandle; //IRQ handle
 DWORD IRQflags; //User defined
 DWORD IRQcount; //User defined
 DWORD IRQstatus; //Set by Get IRQ Status routine
 //What ever the user desires goes here
 } ADEV, *ADEVPTR;
#define IRQ_VIRT_OK 1 

Listing Three
//// VANYDEVDc.c
// --------------------------------------------
// Dynamically loadable VxD for ANYDEV board
// --------------------------------------------
#define WIN32_LEAN_AND_MEAN // Excludes un-needed parts of windows.h
#include "windows.h"
#include <\ddk\inc32\vmm.h>
#include <\ddk\inc32\vwin32.h>
#include <\ddk\inc32\debug.h>
#include "VANYDEVD.H"
// -------------------------------------
// Externs defined in assembly module 
// -------------------------------------
// These defined in assembly for the VMM.INC or VPICD.INC file inclusion
extern DWORD GET_BD_MEM(void);
extern DWORD Virt_IRQ(void);
extern DWORD Get_IRQ_Status(void);

extern void UNVirt_IRQ(DWORD IRQhandle);
extern void End_ISR(DWORD IRQhandle);
extern void Physically_Mask_IRQ(DWORD IRQhandle);
extern void Physically_UNMask_IRQ(DWORD IRQhandle);
// ------------------------
// PRAGMA for this DATA 
// ------------------------
// Establish segment
#pragma data_seg ( "_LDATA","_LCODE")
// ------------------------------------
// Data structures MUST be INITIALIZED
// ------------------------------------
ADEV ANYDEVX = {0}; // Main structure for ANYDEV -- shared by app 
// ------------------------
// PRAGMAS for this CODE 
// ------------------------
// Establish segment
#pragma code_seg ( "_LTEXT", "_LCODE" )
//No stack checking for routines in this module
#pragma check_stack(off)
// ----------------
// Disable hardware
// ----------------
void Disable_AnyDev(void)
{
 //This would likely be a port WRITE to DISABLE the board's interrupt 
}
// ----------------
// Enable hardware
// ----------------
void Enable_AnyDev(void)
{
 //This would likely be a port WRITE to ENABLE the board's interrupt 
}
// --------------------------
// ISR Processing for ANYDEV
// --------------------------
void Process_ISR(void)
{
 //Where the user might SET FLAGS and indicators in the ANYDEVX structure
 //in order to notify the application that data is available below 1MB
}
// -----------------
// ISR for ANYDEV
// -----------------
void _declspec(naked)ISR_ANYDEV(void)
{
 // Save registers
 _asm sti
 _asm pushad
 _asm pushfd
 //Process the ISR 
 Process_ISR();
 //End ISR
 _asm clc
 End_ISR(ANYDEVX.IRQhandle);
 //Set GOOD return code
 _asm clc
 //Restore saved registers

 _asm popfd
 _asm popad
 _asm ret;
}
// -------------------
// Virtualize the IRQ
// -------------------
DWORD CVirt_IRQ(void)
{
 // If in use by an instance of this program, RETURN with BAD code
 if (ANYDEVX.IRQcount)
 return (BOGUSADDRESS);
 // If in use by another program, RETURN with BAD code
 ANYDEVX.IRQstatus = Get_IRQ_Status();
 if (ANYDEVX.IRQstatus)
 return (BOGUSADDRESS);
 // If IRQ NOT in use this point is reached
 // Set BAD return code
 ANYDEVX.IRQhandle = BOGUSADDRESS;
 // Disable ANYDEV hardware 
 Disable_AnyDev();
 // Get global memory address below 1mb
 ANYDEVX.Global_addr_1MB = GET_BD_MEM();
 if (ANYDEVX.Global_addr_1MB != BOGUSADDRESS)
 {
 // Virtualize the IRQ
 ANYDEVX.IRQhandle = Virt_IRQ(); 
 if (ANYDEVX.IRQhandle != BOGUSADDRESS)
 {
 // unmask the IRQ, set OK flag & increment IRQ count
 Physically_UNMask_IRQ(ANYDEVX.IRQhandle);
 ANYDEVX.IRQflags = IRQ_VIRT_OK;
 ++(ANYDEVX.IRQcount);
 }
 }
 return(ANYDEVX.IRQhandle);
}
// ----------------------
// UN Virtualize the IRQ
// ----------------------
void CUNVirt_IRQ(DWORD IRQhandle)
{
 // if IRQ has been successfully virtualized
 if ((ANYDEVX.IRQhandle != 0)
 && (ANYDEVX.IRQhandle != BOGUSADDRESS))
 {
 // Physically mask the IRQ and UN virtualized it
 Physically_Mask_IRQ(ANYDEVX.IRQhandle);
 UNVirt_IRQ(ANYDEVX.IRQhandle);
 }
// Set UNvirtualized flags and indicators
ANYDEVX.IRQhandle = BOGUSADDRESS;
ANYDEVX.IRQflags &= ~IRQ_VIRT_OK;
 return;
}
// ---------------------------------------
// Set Good Return code for DIOC requests
// ---------------------------------------
void _declspec(naked)GoodReturnDIOC(void)

{
 // Clear eax and carry flag for GOOD return
 _asm xor eax,eax
 _asm clc
 _asm ret;
}
// ---------------------------------------
// Set Bad Return code for DIOC requests
// ---------------------------------------
void _declspec(naked) BadReturnDIOC(void)
{
 // NOTE: 50 is a FCN NOT SUPPORTED code -- ok to use
 // SET carry flag for BAD return
 _asm mov eax,50
 _asm stc
 _asm ret;
}
// ------------------------------
// Routine for ANYDEV Device UNINIT 
// ------------------------------
void CVANYDEVD_Device_UNInit()
{
 // Disable ANYDEV, Unvirtualize IRQ, set GOOD return code
 Disable_AnyDev(); 
 CUNVirt_IRQ(ANYDEVX.IRQhandle); 
 GoodReturnDIOC(); 
 return;
}
// ------------------------------
// Routine for ANYDEV Device INIT 
// ------------------------------
void CVANYDEVD_Device_Init()
{
DWORD retcode;
 // Try to virtualize the IRQ
 retcode = CVirt_IRQ();
 // Set GOOD or BAD return code based on success
 if (retcode == BOGUSADDRESS)
 BadReturnDIOC();
 else
 GoodReturnDIOC();
 return;
}
// --------------------------------
// Routine for ANYDEV Device IO ctrl 
// --------------------------------
void CVANYDEVD_Device_IOctrl(PDIOCPARAMETERS ptr)
{
DWORD *obuf1;
 // Field the DEV IO requests from VMM
 switch(ptr->dwIoControlCode)
 {
 case(VANYDEVD_INIT_ADEV_HARDWARE):
 ANYDEVX.flags = 0;
 //User likely to require other initialization here
 break;
 case(VANYDEVD_GRAB_ADDRESSES):
 // Point to Output buffer
 obuf1 = (DWORD *) ptr->lpvOutBuffer;

 // Return GLOBAL 1MB addr, addr of data structure
 // and return indicators of IRQ virtuaization request
 *obuf1 = ANYDEVX.Global_addr_1MB;
 *(obuf1+1) = (DWORD) &ANYDEVX;
 *(obuf1+2) = (DWORD) ANYDEVX.IRQhandle;
 *(obuf1+3) = (DWORD) ANYDEVX.IRQflags;
 //User might want to return other/different values here
 break;
 case(VANYDEVD_ENABLE_ANYDEV_HDWR):
 //Call routine to enable interrupt
 Enable_AnyDev();
 break;
 case(VANYDEVD_DISABLE_ANYDEV_HWR):
 //Call routine to disable interrupt
 Disable_AnyDev();
 break;
 //The below DIOC_GETVERSION is a part of the dynamic load protocol
 //It MUST return a GOOD code (all codes here use GoodReturnDIOC()
 case(DIOC_GETVERSION):
 case(DIOC_CLOSEHANDLE):
 default:
 break;
 }
 GoodReturnDIOC();
 return;
}

Listing Four
// Text1.cpp -- Test routine for VANYDEVD.VXD interface
// A way to test dynamic load of VXD and VERIFY global addresses and data
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <\ddk\inc32\vmm.h>
#include <\ddk\inc32\vwin32.h>
#include <stdio.h>
#include <string.h>
#include <malloc.h>
#include <\msvc20\bin\VANYDEVD\VANYDEVD.H>
//NOTE: "\\\\.\\" indicates to CreateFile that VXD is being loaded dynamically
char VxDpathName[] = "\\\\.\\c:\\xyz\\vanydevd.vxd";
//Init values show something other than 0 during debug sessions
DWORD vxd_outbuff[0x1024]={1,2,3,4,5,6,7,0,0,};
//Main interface test routine for VANYDEVD.VXD
int main(int argc, char *arg[])
{
HANDLE vxd_Handle;
int errx;
DWORD DIOC_count;
DWORD *AddrBelow1MB;
DWORD DataToReview[10];
ADEVPTR SharedMemPtr;
 errx = 0;
 DIOC_count = 0;
 //Load the VXD dynamically
 vxd_Handle=CreateFile(&VxDpathName[0],0,0,NULL,0,
 FILE_FLAG_DELETE_ON_CLOSE,NULL);
 if (vxd_Handle == INVALID_HANDLE_VALUE)
 errx=GetLastError();
 else

 {
 //If VxD LOAD was SUCCESSFUL, GRAB global Addresses and IRQ Status
 DeviceIoControl(vxd_Handle,VANYDEVD_GRAB_ADDRESSES,
 NULL,0,&vxd_outbuff[0],(DWORD) sizeof(vxd_outbuff),
 &DIOC_count,NULL);
 AddrBelow1MB = (DWORD *) vxd_outbuff[0];
 //Walk through here with debugger to verify addresses and data
 if (((DWORD) AddrBelow1MB) != BOGUSADDRESS)
 {
 DataToReview[0] = *AddrBelow1MB;
 SharedMemPtr = (ADEVPTR) vxd_outbuff[1];
 DataToReview[1] = SharedMemPtr->IRQcount;
 DataToReview[2] = SharedMemPtr->IRQhandle;
 DataToReview[3] = SharedMemPtr->IRQstatus;
 }
 else
 printf("\nBOGUS address received for 1MB area");
 }
 printf("\nERR %d \n",errx);
 if (vxd_Handle != INVALID_HANDLE_VALUE)
 CloseHandle(vxd_Handle);
 return(errx);
}
End Listings>>
<<






































Windows 95 Subclassing and Superclassing 


Less work, less code




Jeffrey Richter and Jonathan Locke


Jeff and Jon are coauthors of Windows 95: A Developer's Guide (M&T Books,
1995). They can be contacted through the DDJ offices.


Windows supplies a number of ready-to-use window classes, including listboxes,
comboboxes, and scrollbars. While these controls are intended to be feature
laden and general enough for use in any application, there may be times when
you require slightly different behavior from controls. Although you could
design your own control from scratch, both window subclassing and
superclassing let you inherit features and styles from one or more windows.
However, the form you use is dictated by your particular requirements. For
example, if you're creating a single control that inherits from a Windows
control, then you'll likely use subclassing to create your new control. If, on
the other hand, you plan to create several controls that inherit similar
features from a common class, you may want to use superclassing. In this case,
superclassing reduces your programming effort while producing smaller code in
your application. 


How Window Subclassing Works


When registering a window class, you fill in the members of the WNDCLASSEX
structure and pass the address of the structure to the RegisterClassEx
function. Of course, before calling RegisterClassEx, you must initialize the
lpfnWndProc member of the WNDCLASSEX structure to point to the address of the
class's window procedure. This procedure processes messages pertaining to all
windows of this class. Before RegisterClassEx returns, the system allocates an
internal block of memory that contains information about the window class. The
class's window procedure address is part of this block.
Whenever a new window is created, the system allocates another internal block
of memory containing information specific to the new window. When the system
allocates and initializes the block of memory for this window, the system
copies the class's window procedure address from the class's internal memory
block into the window's internal memory block. When a message is dispatched to
a window procedure, the system examines the value of the window procedure in
the window's memory block and calls the function whose address is stored
there. The system does not use the window procedure address stored in the
window class's memory block when dispatching a message to a window. The window
procedure address stored in the class's memory block is there only so that it
can be copied when a window is created--the system does not use this address
for any other purpose.
To subclass a window, you change the window procedure address in the window's
memory block to point to a new window procedure. Because the address is
changed in one window's memory block, it does not affect any other windows
created from the same class. If the system did not give each window instance
its own copy of the window procedure address, changing the class's address
would alter the behavior of all windows in the class. If this were the case,
subclassing a single edit control so that it would no longer accept digits
also would cause all edit controls used within a single process to stop
accepting digits. This certainly is not desirable. 
Once you have changed the window procedure address in the window's memory
block, all messages destined for the window are sent to the function at this
new address. This function must look exactly like a standard window procedure.
In other words, its prototype must have an identical prototype.
Once the message destined for the original window procedure has been sent to
your procedure, you can:
Pass it to the original procedure. This option is used for most messages. The
reason for subclassing is usually to alter the behavior of a window only
slightly. For this reason, most messages are passed to the original procedure
so that the default behavior for this window class can be performed.
Stop the message from being passed to the original procedure. For example, if
you want an edit control to stop accepting digits, you examine WM_CHAR
messages, check to see if the character (wParam parameter) is between 0 and 9,
and, if it is, immediately return from the SubClassWndProc function. If the
character is not a digit, you pass the message to the original window
procedure for edit controls.
Alter the message before sending it. If you want a combobox to accept only
uppercase characters, examine each WM_CHAR message and convert the key
(contained in wParam) to uppercase (using the CharUpper function) before
passing the message to the original procedure.
In Example 1, which shows how to subclass a window, the only information
required to subclass a window is its window handle. Notice that we have used
the SubclassWindow macro to perform the window subclassing. This macro is
defined in WindowsX.h; see Example 2. You can see that, aside from some
casting, this macro simply calls SetWindowLong, changing the address of the
window's window procedure. Of course, this means that the function address
passed to SubclassWindow must identify a function that is contained in the
process's address space. In Win32, it is not possible to subclass a window
using a window procedure that is contained in another process's address space.
The return value from SetWindowLong is the address of the original window
procedure. This window procedure is saved in a window property that is
associated with the subclassed window.
The function you write to intercept messages is identical in form to a window
procedure. The only difference is that you pass the messages to the original
window procedure instead of calling DefWindowProc. To have the original window
perform its normal operations for a particular message, you must use the
CallWindowProc function. This function is passed to the address of the
original window procedure, the window handle (hwnd), the message (uMsg), and
the two standard parameters (wParam and lParam).


Creating New Window Messages


When you subclass a window, you often want to define some new window messages
that can be sent to the window. Let's say that you create an edit window that
accepts only numbers (using the new ES_NUMBER window style). Then, you
subclass this edit window to alter its behavior slightly. For example, you
might want to define a new window message, EM_ADDNUM, that you can send to
your subclassed window to make adding a value to the number shown in the
subclassed edit window very easy. How should you define this EM_ADDNUM
message? Normally, if you were implementing a window class entirely by
yourself, you would define your class-specific window messages starting with
WM_USER. For example, the EM_ADDNUM message might be defined as #define
EM_ADDNUM (WM_USER + 0).
But, when you are subclassing a window that was originally created from a
class that you did not implement yourself, you cannot simply create new window
messages starting at WM_USER. All the new common controls, like trackbars,
have their class-specific messages starting at WM_USER. For example, the
trackbar class's TBM_GETPOS message is defined as #define TBM_GETPOS
(WM_USER).
If you subclass a trackbar window and define your own message with a value of
WM_USER + 0, your subclass procedure intercepts the message and processes the
message your way. Any code that is expecting to send the TBM_GETPOS message in
order to retrieve size information will be in for a big surprise!
So, you can start defining your messages at WM_USER + 500. But, this isn't
really a safe approach because you don't know if Microsoft has created any
undocumented (WM_USER + x) messages specific for the control. If you add your
own user message starting at WM_USER + 500, it could conflict with an
undocumented message recognized by the trackbar class. Actually, choosing any
value would be tempting fate.
Fortunately, there are two ways to solve this problem. The first is to use the
RegisterWindowMessage function, which creates a new message that is unique as
long as the string we pass to it is also unique. Whenever you call
RegisterWindowMessage, you know that the integer value returned identifies a
message that is ignored by any and all window procedures that have not also
requested to register the same string message.
The second way to solve the problem is much easier. Starting with Windows 95,
Microsoft has declared that all third-party window-class procedures must
ignore messages that range from WM_APP to 0xBFFF, inclusive where WM_APP is
defined in WinUser.H as #define WM_APP 0x8000. 
This means Microsoft guarantees that all system-global window classes and all
new common controls will ignore messages in this range. It also means other
companies that produce controls also should ignore messages in this range. You
should definitely check with the vendor of any controls you use to make sure
that they do not process any messages above WM_APP.
Because these classes ignore messages in this range, you can safely define
your EM_ADDNUM message for our subclassed edit window as #define EM_ADDNUM
(WM_APP + 0). Now, when you send this message to a subclassed edit window, it
will know, beyond a shadow of a doubt, that you want it to add a number to the
value shown in the edit control.


Associating Additional Data with a Subclassed Window


Another problem with subclassing windows is that associating additional data
with the window is difficult. For example, you might like to associate a valid
numeric range with our subclassed edit window. Because you did not register
the edit window class, you have no way to add additional class and/or window
extra bytes. In addition, because subclassing requires that a window instance
already exists before you can subclass it, increasing the number of class or
window extra bytes is also impossible.
The best way to associate data with a subclassed window is to use window
properties. You also could use the GWL_USERDATA bytes to store an address of a
data structure that contains some additional data, but this is not a good
idea. The GWL_USERDATA bytes are supposed to be available for the application
that creates and manipulates the window. Since your subclass procedure doesn't
create the window, but acts very much like a window procedure, you should not
touch the GWL_USERDATA bytes. Window properties are the only robust solution.
Of course, if you know that the users of your subclassed window will not touch
the GWL_USERDATA bytes, you can use them, but you sacrifice some flexibility
that you may desire in the future.


The NoDigits Application


The NoDigits application (see Listings One through Four) demonstrates how to
subclass two edit control windows. When you invoke the application, the main
application window appears. This window contains three edit controls and two
buttons. The first edit control uses the standard, built-in window procedure.
The other two edit controls are subclassed. The subclass procedure intercepts
all characters being sent to the edit control. However, if the character is a
digit, the subclass window procedure does not call the original window
procedure. Instead, it beeps and simply returns.

When NoDigits starts, it creates a dialog box from a template that contains
three edit controls and two buttons. When the dialog-box procedure receives a
WM_INITDIALOG message, the NoDigits_OnInitDialog function subclasses two of
the edit windows by calling the NoDigitsClass_ConvertEdit function.
The NoDigitsClass_ConvertEdit function is implemented in NoDigCls.c (Listing
Two) but is prototyped in NoDigits.c (Listing One). This function simply calls
the SubclassWindow macro and saves the returned original window procedure as a
property associated with the subclassed window.
NoDigitsClass_WndProc is a subclass window procedure that intercepts only a
single window message, WM_CHAR. All other messages are passed to the original
edit control's window procedure by calling NoDigitsClass_CallOrigWndProc at
the end of the function. NoDigitsClass_CallOrigWndProc has the same prototype
as any other window procedure, but it calls the CallWindowProc function to
forward messages to the original window procedure. CallWindowProc's first
parameter is the address of the window procedure you wish to forward the
message to. It's easy for NoDigitsClass_WndProc to get the address of the edit
control's original window procedure. It does this by calling the GetProp
function, passing the same string that was passed to SetProp inside
NoDigitsClass_ConvertEdit.
We'll now turn to how digits are prevented from being entered into the edit
control after it's subclassed. As mentioned earlier, the subclass function
performs some additional processing for the WM_CHAR message only. Whenever
NoDigitsClass_WndProc receives a WM_CHAR message, it calls the
NoDigitsClass_OnChar function. The first thing this function does is determine
if the character passed to it is a digit. If the character is a digit,
MessageBeep is called so that the user is given some indication that the
window does not accept digits. However, if the character is not a digit, the
WM_CHAR message is forwarded to the edit control's original window procedure
by using the FORWARD_WM_CHAR macro. The last parameter is the address of the
function that you wish to forward the message to,
NoDigitsClass_CallOrigWndProc.


How Window Superclassing Works


Window superclassing is similar to window subclassing. Again, you are
associating a different window procedure address with a window to alter a
single window's behavior slightly. With window superclassing, however, you
create a new window class that has altered behavior. Superclassing is most
useful if you intend to create several windows, all with slightly altered
behavior. (There are some other differences between subclassing and
superclassing that will be discussed later.) Creating a superclass is
accomplished by registering a new window class. For example, let's say that
your application needs to create several edit windows where each of the
windows can accept only letters. You could create all the windows and then
subclass each individually in order to get the desired effect, or you could
use window superclassing.
For window superclassing, you must create a superclass window procedure that
is almost identical to a window subclass procedure. The prototype is the same,
and the way that you intercept messages is the same. In fact, the only
difference is how you call the original window procedure. 
Example 3 shows how to create a superclass. Many more steps are necessary to
create a superclass than to subclass a window. The process of superclassing a
window begins with a call to the GetClassInfoEx function, passing it the name
of the desired base class. This fills a WNDCLASSEX structure with statistics
regarding the base class. This WNDCLASSEX structure serves as a starting point
for the new window class.
Once you have the base class's information, it is very important to save the
value of the lpfnWndProc member. This value is the address of the base class
window procedure. This variable later will be used in the superclass window
procedure as the first parameter to the CallWindowProc function.
The next step is to give the new class a name by setting the lpszClassName
member to the new name for the class. The value of the hInstance member should
be set to the value of hinstExe that was passed to WinMain, or the value of
hinstDll passed to a DLL's DllMain function. This value lets the system know
which module (EXE or DLL file image) in the process's address space is
registering the new window class. Finally, the lpfnWndProc member of the
WNDCLASSEX structure is changed to the address of the superclass window
procedure.
Because a new window class is being registered, you can increase the values of
the cbClsExtra and cbWndExtra members of the WNDCLASSEX structure. These
additional bytes may be used by your superclass function and are a big
advantage of superclassing over subclassing. But be careful when using the
class or window extra bytes for a superclassed window class. The base class
window procedure is written with the assumption that the class extra bytes
(from 0 to cbClsExtra-1) and the window extra bytes (from 0 to cbWndExtra-1)
are for its own use. The superclass window procedure must not access the class
and window extra bytes within these ranges unless it knows exactly how they
are used by the base class.
If the superclass window procedure is going to add class and window extra
bytes, it must save the original values of the cbClsExtra and cbWndExtra
members of the WNDCLASSEX structure, usually in global variables, before
changing the values of those members. When the superclass window procedure
accesses any of the window extra bytes, it must add the original value of
cbWndExtra to the index so that it does not reference the window extra bytes
used by the base class. Example 4 shows how to prepare and access additional
window bytes added to the superclass. Of course, it is possible to associate
data with a superclassed window via window properties. However, it is always
better to store information in window extra bytes because properties require
more data space and take more time to access.
The lpszMenuName member of WNDCLASSEX also may be changed to give the new
class a new menu. If a new menu is used, the IDs for the menu options should
correspond to the IDs in the "standard" menu for the base class. This new menu
is not necessary if the superclass window procedure processes the WM_COMMAND
message in its entirety and does not pass this message to the base class
window procedure.
The remaining members of the WNDCLASSEX structure--style, hIcon, hCursor,
hbrBackground, and hIconSm--may be changed in any way you desire. For example,
if you want your new window class to use a different mouse cursor or a
different icon, you can change the hCursor and hIcon members of the WNDCLASSEX
structure accordingly. Finally, call the RegisterClassEx function to inform
Windows of the new class.
The main difference between subclassing and superclassing is that subclassing
alters the behavior of an existing window, while superclassing creates a new
window class where all windows created by the class have altered behavior. It
is better to use superclassing when you wish to create several windows whose
behavior differs slightly from an existing window class. This is because it is
easier to register a new class, give it a new name, and create windows of this
new class than it is to create all the desired windows and use the
SetWindowLong function or SubclassWindow macro to change the address of each
of their window procedures.
Superclassing can be used to create a dialog box that contains several
superclassed controls. When a dialog box is created, CreateDialogIndirectParam
goes through the dialog-box template and creates a window using the parameters
specified on each CONTROL line in the template. If the template contains
several listbox windows that require altered behavior, it is much easier to
specify "NewListBox" in each CONTROL line of the template. With window
subclassing, the system would have to create all the listbox windows before
you could subclass these windows, one at a time, during the processing of the
WM_INITDIALOG message. This is a tedious process.
Another advantage of superclassing is that the superclass window procedure
performs its own initialization for the window. This is because Windows knows
about the superclass window procedure from the class memory block before a
window is created. When the window is created, the superclass window procedure
receives the WM_NCCREATE and WM_CREATE messages. During the processing of
these messages, the superclass window procedure may initialize its class or
window extra bytes or do any other processing it desires.
Both of these messages should be passed to the base class window procedure,
whether the superclass window procedure processes them or not. Windows must
perform initialization for each window in response to WM_NCCREATE. If this
message wasn't passed to the original window procedure, DefWindowProc would
never be called, and the window would not be initialized properly. By passing
the WM_NCCREATE message to the base class procedure, you ensure that
DefWindowProc is eventually called. Similarly, the WM_CREATE message should
also be passed to the base class window procedure.
Unfortunately, defining new window messages for superclassed windows involves
the same problems that you have for subclassed windows. To define new window
messages, you must either use the RegisterWindowMessage function or define
your messages starting with WM_APP. 


Conclusion


In our book Windows 95: A Developer's Guide, we present the Arcade application
which demonstrates how to create an animated button class (AniBtn) by
superclassing the standard button class. AniBtn works almost identically to
the standard button class except that it takes advantage of the new image list
controls to create buttons that have animated icon images instead of boring,
old text. Due to space constraints, we can't present a discussion here.
However, we have provided the Arcade application electronically (see
"Availability," on page 3) and a complete discussion can be found in our book.
With the techniques presented here, you should be able to create your own
controls that inherit from standard Windows controls.
Example 1: Subclassing a window.
static LPCSTR g_szSubclassWndProcOrig = "SubclassWndProcOrig";
int WINAPI WinMain (HINSTANCE hinstExe, HINSTANCE hinstPrev,
 LPSTR lpszCmdLine, int nCmdShow) {
 HWND hwndEdit = CreateWindowEx(0, "EDIT", "", WS_CHILD, 10, 20,
 100, 16, hWndParent, NULL, hInstance, 0L);
 //Change the window procedure address associated with this one edit
 //control
 pfnWndProcOrig = SubclassWindow(hwndEdit, Edit_SubclassWndProc);
 // Associate the original window procedure address with the window
 SetProp(hwndEdit, g_szSubclassWndProcOrig, (HANDLE) (DWORD)
 pfnWndProcOrig);
 // Messages destined for hwndEdit are sent to Edit_SubclassWndProc
 // instead
}
 .
 .
 .
LRESULT WINAPI Edit_SubclassWndProc (HWND hwnd, UINT uMsg, WPARAM wParam,
LPARAM lParam) {
 LRESULT lResult = 0;
 BOOL fCallOrigWndProc = TRUE;
 switch (uMsg) {
 // Some cases set fCallOrigWndProc to FALSE.
 .
 .
 .
 }
 if (fCallOrigWndProc)
 lResult = CallWindow(
 (WNDPROC) (DWORD) GetProp(hwnd, g_szSubclassWndProcOrig),

 hwnd, uMsg, wParam, lParam);
 return(lResult);
}
Example 2: SubclassWindow macro as #defined in WindowsX.h.
#define SubclassWindow(hwnd, lpfn) \ 
 ((WNDPROC)SetWindowLong((hwnd), GWL_WNDPROC, \
 (LPARAM)(WNDPROC)(lpfn)))
Example 3: Creating a superclass.
// Store the address of the original window procedure.
WNDPROC g_pfnWndProcOrig;
 .
 .
 .
int WINAPI WinMain (HINSTANCE hinstExe, HINSTANCE hinstPrev, 
 LPSTR lpszCmdLine, int nCmdShow) { 
 WNDCLASSEX wc; 
 adgINITSTRUCT(wc, TRUE); 
 // Get all the information about the original window class 
 GetClassInfoEx(NULL, "EDIT", &wc); 
 // Save the original window procedure address so that the 
 // Edit_SuperClassWndProc can use it. 
 g_pfnWndProcOrig = wc.lpfnWndProc; 
 // Our new class must have a new class name. 
 wc.lpszClassName = "NoDigits"; 
 // Our new class is registered by our module. 
 wc.hInstance = hinstExe; 
 // Our new class has a different window procedure address. 
 wc.lpfnWndProc = NoDigitsClass_SuperClassWndProc; 
 // Register our new window class. 
 RegisterClassEx(&wc); 
 // At this point we can create windows of the NoDigits window class.
 // All messages that go to these windows are sent to the 
 // NoDigitsClass_SuperClassWndProc first where we can decide if we 
 // want to pass them on to g_pfnWndProcOrig. 
 . 
 . 
 .
}
LRESULT WINAPI NoDigitsClass_SuperClassWndProc (HWND hwnd, UINT uMsg, 
 WPARAM wParam, LPARAM lParam) { 
 LRESULT lResult = 0; 
 BOOL fCallOrigWndProc = TRUE; 
 switch (uMsg) { 
 . 
 . 
 . 
 } 
 if (fCallOrigWndProc) 
 lResult = CallWindowProc(g_pfnWndProcOrig,hwnd,uMsg,wParam,lParam);
 return(lResult);
}
Example 4: Preparing and accessing additional window bytes added to the
superclass.
// Global variables to save the number of class extra bytes, the window
// extra bytes, and the window procedure address of listbox base class
int g_cbClsExtraOrig, g_cbWndExtraOrig;
WNDPROC g_pfnWndProcOrig;
// Index into window extra bytes where our edit data can be found.
// These data follow the data required by the base class.
#define GWL_NODIGITSDATA (g_cbWndExtraOrig + 0)

 .
 .
 .
ATOM WINAPI NoDigitsClass_RegisterClass (void) { 
 WNDCLASSEX wc; 
 GetClassInfoEx(NULL, "Edit", &wc); 
 // Save the information we need later in global variables. 
 g_cbClsExtraOrig = wc.cbClsExtra; 
 g_cbWndExtraOrig = wc.cbWndExtra; 
 g_pfnWndProcOrig = wc.lpfnWndProc; 
 // Add four window extra bytes to account for additional edit data.
 wc.cbWndExtra += sizeof(LONG); 
 // Change lpfnWndProc, lpszClassName, and hInstance members, too.
 . 
 . 
 . 
 // Register the new window class. 
 return(RegisterClassEx(&wc));
}
LRESULT WINAPI NoDigitsClass_SuperClassWndProc (HWND hwnd, UINT uMsg, 
 WPARAM wParam, LPARAM lParam) { 
 int nNoDigitsData; 
 . 
 . 
 . 
 // Retrieve our data from the added window extra bytes. 
 nNoDigitsData = GetWindowLong(hwnd, GWL_NODIGITSDATA); 
 . 
 . 
 . 
 // Call base class window procedure for remainder of processing.
return(CallWindowProc(g_pfnWndProcOrig, hwnd, uMsg, wParam, lParam));
}

Listing One
/******************************************************************************
Module name: NoDigits.c
Written by: Jeffrey Richter
Notices: Copyright (c) 1995 Jeffrey Richter
Purpose: Demonstrates how to subclass a window
******************************************************************************/
#include "..\Win95ADG.h" /* See Appendix A for details */
#include <Windows.h>
#include <WindowsX.h>
#pragma warning(disable: 4001) /* Single-line comment */
#include "resource.h"
///////////////////////////////////////////////////////////////////////////////
// Function that converts an edit window to a NoDigitsClass window
// by subclassing the edit window
BOOL WINAPI NoDigitsClass_ConvertEdit (HWND hwnd, BOOL fSubclass);
///////////////////////////////////////////////////////////////////////////////
BOOL NoDigits_OnInitDialog (HWND hwnd, HWND hwndFocus, LPARAM lParam) {
 adgSETDLGICONS(hwnd, IDI_SUBCLASS, IDI_SUBCLASS);
 // Turn the regular edit windows into NoDigits windows by subclassing them.
 NoDigitsClass_ConvertEdit(GetDlgItem(hwnd, IDC_NODIGITS1), TRUE);
 NoDigitsClass_ConvertEdit(GetDlgItem(hwnd, IDC_NODIGITS2), TRUE);
 return(TRUE); // Accepts default focus window.
}
///////////////////////////////////////////////////////////////////////////////
void NoDigits_OnCommand (HWND hwnd, int id, HWND hwndCtl, UINT codeNotify) {

 switch (id) {
 case IDCANCEL: // Allows dialog box to close.
 // Turn the NoDigits windows back into regular Edit windows.
 NoDigitsClass_ConvertEdit(GetDlgItem(hwnd, IDC_NODIGITS1), FALSE);
 NoDigitsClass_ConvertEdit(GetDlgItem(hwnd, IDC_NODIGITS2), FALSE);
 EndDialog(hwnd, id);
 break;
 } 
}
///////////////////////////////////////////////////////////////////////////////
BOOL WINAPI NoDigits_DlgProc (HWND hwnd, UINT uMsg, 
 WPARAM wParam, LPARAM lParam) {
 switch (uMsg) {
 // Standard Window's messages
 adgHANDLE_DLGMSG(hwnd, WM_INITDIALOG, NoDigits_OnInitDialog);
 adgHANDLE_DLGMSG(hwnd, WM_COMMAND, NoDigits_OnCommand);
 }
 return(FALSE); // We didn't process the message.
}
///////////////////////////////////////////////////////////////////////////////
int WINAPI WinMain (HINSTANCE hinstExe, HINSTANCE hinstPrev, 
 LPSTR lpszCmdLine, int nCmdShow) {
 adgWARNIFUNICODEUNDERWIN95();
 adgVERIFY(-1 != DialogBox(hinstExe, MAKEINTRESOURCE(IDD_NODIGITS),
 NULL, NoDigits_DlgProc));
 return(0);
}
//////////////////////////////// End of File
//////////////////////////////////

Listing Two
/******************************************************************************
Module name: NoDigCls.c
Written by: Jeffrey Richter
Notices: Copyright (c) 1995 Jeffrey Richter
Purpose: NoDigits subclass child control implementation file
******************************************************************************/
#include "..\Win95ADG.h" /* See Appendix A for details */
#include <Windows.h>
#include <WindowsX.h>
#pragma warning(disable: 4001) /* Single-line comment */
#include "resource.h"
///////////////////////////////////////////////////////////////////////////////
static LPCTSTR g_szNoDigitsClassWndProcOrig = 
 __TEXT("NoDigitsClassWndProcOrig");
///////////////////////////////////////////////////////////////////////////////
// The sole purpose of this function is to call the base class's window 
// procedure. When using the FORWARD_* message crackers, a function like
// this one is necessary because the last parameter to a FORWARD_* message
cracker
// is a function with the standard WNDPROC function prototype, whereas 
// CallWindowProc requires a fifth parameter - the address of the window
// procedure to call.
LRESULT NoDigitsClass_CallOrigWndProc (HWND hwnd, UINT uMsg, 
 WPARAM wParam, LPARAM lParam) {
 // Call the base class's window procedure. It was saved as a window
 // property by the NoDigitsClass_ConvertEdit function.
 return(CallWindowProc(
 (WNDPROC) (DWORD) GetProp(hwnd, g_szNoDigitsClassWndProcOrig),
 hwnd, uMsg, wParam, lParam));
}

///////////////////////////////////////////////////////////////////////////////
void NoDigitsClass_OnChar (HWND hwnd, TCHAR ch, int cRepeat) {
 if (adgINRANGE(__TEXT('0'), ch, __TEXT('9'))) {
 // Beep when a digit is received.
 MessageBeep(0);
 } else {
 
 // Allow nondigits to pass through to the original window procedure.
 FORWARD_WM_CHAR(hwnd, ch, cRepeat, NoDigitsClass_CallOrigWndProc);
 }
}
///////////////////////////////////////////////////////////////////////////////
// This function processes all messages sent to the NoDigits windows.
LRESULT WINAPI NoDigitsClass_WndProc (HWND hwnd, UINT uMsg,
 WPARAM wParam, LPARAM lParam) {
 switch (uMsg) {
 HANDLE_MSG(hwnd, WM_CHAR, NoDigitsClass_OnChar);
 }
 return(NoDigitsClass_CallOrigWndProc(hwnd, uMsg, wParam, lParam));
}
///////////////////////////////////////////////////////////////////////////////
// Function that converts an edit window to a NoDigitsClass window
// by subclassing the edit window.
BOOL WINAPI NoDigitsClass_ConvertEdit (HWND hwnd, BOOL fSubclass) {
 BOOL fOk = FALSE;
 if (fSubclass) {
 fOk = SetProp(hwnd, g_szNoDigitsClassWndProcOrig,
 (HANDLE) (DWORD) SubclassWindow(hwnd, NoDigitsClass_WndProc));
 } else {
 WNDPROC wp = (WNDPROC) (DWORD) 
 RemoveProp(hwnd, g_szNoDigitsClassWndProcOrig);
 SubclassWindow(hwnd, wp);
 fOk = (wp != NULL); 
 }
 return(fOk);
}
//////////////////////////////// End of File
//////////////////////////////////

Listing Three
//Microsoft Visual C++ generated resource script. 
//
#include "resource.h"
#define APSTUDIO_READONLY_SYMBOLS
/////////////////////////////////////////////////////////////////////////////
// Generated from the TEXTINCLUDE 2 resource.
//
#include "windows.h"
/////////////////////////////////////////////////////////////////////////////
#undef APSTUDIO_READONLY_SYMBOLS
/////////////////////////////////////////////////////////////////////////////
// Icon
//
IDI_SUBCLASS ICON DISCARDABLE "NoDigits.Ico"
/////////////////////////////////////////////////////////////////////////////
// Dialog
//
IDD_SUBCLASS DIALOG DISCARDABLE -32768, 5, 192, 79
STYLE WS_MINIMIZEBOX WS_VISIBLE WS_CAPTION WS_SYSMENU
CAPTION "No Digits"

FONT 8, "MS Sans Serif"
BEGIN
 LTEXT "&Edit:",IDC_STATIC,7,8,15,8
 EDITTEXT IDC_EDIT,43,8,140,13,ES_AUTOHSCROLL
 LTEXT "NoDigit &1:",IDC_STATIC,7,24,34,8
 EDITTEXT IDC_NODIGITS1,43,24,140,13,ES_AUTOHSCROLL
 LTEXT "NoDigit &2:",IDC_STATIC,7,40,34,8
 EDITTEXT IDC_NODIGITS2,43,40,140,13,ES_AUTOHSCROLL
 DEFPUSHBUTTON "OK",1,36,60,50,14
 PUSHBUTTON "Cancel",2,104,60,50,14
END
#ifdef APSTUDIO_INVOKED
/////////////////////////////////////////////////////////////////////////////
// TEXTINCLUDE
//
1 TEXTINCLUDE DISCARDABLE 
BEGIN
 "resource.h\0"
END
2 TEXTINCLUDE DISCARDABLE 
BEGIN
 "#include ""windows.h""\r\n"
 "\0"
END
3 TEXTINCLUDE DISCARDABLE 
BEGIN
 "\r\n"
 "\0"
END
/////////////////////////////////////////////////////////////////////////////
#endif // APSTUDIO_INVOKED
#ifndef APSTUDIO_INVOKED
/////////////////////////////////////////////////////////////////////////////
// Generated from the TEXTINCLUDE 3 resource.
//
/////////////////////////////////////////////////////////////////////////////
#endif // not APSTUDIO_INVOKED

Listing Four
//{{NO_DEPENDENCIES}}
// Microsoft Visual C++ generated include file.
// Used by NoDigits.RC
//
#define IDD_NODIGITS 104
#define IDD_SUBCLASS 104
#define IDI_NODIGITS 105
#define IDI_SUBCLASS 105
#define IDC_EDIT 1000
#define IDC_NODIGITS1 1004
#define IDC_NODIGITS2 1005
#define IDC_STATIC -1
// Next default values for new objects
// 
#ifdef APSTUDIO_INVOKED
#ifndef APSTUDIO_READONLY_SYMBOLS
#define _APS_NEXT_RESOURCE_VALUE 101
#define _APS_NEXT_COMMAND_VALUE 40001
#define _APS_NEXT_CONTROL_VALUE 1002
#define _APS_NEXT_SYMED_VALUE 101

#endif
#endif
End Listings




























































RAMBLINGS IN REAL TIME


3-D Clipping and Other Thoughts




Michael Abrash


Michael is the author of Zen of Graphics Programming and Zen of Code
Optimization. He is currently pushing the envelope of real-time 3-D on Quake
at id Software. He can be contacted at mikeab@idsoftware.com.


Our part of the world is changing, and I'm concerned. By way of explanation,
three anecdotes. Anecdote the first: In the introduction to one of his books,
Frank Herbert, author of Dune, told how he had once been approached by a
friend who claimed he (the friend) had a killer idea for a science fiction
story. In return for the idea, Herbert had to agree that if he used it in a
story, he'd split the money from the story with this fellow. Herbert's
response was that ideas were a dime a dozen; he had more story ideas than he
could ever write in a lifetime. The hard part was the writing.
Anecdote the second: I've been programming micros for 15 years, and writing
about them for more than a decade, and until about a year ago, I had
never--not once--had anyone offer to sell me a technical idea. In the last
year, it has happened multiple times, generally via unsolicited e-mail.
This trend toward selling ideas reflects a growing desire among programmers to
think something up without breaking a sweat, then let someone else's hard work
make you money. (Software patents are a manifestation of this.) It stems from
the attitude that "I'm so smart that my ideas alone set me apart." Sorry, it
doesn't work that way in the real world. Ideas are a dime a dozen in
programming, too; I have a lifetime's worth of article and software ideas
written neatly in a notebook, and I know several original thinkers who have
far more. Folks, it's not the ideas; it's design, implementation, and
especially hard work that make the difference.
Almost every idea I've encountered in three-dimensional (3-D) graphics was
invented decades ago. You think you have a clever graphics idea? Sutherland,
Sproull, Schumacker, Catmull, Smith, Blinn, Glassner, Kajiya, Heckbert, or
Teller probably thought of your idea years ago. (I'm serious--spend a few
weeks reading through the literature on 3-D graphics, and you'll be amazed at
what's already been invented and published.) If they thought it was important
enough, they wrote a paper about it, or tried to commercialize it. What they
didn't do was try to charge people for the idea itself.
A closely related point is the astonishing lack of gratitude some programmers
show for the hard work and sense of community that went into building the
knowledge base with which they work. How about this? Anyone who thinks they
have a unique idea that they want to "own" and milk for money can do so--but
first they have to track down and appropriately compensate all the people who
made possible the compilers, algorithms, programming courses, books, hardware,
and so forth that put them in a position to have their brainstorm.
Put that way, it sounds like a silly idea, but the idea behind software
patents is precisely that. Eventually everyone will own parts of our communal
knowledge base, and programming will become mainly a process of properly
identifying and compensating each and every owner of the techniques you use.
All I can say is that if we do go down that path, I guarantee that it will be
a poorer profession for all of us--except for the patent attorneys, I guess.
Anecdote the third: A while ago, I had the good fortune to have lunch down by
Seattle's waterfront with Neal Stephenson, the author of Snow Crash and The
Diamond Age (one of the best science fiction books I've come across in a long
time). As he talked about the nature of networked technology and what he hoped
would emerge, he mentioned that a couple of blocks down the street was the
pawn shop where Jimi Hendrix bought his first guitar. His point was that if a
cheap guitar hadn't been available, Hendrix's unique talent would never have
emerged. Similarly, he views the networking of society as a way to get
affordable creative tools to many people, so as much talent as possible can be
unearthed and developed.
Extend that to programming. A steady flow of information should circulate, so
that everyone can do the best work they're capable of. We don't gain by
intellectually impoverishing each other; as we compete and (intentionally or
otherwise) share ideas, all our products become better, so the market grows
larger and everyone benefits.
That's the way things have worked in programming for a long time. As far as I
can see, it has worked remarkably well, and the recent signs of change make me
concerned about the future of our profession.
Things aren't changing everywhere, though; over the past year, I've circulated
a good bit of info about 3-D graphics in this space, and plan to keep on doing
it as long as I can. This month, we'll take a look at 3-D clipping.


3-D Clipping Basics 


Before I got deeply into 3-D, I kept hearing how difficult 3-D clipping was,
so I was pleasantly surprised when I actually got around to doing it and found
it quite straightforward. At heart, 3-D clipping is nothing more than
evaluating whether and where a line intersects a plane. In this context, the
plane is considered to have an "inside" (a side on which points are to be
kept) and an "outside" (a side on which points are to be removed, or clipped).
We can easily extend this single operation to polygon clipping, working with
the line segments that form the edges of a polygon.
The most common application of 3-D clipping is in hidden-surface removal. In
this application, the four planes that make up the view volume (or view
frustum) are used to clip away parts of polygons that aren't visible.
Sometimes this process includes clipping to near and far planes, as well, to
restrict the depth of the scene. Other applications include clipping to
splitting planes while building BSP trees, and clipping moving objects to
convex sectors such as BSP leaves. The clipping principles I'll cover apply to
any sort of 3-D clipping task, but clipping to the frustum is the specific
context in which I'll discuss clipping here.
In a commercial application, you wouldn't want to clip every single polygon in
the scene database individually. As I mentioned last time, the use of bounding
volumes to cull chunks of the scene database that fall entirely outside the
frustum, without having to consider each polygon separately, is an important
performance aspect of scene rendering. Once that's done, however, you're still
left with a set of polygons that may be entirely inside, or partially or
completely outside, the frustum. Today, I'm going to talk about how to clip
those remaining polygons. I'll focus on the basics of 3-D clipping, the stuff
I wish I'd known when I started doing 3-D; there are plenty of ways to speed
up clipping under various circumstances, some of which I'll mention here, but
the material I'll cover will give you the tools you need to implement
functional 3-D clipping.


Intersecting a Line Segment with a Plane


The fundamental 3-D clipping operation is clipping a line segment to a plane.
There are two parts to this operation: determining if the line is clipped by
(intersects) the plane at all, and, if it is clipped, calculating the point of
intersection.
Before we can intersect a line segment with a plane, we must first define how
we'll represent the line segment and the plane. The segment will be
represented in the obvious way, by the (x,y,z) coordinates of its two
endpoints; this extends well to polygons, where each vertex is an (x,y,z)
point. Planes can be described in many ways, among them three points on the
plane, a point on the plane and a unit normal, or a unit normal and a distance
from the origin along the normal; we'll use the latter definition. Further,
we'll define the normal to point to the inside (unclipped) side of the plane.
The structures for points, lines, polygons, and planes are shown in Listing
One.
Given a line segment and a plane to which to clip the segment, the first
question is whether the segment is entirely on the inside or the outside of
the plane, or intersects the plane. If the segment is on the inside, then the
segment is not clipped by the plane, and we're done. If it's on the outside,
then it's entirely clipped, and we're likewise done. If it intersects the
plane, then we have to remove the clipped portion of the line by replacing the
endpoint on the outside of the plane with the point of intersection between
the line and the plane.
The way to determine the segment's position is to find out which side of the
plane each endpoint is on, and the dot product is the right tool for this job.
As you may recall from my column entitled "Frames of Reference" (Dr. Dobb's
Sourcebook, September/October 1995), dotting any vector with a unit normal
returns the length of the projection of that vector onto the normal.
Therefore, if we take any point and dot it with the plane normal, we'll find
out how far from the origin the point is, as measured along the plane normal.
Another way to think of this is to say that the dot product of a point and the
plane normal returns how far from the origin along the normal the plane would
have to be in order to have the point lie within the plane, as if we slid the
plane along the normal until it touched the point.
Remember that our definition of a plane is a unit normal and a distance along
the normal. That means that we have a distance for the plane as part of the
plane structure, and the distance at which the plane would have to be to touch
the point can be determined from the dot product of the point and the normal;
a simple comparison of the two values suffices to tell us which side of the
plane the point is on. If the dot product of the point and the plane normal is
greater than the plane distance, then the point is in front of the plane
(inside the volume being clipped to); if it's less, the point is outside the
volume and should be clipped.
After we do this twice, once for each line endpoint, we know everything
necessary to categorize our line segment. If both endpoints are on the same
side of the plane, there's nothing more to do, because the line is either
completely inside or completely outside. Otherwise, it's on to the next step;
clipping the line to the plane by replacing the outside vertex with the point
of intersection of the line and the plane. Happily, it turns out that we
already have all of the information we need to do this.
From our earlier tests, we already know the length from the plane, measured
along the normal, to the inside endpoint; that's just the distance, along the
normal, of the inside endpoint from the origin (the dot product of the
endpoint with the normal), minus the plane distance, as in Figure 1. We also
know the length of the line segment, again measured as projected onto the
normal; that's the difference between the distances along the normal of the
inside and outside endpoints from the origin. The ratio of these two lengths
is the fraction of the segment that remains after clipping. If we scale the x,
y, and z lengths of the line segment by that fraction, and add the results to
the inside endpoint, we get a new, clipped endpoint at the point of
intersection.


Polygon Clipping


Line clipping is fine for wireframe rendering, but what we want is polygon
rendering of solid models, which requires polygon clipping. As with line
segments, the clipping process for polygons is to determine if they're inside,
outside, or partially inside the clip volume, lop off any vertices that are
outside the clip volume, and substitute vertices at the intersection between
the polygon and the clip plane, as in Figure 2.
An easy way to clip a polygon is to decompose it into a set of edges, and clip
each edge separately as a line segment. Let's define a polygon as a set of
vertices that wind clockwise around the outside of the polygonal area, as
viewed from the front side of the polygon; the edges are implicitly defined by
the order of the vertices. Thus, an edge is the line segment described by the
two adjacent vertices that form its endpoints. We'll clip a polygon by
clipping each edge individually, emitting vertices for the resulting polygon
as appropriate, depending on the clipping state of the edge. If the start
point of the edge is inside, that point is added to the output polygon. Then,
if the start and end points are in different states (one inside and one
outside), we clip the edge to the plane, as described earlier, and add the
point at which the line intersects the clip plane as the next polygon vertex;
see Figure 3. Listing Two presents a polygon-clipping function.
Believe it or not, this technique, applied in turn to each edge, is all that's
needed to clip a polygon to a plane. Better yet, a polygon can be clipped to
multiple planes by repeating the aforementioned process once for each clip
plane, with each iteration trimming away any part of the polygon that's
clipped by that particular plane.
One particularly useful aspect of 3-D clipping is that if you're drawing
texture-mapped polygons, texture coordinates can be clipped in exactly the
same way as (x,y,z) coordinates. In fact, the same fraction that's used to
advance x, y, and z from the inside point to the point of intersection with
the clip plane can be used to advance the texture coordinates as well, so only
one extra multiply and one extra add are required for each texture coordinate.


Clipping to the Frustum



Given a polygon-clipping function, it's easy to clip to the frustum: Set up
the four planes for the sides of the frustum, with another one or two planes
for near and far clipping, if desired; clip each potentially visible polygon
to each plane in turn; and draw whatever polygons emerge from the clipping
process. Listing Three is the core code for a simple 3-D clipping example that
allows you to move around and look at polygonal models from any angle; full
code for this program is available from ftp.idsoftware.com/mikeab/ddjclip.zip.
There are several interesting points to Listing Three:
Floating-point arithmetic is used throughout the clipping process. It's
possible to use fixed point, but it requires considerable care regarding range
and precision. Floating point is much easier--and, with the Pentium generation
of processors, is generally comparable in speed. In fact, for some operations
such as multiplication in general, and division when the floating-point unit
is in single-precision mode, floating point is much faster; check out Chris
Hecker's column in the February 1996 Game Developer for an interesting
discussion along these lines.
The planes that form the frustum are shifted ever so slightly inward from
their proper positions at the edge of the field of view. This is done for two
reasons. First, tightening the frustum a little prevents floating-point
roundoff and accumulated error from ever producing a screen coordinate that is
off the screen; in effect, a small safety margin is created at the edge of the
screen. Otherwise, we might have to clamp the screen coordinates after
projection. Second, it guarantees, again at no performance cost, that a
visible vertex exactly at the eyepoint is never generated. This averts the
divide-by-zero error that such a vertex would cause when projected.
The orientation of the viewer relative to the world is specified via roll,
pitch, and yaw angles, applied successively, in that order. These angles are
accumulated from frame to frame according to user input, and are used (for
each frame) to rotate the view up, view right, and viewplane normal vectors,
which define the world coordinate system, into the viewspace coordinate
system. Those transformed vectors in turn define the rotation from worldspace
to viewspace. (Again, see my September/October 1995 column for a discussion of
coordinate systems and rotation, and take a look at chapters 5 and 6 of
Computer Graphics: Principles and Practice, by James Foley and Andries van
Dam, Addison-Wesley, 1992.) One attractive aspect of accumulating angular
rotations that are then applied to the coordinate system vectors is that there
is no deterioration of the rotation matrix over time. This is in contrast to
my X-Sharp package, in which I accumulated rotations by keeping a cumulative
matrix of all the rotations ever performed; unfortunately, that approach
caused round-off error to accumulate, so objects began to warp visibly after
many rotations.
Listing Three processes each input polygon into a clipped polygon, one line
segment at a time. It would be more efficient to process all the vertices,
categorizing whether and how they're clipped, and then perform a test such as
the Cohen-Sutherland outcode test to detect trivial acceptance (the polygon is
entirely inside) and sometimes trivial rejection (the polygon is fully
outside) without ever dealing with the edges, and to identify which planes
actually need to be clipped against, as discussed in "Line-Segment Clipping
Revisited," by Victor Duvanenko et al. (Dr. Dobb's Journal, January 1996).
Some clipping approaches also minimize the number of intersection calculations
when a segment is clipped by multiple planes. Further, Listing Three clips a
polygon against each plane in turn, generating a new output polygon for each
plane; it is possible and can be more efficient to generate the final, clipped
polygon without any intermediate representations. For further reading on
advanced clipping techniques, see the discussion starting on page 271 of Foley
and van Dam.
Clipping in Listing Three is performed in worldspace, rather than in
viewspace. The frustum is backtransformed from viewspace (where it is defined,
since it exists relative to the viewer) to worldspace for this purpose.
Worldspace clipping allows us to transform only those vertices that are
visible, rather than transforming all vertices into viewspace, and then
clipping them. However, the decision whether to clip in worldspace or
viewspace is not clear cut, and is affected by several factors.


Advantages of Viewspace Clipping


Although viewspace clipping requires transforming vertices that may not be
drawn, it has potential performance advantages. For example, in worldspace,
near and far clip planes are just additional planes, which have to be tested
and clipped to using dot products. In viewspace, near and far clip planes are
typically planes with constant z-coordinates, so testing whether a vertex is
near or far clipped can be performed with a single z compare, and the
fractional distance along a line segment to a near or far clip intersection
can be calculated with a couple of z subtractions and a divide; no dot
products are needed.
Similarly, if the field of view is exactly 90 degrees, so the frustum planes
go out at 45 degree angles relative to the viewplane, then x==z and y==z along
the clip planes. This means that the clipping status of a vertex can be
determined with a simple comparison, far more quickly than the standard
dot-product test. This lends itself particularly well to outcode-based
clipping algorithms, since each compare can set one outcode bit.
For a game, 90 degrees is a pretty good field of view, but can we get the same
sort of efficient clipping if we need some other field of view? Sure. All we
have to do is scale the x and y results of the world-to-view transformation to
account for the field of view, so that the coordinates lie in a viewspace
that's normalized such that the frustum planes extend along lines of x==z and
y==z. The resulting visible projected points span the range -1 to 1 (before
scaling up to get pixel coordinates), just as with a 90-degree FOV, so the
rest of the drawing pipeline remains unchanged. Better yet, there is no cost
in performance, because the adjustment can be added to the transformation
matrix.
I didn't implement normalized clipping in Listing Three because I wanted to
illustrate the general 3-D clipping mechanism without additional
complications. Also, for many applications, the dot product (which, after all,
takes only 10-20 cycles on a Pentium) is sufficient. However, the more frustum
clipping you're doing, especially if most of the polygons are trivially
visible, the more attractive the performance advantages of normalized clipping
become.


Further reading


You now have the basics of 3-D clipping, but because fast clipping is central
to high-performance 3-D, there's a lot more to be learned. One good place for
further reading is Foley and van Dam; another is Procedural Elements of
Computer Graphics, by David F. Rogers (McGraw-Hill, 1985). Read and understand
either of these books, and you'll know everything you need for world-class
clipping.
And, as you read, you might take a moment to consider how wonderful it is that
anyone who's interested can tap into so much expert knowledge for the price of
a book--or, on the Internet, for free--with no strings attached. Our part of
the world is a pretty good place right now, isn't it?
Figure 1: pointclip = pointinside + (pointoutside - pointinside) x
((distanceplane - distanceinside) / (distanceoutside - distanceinside)).
Figure 2: Clipping a polygon.
Figure 3: Clipping a polygon edge.

Listing One
typedef struct {
 double v[3];
} point_t;
typedef struct {
 double x, y;
} point2D_t;
typedef struct {
 int color;
 int numverts;
 point_t verts[MAX_POLY_VERTS];
} polygon_t;
typedef struct {
 int color;
 int numverts;
 point2D_t verts[MAX_POLY_VERTS];
} polygon2D_t;
typedef struct convexobject_s {
 struct convexobject_s *pnext;
 point_t center;
 double vdist;
 int numpolys;
 polygon_t *ppoly;
} convexobject_t;
typedef struct {
 double distance;
 point_t normal;
} plane_t;

Listing Two
int ClipToPlane(polygon_t *pin, plane_t *pplane, polygon_t *pout)

{
 int i, j, nextvert, curin, nextin;
 double curdot, nextdot, scale;
 point_t *pinvert, *poutvert;
 pinvert = pin->verts;
 poutvert = pout->verts;
 curdot = DotProduct(pinvert, &pplane->normal);
 curin = (curdot >= pplane->distance);
 for (i=0 ; i<pin->numverts ; i++)
 {
 nextvert = (i + 1) % pin->numverts;
 // Keep the current vertex if it's inside the plane
 if (curin)
 *poutvert++ = *pinvert;
 nextdot = DotProduct(&pin->verts[nextvert], &pplane->normal);
 nextin = (nextdot >= pplane->distance);
 // Add a clipped vertex if one end of the current edge is
 // inside the plane and the other is outside
 if (curin != nextin)
 {
 scale = (pplane->distance - curdot) /
 (nextdot - curdot);
 for (j=0 ; j<3 ; j++)
 {
 poutvert->v[j] = pinvert->v[j] +
 ((pin->verts[nextvert].v[j] - pinvert->v[j]) *
 scale);
 }
 poutvert++;
 }
 curdot = nextdot;
 curin = nextin;
 pinvert++;
 }
 pout->numverts = poutvert - pout->verts;
 if (pout->numverts < 3)
 return 0;
 pout->color = pin->color;
 return 1;
}

Listing Three
int DIBWidth, DIBHeight;
int DIBPitch;
double roll, pitch, yaw;
double currentspeed;
point_t currentpos;
double fieldofview, xcenter, ycenter;
double xscreenscale, yscreenscale, maxscale;
int numobjects;
double speedscale = 1.0;
plane_t frustumplanes[NUM_FRUSTUM_PLANES];
double mroll[3][3] = {{1, 0, 0}, {0, 1, 0}, {0, 0, 1}};
double mpitch[3][3] = {{1, 0, 0}, {0, 1, 0}, {0, 0, 1}};
double myaw[3][3] = {{1, 0, 0}, {0, 1, 0}, {0, 0, 1}};
point_t vpn, vright, vup;
point_t xaxis = {1, 0, 0};
point_t zaxis = {0, 0, 1};
convexobject_t objecthead = {NULL, {0,0,0}, -999999.0};

// Project viewspace polygon vertices into screen coordinates. Note that the
// y axis goes up in worldspace and viewspace, but goes down in screenspace.
void ProjectPolygon (polygon_t *ppoly, polygon2D_t *ppoly2D)
{
 int i;
 double zrecip;
 for (i=0 ; i<ppoly->numverts ; i++)
 {
 zrecip = 1.0 / ppoly->verts[i].v[2];
 ppoly2D->verts[i].x =
 ppoly->verts[i].v[0] * zrecip * maxscale + xcenter;
 ppoly2D->verts[i].y = DIBHeight -
 (ppoly->verts[i].v[1] * zrecip * maxscale + ycenter);
 }
 ppoly2D->color = ppoly->color;
 ppoly2D->numverts = ppoly->numverts;
}
// Sort the objects according to z distance from viewpoint.
void ZSortObjects(void)
{
 int i, j;
 double vdist;
 convexobject_t *pobject;
 point_t dist;
 objecthead.pnext = &objecthead;
 for (i=0 ; i<numobjects ; i++)
 {
 for (j=0 ; j<3 ; j++)
 dist.v[j] = objects[i].center.v[j] - currentpos.v[j];
 objects[i].vdist = sqrt(dist.v[0] * dist.v[0] +
 dist.v[1] * dist.v[1] +
 dist.v[2] * dist.v[2]);
 pobject = &objecthead;
 vdist = objects[i].vdist;
 // Viewspace-distance-sort this object into the others.
 // Guaranteed to terminate because of sentinel
 while (vdist < pobject->pnext->vdist)
 pobject = pobject->pnext;
 objects[i].pnext = pobject->pnext;
 pobject->pnext = &objects[i];
 }
}
// Move the view position and set the world->view transform.
void UpdateViewPos()
{
 int i;
 point_t motionvec;
 double s, c, mtemp1[3][3], mtemp2[3][3];
 // Move in the view direction, across the x-y plane, as if walking. This
 // approach moves slower when looking up or down at more of an angle
 motionvec.v[0] = DotProduct(&vpn, &xaxis);
 motionvec.v[1] = 0.0;
 motionvec.v[2] = DotProduct(&vpn, &zaxis);
 for (i=0 ; i<3 ; i++)
 {
 currentpos.v[i] += motionvec.v[i] * currentspeed;
 if (currentpos.v[i] > MAX_COORD)
 currentpos.v[i] = MAX_COORD;
 if (currentpos.v[i] < -MAX_COORD)

 currentpos.v[i] = -MAX_COORD;
 }
 // Set up the world-to-view rotation.
 // Note: much of the work done in concatenating these matrices
 // can be factored out, since it contributes nothing to the
 // final result; multiply the three matrices together on paper
 // to generate a minimum equation for each of the 9 final elements
 s = sin(roll);
 c = cos(roll);
 mroll[0][0] = c;
 mroll[0][1] = s;
 mroll[1][0] = -s;
 mroll[1][1] = c;
 s = sin(pitch);
 c = cos(pitch);
 mpitch[1][1] = c;
 mpitch[1][2] = s;
 mpitch[2][1] = -s;
 mpitch[2][2] = c;
 s = sin(yaw);
 c = cos(yaw);
 myaw[0][0] = c;
 myaw[0][2] = -s;
 myaw[2][0] = s;
 myaw[2][2] = c;
 MConcat(mroll, myaw, mtemp1);
 MConcat(mpitch, mtemp1, mtemp2);
 // Break out the rotation matrix into vright, vup, and vpn.
 // We could work directly with the matrix; breaking it out
 // into three vectors is just to make things clearer
 for (i=0 ; i<3 ; i++)
 {
 vright.v[i] = mtemp2[0][i];
 vup.v[i] = mtemp2[1][i];
 vpn.v[i] = mtemp2[2][i];
 }
 // Simulate crude friction
 if (currentspeed > (MOVEMENT_SPEED * speedscale / 2.0))
 currentspeed -= MOVEMENT_SPEED * speedscale / 2.0;
 else if (currentspeed < -(MOVEMENT_SPEED * speedscale / 2.0))
 currentspeed += MOVEMENT_SPEED * speedscale / 2.0;
 else
 currentspeed = 0.0;
}
// Rotate a vector from viewspace to worldspace.
void BackRotateVector(point_t *pin, point_t *pout)
{
 int i;
 // Rotate into the world orientation
 for (i=0 ; i<3 ; i++)
 pout->v[i] = pin->v[0] * vright.v[i] +
 pin->v[1] * vup.v[i] +
 pin->v[2] * vpn.v[i];
}
// Transform a point from worldspace to viewspace.
void TransformPoint(point_t *pin, point_t *pout)
{
 int i;
 point_t tvert;

 // Translate into a viewpoint-relative coordinate
 for (i=0 ; i<3 ; i++)
 tvert.v[i] = pin->v[i] - currentpos.v[i];
 // Rotate into the view orientation
 pout->v[0] = DotProduct(&tvert, &vright);
 pout->v[1] = DotProduct(&tvert, &vup);
 pout->v[2] = DotProduct(&tvert, &vpn);
}
// Transform a polygon from worldspace to viewspace.
void TransformPolygon(polygon_t *pinpoly, polygon_t *poutpoly)
{
 int i;
 for (i=0 ; i<pinpoly->numverts ; i++)
 TransformPoint(&pinpoly->verts[i], &poutpoly->verts[i]);
 poutpoly->color = pinpoly->color;
 poutpoly->numverts = pinpoly->numverts;
}
// Returns true if polygon faces the viewpoint, assuming a clockwise
// winding of vertices as seen from the front.
int PolyFacesViewer(polygon_t *ppoly)
{
 int i;
 point_t viewvec, edge1, edge2, normal;
 for (i=0 ; i<3 ; i++)
 {
 viewvec.v[i] = ppoly->verts[0].v[i] - currentpos.v[i];
 edge1.v[i] = ppoly->verts[0].v[i] - ppoly->verts[1].v[i];
 edge2.v[i] = ppoly->verts[2].v[i] - ppoly->verts[1].v[i];
 }
 CrossProduct(&edge1, &edge2, &normal);
 if (DotProduct(&viewvec, &normal) > 0)
 return 1;
 else
 return 0;
}
// Set up a clip plane with the specified normal.
void SetWorldspaceClipPlane(point_t *normal, plane_t *plane)
{
 // Rotate the plane normal into worldspace
 BackRotateVector(normal, &plane->normal);
 plane->distance = DotProduct(&currentpos, &plane->normal) + 
 CLIP_PLANE_EPSILON;
}
// Set up the planes of the frustum, in worldspace coordinates.
void SetUpFrustum(void)
{
 double angle, s, c;
 point_t normal;
 angle = atan(2.0 / fieldofview * maxscale / xscreenscale);
 s = sin(angle);
 c = cos(angle);
 // Left clip plane
 normal.v[0] = s;
 normal.v[1] = 0;
 normal.v[2] = c;
 SetWorldspaceClipPlane(&normal, &frustumplanes[0]);
 // Right clip plane
 normal.v[0] = -s;
 SetWorldspaceClipPlane(&normal, &frustumplanes[1]);

 angle = atan(2.0 / fieldofview * maxscale / yscreenscale);
 s = sin(angle);
 c = cos(angle);
 // Bottom clip plane
 normal.v[0] = 0;
 normal.v[1] = s;
 normal.v[2] = c;
 SetWorldspaceClipPlane(&normal, &frustumplanes[2]);
 // Top clip plane
 normal.v[1] = -s;
 SetWorldspaceClipPlane(&normal, &frustumplanes[3]);
}
// Clip a polygon to the frustum.
int ClipToFrustum(polygon_t *pin, polygon_t *pout)
{
 int i, curpoly;
 polygon_t tpoly[2], *ppoly;
 curpoly = 0;
 ppoly = pin;
 for (i=0 ; i<(NUM_FRUSTUM_PLANES-1); i++)
 {
 if (!ClipToPlane(ppoly, &frustumplanes[i], &tpoly[curpoly]))
 return 0;
 ppoly = &tpoly[curpoly];
 curpoly ^= 1;
 }
 return ClipToPlane(ppoly, &frustumplanes[NUM_FRUSTUM_PLANES-1], pout);
}
// Render the current state of the world to the screen.
void UpdateWorld()
{
 HPALETTE holdpal;
 HDC hdcScreen, hdcDIBSection;
 HBITMAP holdbitmap;
 polygon2D_t screenpoly;
 polygon_t *ppoly, tpoly0, tpoly1, tpoly2;
 convexobject_t *pobject;
 int i, j, k;
 UpdateViewPos();
 memset(pDIBBase, 0, DIBWidth*DIBHeight); // clear frame
 SetUpFrustum();
 ZSortObjects();
 // Draw all visible faces in all objects
 pobject = objecthead.pnext;
 while (pobject != &objecthead)
 {
 ppoly = pobject->ppoly;
 for (i=0 ; i<pobject->numpolys ; i++)
 {
 // Move the polygon relative to the object center
 tpoly0.color = ppoly->color;
 tpoly0.numverts = ppoly->numverts;
 for (j=0 ; j<tpoly0.numverts ; j++)
 {
 for (k=0 ; k<3 ; k++)
 tpoly0.verts[j].v[k] = ppoly->verts[j].v[k] +
 pobject->center.v[k];
 }
 if (PolyFacesViewer(&tpoly0))

 {
 if (ClipToFrustum(&tpoly0, &tpoly1))
 {
 TransformPolygon (&tpoly1, &tpoly2);
 ProjectPolygon (&tpoly2, &screenpoly);
 FillPolygon2D (&screenpoly);
 }
 }
 ppoly++;
 }
 pobject = pobject->pnext;
 }
 // We've drawn the frame; copy it to the screen
 hdcScreen = GetDC(hwndOutput);
 holdpal = SelectPalette(hdcScreen, hpalDIB, FALSE);
 RealizePalette(hdcScreen);
 hdcDIBSection = CreateCompatibleDC(hdcScreen);
 holdbitmap = SelectObject(hdcDIBSection, hDIBSection);
 BitBlt(hdcScreen, 0, 0, DIBWidth, DIBHeight, hdcDIBSection, 0, 0, SRCCOPY);
 SelectPalette(hdcScreen, holdpal, FALSE);
 ReleaseDC(hwndOutput, hdcScreen);
 SelectObject(hdcDIBSection, holdbitmap);
 ReleaseDC(hwndOutput, hdcDIBSection);
}
End Listings
<<





































20/20


Lessons to Learn




Al Williams


Al is a consultant specializing in software development, training, and
documentation. Look for his latest book, Steal This Code! (Addison-Wesley,
1995). You can find Al at
http://ourworld.compuserve.com/homepages/Al_Williams.


What single accomplishment made you most proud? For me, it was teaching my
youngest son to read. It was gratifying as a parent, but it also was
interesting as a programmer. Watching a young mind acquire language, you
realize just how far current-technology computers still have to go to emulate
the human brain.
Another thing that always brings home the difference between computers and
brains is the name game. Not the banana-fana name game. This is a game you
play on long car trips. One player thinks of a famous person (say, "Ringo
Starr"). The next player then thinks of another famous person whose first name
starts with the same letter as the first person's last name (in this case,
maybe "Sonny Bono"). Play might continue with "Bob Barker," and so forth. This
can go on for hours, even with only two people. What's amazing is how many
people you can name. When you think of these people, you not only remember
their names, but you recall their pictures, a biography, why you know them,
what they sound like, and more. Imagine the amount of storage it would require
to put this much data in your computer. Then think of how many ways you can
quickly index into this data. Amazing.
Many people tout neural networks as a model for how our brains work. Perhaps,
but I think that model presents, at best, only part of the picture. Of course,
neural nets can be useful for certain kinds of pattern matching and other
tasks where we want the computer to "learn" in a human-like fashion. The
problem with neural networks is they are too much like humans--you have to
train them, and sometimes when they learn something new, they forget what they
used to know.
In this column, I'll show you a simple perceptron (neural network) object
using Visual Basic 4.0's new class facility. I'll use the class to build a
tic-tac-toe game that you teach--not program--to play. The main program and
the training program both share the same class, and the class is reusable in
any number of applications.


About Visual Basic Classes


Visual Basic 4.0 provides classes--reusable chunks of code that support data
abstraction. To create a new class, select Class module from the Insert menu.
This creates a .CLS file and adds it to your project. The code browser for the
class will show two pseudoobjects: Class and (General). The Class section
contains the Initialize and Terminate subroutines. VB calls these when you
create or destroy an instance of your class. All the other variables,
functions, and subroutines that constitute your class reside in the (General)
section.
When you declare a class object in your program, you can use the variable to
refer to an existing object or you can create a new object. Consider the
declarations in Example 1(a) where anObject uses the new keyword to create an
object. The current_object variable, on the other hand, is only a reference to
an object (it doesn't use the new keyword). Until you assign a value to
current_object with Set, it isn't a valid object. After the Set statement in
Example 1(a), anObject and current_object both refer to the same object.
You call functions and subroutines in your object using the period notation.
For example, if the DDJObject has a subroutine named Publish, you could call
it with anObject.Publish (or current_Object.Publish). Although you may see
variables in an object, you'll usually use properties instead. Properties are
like super variables. To the program using the class, they appear as
variables. You can assign values to them or use them as values in expressions.
From the class's point of view, properties are code. When someone assigns a
value to the property, a special subroutine executes; when someone reads a
value from the property, VB executes a special function to return the result.
You define these routines with the property keyword. Example 1(b), for
example, defines a silly property that is always 100. If you stop there, you
have a read-only property. If you want to allow programs to set the property,
you need to define a Let property, as in Example 1(c).
You also can declare a Set property to assign properties that refer to other
objects. This is exactly like a Let property, but it assigns an object instead
of a simple variable type.
You can control access to data and procedures in your object. If you use the
Private keyword before the item, it will only be visible inside the class. If
you use the Public keyword (or no keyword at all), you can access the item
from anywhere in your VB program. Good object-oriented design dictates that
you should hide implementation details as private data, functions, and
subroutines.


Perceptron Basics


A perceptron is a simple form of neural network. The perceptron takes a fixed
number of inputs. These inputs are real numbers and represent the input's
level or degree of confidence. The perceptron uses a matrix to transform the
inputs to a fixed number of outputs. The output with the highest value is the
output the perceptron assigns for that input pattern. The transform (or rule)
matrix has as many rows as there are inputs and a column for each possible
output. To calculate an output value, multiply each input value by the
corresponding row in the output column and sum the values (see Figure 1).
Repeat this procedure for each output column.
If the matrix contains zeros or random values, you won't get the results you
want from the perceptron. However, you can train the matrix to produce the
results you want. The procedure is simple. Given a set of inputs, compare the
perceptron's output with the desired output. If they are the same, you don't
need to take any action. If they are different, you subtract the inputs from
the column in the matrix that corresponds to the wrong result and add the
inputs to the correct result's column. Eventually, the matrix will converge on
the proper values.
Perceptrons have several limitations, but they are appropriate for many simple
pattern-matching tasks. In particular, the perceptron can't solve problems
unless they are linearly separable. That is, if each input value is a
dimension in an n-dimensional space, you must be able to draw hyperplanes that
separate all the output values. For example, a 2-input binary OR function is
linearly separable; a 2-input binary XOR function is not; see Figure 2.


Writing the Class


The Discriminate class (see Listing One) isn't very complex. It contains two
arrays of unspecified dimensions: in_ary() holds the input pattern and Rules()
holds the perceptron vector. The Setup subroutine takes two parameters: the
number of possible results and the number of inputs. Setup redimensions the
two arrays and sets the maxx and maxy parameters.
The class exposes a Value property that you use to fill the in_ary. This is an
example of an array property--you can pass any number of parameters of any
type to the Property Get routine. Then the Property Let routine takes the same
parameters plus one extra parameter that specifies the value. Since the
arguments can be of any type, you can easily create pseudoarrays that take
noninteger indices. The class also exposes a read-only property, Calc, that
computes the result based on the current input pattern. This routine
implements the simple logic required to multiply the input array by each
column of the rule matrix and score the results. The return value is an
integer (starting with 0) that indicates which column scored highest.
Rounding out the class are the Train, Load, and Save subroutines. Train
compares its single argument to the current Calc property. If the values are
not equal, the routine performs the training algorithm previously described.
The Load and Save routines read and write a disk file that contains the rule
matrix.
By itself, this class isn't particularly useful, but it does have a general
purpose. Because it avoids any specific knowledge of a particular problem, the
discriminator class can be used in a variety of programs. VB's dynamic memory
allocation (the ReDim statement) allows the class to handle any size problem
without waste.


Representing the Game


There are several ways you might represent a tic-tac-toe game using a
perceptron. Usually, when you select a representation, the input value
represents the "strength" of the input. For example, a signal processor might
use values from 0 to 255 to represent signal strength. A medical diagnostic
program might use values from -1000 to 1000 to represent the certainty of a
symptom. In this case, -1000 means the symptom is not present, 1000 means the
symptom is present and values in between signify various levels of
uncertainty. A 0 value means you don't know if the patient has the symptom.
This is important: Multiplying by 0 results in another 0, so you shouldn't use
0 as a meaningful value in the input vector.
For tic-tac-toe, I elected to use 18 input values. The first nine are set to 1
if the corresponding square contains an X. The last nine are set to 1 if the
corresponding square contains an O. The program knows what is in each square,
so each value is always 1 or 0. There are 9 possible outputs--one for each
square on the board. Given a particular input pattern, the discriminator
should select the best move as an output.


The Training Programs



The problem with perceptrons and other neural networks is that you have to
train them. This can be a tedious process. Training for a particular case is
easy, but when you train the next case, it may upset the original training.
Then retraining that case may upset the second one. Eventually, the rule
matrix will converge on stable values, but this can take some time, especially
with multiple training cases.
To make training more manageable, I decided to always allow the computer to go
first. It always moves to the same square. You can change this, but if you do,
the program will need more training. I also assume the opponent will always
make a good move. With this strategy, if the opponent makes crazy random
moves, he can beat the computer. You could train for all of these cases, but
again, it's more time consuming.
With the complete system (available electronically; see "Availability," page
3), you'll find a program that allows you to interactively train the rule
matrix. While this is fun, you'll probably lose patience before the matrix
converges. Instead, you'll want to use the batch training program (see Listing
Two). This program uses a simple ASCII file that specifies the possible board
positions and the correct response for each. The program checks each case,
calling the discriminator's Train subroutine as needed. The program tries the
cases over and over until it can run through the entire file without any
training. Then it writes the rule matrix out to RULEBASE.DAT.
Notice here that the logic for both of these programs is in the discriminator
class. Each program shares this single class. This simplifies each program and
allows them to use exactly the same logic.


The Main Program


The main program (TICTAC.BAS, Listing Three) also uses the same discriminator
class. The user interface is simplistic (see Figure 3). It uses a grid control
as the tic-tac-toe board. The program first reads the rule matrix from
RULEBASE.DAT file (if present). It then begins playing. When you click on an
empty square, the program places an O in the space. It then calls the Win
function to see if you won the game. If the game is still going, it uses the
discriminator to compute the next move. In case the training is bad, the
program makes sure that the proposed square is empty. If it isn't empty, the
program just searches for the next available square. When TICTAC finds an
empty square, it places an X in it and checks for a win again.
The only other logic in the TICTAC program is a routine to reset the board.
The game-playing logic is all in the rule matrix. You can certainly improve
the user interface, but that isn't the point. The point is that the
discriminator learns to play tic-tac-toe without any traditional programming.


Training Tips


The RULEBASE.DAT file included with the electronic listings plays a good game
if you make sensible moves. However, you could train the game to play better
or even just differently, if you like. If you use the manual training program,
be sure to have a consistent plan. For any given board configuration, you
should train the same response. Otherwise, the matrix may never converge.
Also, be sure to test as many cases as possible. Remember, just because a case
worked once doesn't mean it will work after further training.
The manual training program is entertaining. You play both X and O in the
trainer. The computer guesses your X moves and displays its forecast. If you
move to a different square (and there is a check in the training button), the
program trains using the move you make. There is no win detection--you'll
press the Reset button when you want to restart the game. When you press the
Save button, the trainer writes the updated rule matrix to a file. It is
interesting to watch the program project your moves, but you may tire of
waiting for the matrix to converge.
If you want to create your own batch input-data file, just place each case on
a single line. The first 18 spaces of the line correspond to the 18 elements
of the pattern input. A space in a slot sets the corresponding pattern input
to 0. Any other character sets the input to 1. The 19th character is a single
digit from 0 to 8 that specifies the desired output for the pattern. Figure 4
shows how the positions relate to the tic-tac-toe board.


Other Uses


You could certainly write a tic-tac-toe program in a more-traditional way.
However, it is exciting to have an easy way to add adaptive learning to any
program. Perceptrons can tackle any linearly separable pattern-matching job.
More sophisticated neural networks are little more than networks of
perceptrons with different training algorithms. You can find more about neural
networks in many past issues of Dr. Dobb's Journal; see the accompanying text
box entitled "A Neural Network Resource Guide".
Simple perceptron discriminators can be very successful at optical character
recognition, noisy pattern matching, and signal processing. The discriminator
class can give you a good head start at adding perceptrons to your VB program.


Summary


Tic-tac-toe games are fun, but the perceptron can do serious work, too. By
taking advantage of the new class capability in VB 4.0, you can encapsulate
complex algorithms and save them for reuse.
A Neural Network Resource Guide
A Practical Guide to Neural Nets, by Marilyn McCord Nelson and W.T.
Illingworth, Addison-Wesley, 1991.
Neural and Intelligent Systems Integration, by Branko Soucek and the IRIS
Group, John Wiley and Sons, 1992.
"Bidirectional Associative Memory Systems in C++," by Adam Blum, Dr. Dobb's
Journal, April 1990.
"A Neural Network Instantiation Environment," by Andrew J. Czuchy, Jr., Dr.
Dobb's Journal, April 1990.
"A Neural-Network Audio Synthesizer," by Mark Thorson, Forrest Warthman, and
Mark Holler, Dr. Dobb's Journal, February 1993.
"Neural Nets Tell Why," by Casimir C. "Casey" Klimasauskas, Dr. Dobb's
Journal, April 1991.
Neural Networks and Natural Intelligence, by Grossberg, MIT, 1988.
Adaptive Pattern Recognition and Neural Networks, by Pao, Addison-Wesley,
1988.
Neural Networks: A Comprehensive Foundation, by Simon Haykin, IEEE Computer
Society Press, 1994.
Neural Networks and Fuzzy Logic Applications in C/C++, by Stephen T. Welstead,
IEEE Computer Society Press, 1994.
Artificial Neural Networks: Concepts and Theory, edited by Pankaj Mehra and
Benjamin W. Wah, IEEE Computer Society Press, 1992.
Artificial Neural Networks: Concepts and Control Applications, edited by V.
Rao Vemuri, IEEE Computer Society Press, 1992.
Figure 1: Perceptron basics.
Figure 2: Linear separability: (a) logical OR, which is linearly separable
because you can draw a line between the 1 and 0 outputs; (b) logical XOR,
which is not linearly separable.
Figure 3: TICTAC in action.
Figure 4: Grid layout.
Example 1: Using Visual Basic classes.
(a)
Dim anObject as new DDJObject
Dim current_object as DDJObject
Set current_object=anObject

(b)
Property Get aProp As Integer
 aProp=100
End Property


(c)
Property Let aProp(val As Integer)
 Rem presumably you'll store
 Rem the val parameter here
End Property

Listing One
VERSION 1.0 CLASS
BEGIN
 MultiUse = -1 'True
END
Attribute VB_Name = "Discriminate"
Attribute VB_Creatable = True
Attribute VB_Exposed = True
Private in_ary() As Single
Private Rules() As Single
Private maxx, maxy As Integer
Property Get Calc() As Integer
ReDim Res(maxx) As Single
For x = 0 To maxx
For y = 0 To maxy
 Res(x) = Res(x) + in_ary(y) * Rules(x, y)
Next y
Next x
Max = -3.4E+38
maxi = -1
For x = 0 To maxx
 If Res(x) >= Max Then Max = Res(x): maxi = x
Next x
Calc = maxi
End Property
Sub Setup(x%, y%)
maxx = x - 1
maxy = y - 1
ReDim in_ary(maxy)
ReDim Rules(maxx, maxy)
End Sub
Sub Load(fn$)
On Error Resume Next
Open fn$ For Input As #1
If Err <> 0 Then
 MsgBox "Can't open file", vbOKOnly, "Warning"
 Err.Clear
Else
 Input #1, maxx
 Input #1, maxy
 Setup maxx + 1, maxy + 1
 For y = 0 To maxy
 For x = 0 To maxx
 Input #1, Rules(x, y)
 Next x
 Next y
 Close 1
End If
End Sub
Sub Save(fn$)
Open fn$ For Output As 1
Write #1, maxx

Write #1, maxy
For y = 0 To maxy
For x = 0 To maxx
 Write #1, Rules(x, y)
Next x
Next y
Close 1
End Sub
Sub Train(n%)
r% = Calc()
If r% <> n% Then
 For y% = 0 To maxy
 Rules(r%, y%) = Rules(r%, y%) - in_ary(y%)
 Rules(n%, y%) = Rules(n%, y%) + in_ary(y%)
 Next y%
End If
End Sub
Property Get Value(n%) As Single
 Value = in_ary(n)
End Property
Property Let Value(n%, val As Single)
in_ary(n%) = val
End Property

Listing Two
Attribute VB_Name = "Module1"
Dim Rulebase As New Discriminate
Sub main()
MsgBox "Begin Training"
' Create rulebase
Rulebase.Setup 9, 18
' This flag will equal 0 when all cases
' are successful
Train = 1
Do While Train = 1
 Train = 0
 Open "Train" For Input As 1
 Do While Not EOF(1)
 Input #1, d$
 ' Ignore comments
 If Left$(d$, 1) <> ";" Then
 For i% = 0 To 17
 If Mid$(d$, i% + 1, 1) = " " Then Rulebase.Value(i%) = -1: 
 Else Rulebase.Value(i%) = 1
 Next i%
 n% = val(Mid$(d$, 19, 1))
 n1% = Rulebase.Calc
 ' If not right, train and set Train flag
 If (n% <> n1%) Then Rulebase.Train (n%): Train = 1
 End If
 Loop
Close 1
Loop
Rem training complete!
MsgBox "Training complete"
Rulebase.Save "RULEBASE.DAT"
End Sub

Listing Three

VERSION 4.00
Begin VB.Form Form1 
 Caption = "Play Tic Tac Toe"
 ClientHeight = 3735
 ClientLeft = 240
 ClientTop = 1485
 ClientWidth = 8865
 Height = 4140
 Left = 180
 LinkTopic = "Form1"
 ScaleHeight = 3735
 ScaleWidth = 8865
 Top = 1140
 Width = 8985
 Begin MSGrid.Grid Grid1 
 Height = 3615
 Left = 0
 TabIndex = 0
 Top = 0
 Width = 8775
 _Version = 65536
 _ExtentX = 15478
 _ExtentY = 6376
 _StockProps = 77
 BackColor = 16777215
 BeginProperty Font {0BE35203-8F91-11CE-9DE3-00AA004BB851} 
 name = "Arial"
 charset = 0
 weight = 700
 size = 48
 underline = 0 'False
 italic = 0 'False
 strikethrough = 0 'False
 EndProperty
 Rows = 3
 Cols = 3
 FixedRows = 0
 FixedCols = 0
 ScrollBars = 0
 HighLight = 0 'False
 End
End
Attribute VB_Name = "Form1"
Attribute VB_Creatable = False
Attribute VB_Exposed = False
' Rule matrix
Dim rulebase As New Discriminate
' Cell Types
Dim NextPlay$
Dim XorO$
'Play Counter
Dim pctr As Integer
'Reset board
Sub Reset()
 For y% = 0 To 8
 Grid1.Row = y% Mod 3
 Grid1.Col = Fix(y% / 3)
 Grid1.Text = ""
 Next y%

 pctr = 0
 XorO$ = "X" ' This version always plays X
 Grid1.Row = 0
 Grid1.Col = 0
 Grid1.Text = "X"
 pctr = 1
End Sub
'Check for a win or draw
Function Win()
 Dim c$(2)
 Win = 0
 For x% = 0 To 2
 Grid1.Col = x%
 For y% = 0 To 2
 Grid1.Row = y%
 c$(y%) = Grid1.Text
 Next y%
 If c$(0) <> "" And c$(0)=c$(1) And c$(1) = c$(2) Then Win = -1: Exit Function
 Next x%
 For y% = 0 To 2
 Grid1.Row = y%
 For x% = 0 To 2
 Grid1.Col = x%
 c$(x%) = Grid1.Text
 Next x%
 If c$(0) <> "" And c$(0)=c$(1) And c$(1)=c$(2) Then Win = -1: Exit Function
 Next y%
 Grid1.Row = 1
 Grid1.Col = 1
 c$(0) = Grid1.Text
 Grid1.Row = 0
 Grid1.Col = 0
 c$(1) = Grid1.Text
 Grid1.Row = 2
 Grid1.Col = 2
 c$(2) = Grid1.Text
 If c$(0) <> "" And c$(0)=c$(1) And c$(1) = c$(2) Then Win = -1: Exit Function
 Grid1.Row = 2
 Grid1.Col = 0
 c$(1) = Grid1.Text
 Grid1.Row = 0
 Grid1.Col = 2
 c$(2) = Grid1.Text
 If c$(0) <> "" And c$(0)=c$(1) And c$(1)=c$(2) Then Win = -1: Exit Function
End Function
'Create a new rulebase and read in file
Private Sub Form_Load()
rulebase.Setup 9, 18
rulebase.Load "RULEBASE.DAT"
Reset
End Sub
'Player clicked on a square
Private Sub Grid1_Click()
Dim xy, i As Integer
If Grid1.Text = "" Then ' Can't move to full sq
 NextPlay$ = "O"
 Grid1.Text = NextPlay$
 Rem check for win
 If Win() Then

 MsgBox "You Win!"
 Reset
 Exit Sub
 End
 End If
 
 Rem set up state for our move
 For i = 0 To 8
 Grid1.Row = i Mod 3
 Grid1.Col = Fix(i / 3)
 rulebase.Value(i) = -1
 rulebase.Value(i + 9) = -1
 If Grid1.Text = XorO$ Then rulebase.Value(i) = 1
 If Grid1.Text = NextPlay$ Then rulebase.Value(i + 9) = 1
 Next i
 n = rulebase.Calc
 Rem check n for legal move
 Do
 Grid1.Row = n Mod 3
 Grid1.Col = Fix(n / 3)
 If Grid1.Text <> "" Then
 Rem try again
 n = n + 1
 If n = 9 Then n = 0
 End If
 Loop While Grid1.Text <> ""
 Grid1.Text = XorO$
 pctr = pctr + 2
 Rem check for win
 If Win() Then
 MsgBox "I Win!"
 Reset
 Exit Sub
 End If
 If pctr = 9 Then
 MsgBox "Draw Game!"
 Reset
 Exit Sub
 End If
End If
End Sub
End Listings>>
<<




















DTACK REVISITED


Tempus Fugit




Hal W. Hardenbergh


Hal is a hardware engineer who sometimes programs. He is the former editor of
DTACK Grounded, and can be contacted through the DDJ offices.


Once upon a time, before 747s were invented and the world's population headed
north of five billion, people got along with just one name. After all,
everybody was born, lived, and died without ever leaving their village.
Everybody knew Tom, Dick, and Harry.
Then Europeans got an itch for tourism. With the English leading the way,
everybody got to choose a second name whether they wanted one or not. They
were given the choice of colors or occupations: Blue, Cooper, Fletcher.
MacLaren means "son of Laren." Place names became a favorite, especially as
the practice migrated to the continent.
My name means "Mt. Harden" in German. It's spelled the same way as Charles
Lindbergh (you may have heard of Charlie and his solo Atlantic flight). But
the guardians at the gates of Ellis Island during the big European wave of
immigration translated foreign names into English freely; "bergh" also is
"berg." "Burg" or "burgh" is German for castle or fort. The Pennsylvania
cities of Harrisburg and Pittsburgh grew up around forts.
I've long been resigned to having my name routinely misspelled. When I first
wrote this column (Dr. Dobb's Sourcebook, March/April 1995), it was misspelled
in the table of contents and on my first check.
In the editing of Michael Abrash's book ZEN of Graphics Programming, this was
raised to an art form. My name first appears on page 1 of chapter 17, where
it's misspelled twice, differently. On the first page of chapter 18, my name
is again spelled two ways, once correctly. But the book publisher qualified
for international competition by misspelling my name twice, differently, in
the same caption: Table 18.1 "Hardenburg Circles versus Optimized Hardenburgh
Circles." Sigh.
My name isn't the only thing the book publisher had problems with; in the
index under "Circles" we find "Hardenburgh's circle algorighm" (sic).


A Hardware Guy? Programming?


By now you may be wondering what a hardware guy's name (or variations
thereupon) is doing in a book that has "Programming" in the title. Well, check
my bio on the bottom of this page. I once developed a full-blown 68000-based,
incrementally compiled Basic for the Atari ST, a project that required several
years and some help from others. The language, called "DBASIC," worked fine
but only sold 72 copies (unknown to me at the time, Atari was shipping all its
STs to Germany where a third-party BASIC by GFA did emerge).
Now you know why I call myself a hardware person. I've actually made some
money from time to time on hardware.
DBASIC needed a command to draw a circle. When I checked out published
algorithms, the only ones I could find used either a multiply or, in one case,
a divide (!) inside the main loop. So I devised my own algorithm. After
calculating a couple of squared terms, the main loop just used integer adds
and subtracts. After DBASIC had already flopped, I devised a similar algorithm
for ellipses. All these were in 68000 assembly for the ST.
This brings us to 1989, when both Michael Abrash and I were writing for the
late bimonthly Programmer's Journal. When I learned that Michael lived only a
few miles from me, I arranged a meeting. I presented my algorithms and
suggested that we collaborate on an article. You see, I didn't (then) know how
to program using x86 assembly or VGA cards and those subjects were Michael's
specialty.
Michael agreed the algorithms should be published but envisioned several
articles, a lot more work than I had planned. We agreed that Michael should
write the articles alone, mentioning me as the source of the algorithms.
Chapters 17o 19 of ZEN originally appeared as articles in the first three 1990
issues of PJ. 
I'm resurrecting this old story for two reasons, both of which are related to
Michael's column in the previous issue (Dr. Dobb's Sourcebook,
January/February 1996) of this magazine. Reason one is that in his last three
paragraphs Michael describes how sharing benefits the larger community. Hey, I
believe in that stuff too. Reason two is that the folk at id Software,
Michael's employer, may not be using the optimum method of developing
software.


Egoless Programming


Having turned over my circle-drawing algorithm, which was the fastest I had
ever seen, I soon got a call from Michael. "Hal, you don't need to square
those terms. Your 'finite difference' algorithm works just fine using the
unsquared values if you make this one small change...." Michael was right, and
his simplification enabled the use of 16-bit, rather than 32-bit integers.
(See ZEN, page 1 of chapter 19.)
As Michael has pointed out, it's just about impossible for any one person to
devise an algorithm that somebody else can't improve. I already knew that. In
fact, I had deliberately depended on that in optimizing the inner kernel of
the commercially failed language DBASIC.
A language that's slow performing a fundamental integer operation such as
C%=A%+B% is destined to be slow, period. I wanted DBASIC to be a fast,
incrementally compiled language. Back in 1985, I posted the code then used to
execute C%=A%+B% in DBASIC on the company bulletin board, along with the
number of 68000 clocks needed.
By the next day another programmer, James Shaker, had replaced the code
sequence with a modified form that ran in fewer clocks. I looked at the
change, decided it could be improved, and posted a third version that same
day. Within a week, the code had been sped up by a factor of 5 (!) and we had
arrived at the final form of the innermost kernel of DBASIC. Nobody had ever
put his name on any of those sheets of paper.
According to Michael's column, id Software had a performance problem in an
innermost kernel of their new game, Quake. Hidden surface removal was proving
too slow when there were lots of surfaces. id's honcho, John Carmack, solved
the problem by parking his 1200-HP Ferrari and pulling an all-weekender. This
is great if you have a John Carmack around and he doesn't mind losing sleep.
But for the ordinary mortals who constitute most programming teams, a good way
to solve really tough, important problems is to use egoless programming. Post
the problem and its current implementation and performance on the bulletin
board. Don't sign the posting. Unless the other team members are being
micromanaged, this is an irresistible challenge.
Egoless programming isn't an available option for persons who believe they're
much more competent and innovative than their associates. If that belief is
correct, great! But for the rest of us...
(This is not intended to knock John Carmack, who seems to be a pretty good
programmer and who, as the boss, generously allowed Michael to share some of
the innermost details of Quake with us.)
Try this technique at work. Without revealing any proprietary details, please
let me and/or Dr. Dobb's know how it worked.


Sharing: A Death-Defying Act?


Sharing software is obviously beneficial to the larger community. How about
sharing hardware information? Well, I did some of that way back in '81 and
'83.
In 1979, when Motorola's 68000 microprocessor was approaching completion,
there was a fierce fight within Big M. The chip makers in Austin wanted to
sell the 68K to all comers. But Big M's computer and semiconductor division
headquarters were located in Phoenix, where there also was a computer systems
group. The systems group wanted to limit the 68K's use to high-end
minicomputers. The chip boys had to make their case in memos via snail mail,
while the systems folk lobbied over three-martini lunches. Guess who won?
So it became Big M's official but unannounced policy to keep the 68000 out of
the personal computer marketplace. This was accomplished in two ways:
Application engineers were forbidden to provide any assistance whatsoever to
personal computer types. And the Ap Notes for the 68K exclusively described
incredibly complex systems.
My small company bought the first two 68000s to be sold across Hamilton
Electro Sales' counter, in November 1980. For six months I tried to get
information about how to use the 68K in a simple, PC-related application. I
didn't understand why the application engineers literally would not talk to
me.
Then I got mad. I carefully read the data sheet, and discovered that if pin
10, DTACK, was grounded, then the 68000 became a nice tame device that could
be designed into extremely simple systems just like the 6502 that I'd been
working with up to then. Since nobody else in the PC world seemed to have
access to that info, I started the newsletter DTACK Grounded in July 1981. The
first issue explained exactly how to design the 68000 into a simple system. I
was sharing my information.
By November, I had registered the fictitious name DTACK Grounded and was
selling 68000-based attached processors for the Pet and Apple II. The Apple II
version became a profitable product, and I wound up publishing 45 mostly
monthly issues--1000 pages--of the newsletter.
When I began the DTACK effort in 1981, the 68000 was by far the fastest
available microprocessor. Intel's Pentium runs integer stuff over 200 times
faster now than the 68K did then. FP performance? Don't ask...



Floating the Point


It was obvious from the first that while the 68000 was a really fast integer
processor, it needed an FP math accelerator like the Intel 8087. Big M had in
fact announced that it would ship such a part for the 68K in 1982, but it was
vaporware until mid-1985.
Meanwhile, National Semiconductor developed a line of 68K-like processors, the
32000 series. NS used separate teams to develop the integer CPU and the FPU
chips. The CPUs remained heavily bug-ridden for a long time. But when the
16032 FPU appeared in mid-1983, it worked.
Having learned from my original 68K experience, I carefully studied the 16032
data sheet and decided the FPU would (unlike Intel's 8087) easily interface
with the 68K. So my little company built up a test jig using 3M's neat
prototyping kit and guess what--it worked. But, unlike what the 16032 data
sheet said, the FPU had to be run using a synchronous clock.
I published all this information in newsletter #24 (October 24, 1983),
including a photograph of the prototype board. I contacted Stan Baker about
this, and Stan's popular hardware column in Electronic Engineering Times
reported that the 68K and the National Semi FP chip were compatible and a lot
faster than Intel's x86 + 8087 combo. My next newsletter carried a photo and
complete schematic of our second, simplified prototype. Soon we were selling
production boards, one of which accepted up to three 16032s.
I thought Big M would be happy; FP support for the 68K now was readily
available. Wrong. Big M folk were furious with me; they wanted everybody to
wait for their 68881 to arrive. I thought National Semi would be happy because
here was another market for its FP chip. Wrong. National Semi's minions were
peeved because they didn't want its FPU used to support 68K-based systems.
Intel was understandably unhappy. For a while there I screened my mail for
letter bombs.
(National Semi told everybody that I didn't know what I was talking about; the
16032 was too an asynchronous part. Five months later, the data sheet was
quietly changed to require synchronous operation. Tsk.)


Some Folk Never Learn


In my column in the September/October Dr. Dobb's Sourcebook, I shared more
hardware info: why CPU clock pushing was a really good idea, especially for
Intel's Pentium/75. I suspected the column would not be joyously embraced by
Intel.
Tony Tribelli (adt@netcom.com) is publishing the results of an ongoing
clock-pushing survey. I received a copy after submitting my article to Dr.
Dobb's. Table 1 presents a few of his results. Worthy outcomes? Perhaps not.
Dark clouds gather ominously on the horizon...


Rampant Criminality


On December 12, 1995, an article appeared in my local newspaper's (the San
Jose Mercury News) "Business" section about Intel modifying its Pentium
design. It seems that after Intel decides a CPU's designated clock, it can be
programmed internally to limit its speed. Some partial quotes:
..."overspeed protection" prevents criminals and others from winding up a 75
MHz Pentium, for example.
Some unsavory folks knew it was hard to physically tell the faster chips from
the slower ones. 
The "overclocked" chip would initially test OK. But it probably would burn out
and stop working in a short time. [This assertion is "factually challenged" --
HWH]
...the [clock-pushing] practice has been around for years and was cause for
concern among law enforcement officials.
Hmm. I wonder just how that circuitry works. The easiest method would limit
the phase-lock loop clock multiple. But the P75, P90, and P100s all use a 1.5x
multiple, which still would allow P75s to be pushed to 100 MHz. And I wonder
when Intel's gonna start shipping them new chips that foil us criminals? The
P75s currently on the market assuredly do not have that circuitry activated,
if it's present at all.
My Dr. Dobb's editor helpfully pointed out that I could use the jail pay phone
to verbally deliver my next Sourcebook column (not to mention that he would
spring for a collect call). Yet another reason why sharing hardware info with
the larger community sometimes ain't such a good idea.
Table 1: Sample results from Tony Tribelli's clock-pushing survey.
The Push Tries Success Ratio
P75 -> P90 31 31 100%
P75 -> P100 21 18 86%
P75 -> P120 9 7 78%
P75 -> P133 3 1 33%
P133 -> P180 4 3 75%


























SOFTWARE AND THE LAW


Liability for Defective Software




Marc E. Brown


Marc is a patent attorney and shareholder in the intellectual-property law
firm of Poms, Smith, Lande, & Rose in Los Angeles, CA. Marc specializes in
computer law and can be contacted at 73414.1226@compuserve.com.


Software sometimes doesn't work the way it is intended or expected to. Users
may lose profits, have their reputation injured, and demand corrective action.
Particularly with custom software, the original software developer is often
targeted as the source of compensation for these injuries. Even if the
customer's complaint is without merit, the cost of defending yourself against
a lawsuit can exceed the profit you hope to make from your software.
However, there are clauses that you can incorporate into development contracts
that can substantially reduce your exposure, and can prevent the
misunderstandings that often lead to these expensive lawsuits.


Performance Specifications


When software crashes and/or damages data, it is obvious that the software is
malfunctioning. But liability can also be imposed from a broad variety of
less-apparent defects.
For example, liability could be imposed if the software fails to manage an
important data field, run on a desired platform, or is incompatible with other
software. Objections over speed, memory usage, and ease of use can also form
the basis of liability.
These objections usually stem from customer expectations about the software.
Clear contract language can control these expectations and limit your legal
liability.
Consequently, you should be sure to incorporate detailed performance
specifications in the contract. This is one of the best ways to reduce the
scope of objections that could form the basis for legal liability. To be most
effective, the contract should state that the performance specifications are
the sole criteria the software must meet.


Specify a Service, Not a Product


With the exception of Louisiana and the District of Columbia, every state in
this country has enacted a variant of the Uniform Commercial Code (U.C.C.).
These laws impose a broad variety of liabilities and significantly restrict
the ability to avoid these liabilities through contract restrictions.
But the U.C.C. does not apply to every commercial transaction--it only applies
to the sale of goods by companies regularly engaged in such activity. Software
support is usually not governed by the U.C.C. because it is a service, not a
product. Mass-produced software, on the other hand, would probably be
categorized as goods. Although the purchaser is usually only granted a
license, the transaction is usually viewed as sufficiently analogous to the
"sale," which the U.C.C. requires. Custom software development has the
complexion of both a product and a service, and it is therefore not entirely
clear whether the U.C.C. applies to it.
It may be possible to ensure that the U.C.C. is not applied to a software
development transaction by stating in the agreement that it is primarily an
agreement for service, not the sale of a product. Of course, the agreement
will indicate that the customer will be receiving software. But the agreement
should indicate that the customer is paying primarily for the service of
designing, writing, testing, and improving the software. Language should also
be included stating that the parties do not consider the agreement to
constitute a "sale of goods" and that it will not be governed by the U.C.C.
A court might ignore such language and still find the transaction to be
governed by the U.C.C. The remainder of this column will therefore include a
focus on the Uniform Commercial Code.


Don't Overstate Anticipated Performance


The most-common legal theory asserted against a software developer is breach
of contract. The customer alleges that the developer breached a contract by
providing defective software.
Any recitation of performance criteria usually will be interpreted as
constituting a warranty that these criteria will be met. This is true even if
the contract does not expressly warrant the recited criteria. The developer
also may be held to performance criteria that were merely discussed during the
course of contract negotiations or that appeared in promotional material. The
moral here is simple: Specify performance criteria, but be careful what you
say.


Disclaim Implied Warranties


When applicable, the U.C.C. implies three additional and significant
warranties, even when no assurances are expressly provided.
The first is the implied warranty of "merchantability" --a warranty that the
software will be fit for the ordinary purposes for which it is intended and
will pass without objection in the trade. Although the concept is imprecise,
it can result in the imposition of very broad performance requirements.
The second is the implied warranty of "fitness for a particular purpose." When
you know the particular purpose for which the customer needs the software (as
is usually the case with custom software), the U.C.C. implies a warranty that
the software will be fit for this purpose. The warranty of merchantability
will not be breached if the software is working perfectly. But if the software
does not fulfill the needs of the customer as they were described to the
developer at the incipiency of the relationship, the implied warranty of
fitness for a particular purpose will be breached.
The third warranty is an implied warranty that the software will not infringe
another's patent or copyright. It's easy to avoid copyright infringement--just
don't copy other software! But software can infringe a patent even if the
software developer created all of his software independently and was never
aware of the patent. This type of liability can be enormous.
One way to avoid express and implied warranties is to disclaim them. The
development agreement should state that no express or implied warranty is
provided. Such a disclaimer will not be effective against the implied
warranties of merchantability and fitness unless these implied warranties are
expressly mentioned by name or the contract states that the customer accepts
the software "as is" or "with all faults."


Fraud Cannot be Disclaimed



When the contract does contain strong disclaimers, their legal effect can
sometimes be overcome by an allegation of fraud. The customer alleges that the
developer misled the customer by erroneously telling the customer that the
software would meet his or her needs.
Fraud will survive the very broadest disclaimer. However, four elements
usually must be proven by clear and convincing evidence:
The developer made a false representation.
The developer knew that the representation was false at the time he made it,
but nevertheless did so to deceive the customer.
The customer was unaware of the falsity of the representation at the time it
was made and proceeded forward with the transaction in reliance upon it.
The customer was damaged as a result of the falsity of the representation. 
There is no requirement that the representation have been in writing.


Strict Liability


Another way for users to overcome contractual disclaimers is to allege that
you are "strictly" liable for the defective software. The phrase "strict
liability" means liability without fault. Even though you may have acted with
the utmost care, you can nevertheless be "strictly" liable for any defect in
the software. Like fraud, it cannot be disclaimed.
Most jurisdictions refuse to recognize this doctrine unless the software was
intended for an application that is "inherently dangerous." Software that
controls the explosion of a bomb would be one example. Most business software,
on the other hand, would not satisfy this requirement.
Another typical limit of this doctrine is that it only covers injuries to
persons or property. The more-typical claims for lost profits or injury to
reputation are usually not covered.


Negligence


Another theory sometimes used to overcome contractual disclaimers is
"negligence." To establish a claim of "negligence," it is not sufficient to
merely prove that the software is defective. There must be proof that the
defect is the result of the software developer failing to exercise due care in
its development. That the software is defective can serve as evidence of
negligence. But it is not decisive. Mistakes can and are often made in the
development of software, even though the utmost care is exercised.
Negligence also is sometimes alleged in connection with performance
representations. Although the software may be working perfectly, it may not be
fulfilling the customer's expectations. When these expectations are the result
of performance representations made by the developer, the allegation of
"negligent misrepresentation" is sometimes made. As with malfunctioning
software, it is not sufficient to simply prove that the representation turned
out to be false. The customer must prove that the developer failed to exercise
due care when he made it.
Most courts will not recognize a claim of "negligence" when it is being used
to recover purely economic losses, such as lost profits or injury to a
customer's reputation. As with a claim of "strict liability," most courts will
only recognize an allegation of negligence when personal injury or property
damage has resulted. 


Limit Remedies


Many customers are not willing to accept sweeping disclaimers. On the other
hand, they can understand that the profit that the developer is making is not
sufficient to cover exposure to unbounded liability. As a consequence, many
customers are willing to accept contractual limits on the amount of liability,
even though they are not willing to accept disclaimers eliminating all
liability.
One common technique is to limit liability to "direct" damages. These are
damages "directly" caused by the defective software, such as the cost of
replacing or repairing the software. The contract typically provides that the
developer is not liable for "indirect" or "consequential" damages, such as
lost profits or injury to reputation.
Another approach is to limit the customer to a certain amount of dollars for
each day during which the developer fails to repair the damaged software. This
is known as a "liquidated damage" provision and is enforceable in many states,
particularly when it is difficult to accurately predict the amount of damage
the customer would suffer because of a defect at the time of contracting.
Yet another approach is to limit the developer's liability to an arbitrary
amount, such as the amount of money the customer paid for the software
development.


Integration Clauses


As discussed earlier, representations made outside of the contract can form
the basis for a breach of warranty or fraud claim. Exposure to these claims
can be reduced or eliminated by including an "integration" (or "merger")
clause. This is a clause that states that the developer has not made any
representation concerning the software that is not expressly recited in the
contract and that, if he has, the customer is not proceeding forward in
reliance upon it.


Require Prompt Notice


Some customers do not report problems promptly. During the interim, the
developer may devote months of additional effort, only to later find out that
the customer is unwilling to make any payment because of the defect, or that
he must redo much of the code that was written after the defect surfaced. Key
programmers also may become unavailable and memories of how the code works may
fade.
The contract can minimize these problems by stating that the customer waives
all rights he has in connection with any defect, unless that defect is brought
to the attention of the developer within a stated number of days after it is
discovered. Preferably, the contract should require written notice of the
defect to eliminate any later dispute over whether timely notice was provided.


Arbitration


To reduce the expense of a legal dispute, include a clause which requires the
customer to arbitrate any claim. Many people believe that arbitrators
compromise claims, rather than decide them. Unlike courts, arbitrators usually
are not bound to legal principles. But few dispute that arbitration is far
less expensive than court proceedings.


Not all Limits will be Enforced


Getting the customer to agree on liability limits does not always mean that
these limits will be enforced. When the U.C.C. applies, a limit on liability
will not be respected when it is "unconscionable" or causes the contract to
"fail of its essential purpose." Typically, the court looks to see whether the
customer had a real opportunity to negotiate the terms of the contract,
whether the customer was unsophisticated in software, and the unfairness which
enforcement of the restriction would cause.
For example, a contract that imposed no liability upon the developer if he
failed to deliver the software after receiving payment from the customer would
run afoul of these requirements. This would cause the essential purpose of the
contract--the delivery of software--to fail. It also leads to an obviously
unconscionable result.
Limiting the customer to a return of his money, on the other hand, might well
satisfy these requirements. Some courts have already so held.

It also is important that the limitations on liability be imposed at the time
the agreement is signed. Attempts to limit liability afterwards through the
delivery of self-serving notices would probably be ineffective.
Contractual limitations on liability will rarely bar claims for fraud, gross
negligence (in some jurisdictions), and claims for personal injury or property
damage.


Insurance


The typical "Commercial General Liability" policy will protect against claims
for personal injury and property damage when caused by defective software, or
claims alleging that promotional representations were false. The more-common
breach of contract claim, however, is usually not covered. But when a breach
of contract claim is combined in a lawsuit with a claim that is covered, the
insurance company usually will be obligated to defend the entire lawsuit.
Companies also can purchase an "Errors and Omissions" policy, which provides
broad protection against most defective software claims.


Be Flexible


When a problem does arise, strict adherence to restrictions in the contract is
not always the best course. Responding positively and flexibly to the concerns
expressed by the customer--even if not required by the contract--often will
preserve an important business relationship, enhance the developer's
reputation, and avoid a lawsuit that is likely to be far more costly.

















































EDITORIAL


Your Move


Game programmers have long been developing innovative user interfaces, and
their ideas and techniques are finding their way into a lot of software. For
example, the Windows 95 "taskbar" (and the similar appendage HP-VUE has
sported for years) bears a strong resemblance to the cockpit controls of so
many old video games. Many programs now augment their help facilities by
overlaying recorded animation onto computer-generated backgrounds, using
techniques also perfected long ago in the game community.
Two of the most obvious requirements of any good game are quick response and
intuitive behavior. These features are critical to the success not only of
games, but of all software, from spreadsheets to Web sites. Today's hottest
multiplayer, three-dimensional (3-D) game technologies will have a strong
influence on all of tomorrow's software and hardware. Major system vendors,
including Apple (QuickTime 3D), Microsoft (DirectVideo, DirectSound), and
Intel (MMX), realize this and are eagerly wooing game developers.
One notable example of the far-reaching influence of game programming is the
publicity accorded to the recent chess match between IBM's Deep Blue and Garri
Kasparov, the reigning world champion chess player. Although it was billed as
the ultimate "man against machine" event, interviews with some of the many
contributors to Deep Blue's design gave me the decidedly unromantic impression
that it was more of a "man against industrial research project" event. IBM's
Deep Blue wasn't just a computer--it was teams of chess experts and computer
scientists, databases with tens of thousands of preplayed games, and some of
the fastest computer hardware that money can buy. All just to win a single
game of chess.
Although my attitude toward the event and surrounding publicity was somewhat
cynical, clearly there were many interesting aspects. For one thing, Deep Blue
was constructed primarily from off-the-shelf components, right down to the IBM
POWER2 processors (close relatives of the PowerPC) that managed each node. The
increasing power of cheap microprocessors is an ongoing trend, and is rapidly
reshaping the market for high-end computers.
I also noticed the continued loss of respect for the term "artificial
intelligence." Commentators Maurice Ashley and Yasser Seirawan stressed that
Deep Blue used purely brute-force approaches, relying on a relatively simple
analysis of billions of moves in each three-minute turn. This proved enough
for Deep Blue to win the first game. Unlike Deep Blue, however, Kasparov
learned quickly from his mistakes, and proceeded to two draws and three wins
for a final score of 4-2.
In contrast to the millions spent by IBM, a different kind of chess is played
just down the street from my house. A local used furniture store sets up
tables and chess boards along the sidewalk and invites local kids to stop by
and play for a while. Not world-class competition by any means, but inspiring
nonetheless. The owner of the store started his weekly chess games to give
local kids something to do in the evenings, an alternative to wandering around
the neighborhood getting into trouble. If nothing else, it's picturesque
seeing kids hunch intensely over chessboards, adults chatting, pedestrians
strolling to and from restaurants, all in the dim glow of twilight and street
lamps--and not a phosphorous glow to be seen.
Tim Kientzle
technical editor















































Games Programming with DirectPlay


The Game SDK's multiplayer API




Chris Howard


Chris is the president of Genus Microprogramming, a provider of programming
tools and services. He can be reached at 75100.17@compuserve.com.


Games have come a long way since we began playing Snipes over a Novell
network. A few years ago Spectre became popular because of its multiplayer
capability. Recently, games like Doom and Descent have illustrated that there
is a great demand for good, multiplayer games. And as a programmer, I've
wanted a fast and easy way to add multiplayer support to my own games, without
having to delve into the depths of modem communication and various network
APIs. 
That's where DirectPlay comes in. DirectPlay, which comes with the Windows 95
Game SDK, provides access to communication services for your games. In this
article I'll examine DirectPlay and show how you can use it. I'll also
describe my experience with creating a multiplayer game called "NetRoids"
using the DirectPlay interface. The source code and related files for NetRoids
are available electronically; see "Availability" on page 3.


What Is DirectPlay?


DirectPlay is one of the DirectX components of the Microsoft Windows 95 Game
SDK. The SDK includes components for video (DirectDraw), sound (DirectSound),
input (DirectInput), setup (DirectSetup), and networking (DirectPlay), all
designed for creating games. The name "DirectPlay'' seems nonintuitive to me,
but I guess "DirectMultiPlayer" was too long and "DirectNet" too limiting.
DirectPlay allows easy access to communication services provided by your
computer, including modems, networks, and on-line systems. DirectPlay servers
allow games to skip connectivity and communication details by providing a
common interface. The DirectPlay software interface provides a simple way for
games to communicate with each other over a wide range of services. This frees
you from the need to create complex communication implementations and allows
you to concentrate on producing a great game. DirectPlay consists of two API
functions and roughly two dozen member functions.
The DirectPlay APIs initiate communication through the DirectPlay interface.
DirectPlayEnumerate is used to obtain a list of available DirectPlay service
providers, and DirectPlayCreate instantiates a DirectPlay object for a
particular provider. There are service providers for modem communication and
for network interfaces. A service provider for the Internet is reportedly
being worked on. Again, what's important is that it doesn't matter which
provider is selected--it is transparent to the programming of the game.
As Table 1 illustrates, the member functions are divided into four categories:

Session management.
Player management.
Group management.
Message management.
A DirectPlay session is an instance of a game. Session member functions can
open a new session or connect to an existing or saved session. You can also
list the sessions currently available, save sessions, and close a session. You
need to open a session with the Open member function to begin your game.
The player-management functions manage the players in a game session. The most
important functions in this category are CreatePlayer and DestroyPlayer. These
are the only two functions in this category that I used in my game, but other
functions allow you to list the current players, limit the number of players
in a session, and set formal and friendly names.
DirectPlay also lets you create groups of players. The group member functions
available are similar to the player functions. By grouping players, you can
simplify message handling and conserve bandwidth by sending a message to a
group rather than to each individual player. (I didn't use any functions in
this category.)
The final category is what DirectPlay is all about: message management. The
Send and Receive functions let you communicate with other players. Messages
can be any kind of data your game requires. You can send messages to a
particular player, to a group, or to all the players in the session.
Obviously, I used both Send and Receive.


Using DirectPlay


Using DirectPlay in your game usually involves the following steps:
1. Request that the user select a communication method for the game. Use the
DirectPlayEnumerate function to find all of the available service providers
and then list them in a listbox.
2. Once a service provider has been chosen, create a DirectPlay object for
that provider by calling the DirectPlayCreate function. This causes DirectPlay
to load the appropriate service provider library.
3. Optionally, you can request information from the user, such as a friendly
name. Other game-specific information can be stored in the DPSESSIONDESC
structure in the dwUser fields.
4. Ask if the player wants to start a new game or join an existing one. If the
player is joining an existing game, proceed to step 5. Otherwise, create the
game by specifying the DPOPEN_ CREATESESSION flag and using the Open member
function. Skip to step 6.
5. Find existing sessions with EnumSessions and present them to the player.
After a selection is made, connect to it by specifying the DPOPEN _OPENSESSION
flag and using the Open member function.
6. Create a player or players using the CreatePlayer member function. You can
also find other players by using EnumPlayers.
7. Use the Send and Receive member functions to communicate with the system
and with other players.
8. When the player exits the game, use DestroyPlayer to delete the player from
the session, and the Close member function to close the communication channel.
Now that you have the APIs and member functions, and an overview of how to use
DirectPlay, all you need is a game to try it out on.


NetRoids


To see what it takes to add multiplayer capabilities to a game, I decided to
add network support to the Roids game, which is supplied with the Game SDK.
While space does not permit the listings to be presented here, I've provided
the source code and project files electronically. The Game SDK includes a
multiplayer game called Duel, but it already contains DirectPlay support and I
wanted something that I could do myself. This approach also allows us to focus
on DirectPlay. Roids already has all of the drawing, sound, and art files--all
it needs is more players. Additionally, it provides a good benchmark for
determining how fast and easy it is to add multiplayer support to an existing
game.
Every game needs a globally unique identifier (GUID). This is a long,
complicated-looking number that the game uses to identify itself over the
communication interface. Fortunately, the GUID can be generated for you by a
program called UUIDGEN.EXE, which is provided with the Microsoft Win32 SDK
(GUIDs are sometimes called UUIDs). I used the GUIDGEN.EXE program that comes
with Visual C++ 4.0. GUIDGEN.EXE presents several ways of formatting the
number, and since it is a Windows program, it has the ability to place the
GUID on the clipboard for you to paste into your program. The GUID for
NetRoids looks like Example 1. You can see why placing this number on the
clipboard would be a handy feature! You create the GUID once while developing
the game and use it from then on. The number is unique, so you don't have to
register it with Microsoft.
For the DEFINE_GUID macro to work properly, you must #define INITGUID before
the WINDOWS.H include file. And you need the DPLAY.H include file for the
DirectPlay interface. Once you have the GUID, follow the outline given
earlier. For NetRoids, I turned to the Duel multiplayer example provided by
the SDK and borrowed its dialog boxes for choosing a provider and a session
(they're not great, but they'll do for this test). The Duel DirectPlay code
provided a starting point for the NetRoids DirectPlay support. Most of the
code is for the support of the dialog boxes, since it only takes two function
calls to establish the DirectPlay session and one to create the player.



Creating the Session


When the player presses Enter on the splash screen, the RemoteCreate function
is called. This, in turn, calls GetProvider, which presents a dialog box
listing the available service providers. You will find the DirectPlayEnumerate
call in the dialog-box initialization section; see Example 1(b). The EnumSP
function receives the service-provider information and adds the name strings
to the contents of the listbox. Once a service provider is chosen or the
dialog box is canceled, you return to the RemoteCreate function.
If a service provider was chosen, the CreateGame function is called, which
presents a dialog box asking if the player wants to create a new game or
connect to an existing one. If creating a new game, you can call the Open
member function with DPOPEN_CREATESESSION; see Example2. If connecting to an
existing game, the GetGame function is called. This displays a dialog box
listing all of the currently available sessions of NetRoids. In the dialog box
initialization, the EnumSessions member function is called, where the
EnumSession callback (not to be confused with the EnumSessions member!) adds
the sessions found to the listbox; see Example 3. 


Let's Synchronize


DirectPlay provides everything you need, with the exception of guidelines for
game synchronization. The SDK documentation skirts the issue by stating that
"DirectPlay does not attempt to provide a general approach for game
synchronization; to do so would necessarily impose limitations on the
game-playing paradigm." 
Well, an overview of how that might be accomplished would be nice. 
If you were writing a simple chat program, your work would be done. You would
simply send what one player typed to the other player and receive what he or
she typed. But how do you keep several ships and several hundred random and
fast-moving objects synchronized across four different computers? What exactly
do you send back and forth, and how do you keep it straight? For NetRoids, I
selected a client/server approach. The creator of the session becomes the host
and the keeper of the universe. The host becomes ship 0 (I added the ship
number in the upper-right corner of the display for reference), and the game
operates much like it did as a single-player game. In fact, you can still play
the game by yourself.
The IsHost variable is the key to making the game act like two different
programs. The host does everything the single-player version does and more.
But the client doesn't have to do certain tasks, like create the objects for
the level or check for object collisions. Once the host has created a session,
other players can connect to it. These players will be clients, and they need
a way to tell the host that they are there. Our first two messages handle
this. The MSG_HEREIAM message is sent by a client to all players in the
session. The host will receive it, and then try to find a ship for the new
player. If a ship is available, the host sends a MSG_INIT message back to the
client (and only that client) telling the player his or her ship number and
the current level number. The client then joins the game.
The messaging in NetRoids is handled by the SendGameMessage,
ReceiveGameMessages, and EvaluateMessage functions. A structure is defined for
each message packet, which is initialized and then sent with the DirectPlay
Send member. Example 4 shows how to create and send the MSG_HEREIAM message.
The first parameter is the DirectPlay object, followed by the DirectPlay
sender ID, receiver ID, message flags, the communication buffer, and the
number of bytes to send. To send to everyone in the session, you send to ID 0,
which represents the DirectPlay server.
The message is received by ReceiveGameMessages using the Receive member
function, see Example 5(a), and then handled by EvaluateMessage; see Example
5(b). As you can see, only the host answers MSG_HEREIAM message and then
immediately tries to initialize that player. All messaging is handled through
the Send and Receive member functions. It really is easy, but you still need a
way to synchronize the game.
Synchronization means the synchronization of data. The original Roids game
managed all of the objects through a doubly linked list with dynamically
allocated members, with the player's ship at the top of the list. You can tell
the other players to add an object, but how do you tell them which object to
delete? To make things simple, I changed the object list into a fixed array.
Now each object is placed at a specific index entry, and is easily identified.
The top of the array still contains the player ships, but now there can be
more than one player. This didn't require many code changes, but it is an
important change.
Three messages control the objects in the game, including the ships: 
MSG_UPDATE is sent by the host to update the position of objects on the client
machines, and the clients send it to update their own ship position. 
MSG_ADDOBJ is sent by the clients when they want to add an object--usually
bullets. The add object message uses the same structure as the update message
(the host ignores update messages because it sends them, so a separate message
type was created). 
MSG_DELOBJ is sent by the host when an object should be deleted from the list.
The last two messages deal with synchronization as well. The MSG_LEVEL message
tells the clients that the host has started a new level, and the MSG_SCORE
message allows the host to update a client's score. The score message actually
makes the most use of direct host-to-client messaging, since most messages
from the host are broadcast to all players in the session. When an object is
created, the DirectPlay ID of the creator is stamped into the structure. When
the host sees that a bullet object has collided with another object, it
credits the creator of the bullet by sending the score message directly to the
bullet creator's ID.
I didn't sit down and decide all at once that these were the messages needed
for NetRoids to become multiplayer. I added them as I went along, in pretty
much the order I presented them here. For instance, I didn't have a
delete-object message at first--I handled that with an update message, because
the object structure contains an enabled flag that determines whether the
object is visible or not. This didn't allow for playing the appropriate sound
when the object was destroyed, though, and the update message was already
overloaded.


Multiplayer Changes


The display list (or object array) was one major change in turning the Roids
game into NetRoids. There were many other details to add that didn't involve
DirectPlay. The changes were necessary to make the game multiplayer capable.
For example, any time the logic looked at the player ship, you had to worry
about an array of ships. Global variables for controlling shields now had to
be part of the ship structure. Ships couldn't be initialized at the center of
the screen, since they all would appear on top of one another. And of course,
there needed to be judicious use of the IsHost variable for controlling the
host and client behavior.
A lot of the original code remains intact, however. If you have Visual C++,
run the WinDiff program. Compare ROIDS.C to NETROIDS.C, ROIDS.H to NETROIDS.H,
and ROIDS.RC to NETROIDS.RC. This will give you a good idea of what had to be
changed and what stayed the same.


Performance


In my first attempt, I sent an update message for an object every frame. On my
133-MHz Pentium machine, I was getting 85 frames per second (fps), which I
sent to a 486/66 and got 60 fps. I was able to drive the 486 down to its knees
as soon as I got about 15 objects on the screen, and by doing a little math it
is easy to understand why. I was sending 85*15*120= 153,000 bytes per second
down the wire. Since I wanted several hundred objects on the screen at once,
this obviously wouldn't do.
My second attempt was to update the objects only once per second, instead of
85. The client could move the objects in between the one-second sync pulse so
they would still move smoothly. This would increase my bandwidth by 85 times!
Add and delete messages would still occur immediately, and ship updates would
still occur every frame. This worked a lot better, but there was a momentary
pause at regular intervals when there were many objects on the screen. 
The pause occurred because the pipeline was being maximized once a
second--like turning the water on full blast, and off again, at regular
intervals. To keep the water flowing through the pipe at a constant rate, my
last attempt was to time-stamp each object as it was created. Objects are
created sporadically as roids are destroyed, which can happen at any point in
a one-second interval. An update message is sent to the clients only when that
object's unique timer has expired, thereby keeping its position in the
pipeline each second. This produced a smooth update on all of the clients.
I'm sure more tweaking could be done. The player ships probably don't need to
be updated 85 times a second--only when their velocity, angle, or shields
change. Decreasing the object update sync pulse to every two seconds would
double the game bandwidth without sacrificing anything.


Exercises for the Reader


The dialog boxes could be kept from displaying over the splash screen. They
have a funny green color because of the palette, and the mouse isn't visible
outside the dialog box. For cosmetic reasons, they should probably appear
before the game enters the DirectDraw display mode.
The Roids game is not much of a challenge, since you can fire so many shots at
a time. I left that part alone in NetRoids, because I wanted to see how the
performance kept up. The game would be much more challenging if each player
were limited in the number of shots they could fire at one time and if there
were a minimum amount of elapsed time between shots. Then I would change it
from a cooperative game to a head-to-head game, where the players could shoot
each other.
You could prompt the user for a friendly name, and then display it beside
their ship as it moves around. You could use the GetPlayerName and
SetPlayerName functions for this. Displaying the name is the harder task,
however, since you would need a font and another display surface. Also, I
didn't use any of the group functions, which may improve performance over some
providers. The SDK recommends specifying a notification event when your
application creates a player and then using the Win32 function
WaitForSingleObject to find out whether there is a message pending for that
player. And finally, I ignored all system messages. System messages are sent
when players or groups are added or deleted, or when connections are lost. A
robust DirectPlay game should monitor those messages.


Conclusion


I wanted to keep most of the Roids source intact and include a header file and
a separate source file for the DirectPlay support. Eventually, I had to give
up on that idea. You really have to plan ahead when writing a game for more
than one player, especially in the area of data management. In some ways,
starting with an existing single-player game is more of a hassle than starting
from scratch.
I found the DirectPlay interface to be exactly as I anticipated--quick and
easy. I spent most of my time on synchronization and message management, and I
suppose that will vary with the type of game. A card game would take
considerably less effort, since timing is not critical. I spent about one week
learning the DirectPlay interface and creating the NetRoids game, which isn't
bad at all. If there is an Internet service provider on the horizon, you'll be
able to create a single game that plays over a modem, a network, and the
Internet. I'm game.
DirectPlay Gotchas
Jim Mischel
Jim, who is coauthor of The New Delphi 2 Programming Explorer (Coriolis Group,
1996), can be contacted at jmischel@cinematronics.com.
Designed and developed by Cinematronics and published by Virgin Interactive
Entertainment, TriTryst was one of the first Windows 95 games to incorporate
multiplayer support using DirectPlay. As the primary programmer on that
project, I was assigned, among other things, the unenviable task of
implementing the multiplayer portions of the game. Working with a new library
on a new project is usually difficult, and DirectPlay proved to be no
exception. In the course of the project, I gained new insight into the
structure of multiplayer games and learned quite a bit about DirectPlay's
strengths, weaknesses, and limitations.
Multiplayer First, Game Second

Multiplayer issues extend into all parts of your game, from the highest levels
all the way down to the base-data structures. If there's any chance your game
will turn into a multiplayer game, design the code around that eventuality.
Don't make the mistake of writing code that works fine for a single-player
game with the intention of revisiting it later to "add" multiplayer support.
Almost always, you'll end up rewriting large amounts of code from the ground
up, and regression testing the result is not pretty. The only alternative is
to special case the single-player game, and that's an even worse mess.
Get the multiplayer communications working before doing anything else. Even if
your game is initially single-player, design and write the code so that the
single-player game sends messages to and receives messages from itself in the
same client/server fashion that you'll eventually use for the multiplayer
game.
Know Your Limitations
The greatest shortcoming of DirectPlay is that it doesn't use reliable
transport. My understanding is that unreliable transport was used in the
interest of performance--reliable network transport is somewhat slower. That
argument doesn't hold water, though. If DirectPlay doesn't let the network
guarantee transport, then your program has to do it. That means packet ACK/NAK
code and a checksum or CRC to ensure data integrity. So, build the packet
verification code into your program at the beginning of the development cycle.
You're going to need it, and it's much easier to design it in up front than it
is to retrofit later.
On a related note, DirectPlay doesn't always tell you when a player leaves the
game. If a player's network connection goes down or the computer crashes, your
game gets no notification from DirectPlay. An added benefit of the packet
verification is that you can determine when a player is no longer responding
and act accordingly.
Also keep in mind that there is a limit on packet size. This isn't a
limitation of DirectPlay, but rather the service provider you're using.
Normally, the maximum packet size will be at least 512 bytes, but you can't
guarantee that. You can determine the maximum packet size for a particular
provider by calling the DirectPlay object's GetCaps() member function, and
examining the dwMaxBufferSize variable in the returned DPCAPS structure.
The DirectPlay object's Send() member function won't always return an error if
you attempt to send a packet that's longer than dwMaxBufferSize. Send()
returns DP_OK, leaving you wondering why your packet didn't get there. If
there's a possibility that your program's packet size will be too long, you'll
have to add code that will split packets or reject an attempt to play the game
using a service provider whose maximum packet size is too small.
Another limitation is that DirectPlayEnumerate() is a "call once" function. I
would never have discovered this if it weren't for the testers at Virgin
Interactive. One of their people decided to see what would happen if he
repeatedly pulled up and then canceled TriTryst's multiplayer game dialog.
This worked six times. The seventh time it would crash, right in the middle of
one of DirectPlay's DLLs.
DirectPlay apparently allocates an internal array that will hold up to 20
service providers. Every time you call DirectPlayEnumerate(), all of the
service providers it finds are added to this array--even if they were added on
a previous call. Since there were three service providers supplied with
DirectPlay, there was enough room for six calls to DirectPlayEnumerate() (3*6
= 18). The seventh call would overflow the array.
This is an interesting limitation. If a user happens to install more than 20
different DirectPlay service providers (assuming that many ever exist),
DirectPlay won't be able to run on that system, and the user is likely to
blame the game or the manufacturer of the last service provider that was
installed.
DirectPlay Messages
You have to poll for DirectPlay messages. The rationale behind this design
decision escapes me. Windows is event-driven from the ground up, and
programmers are encouraged to make use of this architecture. But DirectPlay
doesn't notify your program when a message is received. You have to write a
separate polling loop to ask DirectPlay if there are any messages available.
This "feature" of DirectPlay touches your program at a very basic level. Not
only do you have to add a call to the message polling function in your
PeekMessage() or GetMessage() loop, but you also have to set up a timer to
ensure that the polling function gets called when a dialog box is being
displayed or when some other application is active.
Other Issues
DirectPlay is full of little oddities that make it very frustrating to work
with. For example, many of the member functions have the annoying habit of
returning DPERR_GENERIC when they're unable to execute a particular function.
I'm happy that the function tells me that it didn't complete successfully, but
I'd also like to know why.
Another oddity is that when you call EnumSessions(), your program essentially
locks up for the time period that you specify in the dwTimeout parameter.
Empirical evidence shows that EnumSessions() will take at least two seconds
and possibly longer. During this time, your program will not respond to the
user's input. Two seconds doesn't seem like very long until you have to wait
that long before a program responds to pressing the OK button in a dialog box.
Finally, DirectPlay provides one outstanding feature--transport independence.
With DirectPlay you're able to write and test your game on one system, and be
confident that it will work without modification on other transports. This is
the only hands-down "plus" of DirectPlay. A big one, to be sure, but it
doesn't necessarily outweigh the drawbacks.
Table 1: DirectPlay member functions. (a) Session management; (b) player
management; (c) group management; (d) message management.
Function Description
(a)
Close Closes a communication channel.
EnumSessions Enumerates all of the sessions connected
 to a specific DirectPlay object.
GetCaps Retrieves the capabilities of the
 specified DirectPlay object.
Open Establishes a new gaming session or
 connects to an existing one.
SaveSession Saves the current session in the
 registry. Currently supported only by
 modem service providers.

(b)
CreatePlayer Creates a player for a session.
DestroyPlayer Deletes a player from a session.
EnableNewPlayers Enables or disables the addition of new
 players or groups.
EnumPlayers Enumerates all of the players in a
 specified session.
GetPlayerCaps Retrieves the capabilities of a player's
 connection.
GetPlayerName Retrieves a player's friendly and formal
 names.
SetPlayerName Changes the player's friendly and formal
 names.

(c)
AddPlayerToGroup Adds a player to an existing group.
CreateGroup Creates an empty group of players for a
 session.
ATTR 5DeletePlayerFromGroup Deletes a player from a group.
DestroyGroup Destroys a group of players for a session.
EnumGroupPlayers Enumerates the players in a group.
EnumGroups Enumerates all of the groups associated
 with a specified session.

(d)
GetMessageCount Retrieves the number of messages waiting
 for a player.
Receive Retrieves messages that have been sent to
 a player.

Send Sends messages to other players or all of
 the players in a session.
Example 1: (a) Creating a globally unique identifier; (b) filling a list with
names of DirectPlay service providers.
(a)
// NetRoids
a64e8e40-5750-11cf-a4ac-0000c0ec0b9fDEFINE_GUID(NETROIDS_GUID,0xa64e8e40,0x5750,0x11cf,
0xa4,0xac,0x00,0x00, 0xc0,0xec,0x0b,0x9f);

(b)
DirectPlayEnumerate(EnumSP, (LPVOID) hWndCtl);
Example 2: Creating a new game session.
// Create a new game, so we're the host
IsHost = TRUE;
// Initialize session description structure
memset(&dpDesc, 0x00, sizeof(DPSESSIONDESC));
dpDesc.dwSize = sizeof(dpDesc);
dpDesc.dwMaxPlayers = MAXPLAYERS;
dpDesc.dwFlags = DPOPEN_CREATESESSION;
dpDesc.guidSession = pGuid;
strcpy(dpDesc.szSessionName, FullName);
// Try to open the session
if ((hr = lpIDC->lpVtbl->Open(lpIDC, &dpDesc)) != DP_OK)
{
 // We failed
 lpIDC->lpVtbl->Release(lpIDC);
 lpIDC = NULL;
 return(FALSE);
}
Example 3: (a) Opening an existing game; (b) if a session is selected, it is
opened with DPOPEN_OPENSESSION; (c) creating a player.
(a)
// Initialize the session description structurememset(&dpDesc, 0x00,
sizeof(DPSESSIONDESC));dpDesc.dwSize = sizeof(dpDesc);dpDes.guidSession =
*g_lpGuid;// Enum sessions with 5 second
timeoutlpIDC->lpVtbl->EnumSessions(lpIDC, &dpDesc, (DWORD)5000, EnumSession,
(LPVOID) hWndCtl, (DWORD)NULL);

(b)
// Initialize session description struc to open itmemset(&dpDesc, 0x00,
sizeof(DPSESSIONDESC));dpDesc.dwSize = sizeof(dpDesc);dpDesc.guidSession =
*g_lpGuid;dpDesc.dwFlags = DPOPEN_OPENSESSION;dpDesc.dwSession =
SendMessage((HWND) hWndCtl,LB_GETITEMDATA, iIndex, 0);
hr = lpIDC->lpVtbl->Open(lpIDC, &dpDesc);

(c)
// Either way, we have to create a playerif ((hr =
lpIDC->lpVtbl->CreatePlayer(lpIDC, &dcoID, NickName, "NetRoids Player",
&dphEvent)) != DP_OK){ // We failed lpIDC->lpVtbl->Close(lpIDC);
lpIDC->lpVtbl->Release(lpIDC); lpIDC = NULL; return(FALSE);}
Example 4: (a) Creating the MSG_HEREIAM message; (b) sending the message.
(a)
case MSG_HEREIAM: // Tell host we are here lpHereIAm = (LPHEREIAMMSG)CommBuff;
lpHereIAm->MsgCode = msg; lpHereIAm->ID = (DWORD)dcoID; nBytes =
sizeof(HEREIAMMSG); break;

(b)
// Send the messagelpIDC->lpVtbl->Send(lpIDC, dcoID, send_to, 0, 
(LPSTR) CommBuff, nBytes);
Example 5: (a) Receiving the MSG_HEREIAM message; (b) handling the message.
(a)
status = lpIDC->lpVtbl->Receive(lpIDC, &fromID, &dcoReceiveID, DPRECEIVE_ALL,
CommBuff, &nBytes);

(b)
case MSG_HEREIAM: // Someone wants to play if (IsHost) { // I'm the host, so
find out who is here lpHereIAm = (LPHEREIAMMSG)CommBuff; shipID =
lpHereIAm->ID; // Initialize them SendGameMessage(MSG_INIT, shipID, 0); }
break;










































































Designing Isometric Game Environments


Looking at game engines from a new angle




Nate Goudie


Nate is a computer engineering major at Lehigh University. He can be reached
at nsg2@lehigh.edu.


Tile-based games have been around almost as long as two-dimensional (2-D)
arrays. I can remember, for instance, sitting in front of a Commodore 64,
carefully maneuvering a tiny "pixelesque" humanoid from square to square,
thinking, "Hey, this is great!"
As computer games became more sophisticated, however, simple, top-down,
tile-based games gave way to bigger and better things. Why? Because someone
decided to look at things from a slightly different angle. The emergence of
the isometric point of view took tile-based games somewhere they had never
been before--into the third dimension. After playing through a prime example
of this isometric ingenuity (Microprose's award-winning X-com), I decided to
create my own multilevel, isometric environment. As Figure 1 shows, the
Axoview Engine 1.0 is the result of my efforts. In this article, I'll describe
how I designed the engine, and provide an overview of its inner workings.


Putting Things in Perspective


You need to keep two things in mind when designing an isometric layout: 
 Ease of visibility.
 Ease of putting things together.
Of all the isometric possibilities, the axonometric perspective provides
excellent vision clarity and allows for each cube to be easily broken down
into unique components; see Figure 2(a). The multilevel, tile-based world is
constructed entirely from axonometric cubes called "cell blocks." As Figure
2(b) illustrates, each cell block is composed of separate tiles: 
The base tile, which serves as a floor or ceiling.
X- and y-wall tiles, for walls, fences, and the like.
The prop tile, for stationary objects such as trees and chairs. 


Getting Coordinated


Cell blocks are arranged side-by-side on a 2-D grid, bearing a strong
resemblance to a top-down tile layout. The cell-block grid, however, is
rotated at 45 degrees. Now the origin of the rotated grid lies at the top-most
(farthest) corner, and the x- and y-map grid axes increase downward and to the
right or left, respectively, as in Figure 3. Therefore, the x-wall of each
cell block is the wall players come in contact with as they traverse the grid
in the x-direction; likewise for the y-wall. This Cartesian gridmap of cell
blocks is referred to as "map coordinates."
To locate various positions within each cell block, the map grid is further
broken down into a uniform number of virtual partitions per cell block. For
practical reasons, the number of partitions must be an integral power of 2.
These virtual partitions are referenced using world coordinates. Consequently,
each cell block has a local coordinate system (with the origin in the top
corner of the cell block) created by the world-coordinate partitions. This
"grid within a grid" is referenced using "cell coordinates" (see Figure 3).
Finally, a familiar coordinate system must be defined. The grid of pixels that
makes up the screen display will be referred to as "screen coordinates." 


Base Tile Design


The base tile is most important because it is directly related to the layout
of the cell-block grid. Its design hinges on two crucial choices: edge slope
and edge length. For standard 320x200 VGA displays, a slope of 0.5 pixels
produces the best results. This particular slope creates a series of two-pixel
steps which go down and right in the x-direction, and down and left in the
y-direction. The "length" of each edge is simply the number of two-pixel steps
per edge. This length must correspond to the number of virtual cell-block
partitions chosen earlier.
By connecting the ends of four edges to form a diamond (the base tile), a
graphic representation of the local cell-coordinate grid is formed; see Figure
4(a). Subsequently, by aligning adjacent base tiles so that the edges transfer
smoothly in both the x- and y-directions (no disturbance in edge slope), a
graphical representation of the rotated cell-block (map) grid is created. This
also reveals an extra edge of "padding" along the bottom half of each base
tile; see Figure 4(c). This padding completes the design of the base tile.
The alignment of base tiles also provides several important cell-block
dimension constants defined in Listings One and Two. Figure 4(b) illustrates
these dimensions.


Wall Tile Design


It should be noted that the base tile's edges of padding form slots between
adjacent base tiles. This is a convenient place for some walls. The width of
each wall tile is a function of the base tile width: wall width = (base tile
width/2)+2. The height of the wall must be an even number; choose the height
with user visibility in mind.


Prop Tile Design


Once the base and wall tiles are designed, the prop-tile dimensions are
straightforward. The width of the prop tile is simply the same width as the
base tile. The prop-tile height is equivalent to the height of the connected
base and wall tiles decremented by 1.



Leveling Things Out 


Multiple levels will be dealt with by stacking several identical cell-block
grids directly on top of one another. The base tiles of one level should rest
comfortably on the wall tiles of the level below. It should be noted that the
base tile of a certain cell block will serve as the ceiling tile for the cell
block directly below it. 


Basic Engine Structures: The Tile Maps


With the abstract design of the engine out of the way, how do you represent
the axonometric layout in memory? 
Generally, the axonometric world consists of a fixed number of levels. Each
level consists of a 2-D grid of cell blocks, and each cell block consists of
four unique tile components. This layout can be represented by four 2-D arrays
(one array for each component). The first dimension of each array is a level
index and the second is a cell-block location index (for the 2-D grid map).
These tile maps are arrays of bytes, where each byte holds a unique
tile-identification number. Therefore, every cell block in the world has a
corresponding value in the base_map, xwall_ map, ywall _map, and prop_map. See
Listing One for the tile-map declarations.


The Tile Class


Now that a means of storing tile layouts has been established, the tiles must
be capable of organizing themselves. The Tile class organizes tile bitmap data
through four data elements. The most important of these data elements is a
pointer to an array of tile bitmaps. Each tile bitmap is accessed through an
identification number (which works out well because that is exactly what is
stored in the tile maps). The Tile class keeps track of the width and height
of each individual bitmap (all bitmaps for an instance of the Tile class must
have identical dimensions), and also of the size in bytes of each bitmap. The
Tile class is declared in Listing Three.
Not surprisingly, the Axoview Engine declares four instances of the tile
class: base_tiles, xwall_tiles, ywall_tiles, and prop_tiles (see Listing One).
To declare an instance of the Tile class, the class constructor requires three
arguments: the number of tiles, the width of each tile, and the height of each
tile. The constructor calculates the size of each bitmap (widthxheight), and,
more importantly, allocates memory for each of the tile bitmaps (see Listing
Four).
But how do the bitmaps make it to the big screen? The Tile class' member
function blit() takes care of everything. You just need to be sure to pass it
three things: the tile identification number, screen coordinates (where you
would like the bitmap displayed), and a pointer to the off-screen buffer on
which it should do the blitting. How exactly does the blit() function blit?
The typical way of rendering a tile-based display would be to use the
painter's algorithm. The painter's algorithm draws the tiles back to front, so
that the tiles in the foreground cover the tiles beneath them (much like a
painter would paint a picture). Standard bitmap transparency also would be
applied, where each byte of the bitmap would be checked for transparency
(usually a value of 0). If the byte were "transparent," it would not be drawn
to the off-screen buffer. While this method works, it can be highly
inefficient, drawing several levels of cell-block tiles on top of each other,
even though many of the tiles would not be seen at all. It turns out that
there is a way to draw only what can be seen. I refer to it as the "reverse
painter's algorithm."


The Reverse Painter's Algorithm 


The reverse painter's algorithm (RPA) eliminates the blitting of "covered
over" portions of tile bitmaps--only the pixels that will be seen in the end
are drawn. The algorithm can be broken into several steps. First, fill the
entire off-screen buffer (destination viewport) with a string of transparent
(zero) values as quickly as possible. Second, calculate the order in which the
tiles would be drawn using the normal painter's algorithm and reverse it (draw
tiles in the foreground first). The cell-block tile-display order becomes:
prop, x-wall, y-wall, base. To draw from front to back according to the RPA,
the top-most level of cell blocks should be drawn first. Therefore, draw each
level from left to right across the screen, starting with the bottom-most row,
and ending with the top-most row. A significant problem still remains to be
addressed, however--can the blit() function keep the later-drawn background
tiles from copying over the foreground tiles? 


RPA-Style Drawing with Tile::blit() 


To preserve speed, the blit() function performs all clipping outside of the
main blitting loop. It first clips the edges of the tile bitmap to the
viewport, adjusting the starting and ending coordinates appropriately, and
calculates the necessary blitting offsets. It then loops through the unclipped
portion of the tile bitmap and copies the "permissible" bytes to the
off-screen buffer. 
It is here that the "magic" takes place. Instead of checking each byte of the
bitmap (the source byte) for transparency, the blit() function checks the
corresponding byte in the off-screen buffer (the target byte) for transparency
(recall that the off-screen buffer was initially filled with transparent
values). If the target byte is transparent (nothing has been drawn there yet),
the blit is deemed "permissible" and the function copies the source byte to
the target byte, as in Figure 5. The only drawback to this method is that
transparent source bytes may be needlessly copied over transparent target
bytes. This drawback is negligible, however, when compared to the inefficiency
of copying hundreds of tiles on top of one another. Listing Four presents the
full definition of Tile::blit().


It Was a Time to be Rendered


The next step involves rendering the multilevel cell-block environment. The
engine function draw_tilesRPA() assumes full responsibility for this task. The
function draw_tilesRPA() requires five arguments:
The pair of world coordinates around which the display should be centered.
The level that will serve as the top-most level.
The level around which the display should be centered.
A pointer to the off-screen buffer.
However, before draw_tilesRPA() can begin the process of rendering the layout
to the screen, we need a method for converting world coordinates to screen
coordinates (and vice versa).


Going from World to Screen, and Back Again


To convert from world coordinates to screen coordinates, you need to compare
the local cell coordinates of a base tile with its local screen coordinates;
see Figure 6. This comparison shows that a change of +1 along the
cell-coordinate x-axis is equivalent to the following screen-coordinate
changes: +2 in the x-direction, and +1 in the y-direction. Table 1 lists
similar comparisons.
These comparisons suggest the pair of equations in Example 1(a) for converting
from a change in world (or cell) coordinates to a change in screen
coordinates. Solving this pair of equations for the change in world (or cell)
coordinates yields the pair of equations in Example 1(b), which convert from a
change in screen coordinates to a change world coordinates. With these
conversions at the ready, the rendering process may begin.


Rendering with Function draw_tilesRPA()


The pair of world coordinates (x,y) passed to draw_tilesRPA() corresponds to
the center of the viewport. However, the rendering process does not start in
the center, but instead in the lower-left corner (RPA requirement). Therefore,
the screen-to-world conversion formulas are used to find the world coordinates
for the lower-left corner of the viewport. This is done by plugging the change
in screen coordinates (distances from center of viewport to lower-left corner)
into the formulas, then adding the equivalent change in world coordinates to
the original pair of world coordinates (center of viewport). The resulting
pair of world coordinates correspond to the lower-left corner of the viewport.
This world-coordinate location is called the "world intersection point;" see
Figure 7. 
Because the rendering process starts on the top-most level (not necessarily
the current level), the world intersection point must be adjusted to
compensate for the difference in screen height between levels. This change
along the screen y-axis is equivalent to the distance (in pixels) from the top
of a wall tile to the top of its connected base tile (there is no change along
the x-axis). Using the conversion formulas again, the equivalent level change
in world coordinates is added to the world intersection point for each level
that needs to be drawn above the current level. This centers the display
around the current level.



The Main Rendering Loop


Now for the main looping procedure. Starting with the top-most level, the
function calculates the map coordinates of the cell block that contains the
world-intersection point. This is done by a technique I call "binary masking,"
which uses the bitwise AND operator and a certain binary "mask" to retain (or
clip) a wanted or unwanted power of 2 from a number (in this case, the world
coordinates). Assuming that the coordinates are 16-bit integers, the binary
mask is calculated as Map Coordinate Mask = 0xffff-- (# of cell partitions -
1). This technique works because the number of cell partitions was chosen to
be an integral power of 2. The map location (cell block) that contains the
world intersection point is called the "intersection block" (see Figure 7).
Next, the local cell coordinates of the world intersection point are
calculated using the same binary-masking technique. However, in this case, a
different power of 2 needs to be clipped. Therefore, the cell-coordinate mask
is defined as Cell Coordinate Mask = ( # of cell partitions - 1 );. The
world-to-screen conversion formulas are then used to find the screen offsets
of the calculated cell coordinates. These screen offsets, once adjusted to the
viewport, yield the screen coordinates of the intersection block.
The function next finds the map and screen coordinates for the starting block
(the place where the rendering process will finally begin). The starting block
is found by dropping down a certain number of cell blocks from the
intersection block to account for the fact that walls and props of off-screen
base tiles may still be visible. Hence, the higher you make the walls, the
farther down you have to move the starting block; see Figure 7.
The time to render has finally come. Using the starting block as a starting
point, the tiles for each cell block are drawn from left to right (according
to the RPA). Each tile is drawn through a call to the appropriate
Tile::blit(), after getting the tile-identification number from the
appropriate tile map. After each row is drawn, the starting block is moved
alternatively up and to the left and up and to the right, as in Figure 7. Once
all the rows that fit into the viewport have been drawn, the process is done
for the current level. The world intersection point is then adjusted for the
next level down, and the main rendering process continues until all the levels
have been drawn. That's all there is to it! The Axoview Engine is ready to
display your worlds!


Conclusion


You can fill the tile maps with level layouts, fill the Tile class bitmaps
with artwork, and create axonometric, multilevel environments (with
applications beyond just game worlds). Moveable characters (and objects) could
be implemented by incorporating an object map (with the same dimensions as the
tile maps) that contains a linked list to all the objects currently in a cell
block. These characters could then be moved around the world using standard,
cell-oriented collision-detection techniques.
Although the Axoview Engine was written specifically for Mode 13h in DOS
(assembly-language files are available electronically; see "Availability,"
page 3), it can easily be ported to any platform--in some cases with just a
few changes to the Tile class' blitting function. Portability aside, however,
a multitude of possible projects stem from the Axoview Engine as it now
stands. Animated tile code could be thrown into the blitting algorithm, map
rotation could be incorporated to allow the user to change the perspective,
and so on. The possibilities are limited only by your imagination.
Figure 1: The Axoview Engine v1.0.
Figure 2: (a) The axonometric cube (cell block); (b) component breakdown of
the cell block.
Figure 3: The axonometric map-coordinate grid.
Figure 4: (a) Building the base tile from equal edges; (b) base-tile
dimensions to remember; (c) the finished base tile.
Figure 5: Reverse painter's algorithm tile-transparency blitting technique.
Figure 6: Base-tile coordinate-conversion analysis.
Figure 7: Main rendering process.
Example 1: (a) Equations for converting from a change in world (or cell)
coordinates to a change in screen coordinates; (b) equations which convert a
change in screen coordinates to a change in world coordinates.
(a)
 dX(screen) = 2*dX(world) - 2*dY(world)
 dY(screen) = dX(world) + dY(world)

(b)
 dX(world) = [ dX(screen) + 2*dY(screen) ] / 4
 dY(world) = [ 2*dY(screen) - dX(screen ) ] / 4
Table 1: Cell-coordinates-to-screen-coordinates conversion.
Change in Cell Change in ScreenCoordinates Coordinate

dX dY dX dY
+1 0 +2 +1
0 +1 -2 +1
 
+1 +1 0 +2
+1 -1 +4 0
-1 +1 -4 0
-1 -1 0 -2
-1 0 -2 -1
0 -1 +2 -1

Listing One
/********************************************************************
 AXOVIEW.H
 Header file for the Axoview Engine v1.0
 Contains engine constants, global data structures, and function prototypes.
 Written by Nate Goudie 1996
********************************************************************/
// Defines for Tile Totality Constants
#define BASE_COUNT 7 // Number of base tiles
#define WALL_COUNT 5 // Number of wall tiles
#define PROP_COUNT 3 // Number of prop tiles
// Defines for Tile Bitmap Dimension Constants
#define BASE_WIDTH 30 // Width of base tile
#define BASE_HEIGHT 16 // Height of base tile
#define WALL_WIDTH 17 // Width of wall tile

#define WALL_HEIGHT 28 // Height of wall tile
#define PROP_WIDTH 30 // Width of prop tile
#define PROP_HEIGHT 33 // Height of prop tile
// Defines for Viewport Dimension Constants
#define FULL_SCREEN_SIZE 64000 // Size of a full screen buffer
#define VIEWPORT_X_START 0 // Starting X coord of viewport
#define VIEWPORT_X_END 319 // Ending X coord of viewport
#define VIEWPORT_Y_START 0 // Starting Y coord of viewport
#define VIEWPORT_Y_END 199 // Ending Y coord of viewport
#define VIEWPORT_WIDTH 320 // Horizontal width of viewport
#define VIEWPORT_HEIGHT 200 // Vertical height of viewport
// Defines for Map Dimension Constants
#define LEVEL_MAX 4 // Maximum number of levels
#define XMAP_MAX 50 // Maximum X dimension of map
#define YMAP_MAX 50 // Maximum Y dimension of map
#define SCROLL_STEP 2 // World coordinate scroll step
#define MAX_CELL_UNITS 8 // No. of cell units per map unit
#define CELL_COORD_MASK 0x0007 // Cell coordinate mask
#define MAP_COORD_MASK 0xfff8 // Map coordinate mask
#define X_VPTM_OFFSET 10 // Viewport-to-map X offset
#define Y_VPTM_OFFSET 90 // Viewport-to-map Y offset
// Defines for Cell Block Dimension Constants
#define CELL_START_X 14 // Init X for cell -> screen op
#define CELL_START_Y 0 // Init Y for cell -> screen op
#define CELL_FULL_WIDTH 32 // X of cellA - X of adj cellB
#define CELL_HALF_WIDTH 16 // Half the full width
#define CELL_FULL_HEIGHT 16 // Height of base tile
#define CELL_HALF_HEIGHT 8 // Y of cellA - Y of next cellB
#define XWALL_XOFFSET -1 // X offset for the cell's X-Wall
#define YWALL_XOFFSET +14 // X offset for the cell's Y-Wall
#define WALL_YOFFSET -20 // Y offset for the cell's walls
#define PROP_YOFFSET -18 // Y offset for the prop tile
#define LEVEL_ADJUST 10 // Offset to move up/down a level
// Defines for view rectification of walls
#define START_YS_OFFSET 32 // Initial Y screen offset
#define START_XS_OFFSET 0 // Initial X screen offset
#define START_YM_OFFSET 2 // Initial Y map offset
#define START_XM_OFFSET 2 // Initial X map offset
#define START_XADD 16 // Initial Xadd value
// Declare global data structures
unsigned char palette[256*3]; // array for the color palette
// Declare arrays for the tile maps
unsigned char base_map[LEVEL_MAX][XMAP_MAX*YMAP_MAX];
unsigned char xwall_map[LEVEL_MAX][XMAP_MAX*YMAP_MAX];
unsigned char ywall_map[LEVEL_MAX][XMAP_MAX*YMAP_MAX];
unsigned char prop_map[LEVEL_MAX][XMAP_MAX*YMAP_MAX];
// Declare instances of the Tile class
Tile base_tiles(BASE_COUNT, BASE_WIDTH, BASE_HEIGHT),
 xwall_tiles(WALL_COUNT, WALL_WIDTH, WALL_HEIGHT),
 ywall_tiles(WALL_COUNT, WALL_WIDTH, WALL_HEIGHT),
 prop_tiles(PROP_COUNT, PROP_WIDTH, PROP_HEIGHT);
// Function prototypes
void draw_tilesRPA(int x, int y, int top_level, int current_level,
 unsigned char far *screenbuf);
int load_files();

Listing Two
/********************************************************************
 AXOVIEW.CPP

 The Axoview Engine v1.0. An engine which renders free scrolling tile-based 
 environments in axonometric perspective. Main program and engine functions.
 Written by Nate Goudie 1996
********************************************************************/
#include <stdio.h>
#include <dos.h>
#include <conio.h>
#include <alloc.h>
#include <mem.h>
#include <iostream.h>
#include <process.h>
#include "tile.h"
#include "screen.h"
#include "axoview.h"
void main()
 {
 clrscr();
 // Allocate memory for the offscreen buffer
 unsigned char far *screenbuf;
 screenbuf=new unsigned char[FULL_SCREEN_SIZE];
 // Get old video mode number
 int oldmode=*(int *)MK_FP(0x40,0x49);
 // Create pointer to video memory
 char far *screen=(char far *)MK_FP(0xa000,0);
 // Clear video memory ( set each byte to 0 )
 memset(screen, 0, FULL_SCREEN_SIZE);
 // Call function to load files into memory, exit on error
 if(!load_files()) exit(1);
 // Call assembly routines to set graphics mode and palette
 setgmode(0x13); // Set mode to 13h
 setpalette(palette); // Set VGA palette
 // Initialize starting variables
 int bye=0; // Exit program flag
 int x=0,y=0; // Initial world coord position
 int tlevel=(LEVEL_MAX-1); // Top level to display
 int clevel=0; // Current level position
 // Main rendering loop
 do
 {
 // Clear the offscreen buffer: 1st Step in RPA
 memset(screenbuf, 0, FULL_SCREEN_SIZE);
 // Render the axonometric tile display
 draw_tilesRPA(x, y, tlevel, clevel, screenbuf);
 // Copy offscreen buffer to the screen
 memmove(screen,screenbuf,FULL_SCREEN_SIZE);
 // Check to see if user hit a key, and process if so
 if ( kbhit() )
 {
 int xmap, ymap;
 char ch;
 ch=getch();
 switch (ch)
 {
 case 0:
 ch=getch();
 switch (ch)
 {
 // Left Arrow Key - shift map to the right
 case 'M':

 x+=SCROLL_STEP;
 xmap=int(x & MAP_COORD_MASK)/MAX_CELL_UNITS;
 if (xmap>(XMAP_MAX-1)) x-=SCROLL_STEP;
 break;
 // Right Arrow Key - shift map to the left
 case 'K':
 x-=SCROLL_STEP;
 xmap=int(x & MAP_COORD_MASK)/MAX_CELL_UNITS;
 if (xmap<0) x+=SCROLL_STEP;
 break;
 // Up Arrow Key - shift map downward
 case 'P':
 y+=SCROLL_STEP;
 ymap=int(y & MAP_COORD_MASK)/MAX_CELL_UNITS;
 if (ymap>(YMAP_MAX-1)) y-=SCROLL_STEP;
 break;
 // Down Arrow Key - shift map upward
 case 'H':
 y-=SCROLL_STEP;
 ymap=int(y & MAP_COORD_MASK)/MAX_CELL_UNITS;
 if (ymap<0) y+=SCROLL_STEP;
 break;
 }
 break;
 case 27:
 bye=1;
 break;
 }
 // Clear the keyboard buffer (eliminates backup)
 while(kbhit()) ch=getch();
 }
 }
 while(!bye);
 // Restore the old video mode and clear the screen
 setgmode(oldmode);
 clrscr();
 return;
 }
/*******************************************************************
 Function: draw_tilesRPA();
 Purpose: Renders the axonometric tile display using the
 "Reverse Painter's Algorithm"
 Arguments: x, y - pair of world coordinates on which
 display should be centered
 top_level - topmost level to display
 current_level - level on which display should be
 centered
 screenbuf - pointer to offscreen buffer
 Comments: The destination viewport (screenbuf) must be cleared
 (set to 0 values) before this function is called
*******************************************************************/
void draw_tilesRPA(int x, int y, int top_level, int current_level,
 unsigned char far *screenbuf)
 {
 // Add viewport-to-map offsets corresponding to the distances
 // from the center to the lower left corner of the viewport
 x+=X_VPTM_OFFSET;
 y+=Y_VPTM_OFFSET;
 // Adjust world position to compensate for levels which must

 // be drawn above the current_level position
 for ( int k=current_level; k<top_level; k++ )
 { x+=LEVEL_ADJUST; y+=LEVEL_ADJUST; }
 // Begin rendering process, starting with topmost level
 for ( int level=top_level; level>=0; level-- )
 {
 // Calculate map location (map coordinates)
 int mapx=int(x & MAP_COORD_MASK)/MAX_CELL_UNITS;
 int mapy=int(y & MAP_COORD_MASK)/MAX_CELL_UNITS;
 // Calculate location within cell (cell coordinates)
 int cellx=(x & CELL_COORD_MASK);
 int celly=(y & CELL_COORD_MASK);
 // Set up initial values for conversion from cell
 // coordinates to screen coordinates
 int xpos=CELL_START_X;
 int ypos=CELL_START_Y;
 // Calulate x and y screen coordinates using the
 // cell-to-screen conversion formulas
 xpos+=(2*cellx-2*celly);
 ypos+=cellx+celly;
 // Adjust screen coordinates to the viewport
 xpos=(VIEWPORT_X_START-xpos);
 ypos=(VIEWPORT_Y_END-ypos);
 // Make adustments to the map and screen coordinates
 // (go below the viewport) to ensure that the walls/objects
 // of offscreen base tiles may still be visible (drawn)
 mapy+=START_YM_OFFSET;
 mapx+=START_XM_OFFSET;
 xpos+=START_XS_OFFSET;
 ypos+=START_YS_OFFSET;
 // Set initial x increment value
 int xadd=START_XADD;
 // Hold on to starting values of map coordinates
 int mxhold=mapx, myhold=mapy;
 // Create temporary holding variables for the screen position
 int tempx=xpos, tempy=ypos;
 // Loop through and draw tiles: left->right, bottom->top
 do
 {
 mapx=mxhold; mapy=myhold; // Set to starting map values
 tempx=xpos; // Get starting x value for run
 do
 {
 // Check if current map position is in bounds, if not, skip
 if ((mapx>=0)&&(mapx<XMAP_MAX)&&(mapy>=0)&&(mapy<YMAP_MAX))
 {
 // Calculate current map coordinate location
 int loc=(XMAP_MAX*mapy)+mapx;
 // Get tile index values from the tile maps
 int prop_index=prop_map[level][loc];
 int xwall_index=xwall_map[level][loc];
 int ywall_index=ywall_map[level][loc];
 int base_index=base_map[level][loc];
 // Blit cell block tiles to screen in "reverse" order
 // if not of index 0 ( indicates nothing there )
 // 1st=Prop, 2nd=Xwall, 3rd=Ywall, 4th=Base
 if(prop_index) prop_tiles.blit(prop_index-1,
 tempx, tempy+PROP_YOFFSET, screenbuf);
 if(xwall_index) xwall_tiles.blit(xwall_index,

 tempx+XWALL_XOFFSET, tempy+WALL_YOFFSET,
 screenbuf);
 if(ywall_index) ywall_tiles.blit(ywall_index,
 tempx+YWALL_XOFFSET, tempy+WALL_YOFFSET,
 screenbuf);
 if((base_index)(level==0)) base_tiles.blit(base_index,
 tempx, tempy, screenbuf);
 }
 tempx+=CELL_FULL_WIDTH; // Incr x position for next x move
 mapx+=1; mapy-=1; // Incr map values for next x move
 }
 while ( tempx <= (VIEWPORT_X_END+1) );
 tempy-=CELL_HALF_HEIGHT; // Move y position up for next row
 xadd=-xadd; // Flip xadd orientation
 xpos+=xadd; // Add inc to x position
 if(xadd>0) // If xadd is positive,
 myhold-=1; // decrease starting y map pos
 else // If xadd is negative,
 mxhold-=1; // increase starting y map pos
 }
 while ( tempy >= (VIEWPORT_Y_START-CELL_FULL_HEIGHT) );
 // Adjust world coordinates in order to draw the next level
 x+=-LEVEL_ADJUST; y+=-LEVEL_ADJUST;
 }
 return;
 }
/*******************************************************************
 Function: load_files();
 Purpose: Reads files and loads data for engine into memory
 Arguments: none
 Comments: returns 1 if successful
 returns 0 if error is encountered
*******************************************************************/
int load_files(void)
 {
 int j;
 // Load palette, and load in straight tile bitmaps
 FILE *in;
 if ((in = fopen("pics.wad", "rb")) == NULL)
 {
 fprintf(stderr, "Cannot find file: PICS.WAD \n");
 return(0);
 }
 fread(palette, sizeof(palette), 1, in);
 for (j=0; j<BASE_COUNT; j++)
 fread(base_tiles.image[j], base_tiles.size, 1, in);
 for (j=0; j<WALL_COUNT; j++)
 fread(xwall_tiles.image[j], xwall_tiles.size, 1, in);
 for (j=0; j<WALL_COUNT; j++)
 fread(ywall_tiles.image[j], ywall_tiles.size, 1, in);
 for (j=0; j<PROP_COUNT; j++)
 fread(prop_tiles.image[j], prop_tiles.size, 1, in);
 fclose(in);
 // Load in data for tile map arrays
 if ((in = fopen("map.wad", "rb")) == NULL)
 {
 fprintf(stderr, "Cannot find file: MAP.WAD \n");
 return(0);
 }

 fread(base_map, sizeof(base_map), 1, in);
 fread(xwall_map, sizeof(xwall_map), 1, in);
 fread(ywall_map, sizeof(ywall_map), 1, in);
 fread(prop_map, sizeof(prop_map), 1, in);
 fclose(in);
 return(1);
 }

Listing Three
/********************************************************************
 TILE.H
 Header file for the Tile class of the Axoview Engine v1.0
 Contains the Tile class declaration.
 Written by Nate Goudie 1996
********************************************************************/
class Tile
 {
 // Make everything public
 public:
 char far **image; // Pointer to array of tile bitmaps
 int width,height; // Width and height of tile bitmap
 int size; // size of tile bitmap (in bytes)
 //Constructor for tile class:
 Tile(int num_tiles,int w,int h);
 // Function to draw tile into offscreen buffer
 void blit(int tile_num,int x,int y,
 unsigned char far *screen);
 };

Listing Four
/********************************************************************
 TILE.CPP
 The Tile class for the Axoview Engine v1.0
 - a class to handle the containment and drawing of tiles
 Contains the Tile class functions.
 Written by Nate Goudie 1996
********************************************************************/
#include <stdio.h>
#include <alloc.h>
#include <mem.h>
#include "tile.h"
// Declare constants for the viewport clipping
#define XMIN 0
#define XMAX 319
#define YMIN 0
#define YMAX 199
#define SCREENWIDTH 320
#define SCREENHEIGHT 200
/*******************************************************************
 Function: Tile::Tile();
 Purpose: The Tile class constructor
 Arguments: num_tiles - number of tiles (bitmaps)
 w - width of the tile in pixels (bytes)
 h - height of the tile in pixels (bytes)
 Comments: Will compute size of each tile in bytes, and will
 allocate memory for the tile bitmaps
*******************************************************************/
Tile::Tile(int num_tiles,int w,int h)
 {

 width=w;
 height=h;
 // Calculate size of tile bitmap
 size=w*h;
 // Allocate memory for tile bitmaps
 image=new char far *[num_tiles];
 for (int j=0; j<num_tiles; j++)
 image[j]=new char[size];
 }
/*******************************************************************
 Function: Tile::blit();
 Purpose: Tile class function which blits the indicated tile
 bitmap to an offscreen buffer using the
 "Reverse Painter's Algorithm".
 Handles clipping and tile transparency.
 Arguments: tile_num - index of tile bitmap to display
 x, y - offscreen coordinates where the tile
 should be drawn
 screen - pointer to offscreen buffer
 Comments: The destination buffer (screen) must be 320x200.
*******************************************************************/
void Tile::blit(int tile_num, int x, int y, unsigned char far *screen)
 {
 int txstart=0, txend=width, tystart=0, tyend=height;
 // Perform clipping before main loop
 // Clip tile to viewport and set boundaries to blit through
 if (x<XMIN)
 txstart=XMIN-x;
 if (y<YMIN)
 tystart=YMIN-y;
 if ((x+width-1)>XMAX)
 txend=XMAX-x+1;
 if ((y+height-1)>YMAX)
 tyend=YMAX-y+1;
 // Calculate tile and buffer starting offsets
 int toffset = (tystart*width)+txstart;
 int poffset = (y+tystart)*SCREENWIDTH+(x+txstart);
 // Calculate next row increments
 int toffinc = ((width-txend)+txstart);
 int poffinc = (SCREENWIDTH-(txend)+txstart);
 // Dereference one of the pointers to the tile bitmap for speed
 char far *tileimage = image[tile_num];
 // Now loop through and copy the tile bitmap to the screen buffer
 for(int row=0; row<(tyend-tystart); row++)
 {
 for(int column=0; column<(txend-txstart); column++)
 {
 // Get pixel from the offscreen buffer
 int dest_pixel=screen[poffset];
 // Check if it is transparent (0), if so, copy
 // the tile bitmap's pixel over it
 if (!dest_pixel)
 screen[poffset] = tileimage[toffset];
 poffset++; toffset++;
 }
 // Jump to start of next row
 toffset+=toffinc;
 poffset+=poffinc;
 }

 return;
 }
End Listings




























































Designing a Distributed Simulation Game


Communications classes and network services are the key




Ron van der Wal


Ron, author of the Tarma Simulation Framework, can be contacted at
tarma@pi.net.


Imagine students peering into computer monitors and arguing over investments,
price settings, and wages--all the time pointing to charts that show how well
(or badly) their company is faring in a simulated economy. The game I'm
describing is a distributed application that my company developed for a Dutch
university. Its object is to teach students the causes and effects of both
micro- and macro-economic decisions. It consists of a set of Windows-hosted
applications, interconnected through a LAN (Novell NetWare, Banyan VINES,
NetBIOS, and a shared-memory network simulation are supported), which form a
simulated national economy, consisting of 30-50 companies, a government, and a
banking sector.
The game is played in rounds that last anywhere between 1 and 30,000 seconds.
A "normal" round lasts for 60-120 seconds. During each round, each company
analyzes its position by means of its balance sheet, profit and loss
statements, and production and inventory data. Spreadsheets and charts allow
analysis of the corresponding historical data. Based on this information, its
board of directors decides on its prices, investments, and so on. At the end
of each round, current individual prices, investment and labor demands are fed
into an economic model (created by the university for which the game was
developed), and sales, labor allocations, and order stocks are distributed
amongst the companies. The process then continues to the next round. A game
normally lasts about two to three hours (or approximately 120 rounds).
In this article, I'll focus on the design of the game software and the network
communications that allow it to run on top of several different protocols. As
a bonus, I'll share a trick for dynamically loading C++ classes from DLLs. 


The Software Design Process


The game software is based on an object-oriented model in which companies, the
national economy, and the game controller are all objects. Each object is
responsible for the part of the economic model that it represents: Companies
keep track of their own inventories, production, balance sheets, and so on,
while the national economy consolidates summary data from individual
companies, and adds its own monetary policies and market-distribution
mechanisms. The game-controller object is responsible for overall control of
the game and the login/logout process. Clearly, communication between objects
is required.
Figure 1 illustrates the basic communication pattern between the
national-economy object and individual companies. Time runs from left to
right. At the end of each round, the economy object uses up-to-date
information from the company objects to perform the first series of
calculations (which involve allocation of sales and labor, among other things)
and relays this information back to the companies. On receipt of this
information, each company finalizes the then-expired period, updates its
balance sheet, inventory, and so forth, and prepares for the next round.
Meanwhile, the economy waits until all companies have finalized the period,
then performs its own finalization.


Publishers and Subscribers


The pattern in Figure 1 is appropriate for objects that live on a single
computer, but less so for a truly distributed system. Network communications
between objects living on different machines (or simply in different address
spaces) require a more elaborate approach. As in many other distributed
applications, communication is managed using a form of "proxies." Each company
object becomes a remote publisher of information, while the economy object
communicates with local subscribers to that information. Figure 2 shows this
modified communication pattern; note that the economy object and the company
subscribers live on one computer, while the company publishers live on several
other different computers.
For efficiency, company subscribers cache the most important information from
their publishers, which is updated as necessary by those publishers. This
means that information pertaining to the current round is frequently updated
as players make decisions regarding their company's policy, while at the end
of a round, the information is frozen and added to the historical data that
both publishers and subscribers maintain.
Publishers and subscribers are not restricted to a one-to-one relation; in
general, a publisher can have any number of subscribers. Figure 3 shows a
publisher's publication channel, to which subscribers may connect. The channel
has a bus-like structure that allows the publisher to send updates to its
subscribers by means of a multicast. This is not the only communication
pattern, however. Individual subscribers may also request specific information
from their publisher (as in a classic client/server relationship), and
conversely, a publisher may request information from one of its subscribers.
In this situation, the subscriber under consideration acts as an "author" of
information. In the game software, this is primarily used when a previous game
is being reopened from the central game repository: The company subscribers
(acting as authors) located on the economy's computer are used to initialize
their publishers to the state where the game left off. This approach is
feasible for other applications: for example, a "blackboard" architecture in
which several parties contribute to a set of common knowledge that is
maintained by the publisher.
The publisher/subscriber approach is useful for other purposes, too. Each
participating computer also displays a summary of the national economic data
and, of course, the state of the game at large (current round, progress in the
current round, and so on). Similar to the companies' information, economic and
game information is distributed by the publisher/subscriber mechanism. The
national economy object is turned into an economy publisher and all other
computers are equipped with economy subscribers. The same applies to the game
controller, whose subscribers display the simulated game date, time, and news
bulletin messages broadcast by the "government." To keep the information from
different sources separated, there are as many different publishing channels
as there are publishers; in this game, this amounts to n+2, where n is the
number of companies and the two additional channels are for the economy and
the game controller.
Figure 4 illustrates a hypothetical situation involving four computers. The
top computer is operated by the instructor. It contains publishers for game
and economic information, and subscribers for all companies. Below it are two
computers of teams that participate in the game; each has one subscriber for
the game information and one for the economic data. In addition, they both
house a publisher for their respective companies. Finally, at the bottom of
the figure is a computer used by an outside observer, that subscribes to the
game, the economic data, and all companies, but publishes nothing itself.


Publisher and Subscriber Classes


To capture the commonality of the publisher/subscriber pattern, I have
developed a small hierarchy of classes; see Figure 5. The top-level class is
called Actor and captures the common aspects of both publishers and
subscribers, including the ability to communicate across the network through
the use of a Port--an object that represents an endpoint in a network
connection, similar to like-named entities in network protocols (sometimes
called "sockets"). 
Derived from the Actor base class are the Publisher and Subscriber classes.
Their purposes are reflected in their interfaces: A Subscriber object is aware
of the network address of its Publisher, while the converse is not true. A
representative part of the C++ declarations of these classes is shown in
Listing One. Class cActor, contains a cPort object that takes care of the
actual network communications, a number of functions relating to this port and
the associated communication channel, and some functions for datagram
management. These functions are, in fact, central to the purpose of the cActor
hierarchy, which is to communicate across the network. At this level,
communication is performed by means of a datagram abstraction (not shown
here). A datagram contains fields to indicate the message type (for example,
information update, shutdown notification), its class (request/response,
broadcast, acknowledgment, and so on), some routing information, and the
message contents. Internally, datagrams are kept in a preallocated pool, which
is why cActors and their derivations must explicitly acquire and release the
cDatagram instances.
Dispatch of incoming datagrams in the cActor hierarchy is done through message
maps similar to those in Borland's OWL or Microsoft's MFC frameworks. The
cActor class contains the dispatching mechanism proper, as well as a default
message map that deals with a few predefined message types (such as
diagnostics). Derived classes can add their own message handling or override
existing mappings by including message maps in their own class declarations.
Messages that do not appear in any map are handled by
cActor::UnknownMessage(); by default, this function simply releases the
datagram (in the debug version, it also produces a trace message to that
effect). A similar function, cActor::IgnoreMessage() can be used to explicitly
indicate that the message type is known, but needs no further handling. It,
too, will release the datagram to its pool.
The derived classes cPublisher and cSubscriber specialize the default behavior
by adding support for the maintenance of a publication channel, and the
subscription to it, respectively. In particular, functions for the orderly
shutdown of a channel are implemented for both parties. Only the cSubscriber
class needs a message handler to cover this possibility.
Finally, you may have noticed that the actual allocation of publication
channels is not covered by one of the cActor-derived classes; instead, this
responsibility is delegated to a helper class of the game controller which
oversees channel management in general and is not part of the cActor
hierarchy.


Combining Actor Classes with Domain Classes


The cActor hierarchy knows nothing about companies, the economy, or any
domain-specific information. To create specific publishers and subscribers,
you must combine the cActor functionality with domain-specific classes. I'll
use the company classes as an example; the other domain classes are dealt with
in much the same way.
There are two basic company classes, cCompany and cCompanyPlayer; see Figure
6. Class cCompany represents the basic structure of a company, including its
historical data. Class cCompanyPlayer extends this functionality by adding
detailed information about the machine inventory and the company's policy,
both of which are only available to the companies themselves, not to outside
observers. Class cCompany is derived from a document-type base class provided
by the application framework that we used to interface to the Windows
environment; I used Borland's OWL, but other frameworks with similar
document/view architectures would work.
To this basic company functionality, the publisher and subscriber
functionality was added using the cPublisher and cSubscriber classes,
respectively, as mix-in classes in a multiple-inheritance setting. Since there
was no chance that both of them would appear as a base class for a further
derived class, their common ancestor, cActor, didn't need to be virtual. The
resulting classes are cCompanyPublisher (a player that also is active as a
publisher), and cCompanySubscriber (an observer that tracks updates from its
publisher). Each derived class has its own message map, which takes care of
properly dispatching initialization and update broadcasts (in the case of the
subscriber), and information requests (in the case of the publisher). In all,
14 different message types are currently defined for communication across a
company publication channel. A similar number applies to both the economy and
the game-control channels.


Network Communication Classes



As far as actors are concerned, a Port object is all they need to know about
the network. Behind the scenes, however, a lot goes on to make this
abstraction work. For one thing, the game has to run on top of several
different LANs (Novell NetWare, Banyan VINES, generic NetBIOS, and
occasionally AppleTalk) and several different operating systems. For the time
being, I'll restrict my discussion to 16-bit Windows. Most of the
implementation of the network layer runs on most of the platforms.
Furthermore, the network protocol is freely selectable (given the presence of
the corresponding network) and new protocols may even be added at run time.
We were able to accomplish this through abstraction and encapsulation in a
number of classes. In effect, we designed our own transport layer that
operates in terms of an abstract network protocol. For each of the network
protocols that must be supported, we provided concrete implementations of the
abstract protocol in terms of the API of the protocol under consideration--we
used IPX for Novell NetWare, IPC for Banyan VINES, and datagrams for NetBIOS
and AppleTalk. When we implement a TCP/IP version, we'll use UDP. Figure 7
provides for an overview of the relationships among the classes. Listing Two
presents the corresponding class declarations.
Class cTransportManager takes responsibility for overall communication
management. It exposes its services to the actors by means of the intermediate
cPort objects that we first encountered in the cActor hierarchy. As shown in
Listing Two, the cPort class offers its clients the ability to send datagram
messages in several ways. Conversely, when a datagram is received, the cPort
object will call back its cActor object and let it dispatch the datagram as
dictated by the actor's message maps. Class cTransportManager does not create
or destroy cPort objects, since they are normally assumed to be part of other
objects, but does provide the means to connect them to and disconnect them
from the network as appropriate. Furthermore, the cPort objects can use the
cTransportManager::Send...() functions to forward the datagrams that are
submitted by their own clients.
On the network side, cTransportManager uses the abstract interface of class
cNetProtocol to get the datagrams from and to the port objects across the
physical network. Class cNetProtocol is responsible for implementing some
basic network services present in all network protocols considered. In the
concrete derivations of the abstract cNetProtocol class, functions such as
cNetProtocol::SendBroadcast() and cNetProtocol::SendMessage() map almost
immediately to corresponding-protocol API functions as indicated earlier. We
have also implemented a shared-memory network simulation, which allows us to
test the network classes on a single computer. Originally, we did this for
testing only, but this pseudonetwork protocol turned out to be quite useful
for stand-alone demonstrations of the simulation game, and is now a standard
part of the software distribution.
The final two classes in Figure 7 are cDatagram and cDatagramPool. Class
cDatagram represents the actual datagram, as mentioned earlier; class
cDatagramPool assists cTransportManager in the maintenance of a pool of these
objects. There are several reasons for this pool. Datagram buffers must
normally be present during (network) interrupts, since several of the network
protocols use some kind of event-service routine on receiving or transmitting
a datagram. In the 16-bit Windows environment, this implies that those buffers
must be page locked. Since we need them often and without delay, it makes
sense to preallocate an ample number of them and let them be managed by a
separate class. Instead of new and delete, we use an acquire/release protocol
to manipulate datagram buffers. (We could have overloaded operators new and
delete, but they were already overloaded to allocate page-locked memory
chunks, and we also didn't want frequent calls to constructors and
destructors.)


Implementation of Network Services


The trio cPort/cTransportManager/cNetProtocol offers the following types of
datagram transmission:
Multicast to all ports connected to a given channel. Publishers use this to
announce updates or send other information to all their subscribers at once.
Point-to-point request and reply with guaranteed delivery, used by subscribers
and publishers in a client/server fashion (where the publisher itself
sometimes assumes the role of a client to an authoring subscriber).
Point-to-point informational message without reply. This is used in particular
during the shutdown of a node to announce its demise to the game-control
publisher, and for acknowledgment messages.
The network protocol class cNetProtocol only needs to provide two services:
multicast (or broadcast) and point-to-point, both of which may be unreliable.
Class cTransportManager improves upon this basic quality of service by
maintaining queues of pending requests (to retransmit the request if no reply
is received) and of recent replies (to respond to re-requests whose replies
were accidentally lost). A third queue holds pending transmissions in general,
since some network protocols cannot handle more than a few (perhaps ten)
datagram submissions at a time. In a heavily loaded game, there may be bursts
of a few hundred transmissions within a few seconds. The pending transmission
queue allows the cTransportManager class to adjust its outgoing pace to the
capabilities of the underlying protocol.
The request/reply protocol is a straightforward implementation of a
"request/reply with acknowledgment" algorithm with retransmission after a
time-out, described in detail in books such as Distributed Systems: Concepts
and Design, Second Edition, by G. Coulouris et al. (Addison-Wesley, 1994). The
idea is to attach a unique identifier and an expiration field to each request,
keep transmitted requests around until the corresponding reply is received,
and retransmit the request if a time-out period expires without a reply. This
may be repeated for several expiration periods, after which the other party is
deemed unreachable and an error indication (instead of a reply) is returned to
the original submitter of the request. If a reply is received, however, the
request is satisfied, and an acknowledgment is sent to the replying party,
which allows it to release any resources it might hold for retransmissions of
the reply. In practice, the length of the time-out period, the maximum number
of retries, and the expiration time of replies (in case acknowledgments are
lost) are subject to the quality of the underlying network, the overall
network load, the desired response times, and the risk one is willing to take
of falsely declaring a node unreachable. In our implementation, these are all
parameters that may be preset and that, to some extent, will adapt dynamically
to the network conditions.


Dynamically Loading New Classes


To load new network protocol classes at run time without linking them into the
code, place the class code in a DLL. You can call virtual functions through a
vtable, which is a glorified jump table. Suppose that we knew the address of
that jump table, and knew that index #3 would point to one function, #4 to
another, and so on. If you implement class cTransportManager in terms of the
(virtual) interface of class cNetProtocol, put them both in the application's
executable, and at run time provide a pointer to a cNetProtocol-derived object
in a DLL, you have:
The pointer to the jump table (the vtable; its address can be found at some
offset from where the object's this pointer points).
The indices of the various functions in that table, since the C++ compiler
courteously translates calls to virtual functions to look-up operations in
that same jump table.
This works like a charm, and there's no need for you to export anything from
the protocol's DLL. Remember, though, you must make the member functions
themselves exportable, and the class must be compiled as huge to get full-size
vtable pointers and contents, even if you don't actually export them in the
DLL's export table. They will still be called in a situation where DS!=SS and
that sort of thing, even if you didn't link to them or load their address in
any obvious way. To get that pointer to the cNetProtocol-derived object in the
first place, we do need a conventionally exported function--but only one. For
the simulation game, we called that function CreateProtocol() and demanded
that it have no parameters and return a pointer to cNetProtocol (but at run
time, it should return an object of a derived class), and that's it. When we
load a DLL for a network protocol, we call GetProcAddress() for the
CreateProtocol() function, call CreateProtocol(), and if it returns a nonzero
value, we have our network protocol. Thanks to the protocol's virtual
destructor, we don't even need any further assistance to get rid of it when
we're done. Finally, by placing a list of protocol descriptions, with the
names of the corresponding DLLs, in the application's .INI file, you can add
and remove protocols at run time.


Conclusion


In this article, I've covered a lot of material in a short space. Still, I
hope that I've shed some light on yet another distributed computing design,
and perhaps also shown some useful patterns and implementation techniques for
immediate application.
Figure 1: Communication between national economy and company objects.
Figure 2: Communication between economy and companies in a distributed
environment.
Figure 3: Communication channel between a publisher and its subscribers.
Figure 4: Distribution of publishers and subscribers over participating
computers.
Figure 5: Hierarchy of publisher and subscriber classes.
Figure 6: cCompany class hierarchy.
Figure 7: Class diagram for network communications.

Listing One
// Assume declarations of the following classes:
class cPort; // Network port abstraction
class cNetAddress; // Generic network address
class cDatagram; // Datagram message
// Abstract base class for Publisher & Subscriber
class cActor {
public:
 // Public virtual destructor; anyone can delete an Actor.
 virtual ~cActor();
 // Functions to obtain network information
 uint16 ChannelNo() const;
 void NetAddress(cNetAddress &adr) const;
 // Functions to interrogate & change connection state
 void DisconnectPort();
 bool IsConnected() const;
 virtual void Shutdown() = 0;
 virtual bool IsPublisher() const = 0;
 protected:
 // Constructor for use by derived classes

 cActor();
 
 // Access to the port object for derived classes
 cPort & Port();
 // Functions relating to datagram management
 cDatagram * AcquireDatagram();
 void ReleaseDatagram(cDatagram *);
 virtual void IgnoreMessage(cDatagram *);
 virtual void UnknownMessage(cDatagram *);
 // Implementation of message dispatcher
 bool DispatchMessage(cDatagram *);
 // Default message table
 DECLARE_MESSAGE_TABLE(cActor);
 private:
 // Actors own ports for their network communications.
 cPort mPort;
 };
 // Publisher class
 class cPublisher: public cActor {
 public:
 cPublisher();
 
 // Functions to manage the publication channel
 void BroadcastChannelDown();
 virtual void Shutdown();
 // Implementations of other cActor functions
 virtual bool IsPublisher() const { return true; }
 // Signature of the publisher
 uint16 Signature() const;
 };
 // Subscriber class
 class cSubscriber: public cActor {
 public:
 cSubscriber();
 
 // Functions that set the server address of the client.
 const cNetAddress &PublisherNode() const;
 void SetPublisherNode(const cNetAddress &);
 // Implementations of other cActor functions
 virtual bool IsPublisher() const { return false; }
 virtual void Shutdown();
 protected:
 // Default message responders
 void OnChannelDown(cDatagram *);
 virtual void ChannelDownAction() {}
 // Subscriber message table
 DECLARE_MESSAGE_TABLE(cSubscriber);
 private:
 // We keep the node address of our publisher
 cNetAddress mPublisherNode;
 };

Listing Two
// Network endpoint abstraction
class cPort {
public:
 cPort(cActor *);
 ~cPort();
 // Access to information regarding this port

 cTransportManager *Manager() const;
 void NetAddress(cNetAddress &) const;
 uint16 ChannelNo() const;
 // Function to check the connection state of the port
 bool IsConnected() const;
 void Disconnect();
 // Functions to send messages to our peers in other nodes.
 void SendBroadcast(cDatagram *);
 void SendInfo(cDatagram *, const cNetAddress &);
 void SendRequest(cDatagram *, const cNetAddress &);
 void SendReply(cDatagram *);
 private:
 friend class cTransportManager;
 
 // cPort instances are managed by the transport manager,
 // organized by channel number.
 cTransportManager *mManager;
 uint16 mChannelNo;
 // Pointer to the actor to be called back by the port.
 cActor * mActor;
 // Function to handle incoming datagrams
 void ReceiveDatagram(cDatagram *);
 };
 // Transport manager
 class cTransportManager {
 public:
 cTransportManager();
 ~cTransportManager();
 
 // Interface to start the network connection.
 int StartGroupAdmin(const char *);
 int StartGroupMember(const cGroupInfo &);
 // Stopping the network occurs in two phases
 void StartShutdown();
 void FinishShutdown();
 // Function to enumerate the active groups.
 int EnumGroups(cGroupList &);
 int LookupGroup(cGroupInfo &);
 // Node-level information functions.
 bool IsActive() const;
 bool IsAdmin() const;
 const char * GroupName() const;
 // Low-level protocol information functions.
 cNetProtocol *Protocol();
 const char * ProtocolName() const;
 void NetAddress(cNetAddress &);
 const cNetAddress *GroupAdmin() const;
 // Interface to attach and detach ports.
 void ConnectPortAt(cPort *, uint16);
 void ConnectNextPort(cPort *);
 void DisconnectPort(cPort *);
 uint16 NextChannelNo();
 // Functions for datagram buffer maintenance.
 cDatagram * AcquireDatagram();
 void ReleaseDatagram(cDatagram *);
 // Functions to send datagrams.
 void SendBroadcast(cDatagram *);
 void SendRequest(cDatagram *, const cNetAddress &);
 void SendInfo(cDatagram *, const TLNetAddress &);

 void SendReply(cDatagram *);
 void SendAck(cDatagram *);
 // On reception of a datagram, ReceiveDatagram() is called.
 void ReceiveDatagram(cDatagram *);
 void DispatchDatagram(cDatagram *);
 private:
 // Current network protocol
 cNetProtocol *mProtocol;
 // List of connected ports and next available channel
 cPtrArray<cPort> mPortList;
 uint16 mNextChannel;
 // A pool of datagrams is maintained by a subobject.
 cDatagramPool mPoolMgr;
 // Queues for reply and request transactions
 int16 mNodeID; // Unique node ID
 int16 mNextTid; // Transaction counter
 cDataQueue mRequestQ; // Request queue
 cDataQueue mReplyQ; // Reply queue
 cDataQueue mSendQ; // Pending send queue
 // Protocol maintenance
 int OpenProtocol();
 bool CloseProtocol();
 bool IsProtocolOpen() const;
 // Internal port maintenance
 void DisconnectAllPorts();
 // Function to maintain send and receive queues
 void PostSends();
 void CheckTransactions();
};
// Abstract base class for network protocols
class cNetProtocol {
public:
 // Virtual destructor to cater for derivation
 virtual ~cNetProtocol();
 // Functions to initialize and terminate the protocol
 virtual int InitProtocol() = 0;
 virtual int TermProtocol() = 0;
 virtual bool IsInited() const = 0;
 virtual const char *ProtocolName() const = 0;
 // Functions to open and close the network connection.
 virtual int OpenConnection(const cNetAddress * = 0) = 0;
 virtual int CloseConnection() = 0;
 virtual bool IsConnected() const = 0;
 virtual void NetAddress(cNetAddress &) = 0;
 // A node must be able to advertise its address.
 virtual int StartAdvertising(const char *) = 0;
 virtual int StopAdvertising(const char *) = 0;
 // Function to enumerate the active groups.
 virtual int EnumGroup(cGroupList &) = 0;
 virtual int LookupGroup(const char *, cNetAddress &) = 0;
 // Functions to send datagram messages.
 virtual bool SendBroadcast(cDatagram *) = 0;
 virtual bool SendMessage(cDatagram*, const cNetAddress&)=0;
protected:
 // Back pointer to transport manager
 cTransportManager *mManager;
 // Constructor for derived classes
 cNetProtocol(cTransportManager *);
};
































































Your Own Two-Dimensional Gaming Engine


Double buffering is the key, a bitmap compiler is the bonus




Mark Seminatore


Mark is coauthor of Tricks of the Game Programming Gurus (SAMS Publishing,
1994). He can be reached at 72040.145@compuserve.com.


The center of gravity in computer games is rapidly moving to
three-dimensional, first-person games with light-shaded, texture-mapped
polygons. However, the basic programming concepts and skills behind most games
generally remain the same. To illustrate some of these concepts, I'll present
a two-dimensional (2-D) game engine (Figure 1 is a typical game based on this
game engine), borrowing ideas from classics such as Space Invaders and
Galaxians. With some work, however, it could be made to resemble one of the
more-recent games, like Raptor. 
The game engine consists of a number of modules, including those that handle
VGA and keyboard routines, animation, and file management. Table 1 lists the
code modules (available electronically; see "Availability," page 3) that make
up the game engine. All code was developed and tested using Borland C++ 3.1,
but I made every effort to keep the code compiler neutral. The code should
compile under 32-bit Extended DOS (with the Watcom compiler) with some minor
changes.


The Keyboard


The system BIOS includes an interrupt handler to manage keyboard input. The
BIOS responds to a keyboard interrupt by reading make/break (press/release)
codes and storing them in a modest 16-character first-in-first-out (FIFO)
buffer. The majority of applications make calls to the BIOS routines via the
operating system. The BIOS routines return the keycodes sequentially from this
buffer. As an added bonus, most flavors of BIOS include a typematic feature
whereby, if a key is held down for more than half a second, the make codes are
repeated automatically.
At first glance, this would appear to be a workable method for managing
keyboard input. However, consider what happens when a key is held down for
several seconds: The BIOS receives a stream of key codes that quickly
overflows the 16-character buffer. The BIOS then begins to sound rather
annoying beeps on the system speaker. This is hardly a feature you'd like to
include in a game.
Your first task, then, is to write a routine that disables the BIOS keyboard
interrupt handler. This simply requires a new interrupt handler that avoids
key-code buffer overruns. Instead of storing actual key codes in a buffer,
I'll use a 128-byte array to store the current status of each key. Each key is
either currently pressed or released.
Example 1 shows what the new interrupt handler looks like. Notice that I've
declared a function pointer, OldInt9, that saves a pointer to the original
BIOS interrupt handler. This way you can restore the interrupt upon exiting
the game.


Video Display


The game engine is designed around the standard VGA video mode 13h (a good
video mode for games due to its simple memory organization). This mode is
supported by all VGA-compatible video adapters. Admittedly, the 320x200
resolution is a bit low given today's Super-VGA hardware, but this drawback is
more than offset by the ability to display 256 unique colors.
In mode 13h, a pixel is drawn by writing an 8-bit color value to a 64-KB
memory window at physical-memory location A000:0000. Each byte represents a
single pixel. Drawing pixels is simple and fast--you merely set a far pointer
to this window and dereference the pointer as if it were an array.
The video BIOS provides many useful routines. The engine includes a module,
vga.c, that encapsulates the BIOS calls you'll be using. In addition, the
module includes routines for tasks such as reading a 256-color palette from
disk.


PCX Files


Next, you need routines for reading bitmapped images. These bitmaps will be
used as sprites by the game engine. PCX is supported by most paint programs,
and source code is readily available for reading images stored in this format.
For these reasons, all the sprites used by the engine are stored on disk as
256-color PCX files.


Sprites


At this point, you need to develop data structures for the sprites to keep
track of the visual elements of the game. There are two basic classes of
sprites in the game engine--actors and missiles. Actors are the bad guys and
can have complex behavior such as dive-bombing players or shooting missiles.
Missiles are quite simple, marching along a fixed path until either they hit a
target or disappear off the screen.
You'll also want to keep track of some additional information for each visual
element: current location, current direction, and type. The actor type will
need a few additional fields (to be discussed in a moment). Example 2 shows
the player, sprite, and missile data structures. Sample sprites (in PCX
format) are available electronically.


Linked Lists


You may have noticed the field next in the actor and missile data structures.
This field is needed so that you can build linked lists of these data types.
The drawback of using linked lists is that they are more difficult to
implement than, say, an array of data structures. But for this game, you'll
need the ability to create and remove list items, namely actors and missiles,
on-the-fly. While this can be accomplished with an array, it is handled much
more efficiently with a linked list.
The sprite list-management code is in the module sprite.c (available
electronically). This is a relatively straightforward implementation of a
linked-list data structure using dummy nodes for the head and tail. All of the
basic list operations are supported, including insertion, deletion, and list
iteration.
The engine implements list iteration by means of the ForEachActor() and
ForEachMissile() functions, which take, as a parameter, a pointer to a user
function to apply to each list item. The user function itself takes, as a
parameter, a pointer to a list item. Also present are FirstActor() and
NextActor(), which allow for more complex processing.


Some Animation Basics



The engine uses several basic animation techniques, the first of which is
double buffering the video screen. To implement double-buffering, you allocate
memory for a 64-KB buffer during program startup. Next, you design all of the
custom video routines to draw to this off-screen buffer. After image
composition is complete, the buffer is copied to the screen in one large chunk
during vertical retrace.
The main goal of double buffering is to minimize the number of times you
access video memory. Because writes to video memory are generally slower than
writes to system memory, this usually results in faster animation. An added
benefit is that double buffering can be used to minimize image flicker, since
all image composition takes place off-screen.
The second animation technique is masked image copying, sometimes known as
"AND-XOR animation." This technique makes it possible to draw the sprites with
transparent colors. Why do you need transparent colors? Consider a typical
sprite bitmap. The bitmap shape is almost certainly not a rectangle, but
bitmaps are stored in rectangular blocks of memory. If we simply copied the
sprite bitmap to the display, the black edges of our bitmap would overwrite
and obscure any image underneath the new sprite.
To implement a masked bitblt, you need both a bitmap image and image mask.
When drawing the bitmap, you first copy the image mask to the double buffer
performing a bitwise AND to the double-buffer contents and the image mask. The
bitmap itself is then drawn to the double buffer using a bitwise XOR
operation. The result is that background images can be seen through the
transparent regions of the sprite.
Which leads to the obvious question: What is an image mask? An image mask is a
copy of the bitmap image in which the transparent color is replaced with its
bitwise complement. All other colors are replaced with the transparent color.
For sprites, I reserve color 0 as the transparent color. By common convention,
color 0 is defined as black and color 255 as white, but this is not required.
An image mask usually looks like a black and white negative of the bitmap.
Figure 2 is an example of a bitmap image and its corresponding mask. The
engine includes the CreateBitmapMask() routine, which creates a mask image
from a bitmap. Also included are PutBitmapAnd() and PutBitmapXor(), which can
be used for AND-XOR bitblts to the double buffer.
Using masked images is just one of several ways of handling bitmap
transparency. Unfortunately, the technique is not exactly optimal in the sense
that you perform additional processing for each pixel drawn. Worse yet, this
processing involves no less than six memory accesses per pixel. Nonetheless,
for reasonable numbers of relatively small sprites, this may be acceptable.
One alternative method is to encode the bitmap data such that runs of
transparent pixels can be skipped over. This allows the bitmap display
routines to skip over transparent pixels and draw runs of colored pixels using
block-copy routines. The cost of this method is the addition of a conditional
branch to the inner loop, but it eliminates the extra memory accesses. The
only other disadvantage to this method is the need to store bitmap data in a
nonstandard format.
Yet another method of handling transparency is to use compiled bitmaps. By far
the greatest benefit of this method is the tremendous performance. In fact,
I'll venture to say that there is no faster way to draw a bitmap with
transparency. A compiled bitmap is bitblt optimization taken to the
extreme--the bitmap data is preprocessed into machine code. This is the
minimal code required to draw only the nontransparent pixels which, if you
think about it, amounts to a totally unrolled inner loop. The key to
performance with compiled bitmaps is the complete elimination of all loops and
conditionals.
Example 3 is a typical compiled bitmap. One potential limitation of compiled
bitmaps is that significantly greater memory space is required. The data
expansion is on the order of six bytes for every nontransparent pixel, plus
some overhead. This results directly from the fact that the opcode of each mov
instruction takes two bytes plus one byte for the segment override. The
operand data includes two bytes for the offset and one byte for the actual
pixel data. If you use mov [si+off16], with its implied DS segment override,
you can save one byte per pixel at the cost of some additional overhead to
setup the DS register upon entry and restore it before returning.
The increased storage requirements of compiled bitmaps is significant enough
so that, for large sprites, memory space may be a problem. Then again, low
memory is a common and well understood problem in games, and there are a
number of workable solutions. These solutions include bitmap caching, virtual
memory, and EMS/XMS.
Another drawback of compiled bitmaps is that clipping to the viewport becomes
very difficult, if not impossible. You can accept that limitation in the
engine since sprites are not allowed outside the viewport. I designed the game
logic so that sprites change direction when they reach the borders of the
screen. Another approach might be to reduce the viewport size and only copy a
portion of the double buffer to the display. This would give the appearance of
sprite clipping without having to test each sprite.
To illustrate the level of performance improvements possible through the use
of compiled bitmaps, I have included a bitmap compiler routine with the
engine. To use the compiler, pass a bitmap pointer to CreateCompiledBitmap()
and the routine returns a far pointer to the compiled bitmap. Listing One
presents the source for the bitmap compiler.
Calling the compiled bitmap is as easy as making a far function call through
the pointer. You must make a far call because the game engine uses the compact
memory model and the return value is really a far pointer to data, rather than
a near code pointer, as you might expect. Note that this violates some of the
rules of protected-mode programming, where attempting to execute data will
generate a fault. If necessary, this can be handled by creating an alias
selector for the data. In flat-model, protected-mode environments, such as the
Tenberry DOS-Extender, this is usually not a problem because the code and data
selectors overlap and point to the same physical memory.
The compiled bitmap function is designed to be passed a far pointer to the
double buffer, and the x- and y-coordinates for the upper-left corner of the
sprite. You must be careful to pass only valid pointers to the compiled bitmap
function. Passing an invalid pointer to the compiled bitmap will almost
certainly crash the system.


Setup and Cleanup


Before getting to the heart of the engine, you need to attend to some
housekeeping details. The game engine requires some initialization routines to
set up the environment. Likewise, you'll need some routines to free memory
allocations and properly restore the environment.
When the game engine starts, it makes a call to StartUp(), which in turn calls
routines to set up the video mode, load bitmap sprites, install the new
interrupt handlers, and so on. All game variables and data structures,
including sprite lists, are initialized at this point. Upon a successful
return from StartUp(), the engine calls the event loop. When the current game
is over, ShutDown() is called to restore the environment before exiting.


The Event Loop


The core of nearly every game is an event loop--and this is where the game
spends the majority of its time. The event loop is, in essence, nothing more
than a while loop. This loop manages the various game tasks, in order, calling
helper routines as needed.
The helper routines handle tasks such as processing user input, drawing
sprites, and updating the screen. Many of these tasks require the application
of a helper routine to an entire list of items. An example of this is sprite
movement; each sprite chooses a course of action independently. In these
cases, you use the list-iteration routines, as in ForEachActor(MoveActor), to
apply a user function to each sprite.
The engine includes a function, EventLoop(), that is called after the start-up
routines and returns only when the game is over or the player hits Escape.
Figure 3 describes the event loop; Listing Two presents the complete source
code for it.
If you look closely at the outline in Figure 2, you may wonder why the aliens
are not drawn at the same time that they are moved. As you will see in a
moment, it is often convenient in games to subdivide tasks in this way.


Timer Functions


At this point, you have bitmap animation routines, sprite-list management, and
an event loop. What comes next? Well, you could place all of the action
routines inside the event loop and call it a day. The problem with this
approach is that the speed of the game animation is directly tied to hardware
speed. When run on a Pentium/133, it may well be that game play is
unacceptably fast relative to a 486/25.
Therefore, you must find some way to regulate the timing of the various game
activities. In the game engine, this is handled using the system timer. The
game engine manages the system timer in two ways. First, the 8254 timer IC is
reprogrammed to generate interrupts at a rate of 144 Hz. This allows for a
timer accuracy of about 7 milliseconds.
Second, the engine installs its own timer hardware-interrupt handler. This
interrupt handler manages several one-shot count-down timers. When the
interrupt handler receives control it decrements a global variable until it
reaches zero. Meanwhile, at each pass through the event loop, you check to see
if one of these counters has reached zero. If so, you call a helper routine
and reset the timer.
The game engine sets up unique timers for the game events: 
Missile movement.
Alien decision making.
Alien movement.
Alien animation.
Palette animation. 
Each of these events is scheduled to happen at a different rate. For example,
the actor-movement timer is setup to update the location of each actor about
every 13ms, while the actor bitmap changes once every quarter second. By
subdividing the various event-loop tasks, you can fine tune the rates of these
tasks by assigning them to different timers.


Palette Animation


One technique used frequently in games is palette animation. This is a really
clever way to produce simple animation. In designing the sprites for the game,
I reserved the last 16 colors for palette animation. During the game, the RGB
values of these colors are periodically cycled to all black and then back to
the original values. This is done by writing new values to the DAC palette
registers.
This is not the same thing as changing the individual pixel colors. You don't
need to calculate the memory addresses of the changing pixels (this alone is
reason enough to use palette animation). Moreover, by changing the RGB value
of a palette register, you instantly change the color of each and every pixel
on the screen.
The engine has a routine, AnimatePalette(), which makes calls to
AnimatePaletteUp() and AnimatePaletteDown() as required. These two routines
write directly to the VGA palette registers (rather than through the video
BIOS) for performance reasons. The routine AnimatePalette(), in turn, is
called from EventLoop(). Like many of the other game functions,
AnimatePalette() is controlled by a count-down timer.


Enemy AI



To this point, the actors do not have very interesting behavior. In fact,
without additional code, they do not have any behavior at all. It would
certainly be nice to give the actors some actions that challenge the player
and make the game interesting.
The technique I used for giving the actors some savvy is a simple finite-state
machine. A finite-state machine is an abstraction or model of behavioral
patterns: the premise being that an actor's current behavior depends upon its
previous behavior and upon the status of one or more external variables.
To implement the finite-state machine in the game engine, each actor is given
a variable that holds its current state. You then create a state table that
lists the new states an actor can attain, given its current state and another
variable. The engine chooses the new state randomly from one of five possible
choices. At each state change, you have the option of calling some action
routines. This is handled by the call ForEachActor(UpdateActorState) inside
the event loop.
Example 4 shows the state table used by the engine. The rows of the table
represent the chances or probability levels, while the columns represent the
current state. As you can see, if an actor is currently in the shooting state,
it has no choice but to return to the hover state. However, while in the hover
state, the actor may continue to hover (60 percent chance), shoot a missile
(20 percent chance), or dive at the player (20 percent).
Using the finite-state machine, you can build up quite complex behavior
patterns. If necessary, you could expand the state table defining new states
and/ or adding additional probability levels. You could even develop more
complex behaviors by layering or nesting state tables.


Game Play


Playing a game is simple. You are the hero racing through space in your
interceptor. Your ship can move in any direction, with movement controlled by
the arrow keys. The alien attack is unpredictable and sometimes confusing as
they attempt to destroy you. You have three lives, but you may eject from your
spaceship at any time by pressing the Escape key.


Conclusion


There are certainly a number of ways that the game engine could be extended
and improved. For instance, the star background could be replaced with tiled
scenery. Also, the state table could be expanded, or you could add additional
state tables defining entirely new classes of actors.
The count-down timer settings currently used by the engine provide a
reasonable balance between action and playability, but feel free to change
these values as you experiment with your own ideas.
I encourage you to use the code presented here as a basis for building your
own games. If you do develop a game using some of the techniques presented in
this article, I would love to hear about it.
Figure 1: Screen from a typical game.
Figure 2: A Sprite and its mask.
Figure 3: The event loop.
1. React to player inputs--move player, fire missiles
2. Draw the background
3. Move any missiles
4. Move the aliens
5. Draw the aliens
5. Draw our player
6. Update the video display
7. Palette Animation
8. Alien AI
9. Loop to 1 until game over
Table 1: Game-engine code modules.
Module Description
VGA.C Low-level VGA routines.
KEYBOARD.C Low-level keyboard routines.
PCX.C PCX file management.
DRAW.C Sprite animation.
PALANIM.C Palette animation.
DATA.C All global data.
INIT.C Program setup/shutdown.
VIDEO.C Low-level video routines.
SPRITE.C Sprite list management.
TIMER.C 8254 Timer IC routines.
COMPILE.C Bitmap sprite compiler.
MAIN.C Where it all begins, event loop.
Example 1: New keyboard interrupt handler.
unsigned char kbKeyboard[128];
// This is the new Int 09h handler
void static _interrupt NewInt9(void)
{
 register unsigned char acode;
 // read key code from keyboard
 kbScanCode=inp(0x60);
 acode=inp(0x61);
 outp(0x61,(acode 0x80));
 outp(0x61,acode);

 // send End-Of-Interrupt to 8259 PIC
 outp(0x20,0x20);
 // record a keypress
 kbKeyboard[kbScanCode & 127]=1;
 if(kbScanCode & 128)
 // clear a keypress
 kbKeyboard[kbScanCode & 127]=0;
}
Example 2: Sprite data structures.
typedef struct tagPlayer
{
 int x,y;
 unsigned Score,Level,Lives;
} Player;
typedef struct tagActor
{
 int type;
 int x,y;
 int dx,dy;
 int State,LastState,Momentum;
 unsigned MissileDelay;
 struct tagActor *next;
} Actor;
typedef struct tagMissile
{
 int type;
 int x,y;
 int dx,dy;
 struct tagMissile *next;
} Missile;
Example 3: (a) Portion of a compiled bitmap; (b) a more-efficient alternative.
(a)
mov byte ptr [es:di+10],FF
mov byte ptr [es:di+11],F0
mov byte ptr [es:di+12],F0
mov byte ptr [es:di+13],FF
retf
(b)
push ds
...
mov byte ptr [si+10],FF
mov byte ptr [si+11],F0
mov byte ptr [si+12],F0
mov byte ptr [si+13],FF
pop ds
retf
Example 4: Finite-state table.
 uchar StateTable[CHANCES][LAST_STATE]=
 {
 // col = current state, row = chance
 // HOVER SHOOT DIVE
 {HOVER_STATE, HOVER_STATE, DIVE_STATE},
 {HOVER_STATE, HOVER_STATE, DIVE_STATE},
 {HOVER_STATE, HOVER_STATE, DIVE_STATE},
 {SHOOT_STATE, HOVER_STATE, DIVE_STATE},
 {DIVE_STATE , HOVER_STATE, SHOOT_STATE}
 };

Listing One

// compiled.c - This module implements a bitmap compiler.
// Copyright (c) 1996 by Mark Seminatore, all rights reserved.
 #include <stdio.h>
 #include <stdlib.h>
 #include "compat32.h"
 #include "vga.h"
 #include "pcx.h"
 #include "sprite.h"
 #include "globals.h"
// #define UNIT_TEST
//
// []------------------------------------------------------------[]
// 
// 
// []------------------------------------------------------------[]
//
 CompiledSprite CompileBitmap(uchar *pBitmap, int width, int height)
 {
 uchar *pCompiled, *pTemp;
 unsigned column, row, TotalBytes=0;
 // calc bytes required
 pTemp=pBitmap;
 for(row=0; row < height; row++)
 for(column=0; column < width ; column++)
 if(*pTemp++)
 TotalBytes++;
 // far function-call overhead
 TotalBytes *= 5;
 TotalBytes += 25;
 pTemp = pCompiled = malloc(TotalBytes);
 if(!pTemp) return (CompiledSprite)pTemp;
 *pTemp++ = 0x55; // push bp
 *pTemp++ = 0x8b; // mov bp,
 *pTemp++ = 0xec; // ...sp
 *pTemp++ = 0x56; // push si
 *pTemp++ = 0x1e; // push ds
 *pTemp++ = 0xc5; // lds si,[bp+06] (pBitmap parm)
 *pTemp++ = 0x76;
 *pTemp++ = 0x06;
 *pTemp++ = 0x8b; // mov ax,[bp+12] (y parm)
 *pTemp++ = 0x46;
 *pTemp++ = 0x0c;
 *pTemp++ = 0xbb; // mov bx,320 (screen width)
 *pTemp++ = 0x40;
 *pTemp++ = 0x01;
 *pTemp++ = 0xf7; // mul bx (calc offset)
 *pTemp++ = 0xe3;
 *pTemp++ = 0x03; // add ax,[bp+10] (x parm)
 *pTemp++ = 0x46;
 *pTemp++ = 0x0a;
 *pTemp++ = 0x03; // add si,ax
 *pTemp++ = 0xf0;
 for(row=0; row < height; row++)
 for(column=0; column < width ; column++)
 {
 if(*pBitmap)
 {
 *pTemp++ = 0xc6; // mov [si+off16],xx
 *pTemp++ = 0x84;

 // 16-bit offset
 *(((unsigned*)pTemp)++) = row * 320 + column;
 // pixel data
 *pTemp++ = *pBitmap;
 }
 pBitmap++;
 }
 *pTemp++ = 0x1f; // pop ds
 *pTemp++ = 0x5e; // pop si
 *pTemp++ = 0x5d; // pop bp
 *pTemp = 0xcb; // retf
 return (CompiledSprite)pCompiled;
 }
#ifdef UNIT_TEST
 #include <conio.h>
 uchar *VideoMem;
 void main(void)
 {
 PcxImage pcx;
 CompiledSprite pfnSprite;
 int result,i;
 result=PcxLoadImage("alien1.pcx",&pcx);
 pfnSprite=(CompiledSprite)CompileBitmap(pcx.pBitmap, pcx.Width, pcx.Height);
 getch();
 VgaSetMode(0x13);
 for(i=0;i < 199-16; i+=16)
 pfnSprite(VideoMem,i,i);
 getch();
 VgaSetMode(3);
 }
#endif

Listing Two
// Main.c: This module contains the event loop code
// Copyright (c) 1996 by Mark Seminatore, all rights reserved.
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <dos.h>
 #include "compat32.h"
 #include "keys.h"
 #include "init.h"
 #include "vga.h"
 #include "pcx.h"
 #include "sprite.h"
 #include "draw.h"
 #include "palanim.h"
 #include "globals.h"
// []------------------------------------------------------------[]
// This is the main even-loop 
// []------------------------------------------------------------[]
 void EventLoop(void)
 {
 // ESC quits the game
 while(!kbKeyboard[ESC_KEY])
 {
 // fire a player missile
 if(kbKeyboard[SPACE_BAR])
 {

 // check if re-loading
 if(!PlayerMissileDelay)
 {
 AddMissile(PLAYER_MISSILE, player.x, player.y-MISSILE_H, NONE, UP);
 // delay for missile re-load
 PlayerMissileDelay=HALF_SECOND;
 }
 }
 // move player up
 if(kbKeyboard[UP_ARROW])
 {
 player.y--;
 if(player.y < 0)
 player.y=0;
 }
 // move player to the left
 if(kbKeyboard[LEFT_ARROW])
 {
 player.x--;
 if(player.x < 0)
 player.x=0;
 }
 // move player to the right
 if(kbKeyboard[RIGHT_ARROW])
 {
 player.x++;
 if(player.x + PLAYER_W > (SCREEN_WIDTH-1))
 player.x=(SCREEN_WIDTH-1) - PLAYER_W;
 }
 // move player down
 if(kbKeyboard[DOWN_ARROW])
 {
 player.y++;
 if(player.y + PLAYER_H > VIEW_HEIGHT)
 player.y= VIEW_HEIGHT - PLAYER_H;
 }
 // draw the starfield
 DrawBackground();
 // allow missile movement
 ForEachMissile(MoveMissile);
 // time to move?
 if(!AlienMovementDelay)
 {
 // allow actor movement
 ForEachActor(MoveActor);
 AlienMovementDelay=TWO_TICKS;
 }
 // draw the aliens
 ForEachActor(DrawActor);
 // show our hero
 DrawPlayer();
 // update double-buffer
 UpdateScreen();
 // check if time to animate palette
 if(PaletteToggle)
 {
 AnimatePalette();
 PaletteToggle=0;
 }

 // aliens don't think too quickly!
 if(!AlienStateDelay)
 {
 ForEachActor(UpdateActorState);
 AlienStateDelay=TWO_SECONDS;
 }
 // animate aliens
 if(!AlienAnimationDelay)
 {
 AlienBitmap=(AlienBitmap == Alien1Bitmap) ? Alien2Bitmap : Alien1Bitmap;
#ifdef PUTBM
 AlienMask=(AlienMask == Alien1Mask) ? Alien2Mask : Alien1Mask;
#endif
 AlienAnimationDelay=QTR_SECOND;
 }
 // you won...this round
 if(nActors ==0)
 {
 player.Level++;
 UpdateScore();
 InitAliens();
 }
 } // end while()
 } // end EventLoop()
// []------------------------------------------------------------[]
// This is the main program entry-point 
// []------------------------------------------------------------[]
 #pragma argsused
 void main(int argc,char *argv[])
 {
 // Call the various initialization routines
 StartUp();
 // Display player info
 UpdateScore();
 // Jump into the main program loop
 EventLoop();
 // Call the various cleanup routines
 ShutDown();
 }
End Listings























A MIDI Class in C++


Using VisualAge C++ collection classes




George Wright


George has been a member of the information systems/decision sciences
department at Loyola College in Baltimore since 1987. He can be reached at
geo@loyola.edu.


The musical instrument digital interface (MIDI) specification spells out a
compact, digital representation of a piece of music. The MIDI file format was
established under the auspices of the International MIDI Association to
standardize the way keyboards and synthesizers send and receive musical data.
By the mid-1980s, PCs were being used to control synthesizers, and MIDI
programming began to evolve. Most of us who have programmed for MIDI have
built on the work of Jim Conger and Michael Czeiszperger, authors of numerous
books and articles on the subject. Their work was the inspiration for this
article.
In connection with one MIDI project, I needed to convert some old MIDI
routines in C to C++. Because the hierarchical structure of a MIDI file lends
itself to the object paradigm, implementing MIDI object classes seemed to be
the ideal approach. The abstraction of the object classes greatly simplifies
the programming (once the classes are written, of course).
Given MIDI classes with basic functionality, application programs for
MIDI-file manipulation are greatly simplified. Quick-and-dirty programs of
three or four lines can read in a MIDI file, display it as a human-readable
musical score, add a missing key signature, extract a particular track to
another MIDI file, or transpose the key of the file (with appropriate
key-signature changes for scoring programs). In this article, I'll present
MIDI classes and several example programs.


MIDI Files and Events


A MIDI file consists of a collection of one or more tracks. Format 0 MIDI
files have all events in one track; format 1 MIDI files (the most common) have
multiple tracks. The "conductor track" in a format 1 file contains all
necessary information on tempo, key signature, title, copyright notice, and
the like. The conductor track is usually track 1. Other tracks contain musical
information, usually one track per instrument represented in the piece of
music. Each instrumental or channel track consists of many events. Channel
events produce, change, or stop a musical note. Such events include a note-on
event, a note-off event, an event signaling action of a pedal or lever, an
event modifying the sustain or volume of a note, or a pitch-changing event.
Along with the usual channel events, there are other events called
"metaevents," which include the information events in the conductor's track,
instrument names, lyrics, and cue points. System-exclusive events signal the
beginning or continuation of a series of arbitrary bytes, of meaning only to
the particular device receiving them. No matter what the type, each MIDI event
consists of a time, type signature, and series of data bytes. 
Listing One presents the event header file, EVENT.H. Note that the MIDI event
class has three private data members. The m_DeltaTime member is the elapsed
time since the last event in the same track. The time is measured in arbitrary
ticks. Actual elapsed time depends on the tempo established in the conductor's
track. Member m_EventType is one of an enumeration of types: 
Channel is an event on a certain channel, such as a note. 
Meta is a metaevent, such as text.
SysEx is an event exclusive to a certain type of MIDI equipment.
SysExCont is a continued system exclusive.
Undefined is an unknown type of event.
Error indicates an error condition. 
Member m_Data is an array of bytes of type CByteArray. The class CByteArray
takes advantage of the collection classes that are available with IBM's
VisualAge C++ for OS/2. Using the statement, typedef ISequence <char>
CByteArray allows you to take advantage of the ISequence template. This gives
you access to an efficient implementation of an abstract class with a
complete, systematic combination of basic properties; see Figure 1. Using the
ISequence template gives you a byte array with a full set of methods for
adding, locating, testing, and removing the data bytes of a MIDI event.
Listing One shows the usual complement of methods for an abstract class:
constructors, destructors, operations, and member get and set functions. The
workhorse functions--WriteData() and ReadData()--handle MIDI event I/O to and
from a stream. There's also a Printf() function that outputs a formatted,
readable MIDI event dump to cout. The implementation of the event functions
appears in Listing Two. ReadData() is the most elaborate, because it has to
cope with the wide variety of MIDI events, plus some other wrinkles. For one
thing, different channel events have different numbers of data bytes involved.
This is easily handled (after Czeiszperger) with a static table of additional
bytes needed.
Another wrinkle with reading (and writing) MIDI events is the concept of
running status. This concept is used to cut down on the number of bytes that
must be transmitted in the MIDI stream. If running status is set, the byte
indicating the type of event is skipped. It's assumed that the current event
type is the same as the last. Since most of the events are four bytes long,
this usually yields a 25 percent savings. When the high bit in the high-order
byte is set, it indicates running status is set.
A third wrinkle is apparent in the private methods of the event class. The
GetVar, PutVar, ReadVar, and WriteVar functions are necessary to deal with
variable-length numbers. In the interest of efficiency, the MIDI specification
requires that the number of bytes used to represent a number be minimal. Only
the number of bytes needed to represent a number is used. In a variable-length
number, the last byte has the most-significant bit (MSB) cleared; all other
bytes have it set. As a variable-length number is read, bytes are read until
one with the MSB cleared is found. The 7-bit values are then reassembled into
8-bit bytes.
Also apparent in Listing Two is the heavy use of the ISequence collection
class methods as member functions of the m_Data byte array. Functions such as
numberOfElements(), elementAtPosition(), and addAsLast() all come from the
ISequence template in Figure 1. 
The last line of code in Listing One, typedef ISequence <CEvent> CEventArray;
again makes use of the ISequence template to generate a new class,
CEventArray. The CEventArray class, to be used as a member of each MIDI track,
has all the functionality declared in Figure 1.


MIDI Tracks


A MIDI track consists of a header, track length, and number of events. The
track class defined in track.h (available electronically; see "Availability,"
page 3) represents a MIDI track. Every MIDI track has a track signature, MTrk.
Since all tracks have the same signature, you can represent the header as a
static member. I make use of the String class, also included with VisualAge
C++, to store the track header. The String class is similar to string-handling
routines commonly found in C++ textbooks. Track length is stored as an
unsigned 32-bit integer. The declaration uses UINT32, typedefed as an unsigned
long. The events that make up the rest of the track are handled compactly by
the CEventArray type, typedefed to an ISequence of CEvents.
track.h also shows the usual constructors, destructors, operators, get and set
functions, and utility functions. These are implemented in the MIDI track
class implementation; see track.cpp (available electronically). Implementation
of the track class is much shorter than the implementation of the event class,
because you're able to build on the event code as well as on the ISequence
code.
ReadTrack, the track class member function for reading a MIDI track, simply
clears an event buffer, calls ReadData for the event buffer, adds the event to
the CEventArray member, and loops until no more events remain. The WriteTrack
function steps through the CEventArray, calling the event WriteData function
for each. Both the ReadTrack and WriteTrack functions make use of utility
functions for reading and writing 32-bit unsigned integers. These functions,
adapted from David Charlap (see "The BMP File Format, Part I," Dr. Dobb's
Journal, March 1995), are available electronically. They let you handle 16-
and 32-bit quantities without worrying about the underlying byte-ordering of
the host platform. The track implementation closes with another invocation of
the ISequence template. This gives all the functionality of Figure 1 for a
collection of MIDI tracks.


MIDI Files


The file midi.cpp (available electronically) shows the definition of the MIDI
file object. The file midi.h (also available electronically) contains the
private members of the object. Each MIDI file begins with the header MThd. The
m_MThd member is stored as a static String object, since you only need one
copy no matter how many MIDI files you have. The header length m_HeaderLength
can also be stored as static, because it never varies from file to file.
The MIDI file specification defines three formats, called simply "0," "1," and
"2." A piece of music can be stored in a MIDI file of any of the three formats
and still sound the same when the file is played. It's just a matter of how
the data is stored internally. In type 0 format, all information is stored in
a single, multichannel track. Type 1 format, the most common, stores several
simultaneous tracks with the same tempo and time signature. Type 2 allows for
multiple tracks, each with its own tempo and time signature. Only types 0 and
1 are supported by this code.
The number of tracks in the file is stored in m_TrackCount. The member
m_Division contains information about timing. If the most significant of the
16 bits is cleared, the remaining 15 bits indicate the resolution of a quarter
note in ticks per quarter note. If the MSB is set, the rest of the high byte
is the Society of Motion Picture and Television Engineers (SMPTE) frame rate,
and the low byte is the resolution of a frame in ticks per frame. SMPTE timing
is a standardized time code used in the analog world of tape and video
recording. In such applications, SMPTE timing can synchronize MIDI events to
external events, such as actions on an accompanying video tape track. SMPTE
timing is not supported in this code. The last member of the MIDI object is
m_TrackArray, the collection of one or more tracks. The data type is the
CTrackArray class, generated from the ISequence template in track.cpp, the
track implementation.
The MIDI object implementation (midi .cpp) mimics the track implementation in
that most operations call corresponding methods for constituent tracks. The
MIDI object is the only one that has methods to read from and write to disk.
Export simply writes the header items and then steps through the CTrackArray
in m_TrackArray, calling the WriteTrack function for each track. The Import
function uses m_TrackCount to loop through clearing each track buffer, calling
ReadTrack, and calling ISequence::addAsLast to install the track into
m_TrackArray.


Using the Classes



Once all the up-front work of class definition is done, the MIDI file becomes
an abstract data type (ADT). This leads to very short application programs.
The musical fragment in Figure 2 shows the power of the MIDI ADT. This is the
file Czeiszperger treats in detail (see "References") as an example. (For the
sake of convenience, I've changed Czeiszperger's type 0 file to a type 1 MIDI
file.) Figure 2 shows no key signature, suggesting that the fragment is in the
key of C major or the key of A minor. A closer look at the second measure
shows a D-minor chord. It's probably either a fragment of a piece in A minor
with an unresolved subdominant D-minor chord or a piece in D minor with a
missing key signature.
Might there be any text metaevents in the MIDI file that would provide you
with a clue as to the proper key signature? Example 1 will print the contents
of the MIDI file named "cz.mid," the file that appears in score in Figure 2.
Figure 3 is a partial listing of the result, showing only track 1, where you
would expect to find a key signature. The track contains a time signature, a
tempo, and a title, but there is no key signature. This isn't a surprise. Many
MIDI files, never intended for scoring as in Figure 2, lack the proper key
signature. This isn't a problem during playback, of course, since each note
has its proper pitch value. It is a problem, however, when the file is scored.
In the absence of the right key signature, the scoring program simply places
an accidental for every sharp and flat note.
Assume the Czeiszperger piece is in D minor with a missing key signature, to
wit, one flat. We'd like to add the key signature to the MIDI file so that,
when scored, it would follow common notational conventions. Example 2 will
read a file named "cz.mid," add the one-flat key signature, set the minor
flag, and write the result to a file named "czkey.mid." The key signature
addition is accomplished by the call to CMidi::AddKeySignature() with the
number of sharps (a positive integer) or flats (a negative integer) and the
minor flag (0 for major key, 1 for minor key) as arguments. The file midi.cpp
shows that the MIDI-level function for adding a key signature always adds the
key signature to track 1. The MIDI-level function then passes the argument to
the track-level function.
The add is done in CTrack::AddKeySignature(). As can be seen in track.cpp
(available electronically), the track-level key signature function builds a
4-byte array consisting of the proper event type, proper data length, number
of sharps or flats, and minor flag. The byte array is placed into an event
constructed to add the key signature at 0 delta time. The new event is then
placed at the beginning of the track, and the track length is adjusted
accordingly. Figure 4 shows the result. Any decent scoring package will pick
up the one-flat key signature. It should also add the cancel sign for the
treble-clef B natural accidental in the first measure. (I used Voyetra's
Orchestrator Plus to prepare the figures, and it did both.)
A more common and more difficult requirement is transposition. This problem
frequently arises when orchestrating a MIDI piece for play by live performers.
Perhaps the key is wrong for a singer's vocal range. Perhaps no bassoon is
available, so the bassoon part must be transposed for baritone sax.
Transpositions, along with appropriate key signature changes, can be
accomplished easily with the functionality of the MIDI ADT.
To continue with our example, the code of Example 3 transposes the MIDI file
czkey.mid down one half step. Once again, follow the strategy of calling a
function only at the MIDI file level. In midi .cpp, CMidi::Transpose() defines
a cursor (part of the template code of Figure 1) to iterate over each track in
the MIDI file. For each track, the track level transpose function is called.
In track.cpp, CTrack:: Transpose() behaves similarly. It defines a cursor to
iterate over each event, calling the event-level function.
The work of transposition is accomplished in Listing Two's
CEvent::Transpose(). This function begins with key signature data in a static
array. Standard key signatures are used. (Some composers write in nonstandard
keys--such as Db minor instead of C# minor--which require more accidentals
than the standard keys. This code doesn't support nonstandard keys.)
Only two types of events are affected by transposition, the key signature
metaevent and the note on/off channel events. Each event is checked to see if
it is one of these two types. If the event is a key signature, the current
signature is found in the table, and then the new signature is found at the
requested transposition offset. The key signature doesn't change, of course,
if the transposition is one or more octaves, a multiple of 12 half steps. If
the event is a note-on or a note-off, the transposition offset is applied to
the note value. Since note values can't be lower than 0 (C five octaves below
middle C) or higher than 127 (G six octaves above middle C), transpositions
beyond these ranges are set to the highest or lowest possible octave. The
transposition of Figure 4 by the code of Example 3 appears in Figure 5. The
four sharps of C# minor appear, and the appropriate accidental has been added
in the treble clef, first measure.


Conclusion


In developing the event, track, and MIDI file classes, collection class
templates are an enormous help. The collection classes supplied with IBM's
VisualAge C++ provided efficient, function-rich code for handling the
file/track/event hierarchies in MIDI files. The code I present here is
sufficiently complete to allow you to expand the functionality to other MIDI
file processing, such as normalization of all note velocities (volumes).
The catch is efficiency. While the template code is sparse and easy to code at
high levels of abstraction, the overhead of the underlying code is concealed.
Large MIDI files bring correspondingly long run times. VisualAge C++'s trace
facility showed that a lot of time was spent allocating memory for event
arrays and deallocating it when destructors were called. This may be partially
because I used VisualAge's Equality Sequence template, which has the richest
set of functionality available. Now that I have a better idea what functions I
need for the MIDI ADT, it may be possible to choose a simpler and more
efficient template.
The next step is to try the MIDI ADTs on other platforms. A prototype port to
Visual C++ 1.5 and Microsoft Foundation Classes (MFC) under Windows was
successful in principle, but memory was exhausted by MIDI files of more than
2000 bytes. Another drawback was that VC++ 1.5 does not support templates.
However, my next project is to try Visual C++ 4.0 under Windows NT. The MFC
distributed with Version 4.0 includes template support and isn't limited to
the medium memory model. 


References


Charlap, D. "The BMP File Format, Part 1." Dr. Dobb's Journal, March 1995.
Conger, J. C Programming for MIDI. New York, NY: M&T Books, 1989.
------. "MIDI Programming in C, Part One: MIDI Input and Output." Electronic
Musician, September 1989.
------. "MIDI Programming in C, Part Three: Patch Librarian Basics."
Electronic Musician, November 1989.
------. "MIDI Programming in C, Part Two: MIDI Data Debugger." Electronic
Musician, October 1989.
------. MIDI Sequencing in C. New York, NY: M&T Books, 1989.
Czeiszperger, M.S. "Introducing Standard MIDI Files," Electronic Musician,
April 1989.
Murray, R.B. C++ Strategy and Tactics. Reading, MA: Addison-Wesley, 1993.
Pohl, I. C++ for C Programmers, Second Edition. Redwood City, CA:
Benjamin/Cummings, 1994.
Wyatt, D. Standard MIDI files 0.06. Available at
http://www.id.ethz.ch:80/~parish/ midi/midi_file_format.txt, March 1988.
Figure 1: IBM's Sequence declarations.
template < class Element >
class ISequence {
public:
 class Cursor : ICursor {
 Element& element ();
 void setToLast ();
 void setToPrevious ();
 Boolean operator== (Cursor const& cursor);
 Boolean operator!= (Cursor const& cursor);
 };
 ISequence (INumber numberOfElements = 100);
 ISequence (ISequence < Element > const&);
ISequence < Element >&
 operator = (ISequence < Element > const&);
 ~ISequence ();
Boolean add (Element const&);
Boolean add (Element const&, ICursor&);
void addAllFrom (ISequence < Element > const&);
Element const& elementAt (ICursor const&) const;
Element& elementAt (ICursor const&);
Element const& anyElement () const;
void removeAt (ICursor const&);
INumber removeAll (Boolean (*property)
 (Element const&, void*),
 void* additionalArgument = 0);

void replaceAt (ICursor const&, Element const&);
void removeAll ();
Boolean isBounded () const;
INumber maxNumberOfElements () const;
INumber numberOfElements () const;
Boolean isEmpty () const;
Boolean isFull () const;
ICursor* newCursor () const;
Boolean setToFirst (ICursor&) const;
Boolean setToNext (ICursor&) const;
Boolean allElementsDo (Boolean (*function) (Element&, void*),
 void* additionalArgument = 0);
Boolean allElementsDo (IIterator <Element>&);
Boolean allElementsDo (Boolean (*function)
 (Element const&, void*),
 void* additionalArgument = 0) const;
Boolean allElementsDo (IConstantIterator <Element>&) const;
Boolean isConsistent () const;
void removeFirst ();
void removeLast ();
void removeAtPosition (IPosition);
Element const& firstElement () const;
Element const& lastElement () const;
Element const& elementAtPosition (IPosition) const;
Boolean setToLast (ICursor&) const;
Boolean setToPrevious (ICursor&) const;
void setToPosition (IPosition, ICursor&) const;
Boolean isFirst (ICursor const&) const;
Boolean isLast (ICursor const&) const;
long compare (ISequence < Element > const&,
 long (*comparisonFunction)
 (Element const&,
 Element const&)) const;
void addAsFirst (Element const&);
void addAsFirst (Element const&, ICursor&);
void addAsLast (Element const&);
void addAsLast (Element const&, ICursor&);
void addAsNext (Element const&, ICursor&);
void addAsPrevious (Element const&, ICursor&);
void addAtPosition (IPosition, Element const&);
void addAtPosition (IPosition, Element const&, ICursor&);
void sort (long (*comparisonFunction)
 (Element const&,
 Element const&));
};
Figure 2: Czeiszperger's MIDI example.
Figure 3: Listing of Track 1 in readable form.
--------------- Track 1 --------------
Header Signature: MTrk
 Track Length: 53
Delta Time: 0
 Data Size: 6
 Type: time signature, (type 58).
 Data: 58 4 3 2 18 8
Delta Time: 0
 Data Size: 5
 Type: set tempo, (type 51).
 Data: 51 3 7 a1 20
Delta Time: 0

 Data Size: 32
 Type: text, (type 1).
 Data: TYPE 1 MIDI FILE
Delta Time: 0
 Data Size: 2
 Type: end of track, (type 2f).
 Data: 2f 0
Figure 4: MIDI example with D minor key signature added.
Figure 5: MIDI example transposed to C# minor.
Example 1: Code to print a MIDI file in readable form.
#include "midi.h"
void main(void)
{
 CMidi tempmidi;
 tempmidi.Import("cz.mid");
 tempmidi.Printf(); // method to print in readable form
}
Example 2: Code to add a D minor key signature.
#include "midi.h"
void main(void)
{
 CMidi tempmidi;
 tempmidi.Import("czkey.mid");
 tempmidi.Transpose(-1);
 tempmidi.Export("czcsmin.mid");
}
Example 3: Code to transpose down one-half step.
#include "midi.h"
void main(void)
{
 CMidi tempmidi;
 tempmidi.Import("cz.mid");
 tempmidi.AddKeySig(-1, 1); // 1 flat, minor = 1 => d minor
 tempmidi.Export("czkey.mid");
}

Listing One
// event.h---declares event class
#ifndef __C_EVENT_H_____LINEEND____
#define __C_EVENT_H_____LINEEND____
#include <iglobals.h>
#include <iseq.h>
#include <fstream.h>
const int TRUE = 1;
typedef ISequence <char> CByteArray;
enum EventType {Channel, Meta, SysEx, SysExCont, Undefined, Error};
class CEvent
{ 
public:
 CEvent();
 CEvent(const unsigned long& time, const EventType& TypeVal,
 const CByteArray& Data);
 CEvent(const CEvent& Event);
 ~CEvent();
 CEvent& operator = (const CEvent& Event);
 int operator == (const CEvent& Event) const;
 int operator != (const CEvent& Event) const;
 void Clear(void);
 int GetDataLength(void) const; 

 const unsigned long& GetDeltaTime(void) const;
 const EventType GetEventType(void) const;
 void Printf(void) const;
 int ReadData(ifstream&);
 CEvent& SetDeltaTime(const unsigned long& time);
 CEvent& SetEventType(const EventType& TypeVal);
 void Transpose(const int);
 void WriteData(ofstream&) const;
 
private:
 unsigned long m_DeltaTime;
 EventType m_EventType;
 CByteArray m_Data;
 
 int GetVar(const CByteArray&, unsigned long *) const;
 int PutVar(CByteArray&, const int,
 const unsigned long);
 int ReadVar(ifstream& ins, unsigned long *);
 void WriteVar(ofstream&, unsigned long) const;
};
typedef ISequence <CEvent> CEventArray;
#endif

Listing Two
// event.cpp---implements event class. Based on code from Michael
Czeiszperger.
#include "event.h"
#include <iomanip.h>
CEvent::CEvent() : m_DeltaTime(0l), m_EventType(Undefined), m_Data() {}
CEvent::CEvent(const unsigned long& DeltaTime,
 const EventType& TypeVal, const CByteArray& Data) :
 m_DeltaTime(DeltaTime), m_EventType(TypeVal),
 m_Data(Data) {}
CEvent::CEvent(const CEvent& Event)
{ 
 if (this != &Event)
 { 
 int ElementCount = Event.m_Data.numberOfElements();
 m_DeltaTime = Event.m_DeltaTime;
 m_EventType = Event.m_EventType;
 for (int i = 1; i <= ElementCount; i++)
 m_Data.addAsLast(Event.m_Data.elementAtPosition(i));
 }
}
CEvent::~CEvent()
{ m_Data.~ISequence(); }
 
CEvent& CEvent::operator=(const CEvent& Event)
{ 
 if(this != &Event)
 { 
 int ElementCount = Event.m_Data.numberOfElements();
 if (m_Data.numberOfElements() > 0) // Old data?
 m_Data.removeAll();
 m_DeltaTime = Event.m_DeltaTime;
 m_EventType = Event.m_EventType;
 for (int i = 1; i <= ElementCount; i++)
 m_Data.addAsLast(Event.m_Data.elementAtPosition(i));
 }
 return *this;

}
int CEvent::operator==(const CEvent& Event) const
{
 int ByteCount;
 if (this == &Event)
 return 1;
 if (m_DeltaTime != Event.m_DeltaTime 
 m_EventType != Event.m_EventType)
 return 0;
 ByteCount = m_Data.numberOfElements();
 if (ByteCount != Event.m_Data.numberOfElements())
 return 0;
 for (int i = 1; i <= ByteCount; i++)
 {
 if (m_Data.elementAtPosition(i) != 
 Event.m_Data.elementAtPosition(i))
 return 0;
 }
 return 1;
}
int CEvent::operator!=(const CEvent& Event) const
{ return !(*this == Event); }
void CEvent::Clear(void)
{ m_DeltaTime = 0l; m_EventType = Undefined; m_Data.removeAll(); }
const unsigned long& CEvent::GetDeltaTime() const
{ return m_DeltaTime; }
const EventType CEvent::GetEventType() const
{ return m_EventType; }
void CEvent::Printf() const 
{ 
 int byte, chan, i, size;
 const int LINELENGTH = 60;
 unsigned long DataLength;
 
 cout << endl
 << "Delta Time: " << dec << m_DeltaTime;
 size = m_Data.numberOfElements();
 cout << endl
 << " Data Size: " << dec << size << endl;
 if (!size)
 return;
 cout << " Type: ";
 byte = m_Data.elementAtPosition(1);
 switch ((int)m_EventType)
 {
 case (int)Channel: //--- channel event ----------------
 chan = (byte & 0xf) + 1;
 switch (byte & 0xf0)
 {
 case 0x80: cout << "Note off, channel " << chan << ", ";
 break;
 case 0x90: cout << "Note on, channel " << chan << ", ";
 break;
 case 0xa0: cout << "Pressure, channel " << chan << ", ";
 break;
 case 0xb0: cout << "Parameter, channel " << chan << ", ";
 break;
 case 0xe0: cout << "Pitchbend, channel " << chan << ", ";
 break;

 case 0xc0: cout << "Program, channel " << chan << ", ";
 break;
 case 0xd0: cout << "Channel pressure, channel " << chan <<
 ", "; break;
 default: cout << "Unknown event, "; break;
 }
 cout << "(type " << hex << byte << ")." << endl;
 if (size > 1)
 { 
 cout << " Data: "; 
 CByteArray::Cursor cursor(m_Data);
 forCursor(cursor)
 cout << hex << (int)m_Data.elementAt(cursor) << " ";
 cout << endl;
 }
 break;
 case (int)Meta: //--- meta event -------------------
 switch (byte)
 { 
 case 0x0: cout << "sequence number, "; break;
 case 0x1: cout << "text, "; break;
 case 0x2: cout << "copyright, "; break;
 case 0x3: cout << "sequence/track name, "; break;
 case 0x4: cout << "instrument name, "; break;
 case 0x5: cout << "lyric, "; break;
 case 0x6: cout << "marker, "; break;
 case 0x7: cout << "cue point, "; break;
 case 0x20: cout << "MIDI channel prefix, "; break;
 case 0x2f: cout << "end of track, "; break;
 case 0x51: cout << "set tempo, "; break; 
 case 0x54: cout << "SMPTE offset, "; break;
 case 0x58: cout << "time signature, "; break;
 case 0x59: cout << "key signature, "; break;
 case 0x7f: cout << "sequencer-specific, "; break;
 default: cout << "Unknown meta event, "; break;
 }
 cout << "(type " << hex << byte << ")." << endl;
 if (size > 1)
 { 
 cout << " Data: "; 
 if (0x0 <= byte && byte <= 0x5) // if text
 for (i = 3; i <= size; i++) // print char bytes
 cout << (char)m_Data.elementAtPosition(i);
 else for (i = 1; i <= size; i++) // dump hex bytes
 cout << hex << (int)m_Data.elementAtPosition(i) << " ";
 cout << endl;
 }
 break;
 case (int)SysEx: //--- sysex event ------------------
 case (int)SysExCont:
 if (size > 1)
 {
 i = GetVar(m_Data, &DataLength);
 cout << " Length: " << DataLength << endl;
 cout << "SysEx Data: "; 
 for (i = 1; i <= size; i++)
 {
 cout << hex << m_Data.elementAtPosition(i) << " ";
 if (((i * 2) % LINELENGTH) == 0)

 cout << "\n ";
 }
 }
 break;
 default:
 cerr << "Unexpected event type: " << (int)m_EventType
 << ". Aborting." << endl;
 exit (1);
 break;
 }
} 
int CEvent::ReadData(ifstream& ins)
{ 
 unsigned char c, c1;
 static int ChanType[] = {0,0,0,0,0,0,0,0,2,2,2,2,1,1,2,0};
 static int EventLength = 0;
 int i;
 unsigned long Length;
 int LengthLength = 0;
 int Needed;
 static int NoMerge = 0; 
 static int Running = 0;
 static int Status = 0;
 static int SysExContinue = 0; 
 EventLength = ReadVar(ins, &m_DeltaTime); 
 c = (char)ins.get(); 
 EventLength++;
 if (SysExContinue && c != 0xf7)
 {
 cerr << "Didn't find expected continuation of sysex. Aborting." 
 << endl;
 exit (1);
 }
 if ((c & 0x80) == 0)
 {
 if (Status == 0) 
 {
 cerr << "Unexpected running status---" << endl;
 cerr << "Status = " << hex << Status << ", Running = " << dec
 << Running << ". Aborting." << endl;
 exit(1);
 }
 Running = 1;
 } 
 else 
 {
 Running = 0;
 Status = c;
 }
 Needed = ChanType[(Status >> 4) & 0xf]; 
 if ( Needed ) // channel event?
 {
 m_EventType = Channel;
 m_Data.addAsLast(Status);
 if ( Running )
 c1 = c;
 else
 {
 c1 = ins.get(); 

 EventLength++;
 }
 m_Data.addAsLast(c1);
 if (Needed > 1)
 {
 m_Data.addAsLast(ins.get()); 
 EventLength++;
 }
 }
 else
 {
 switch (c)
 { 
 case 0xff: // meta event
 m_EventType = Meta;
 m_Data.addAsLast(ins.get()); 
 EventLength++;
 LengthLength += (int)ReadVar(ins, &Length);
 PutVar(m_Data, EventLength+1, Length);
 EventLength += LengthLength;
 for (i = 1; i <= (int)Length; i++)
 {
 m_Data.addAsLast(ins.get());
 EventLength++;
 }
 break;
 case 0xf0: // start of system exclusive 
 case 0xf7: // system exclusive continuation
 if (c == 0xf0)
 {
 m_EventType = SysEx;
 SysExContinue = (NoMerge == 0) ? 0 : 1;
 }
 else
 m_EventType = SysExCont;
 m_Data.addAsLast(Status);
 EventLength += ReadVar(ins, &Length);
 for (i = 1; i <= (int)Length; i++) 
 m_Data.addAsLast(ins.get());
 break;
 default:
 cerr << "Unexpected event type " << hex << (int)c 
 << ". Aborting." << endl;
 exit (1);
 break;
 }
 }
 return EventLength;
}
 
CEvent& CEvent::SetDeltaTime(const unsigned long& Time)
{ m_DeltaTime = Time; return *this; }
CEvent& CEvent::SetEventType(const EventType& TypeVal)
{ m_EventType = TypeVal; return *this; }
void CEvent::Transpose(const int steps)
{
 // major: C Db D Eb E F F# G Ab A Bb B
 // minor: A Bb B C C# D Eb E F F# G G#
 // sharps: 2 4 6 1 3 5

 // flats: 5 3 1 4 2
 static int keysig[] = {0,0xfb,2,0xfd,4, 0xff,6, 1,0xfc,3, 0xfe,5 };
 // key signature == ff 59 02 sf mi: +sf == count sharps,
 // -sf == count flats, mi == 0 => major, mi == 1 => minor
 if (m_EventType == Meta)
 {
 if (m_Data.elementAtPosition(1) == 0x59)
 {
 int i, keyshift, sharpflat;
 CByteArray::Cursor cursor(m_Data);
 keyshift = steps % 12; // no change for octaves
 m_Data.setToPosition(3, cursor); // position at sf
 sharpflat = m_Data.elementAtPosition(3);
 if (sharpflat == -6)
 sharpflat = 6;
 for (i = 0; i < 12; i++)
 if (keysig[i] == sharpflat)
 break;
 if (i == 12)
 {
 cerr << "Unexpected key signature " << sharpflat
 << ". Aborting." << endl;
 exit (1);
 }
 m_Data.elementAt(cursor) = keysig[(i+keyshift)%12];
 }
 }
 else if (m_EventType == Channel)
 {
 int byte = m_Data.elementAtPosition(1) & 0xf0;
 if (byte == 0x80 byte == 0x90) // note off or on
 {
 CByteArray::Cursor cursor(m_Data);
 m_Data.setToPosition(2, cursor); // position at note
 m_Data.elementAt(cursor) += steps;
 if (m_Data.elementAt(cursor) < 0) // lower than C-5?
 m_Data.elementAt(cursor) += 12; // then raise an octave
 else if (m_Data.elementAt(cursor) > 0x7f) // higher than G6?
 m_Data.elementAt(cursor) -= 12; // then lower an octave
 }
 }
}
void CEvent::WriteData(ofstream& outs) const
{ 
 static int CurrentChanType = 0; 
 static int Running = 0;
 
 WriteVar(outs, m_DeltaTime);
 if (m_EventType != Channel)
 {
 Running = CurrentChanType = 0;
 if (m_EventType == Meta)
 outs.put((char)0xff);
 }
 else
 {
 if (CurrentChanType == m_Data.elementAtPosition(1))
 Running = 1;
 else

 {
 Running = 0;
 CurrentChanType = m_Data.elementAtPosition(1);
 }
 } 
 for (int i = Running+1; i <= m_Data.numberOfElements(); i++)
 outs.put(m_Data.elementAtPosition(i));
}
int CEvent::GetDataLength(void) const
{ return m_Data.numberOfElements(); }
int CEvent::GetVar(const CByteArray& Data, unsigned long *Value) const
{
 int c;
 int count = 0;
 
 c = (char)Data.elementAtPosition(count++);
 *Value = (long)c;
 if (c & 0x80)
 {
 *Value &= 0x7f;
 do
 {
 c = Data.elementAtPosition(count++);
 *Value = (*Value << 7) + (c & 0x7f);
 } while (c & 0x80);
 } 
 return count;
}
int CEvent::PutVar(CByteArray& Data, const int Position,
 const unsigned long Value)
{
 unsigned long Buffer, TempVal;
 int i = 0;
 
 Buffer = Value & 0x7f;
 TempVal = Value;
 while ((TempVal >>= 7) > 0)
 {
 Buffer <<= 8;
 Buffer = 0x80;
 Buffer += (TempVal & 0x7f);
 } 
 while(TRUE)
 { 
 Data.addAsLast((char)Buffer);
 i++;
 if (Buffer & 0x80)
 Buffer >>= 8;
 else
 break;
 }
 return i;
}
int CEvent::ReadVar(ifstream& ins, unsigned long *Value)
{ 
 int c;
 int count = 0;
 
 c = ins.get();

 count++;
 *Value = (long)c;
 if (c & 0x80)
 {
 *Value &= 0x7f;
 do
 {
 c = ins.get();
 count++;
 *Value = (*Value << 7) + (c & 0x7f);
 } while (c & 0x80);
 } 
 return count;
}
 
void CEvent::WriteVar(ofstream& ofs, unsigned long Value) const
{
 unsigned long buffer;
 
 buffer = Value & 0x7f;
 while ((Value >>= 7) > 0)
 {
 buffer <<= 8;
 buffer = 0x80;
 buffer += (Value & 0x7f);
 } 
 while(TRUE)
 {
 ofs.put((char)buffer);
 if (buffer & 0x80)
 buffer >>= 8;
 else
 break;
 }
}
End Listings>>
<<


























Programming with OpenGL Primitives


Build your own three-dimensional widget collection




Ron Fosner


Ron runs Data Visualization, a software consulting group specializing in data
exploration and visualizing techniques. He is the author of the forthcoming
book OpenGL for Windows 95 and Windows NT (Addison-Wesley). He can be
contacted at ron@txtiac.net.


OpenGL is a graphics API that provides portable, hardware-assisted, 3-D
rendering. OpenGL includes the basic building blocks for creating objects that
are limited only by your imagination (and your programming ability). It's a
powerful graphics interface for creating effects such as hidden-surface
removal, shading, lighting, texture mapping, and so on. To support as many
hardware graphics accelerators as possible, however, it has a limited set of
drawing primitives. By limiting you to ten primitives, OpenGL forces you to
write your own routines (or use those in the auxiliary library). In this
article, I'll examine these primitives and show how you can put them to work
in your own programs.


OpenGL Primitives


The ten primitives supported in OpenGL can be divided into three categories:
point, lines, and polygons. All other functionality (lighting, texture
mapping, and the like) is provided by the OpenGL API. OpenGL contains just a
single point primitive, but the primitive gives you a great deal of
flexibility and control over how the point is displayed. You can control the
size and the antialiasing of the point. The default rendering of a point is
simply a pixel, but you can control the size to make points any number of
pixels in diameter, even fractional pixels. While you're not really displaying
half a pixel, OpenGL will "blur" the point across the pixels (antialiasing the
pixel), which gives the point the visual effect of being located fractionally.
For instance, Example 1(a) draws four points in a diamond pattern about the
origin. As with all OpenGL commands, you must wrapper the commands between
glBegin() and glEnd() statements. The glBegin() statement takes a single
argument representing one of the ten enum values for the primitive you're
creating; Table 1 and Table 2 list these values. The actual pixels drawn on
the screen depend upon a number of factors. Normally, the locations of a point
are reduced to a single pixel on the screen. If antialiasing is turned on, you
might instead get a group of pixels of varying intensity, with the sum of
their location representing a single pixel located across multiple pixel
boundaries. If you change the default pixel size with the glPointSize()
command, the pixels drawn will correspond to that size. If antialiasing also
is turned on, the edges of the point might be slightly "fuzzed," because
OpenGL attempts to represent a point that doesn't exactly fall on pixel
boundaries. You can query for the maximum and minimum size using the
glGetFloatv() function with the GL_POINT_ SIZE_ RANGE argument. 


OpenGL Lines


OpenGL provides a bit more latitude in manipulating lines. You not only
control line width, but you can specify stipple patterns. Lines are the first
primitive we've seen that are really affected by lighting calculations. Unlike
the point primitive, the order in which vertexes are specified (for both lines
and polygons) is important. When you construct your primitives, make sure that
the order in which vertexes are specified is correct. There are three
different line primitives that can be created; see Table 1.
To draw the diamond pattern used for the points in Example 1(a), simply change
the primitive specified in the glBegin() call, as in Example 1(b). This will
draw a parallelogram. Just as with points, the default size is one pixel, but
you can control the line size using glLineWidth() and the line smoothness with
antialiasing. Again, just as with points, you can get the maximum and minimum
line size by using the glGetFloatv() call with the appropriate arguments.
You can also specify stippled (patterned) lines. These patterns are
essentially a series of 0s and 1s specified in a 16-bit variable. For every 1
appearing in the pattern, drawing is turned on, and for every 0, drawing is
turned off. You can also increase the size of the pattern by specifying a
multiplying factor. This allows you to make the patterns appear larger. When
the full 16 bits have been used, the pattern is restarted. For example, a
pattern of 0xFFFF renders a solid line, while 0xFF00 renders a dashed line
with drawn and undrawn parts of equal length.
If you are rendering a series of connected lines (that is, they are all in the
same glBegin()-glEnd() sequence), the pattern continues across the connecting
vertices. This is useful if you're plotting data on a graph and want the
pattern to continue along the entire length of the line, or if you're plotting
a curved shape and want the pattern to flow along the curve. You can also
control the width of stippled lines just as you would solid lines. The width
is independent of the pattern, so you have complete control over the line. If
you need a line that contains a pattern in two or more colors, then you can
create this effect by creating patterns that only draw in the appropriate
locations, then using these patterns to create multiple, overlapping lines. As
long as you use the exact same vertices you should get the effect you want
with only the occasional overlapping pixel being drawn in both colors (use
opaque colors and you'll never notice).


OpenGL Polygons


OpenGL polygons are constrained in certain ways from the more-comprehensive
mathematical definition of a polygon. A polygon that OpenGL can render
correctly is a simple, convex polygon of three or more unique points that all
lie on the same plane. A polygon is simple if its edges don't intersect; that
is, two edges can't cross without forming a vertex. A polygon is convex if
it's never dimpled inwards--that is, for any two vertices of the polygon, a
line drawn between them remains on the interior of the polygon. Finally, all
of the points must lie on the same plane. (By "plane," I mean any arbitrary
plane, not just the three planes formed by intersections of the x, y, and z
axes.)
As examples of simple and nonsimple polygons, consider a square of cloth lying
on a table with all four corners pinned to the table. This meets our
definition of a simple polygon with four vertices. Now, if you take one corner
and pull it up, you have a curved surface and a nonsimple polygon. Be aware
that OpenGL is perfectly willing to render any nonsimple polygons, and it's up
to you to ensure OpenGL receives accurate information. There'll be no warning
if you enter a nonsimple polygon--but what happens when you try to render it
is undefined. Frequently, OpenGL will do a creditable job of rendering
polygons that are only slightly out of true. However, at certain angles the
polygon will look inaccurate, and it can be agonizing to figure out what's
wrong. This is why triangles are so popular. With only three points, they must
lie on a plane. That's why so many routines in OpenGL degenerate objects into
groups of triangles, since triangles meet all the requirements of being
simple, convex polygons.
Example 1(c) shows how to construct a filled parallelogram on the x-y plane.
The order of the vertices is important, since the order tells OpenGL which
side of the polygon is the "front." This affects objects that are completely
enclosed, or those visible from just one side. By judiciously constructing
objects so that you only need to render the front faces and not the back, you
can significantly increase performance. The face of a polygon that's rendered
to the screen and that has a counterclockwise vertex order around the
perimeter is (by default) considered the front face of the polygon, while a
clockwise order denotes the back face.


Front Faces, Back Faces, and Rendering Modes 


By default, both faces of a polygon are rendered. However, you can select
which faces get drawn, as well as how front faces are differentiated from back
faces. This is another of those subtle points that can bite you if you aren't
paying attention. If you mix the order in which you draw your polygons,
rendering becomes problematic, since you frequently don't want the back-facing
polygons to be displayed. For example, if you're constructing an astounding
3-D texture-mapped game in tribute to Star Trek, you probably don't want the
interior of the Romulan ships drawn. In fact, you probably don't want OpenGL
to bother with any of the faces of the objects that aren't visible, so you
specify that the back faces can be ignored in order to boost performance. So,
how do you tell if the polygon will be clockwise or counterclockwise in screen
coordinates? This can be very complicated. In practical terms, just make sure
that the object has an "outside" or a "front" and that this is the side with a
counterclockwise vertex order with respect to the screen.
You can control rendering using the glCullFace() command with a value
indicating front or back faces. This allows you to select which faces get
culled. Toggle culling using glEnable() with the appropriate arguments. And
you can swap the clockwise or counterclockwise designation of a front face
with the glFrontFace() command. Note that you can easily control the culling
for an object by prefacing its modeling commands with the appropriate
commands. Don't forget to restore the attributes that you'll need later.
It's also possible to individually control how the front face and back face
are rendered. The glPolygonMode() command takes two parameters. The first
parameter selects a polygon. The second parameter is the rendering mode, which
can indicate that the polygon should be rendered as only vertex points, lines
connecting the vertexes (as in a wireframe), or a filled polygon. This is
useful when the user is inside a model, and you want to give them a hint that
they are inside. If the mode were filled polygons, they'd just see a screen
full of color. If it were points (or back-face culling enabled) they would
probably see nothing recognizable. If the wireframe mode was on, then they'd
probably quickly get the idea that they were inside the model. You can control
which edges in a polygon are rendered and thus eliminate lines in the
wireframe. Finally, OpenGL can construct six different types of polygon
primitives, each optimized to assist in the construction of a particular type
of surface. Table 2 describes the polygon types you can construct.


Patterned Polygons


Just as you could stipple an OpenGL line, you can also stipple a polygon.
Instead of the default filled polygon style, specify a 32-bit-by-32-bit
pattern that will be used to turn rendering off and on. The pattern is window
aligned so that touching polygons will appear to continue the pattern. This
also means that if polygons move and the viewpoint remains the same, the
pattern will appear to be moving over the polygon!
The pattern can be placed in any consecutive 1024 bits, with a 4x32 GLubyte
array being a convenient format. The actual storage format of the bytes can be
controlled so that you can share bitmaps between machines that have different
Endian storage, but by default, the storage format is that of the particular
machine your program is running on. The first byte is used to control drawing
in the lower-left corner, with the bottom line being drawn first. Thus, the
last byte is used for the upper-right corner. 


Rendering Primitives



Now that you've been exposed to the basic primitives, let's see how you can
use some of them in your own code. In addition to specifying the type of
primitive and the vertices that make up the primitive, you'll need to specify
colors for the vertices. If you're interested in lighting for your model,
you'll also have to specify additional information with each vertex. 
OpenGL is a state machine, and it is never more obvious than when setting the
color of a vertex. Being a state machine means that, unless something
explicitly changes a state, it remains in that state. Once a color is
selected, all rendering will be done in that color. The glColor*() function is
used to set the current rendering color. This family of functions takes three
or four arguments: three RGB values and an (optional) alpha value. The
glColor3f() function takes three floating-point values for the red, blue, and
green color. A value of 0.0 means no intensity, while a value of 1.0 is full
intensity, and any intermediate value is partial intensity. Note that if
you're using a 256-color driver, you'll probably get a dithered color.
Color is selected on a per-vertex basis, but it's also affected by the current
shading model selected. If flat shading is currently selected, only one vertex
is used to define the shaded color for the entire polygon. (The vertex
defining the shaded color depends upon the primitive type.) If, however, you
have smooth shading enabled, each vertex can have a unique shaded color (which
depends upon the vertex's unshaded color, the color you assigned that
particular vertex, and the current light falling on the vertex). Between
vertices of different shaded colors, the intermediate pixel's colors are
interpolated between the shaded colors of the vertices. Thus, OpenGL will
smoothly blend from the shaded color at one vertex to the shaded color of
other vertices, all with no intervention from you.


Calculating Normal Vectors


A normal vector is a vector that is perpendicular to a line or a polygon.
Normal vectors are used to calculate the amount of light hitting the surface
of the polygon. If the normal vector is pointing directly at the light source,
the full value of the light is hitting the surface. As the angle between the
light source and the normal vector increases, the amount of light striking the
surface decreases. Normals are usually defined along with the vertices of a
model.
You must define a normal for each vertex of each polygon that you want to show
the effects of incident lighting. Assuming lighting is disabled, the rendered
color of each vertex is the color specified for that vertex. If lighting is
enabled, the rendered color of each vertex is computed from the specified
color and the effects of lighting upon that color. If you use the smooth
shading model, the colors across the surface of the polygon are interpolated
from each of the vertices of the polygon. If flat shading is selected, then
only one normal from a specific vertex is used. If you only use flat shading,
your rendering time will be significantly faster, not only because flat
shading is a faster shading model, but because you only have to calculate one
normal for each polygon. For a single polygon, the first vertex is used to
specify the color. For all the other primitives, it's the last vertex in each
polygon or line segment.
Calculating normals is relatively easy, especially if you're restricted to
simple polygons as you are when using OpenGL primitives. Technically, you can
have a normal for either side of a polygon. By convention, however, normals
specify the front face. Since you need at least three unique points to specify
a planar surface, you can use three vertices of a simple polygon to calculate
a normal. You take the three points and generate two vectors sharing a common
origin. You then calculate the vector or cross product of these two vectors.
The last step is to normalize the vector, which simply means ensuring the
vector is a unit vector. OpenGL will normalize all normal vectors for you, if
you tell it to. But you can use this routine to automatically provide only
unit normals.
Listing One takes three vertices specified in counterclockwise order and
calculates a unit normal vector based upon these points. If you use this
approach, you'll get reasonably good results, with lighting effects that look
good. However, you'll get an artifact called "faceting"--the shading on
adjacent polygons will be discontinuous, in some cases clearly showing the
individual polygons that make up the surface. If this is unacceptable, then
you can switch to smooth shading. However, smooth shading, while interpolating
between the vertices of a polygon, does nothing to make sure the interpolygon
shading is smooth. You'll need to use a larger number of polygons to define
your surface, or modify the normals to simulate a smooth surface.


Analytic Surfaces


An analytic surface is a surface defined by one or more equations. The easiest
way of getting surface normals for such surfaces is to take the derivative of
the equation(s). If the surface is being generated from a sampling function (a
function that provides interpolated values taken from a database), you'll have
to estimate the curvature of the surface and get the normal from this
estimate. There is an appendix in the OpenGL Programming Guide
(Addison-Wesley, 1993) that gives an overview of this approach. An alternative
is to take the current information and interpolate your own surface normals.
To reduce faceting of all the touching vertices, you'll need to force these
vertices to have the same normal. The simplest method is to average all of the
normals of each of the vertices and use this averaged value. For n polygons
that share a common point, there are n vertices, and n individual normals. You
need to sum all the normal values, n0 + n1 + n2 + ... + nn, then normalize
this vector and replace the original normal values with this value. 


Summary


Creating primitives is at the heart of three-dimensional programming in
OpenGL. It doesn't matter if it's a car widget or a velociraptor, the really
interesting things usually consist of one complicated thing made up of less
complicated parts, made up of still simpler parts. Once you get a feel for
creating primitives, you're on your way to creating your own toolbox of
primitives that you can reuse in a number of different ways.
Example 1: OpenGL statements to draw (a) four points in a diamond shape around
the origin; (b) a parallelogram; (c) a simple polygon.
(a) glBegin( GL_POINTS )
 glVertex2f( 0.0f, 2.0f ); // note 2D form
 glVertex2f( 1.0f, 0.0f );
 glVertex2f( 0.0f,-2.0f );
 glVertex2f(-1.0f, 0.0f );
 glEnd();
 
(b) glBegin( GL_LINE_LOOP )// make it a connected line segment
 glVertex2f( 0.0f, 2.0f ); // note 2D form
 glVertex2f( 1.0f, 0.0f );
 glVertex2f( 0.0f,-2.0f );
 glVertex2f(-1.0f, 0.0f );
 glEnd();
 
(c) glBegin( GL_POLYGON )
 glVertex2f( 0.0f, 0.0f ); // note 2D form
 glVertex2f( 1.0f, 1.0f );
 glVertex2f( 0.0f, 1.0f );
 glVertex2f( 2.0f, 0.0f );
 glEnd();
Table 1: The point and line primitive types that you can specify in the
glBegin() statement.
Type Description
GL_POINTS Draws a point at each vertex, for as many vertices
 as are specified.
GL_LINES Draws a line segment for each pair of vertices.
 Vertices v0 and v1 define the first line; v2 and
 v3, the next, and so on. If an odd number of
 vertices is given, the last one is ignored.
GL_LINE_STRIP Draws a connected group of line segments from
 vertex v0 to vn, connecting a
 line between each vertex and the next in the
 order given.

GL_LINE_LOOP Draws a connected group of line segments from
 vertex v0 to vn, connecting each
 vertex to the next with a line, in the order
 given, then closing the line by drawing a
 connecting line from vn to v0,
 defining a loop.
Table 2: Polygon primitive types.
Type Description
GL_POLYGON Draws a polygon from vertex v0 to
 vn-1. n
 must be at least 3, and the vertices must
 specify a simple, convex polygon. These
 restrictions are not enforced by OpenGL.
 In other words, if you don't specify the
 polygon according to the rules for the
 primitive, the results are undetermined.
 Unfortunately, OpenGL will be happy to
 attempt to render an ill-defined polygon
 without notifying you, so construct all
 polygon primitives carefully.
GL_QUADS Draws a series of separate four-sided
 polygons. The first quad is drawn using
 vertices v0, v1, v2, and v3. The next is
 drawn using v4, v5, v6, v7, and each
 following quad, using the next four vertices
 specified. If n isn't a
 multiple of four, then the extra vertices
 are ignored.
GL_TRIANGLES Draws a series of separate three-sided
 polygons. The first triangle is drawn using
 vertices v0, v1, and v2. Each set of three
 vertices is used to draw a triangle.
GL_QUAD_STRIP Draws a strip of connected quadrilaterals.
 The first quad is drawn using vertices v0,
 v1, v3, and v2. The next quad reuses the
 last two vertices and adds the next two,
 v2, v3, v4, and v5. Each of the following
 quads uses the last two vertices from the
 previous quad. n must be
 greater than 4 and a multiple of 2.
GL_TRIANGLE_STRIP Draws a series of connected triangles. The
 first triangle is drawn using vertices v0,
 v1, and v2, the next uses v2, v1, and v3,
 the next v2, v3, and v4. Note that the order
 ensures that they all are oriented alike.
GL_TRIANGLE_FAN Draws a series of triangles connected about
 a common origin, vertex v0. The first
 triangle is drawn using vertices v0, v1,
 and v2, the next uses v0, v2, and v3, the
 next v0, v3, and v4.

Listing One
// Pass in three points, and a vector to be filled
void NormalVector(GLdouble p1[3],GLdouble p2[3],GLdouble p3[3],GLdouble n[3])
{
 GLdouble v1[3], v2[3], d;
 // calculate two vectors, using the middle point as the common origin 
 v1[0] = p2[0] - p1[0];
 v1[1] = p2[1] - p1[1];

 v1[2] = p2[2] - p1[2];
 v2[0] = p2[0] - p0[0];
 v2[1] = p2[1] - p0[1];
 v2[2] = p2[2] - p0[2];
 
 // calculate the crossproduct of the two vectors
 n[0] = v1[1]*v2[2] - v2[1]*v1[2];
 n[1] = v1[2]*v2[0] - v2[2]*v1[0];
 n[2] = v1[0]*v2[1] - v2[0]*v1[1];
 
 // normalize the vector
 d = ( n[0]*n[0] + n[1]*n[1] + n[2]*n[2] );
 // try to catch very small vectors
 if ( d < (Gldouble)0.00000001)
 {
 // error, near zero length vector
 // do our best to recover
 d = (GLdouble)100000000.0;
 }
 else // take the square root
 {
 // multiplication is faster than division
 // so use reciprocal of the length
 d = (GLdouble)1.0/ sqrt( d );
 }
 n[0] *= d;
 n[1] *= d;
 n[2] *= d;
}
End Listing>>
<<
































RAMBLINGS IN REAL TIME


Quake's Hidden-Surface Removal




Michael Abrash


Michael is the author of Zen of Graphics Programming, Second Edition, and Zen
of Code Optimization. He is currently pushing the envelope of real-time 3-D on
Quake at id Software. He can be contacted at mikeab@idsoftware.com.


Okay, I admit it. I'm sick and tired of classic rock. Admittedly, it's been a
while--about 20 years--since I was excited to hear anything by the Cars or
Boston, and I was never particularly excited about Bob Seger or Queen--to say
nothing of Elvis--so some things haven't changed. But I knew something was up
when I found myself changing the station on the Allman Brothers and Steely Dan
and Pink Floyd and, God help me, the Beatles (just stuff like "Hello Goodbye"
and "I'll Cry Instead," though, not "Ticket to Ride" or "A Day in the Life;"
I'm not that far gone). It didn't take long to figure out what the problem
was; I'd been hearing the same songs for a quarter of a century, and I was
bored.
I tell you this to explain why, when my daughter and I drove back from dinner
the other night, the radio in my car was tuned, for the first time ever, to a
station using the slogan "There is no alternative."
Now, we're talking here about a ten-year-old who worships the Beatles and has
been raised on a steady diet of oldies. She loves melodies, catchy songs, and
good singers, none of which you're likely to find on an alternative rock
station. So it's no surprise that when I turned on the radio, the first word
out of her mouth was "Yuck!"
What did surprise me was that after listening for a while, she said, "You
know, Dad, it's actually kind of interesting."
Apart from giving me a clue as to what sort of music I'll hear blasting
through our house when she's a teenager, her quick uptake on alternative rock
reminded me of something that it's easy to forget as we become older: It's
essential to keep an open mind, and to be willing--better yet, eager--to try
new things. Programmers tend to become attached to familiar approaches, and
are inclined to stick with whatever is currently doing the job adequately, but
in programming there are always alternatives.
Not that I should have needed any reminding, considering the ever-evolving
nature of Quake.


Creative Flux


In my January/February column, I described the creative flux that led to John
Carmack's decision to use a precalculated, potentially visible set (PVS) of
polygons for each possible viewpoint in Quake, the game we're developing at id
Software. The precalculated PVS meant that instead of having to spend a lot of
time searching through the world database to find out which polygons were
visible from the current viewpoint, we could simply draw all the polygons in
the PVS from back to front (getting the ordering from the world BSP tree;
check out my May/June, July/August, and November/December 1995 columns for a
discussion of BSP trees). This draws the correct scene with no searching at
all, letting the back-to-front drawing perform the final stage of
hidden-surface removal (HSR). This was a terrific idea, but it was far from
the end of the road for Quake's design.


Drawing Moving Objects


For one thing, there was still the question of how to sort and draw moving
objects properly; in fact, this is the question I've been asked most often
since the January/ February column came out, so I'll take a moment to address
it. The primary problem is that a moving model can span multiple BSP leaves,
and the leaves that are touched vary as the model moves. That, together with
the possibility of multiple models in one leaf, means there's no easy way to
use BSP order to draw the models in correctly sorted order. When I wrote the
January/February column, we were drawing sprites (such as explosions),
moveable BSP models (such as doors), and polygon models (such as monsters) by
clipping each into all the leaves it touched, then drawing the appropriate
parts as each BSP leaf was reached in back-to-front traversal. However, this
didn't solve the issue of sorting multiple moving models in a single leaf
against each other, and also left some ugly sorting problems with complex
polygon models.
John solved the sorting issue for sprites and polygon models in a startlingly
low-tech way: We now z-buffer them. (That is, before we draw each pixel, we
compare its distance, or z value, with the z value of the pixel currently on
the screen, drawing only if the new pixel is nearer than the current one.)
First, we draw the basic world--walls, ceilings, and the like. No z-buffer
testing is involved at this point (the world visible-surface determination is
done in a different way, as we'll soon see); however, we do fill the z-buffer
with the z values (actually, 1/z values, also discussed later) for all the
world pixels. Z-filling is a much faster process than z-buffering the entire
world, because no reads or compares are involved, just writes of z values.
Once drawing and z-filling the world are done, we can simply draw the sprites
and polygon models with z-buffering and get perfect sorting all around.
Whenever a z-buffer is involved, two questions inevitably arise: What's the
memory footprint, and what's the performance impact? The memory footprint at
320x200 is 128 KB, not trivial but not a big deal for a game that requires 8
MB. The performance impact is about 10 percent for z-filling the world, and
roughly 20 percent (with lots of variation) for drawing sprites and polygon
models. In return, we get a perfectly sorted world, and the ability to do
additional effects, such as particle explosions and smoke, because the
z-buffer lets us flawlessly sort such effects into the world. All in all, the
use of the z-buffer vastly improved the visual quality and flexibility of the
Quake engine, and also simplified the code quite a bit, at an acceptable
memory and performance cost.


Leveling and Improving Performance


As I said before, in the Quake architecture, the world itself is drawn
first--without z-buffer reads or compares, but filling the z-buffer with the
world polygons' z values--and then the moving objects are drawn on top of the
world, using full z-buffering. Thus far, I've discussed how to draw moving
objects. For the rest of this column, I'm going to talk about the other part
of the drawing equation: How to draw the world itself, where the entire world
is stored as a single BSP tree, and never moves.
As you may recall from the January/ February column, we're concerned with both
raw performance and level performance. That is, we want the drawing code to
run as fast as possible, but we also want the difference in drawing speed
between the average scene and the slowest-drawing scene to be as small as
possible. It does little good to average 30 frames per second (fps) if 10
percent of the scenes draw at 5 fps, because the jerkiness of those scenes is
extremely obvious by comparison, and highly objectionable. It's better to
average 15 fps 100 percent of the time.
The precalculated PVS was an important step toward both faster and more level
performance, because it eliminated the need to identify visible polygons, a
relatively slow step that tended to be at its worst in the most complex
scenes. Nonetheless, in some spots in the game, the precalculated PVS contains
five times more polygons than are actually visible; together with the
back-to-front HSR approach, this created hot spots where the frame rate bogged
down visibly as hundreds of polygons were drawn back to front, most of which
were immediately overdrawn by nearer polygons. Raw performance in general was
also reduced by the typical 50 percent overdraw resulting from drawing
everything in the PVS. So, although drawing the PVS back to front as the final
HSR stage worked and was an improvement over previous designs, it was not
ideal. Surely, John thought, there's a better way to leverage the PVS than
back-to-front drawing. And indeed there is.


Sorted Spans


The ideal final HSR stage for Quake would reject all the polygons in the PVS
that are actually invisible, and draw only the visible pixels of the remaining
polygons, with no overdraw--that is, with every pixel drawn exactly once--all
at no performance cost, of course. One way to do that (although certainly not
at zero cost) would be to draw the polygons from front to back, maintaining a
region describing the currently occluded portions of the screen and clipping
each polygon to that region before drawing it. That sounds promising, but it
is in fact nothing more or less than the beam tree approach I described in the
January/February column, an approach that we found to have considerable
overhead and serious leveling problems.
We can do much better if we move the final HSR stage from the polygon level to
the span level and use a sorted-spans approach. In essence, this approach
consists of turning each polygon into a set of spans, as shown in Figure 1,
and then sorting and clipping the spans against each other until only the
visible portions of visible spans are left to be drawn, as shown in Figure 2.
This may sound a lot like z-buffering (which is simply too slow for drawing
the world, although it's fine for smaller moving objects, as described
earlier), but there are crucial differences. In contrast with z-buffering,
only visible portions of visible spans are scanned out pixel by pixel
(although all polygon edges must still be rasterized). Better yet, the sorting
that z-buffering does at each pixel becomes a per-span operation with sorted
spans, and because of the coherence implicit in a span list, each edge is
sorted against only some of the spans on the same line, and clipped only to
the few spans that it overlaps horizontally. Although complex scenes still
take longer to process than simple scenes, the worst case isn't as bad as with
the beam tree or back-to-front approaches, because there's no overdraw or
scanning of hidden pixels, because complexity is limited to pixel resolution,
and because span coherence tends to limit the worst-case sorting in any one
area of the screen. As a bonus, the output of sorted spans is in precisely the
form that a low-level rasterizer needs: a set of span descriptors, each
consisting of a start coordinate and a length.
In short, the sorted spans approach meets our original criteria pretty well:
Although it isn't zero-cost, it's not horribly expensive, it completely
eliminates both overdraw and pixel scanning of obscured portions of polygons,
and it tends to level worst-case performance. We wouldn't want to rely on
sorted spans alone as our hidden-surface mechanism, but the precalculated PVS
reduces the number of polygons to a level that sorted spans can handle quite
nicely.
So we've found the approach we need; now it's just a matter of writing some
code and we're on our way, right? Well, yes and no. Conceptually, the
sorted-spans approach is simple, but it's surprisingly difficult to implement,
with a couple of major design choices to be made, a subtle mathematical
element, and some tricky gotchas that we'll see in the next column. Let's look
at the design choices first.


Edges versus Spans


The first design choice is whether to sort spans or edges (both of which fall
into the general category of "sorted spans"). Although the results are the
same both ways--a list of spans to be drawn, with no overdraw--the
implementations and performance implications are quite different, because the
sorting and clipping are performed using very different data structures.

With span sorting, spans are stored in x-sorted, linked-list buckets,
typically with one bucket per scan line. Each polygon, in turn, is rasterized
into spans, as shown in Figure 1, and each span is sorted and clipped into the
bucket for the scan line the span is on, as shown in Figure 2, so that at any
time each bucket contains the nearest spans encountered thus far, always with
no overlap. This approach involves generating all spans for each polygon, in
turn, with each span immediately being sorted, clipped, and added to the
appropriate bucket.
With edge sorting, edges are stored in x-sorted, linked-list buckets according
to their start scan line. Each polygon, in turn, is decomposed into edges,
cumulatively building a list of all the edges in the scene. Once all edges for
all polygons in the view frustum have been added to the edge list, the whole
list is scanned out in a single top-to-bottom, left-to-right pass. An active
edge list (AEL) is maintained. With each step to a new scan line, edges that
end on that scan line are removed from the AEL, active edges are stepped to
their new x coordinates, edges starting on the new scan line are added to the
AEL, and the edges are sorted by the current x coordinate.
For each scan line, a z-sorted active polygon list (APL) is maintained. The
x-sorted AEL is stepped through in order. As each new edge is encountered
(that is, as each polygon starts or ends as we move left to right), the
associated polygon is activated and sorted into the APL, as shown in Figure 3,
or deactivated and removed from the APL, as shown in Figure 4, for a leading
or trailing edge, respectively. If the nearest polygon has changed (that is,
if the new polygon is nearest, or if the nearest polygon just ended), a span
is emitted for the polygon that just stopped being the nearest, starting at
the point where the polygon first became nearest and ending at the x
coordinate of the current edge, and the current x coordinate is recorded in
the polygon that is now the nearest. This saved coordinate later serves as the
start of the span emitted when the new nearest polygon ceases to be in front.
Don't worry about following all of that; this is just a quick overview of edge
sorting to help make the rest of this column clearer. There will be a more
thorough discussion in my next column.
The spans generated with edge sorting are exactly the same spans that
ultimately emerge from span sorting; the difference lies in the intermediate
data structures used to sort the spans in the scene. With edge sorting, the
spans are kept implicit in the edges until the final set of visible spans is
generated, so the sorting, clipping, and span emission is done as each edge
adds or removes a polygon, based on the span state implied by the edge and the
set of active polygons. With span sorting, spans are immediately made explicit
when each polygon is rasterized, and those intermediate spans are then sorted
and clipped against the other spans on the scan line to generate the final
spans, so the states of the spans are explicit at all times, and all work is
done directly with spans.
Both span sorting and edge sorting work well, and have been employed
successfully in commercial projects. We've chosen to use edge sorting in Quake
partly because it seems inherently more efficient, with excellent horizontal
coherence, which makes for quick sorting, in contrast with the potentially
costly sorting into linked lists involved in span sorting. A more important
reason, though, is that with edge sorting we're able to share edges between
adjacent polygons, and that cuts the work involved in sorting, clipping, and
rasterizing edges nearly in half, while also shrinking the world database
quite a bit.
One final advantage of edge sorting is that it makes no distinction between
convex and concave polygons. That's not an important consideration for most
graphics engines, but in Quake, edge clipping, transformation, projection, and
sorting has become a major bottleneck, so we're doing everything we can to get
the polygon and edge counts down, and concave polygons help a lot in that
regard. While it's possible to handle concave polygons with span sorting, that
can involve significant performance penalties.
Nonetheless, there's no cut-and-dried answer as to which approach is better.
In the end, span sorting and edge sorting amount to the same functionality,
and the choice between them is a matter of whatever you feel most comfortable
with. In the next column, I'll go into considerable detail about edge sorting,
complete with a full implementation. I'm going the spend the rest of this
column laying the foundation for next time by discussing sorting keys and 1/z
calculation. In the process, I'm going to have to make a few forward
references to aspects of edge sorting that I haven't covered in detail; my
apologies, but it's unavoidable, and all should become clear by the end of the
next column.


Edge-sorting Keys


Now that we know we're going to sort edges, using them to emit spans for the
polygons nearest the viewer, the question becomes telling which polygons are
nearest. Ideally, we'd just store a sorting key in each polygon, and whenever
a new edge came along, we'd compare its surface's key to the keys of other
currently active polygons, and easily tell which polygon was nearest. That
sounds too good to be true, but it is possible. If, for example, your world
database is stored as a BSP tree, with all polygons clipped into the BSP
leaves, then BSP walk order is a valid drawing order. So, for example, if you
walk the BSP back to front, assigning each polygon an incrementally higher key
as you reach it, polygons with higher keys are guaranteed to be in front of
polygons with lower keys. This is the approach Quake used for a while,
although a different approach is now being used, for reasons I'll explain
shortly.
If you don't happen to have a BSP or similar data structure handy, or if you
have lots of moving polygons (BSPs don't handle moving polygons very
efficiently), another way to accomplish our objectives would be to sort all
the polygons against one another before drawing the scene, assigning
appropriate keys based on their spatial relationships in viewspace.
Unfortunately, this is generally an extremely slow task, because every polygon
must be compared to every other polygon. There are techniques to improve the
performance of polygon sorts, but I don't know of anyone who's doing general
polygon sorts of complex scenes in real time on a PC.
An alternative is to sort by z distance from the viewer in screenspace, an
approach that dovetails nicely with the excellent spatial coherence of edge
sorting. As each new edge is encountered on a scan line, the corresponding
polygon's z distance can be calculated and compared to the other polygons'
distances, and the polygon can be sorted into the APL accordingly.
Getting z distances can be tricky, however. Remember that we need to be able
to calculate z at any arbitrary point on a polygon, because an edge may occur
and cause its polygon to be sorted into the APL at any point on the screen. We
could calculate z directly from the screen x- and y-coordinates and the
polygon's plane equation, but unfortunately this can't be done very quickly,
because the z for a plane doesn't vary linearly in screenspace; however, 1/z
does vary linearly, so we'll use that instead. (See Chris Hecker's series of
columns on texture mapping over the past year in Game Developer magazine for a
discussion of screenspace linearity and gradients for 1/z.) Another advantage
of using 1/z is that its resolution increases with decreasing distance,
meaning that by using 1/z, we'll have better depth resolution for nearby
features, where it matters most.
The obvious way to get a 1/z value at any arbitrary point on a polygon is to
calculate 1/z at the vertices, interpolate it down both edges of the polygon,
and interpolate between the edges to get the value at the point of interest.
Unfortunately, that requires doing a lot of work along each edge, and worse,
requires division to calculate the 1/z step per pixel across each span.
A better solution is to calculate 1/z directly from the plane equation and the
screen x and y of the pixel of interest. The equation is 1/z = (a/d)x' -
(b/d)y' + c/d, where z is the viewspace z coordinate of the point on the plane
that projects to screen coordinate (x',y') (the origin for this calculation is
the center of projection, the point on the screen straight ahead of the
viewpoint), [a b c] is the plane normal in viewspace, and d is the distance
from the viewspace origin to the plane along the normal. Division is done only
once per plane, because a, b, c, and d are per-plane constants.
The full 1/z calculation requires two multiplies and two adds, all of which
should be floating point to avoid range errors. That much floating-point math
sounds expensive but really isn't, especially on a Pentium, where a plane's
1/z value at any point can be calculated in as little as six cycles in
assembly language.
For those of you who are interested, here's a quick derivation of the 1/z
equation. The plane equation for a plane is ax+ by + cz-d = 0, where x and y
are viewspace coordinates, and a, b, c, d, and z are defined above. If you
substitute x=x'z and y=-y'z (from the definition of the perspective
projection, with y inverted because y increases upward in viewspace but
downward in screenspace), and do some rearrangement, you get z = d / (ax' -
by' + c). Inverting and distributing yields 1/z = ax'/d - by'/d + c/d. You'll
see 1/z sorting in action next time.


Quake and z Sorting


I mentioned that Quake no longer uses BSP order as the sorting key; in fact,
it uses 1/z as the key now. Elegant as the gradients are, calculating 1/z from
them is clearly slower than just doing a compare on a BSP-ordered key, so why
have we switched Quake to 1/z?
The primary reason is to reduce the number of polygons. Drawing in BSP order
means following certain rules, including the rule that polygons must be split
if they cross BSP planes. This splitting increases the numbers of polygons and
edges considerably. By sorting on 1/z, we're able to leave polygons unsplit
but still get a correct drawing order, so we have far fewer edges to process
and faster drawing overall, despite the added cost of 1/z sorting.
Another advantage of 1/z sorting is that it solves the sorting issues
(mentioned before) involving moving models that are themselves small BSP
trees. Sorting in world BSP order wouldn't work here, because these models are
separate BSPs, and there's no easy way to work them into the world BSP's
sequence order. We don't want to use z buffering for these models because
they're often large objects such as doors, and we don't want to lose the
overdraw-reduction benefits that closed doors provide when drawn through the
edge list. With sorted spans, the edges of moving BSP models are simply placed
in the edge list (first clipping polygons so they don't cross any solid world
surfaces, to avoid complications associated with interpenetration), along with
all the world edges, and 1/z sorting takes care of the rest.


Onward to Next Time


There is, without a doubt, an awful lot of information in the preceding pages,
and it may not all connect yet in your mind. The code and accompanying
explanation next time should help; if you want to peek ahead, the code should
be available electronically (see "Availability," page 3) and from
ftp.idsoftware.com/mikeab/ddjdsort.zip by the time you read this column. You
may also want to take a look at Computer Graphics: Principles and Practice, by
James Foley and Andries van Dam (Addison-Wesley, 1992) or Procedural Elements
of Computer Graphics, by David F. Rogers (McGraw-Hill, 1985). 
As I write this, it's unclear whether Quake will end up sorting edges by BSP
order or 1/z. Actually, there's no guarantee that sorted spans in any form
will be the final design. Sometimes it seems like we change graphics engines
as often as they play Elvis on the '50s oldies stations (but, one would hope,
with more aesthetically pleasing results!), and no doubt we'll be considering
the alternatives right up until the day we ship.
Figure 1: Span generation.
Figure 2: The spans from polygon A from Figure 1, sorted and clipped with the
spans from polygon B, where polygon A is at a constant z distance of 100 and
polygon B is at a constant z distance of 50 (B is closer).
Figure 3: Activating a polygon when a leading edge is encountered in the AEL.
Figure 4: Deactivating a polygon when a trailing edge is encountered in the
AEL.























DTACK REVISITED


Small Catastrophes




Hal W. Hardenbergh


Hal is a hardware engineer who sometimes programs. He is the former editor of
DTACK Grounded, and can be contacted through the DDJ offices.


A "small" catastrophe is one that happens to someone else. Consider, for
instance, the San Jose, CA, resident who discovered that, while she had been
at work, somebody had dug a ditch in the street, run the ditch into her front
yard, and dug up her rose bushes. The next morning she discovered, when
Pacific Gas and Electric's work crew showed up, that no mistake had been made.
PG&E had fully intended to dig up her rose bushes.
A quick review: Rice, normally a crop for monsoon countries where the annual
rainfall is a quarter-mile or so, is being grown in the central California
desert. Rice requires so much of California's ample water supply that there's
precious little left over for people. So San Jose is recycling sewage water
for watering industrial-park lawns. While laying the pipe to carry this water,
San Jose contracted with PG&E to place fiber-optic cable in those sewage-water
ditches.
Well, the cable is in place. The optic cable must be coupled to regular coax
for home use, and that involves electronic equipment boxes measuring 6x5x2
feet (!). These boxes will run hot, so they have to be in the open air, not
underground. This means they will be a prominent feature of any front yard
they reside in, and if rose bushes have to be sacrificed or little Timmy's
play area is eliminated, tough. 
Meanwhile, several new cellular services are being established in the San Jose
area. Each cellular service requires its own set of transceivers every few
blocks. You guessed it: The new cellular service providers also have a
contract with the city of San Jose, and it turns out your house is just
perfect to bolt one of these antennas onto. Never mind that hearing aids and
pacemakers don't work close to such devices.
Just when most traditional utilities such as telephone lines and electrical
power have finally moved underground, the new information age is making even
upscale residential neighborhoods look like high-tech junkyards. Isn't
progress wonderful?


More Small Catastrophes


According to the industry newsletter Microprocessor Report (MPR), the fastest
shipping desktop computer (as measured by SPECint95) on January 1 was the
200-MHz Pentium Pro. This lead lasted for about a month before DEC started
shipping a 333-MHz DEC 21164-based workstation.
Several companies are continually vying to produce the fastest desktop CPU. At
any given moment, every company but one fails to do so. Each failure is a
not-so-small catastrophe--enormous sums of money are at stake.


Sun Overcomes Adversity


A few years back, Sun designed the SuperSparc in-house, then turned the design
over to Texas Instruments' process engineers. When Supe' saw the light of day,
it proved far slower than Sun had preannounced. Unseemly fighting broke out as
Sun and TI engineers pointed the finger of blame at each other. Everyone in
the industry, including Sun's workstation customers, became aware that Supe'
was even slower than cheap Pentium-based machines. But Sun learned its lesson.
Sun has started shipping systems based on the SuperSparc's successor, the
UltraSparc. Sun has announced that Ultra is wonderful; no public arguments or
finger pointing. Ultra systems are slower than relatively cheap Pentium Pro
boxes, but that isn't mentioned by Sun's PR pitchmen. The UltraSparc: success,
glory! Why, Sun even included a bunch of CISCy multimedia instructions, each
of which replaces a sequence of 20 to 30 RISC instructions. So much for RISC's
theoretical superiority to CISC. Don't laugh. Although the facts on the ground
remain constant, Sun's stock has gone way up since Sun changed its PR stance.


Silicon Graphics and MIPS


You may remember that MIPS, the microprocessor design firm, was going
bankrupt. Silicon Graphics (SGI) had to buy it to assure a continuing supply
of leading-edge CPUs. The new R10000 has seen first silicon and works well,
being competitive with the 200-MHz Pentium Pro and the 333-MHz DEC 21164. But
the die is huge and yields are, as yet, very low.
Remember that SGI and DEC can get by with shipping dozens of their highest-end
CPUs, while Intel needs shipments in the hundreds of thousands to be even
noticeable.
SGI's catastrophes, two of them, loom on the horizon. First, SGI specializes
in three-dimensional (3-D) graphics and has traditionally maintained 50
percent gross margins, as Apple once did. 3-D graphics boards are becoming
commonly available for x86-based PCs, and (stop me if you've heard this
before) SGI has discovered it cannot sustain 50 percent gross margins. In
fact, it recently announced major price cuts, including dropping some system
prices in half.
The supercomputer industry, which was never an especially profitable
marketplace, peaked in 1992 before entering a terminal nosedive. SGI has
decided that the solution to its problems is the purchase of supercomputer
manufacturer Cray Research, the one in Minnesota. I am not making this up.


PPC CPUs Emerge Triumphant


Well, they were supposed to before reality set in. First Apple crippled its
initial 601-based systems (except for the most expensive 8100 line) by leaving
out the L2 cache. This caused a horrendous performance shortfall. The line
continues to be crippled by the absence of a native-code operating system.
The performance of the next-generation 604 was overpromised by 33 percent by
both IBM and Apple. When the 603 came out, the internal (L1) instruction cache
proved seriously undersized, so the 603e had to be rushed to market. IBM and
Apple were going to release hardware-compatible systems, but it didn't happen.
IBM was going to ship a PPC-native OS/2, but that didn't happen either.
All of the PPC's performance problems were going to be solved by the
third-generation 620. Well, the 620 staggered weakly into public view with
astonishingly poor performance, so much so that the chip has been withdrawn.
IBM fired the two heads of its Somerset PPC design center and hired a new boss
from a company that knows how to design fast CPUs. Yep, Cyrix.
(Andrew Allison, publisher of the computer-systems industry newsletter Inside
the New Computer Industry, thinks the 620 project would have been canceled
outright except for some inconvenient delivery contracts.)


Legacy Code and the Pentium Pro


The poor performance of the Pentium Pro on 16-bit code is now universally
recognized. When the CPU was introduced, Intel's PR carefully pointed out that
this performance shortfall was intended. But I've read that Intel's
competitors are "ecstatic" over the P6's poor 16-bit performance; surely that
was not intended!

As this situation first unfolded, a colleague told me the Pentium Pro would be
sold for Windows NT and the Pentiums for regular Windows. I replied that such
an artificial divide would be untenable. Right now my colleague is way the
heck ahead on points. Sigh.
As I write this, Intel is deciding which of three product lines will get the
first production allotment of its new 0.25m fab. The one that gets the
production slot will be the one with leading performance and will likely
recover the industry lead from the DEC 21164. The three contending products
are the regular Pentium, a new version of the Pentium with added multimedia
instructions (the P55), and the Pentium Pro.
You would think the Pentium Pro would be awarded that production slot, but if
Pentium Pros are only going to be sold to NT users then not many will be sold.
Intel is a mass-production company which cannot make a profit selling a very
few Tiffany CPUs. (You don't suppose Intel is quietly redesigning the Pro's
16-bit mode, do you? Naah. Too sensible. Face would be lost.)


The Emperor's New Fabs


A few years back the latest fad in the semiconductor industry was the fabless
chip-design firm. Now Cyrix has what's generally regarded as a really good
x86-compatible chip design, but has no leading-edge production capacity.
Neither IBM nor Thomson CSF, who have contracted to build chips for Cyrix and
for sale under their own name, have significant available production capacity.
The spiffiest of CPU designs is worthless unless the part can be built, and in
the mass PC marketplace, that means built by the millions. I tell everybody
that Intel should buy Cyrix; everybody tells me I'm nuts. I guess Intel has as
severe a case of "not invented here" as most design companies.


AMD's K5 Bellyflop


A bellyflop won't win the Olympic diving competition this summer in Atlanta,
and AMD's K5 processor, the one that was going to drive Intel out of the
Pentium business, isn't going to win any computer system sales at all. After
trumpeting its fabulous (impending) performance for what seemed like years,
AMD finally produced working K5s and discovered, to its horror, that they were
pathetically slow (the SuperSparc scenario, only worse).
AMD's recovery plan is to rename the K5 as the SSA/5 to compete in the
shrinking upscale-486 arena, not with Pentiums. AMD has purchased NexGen and
renamed two NexGen developments as the new future K5 and K6. Like all future
CPUs, these are world beaters. Like all future CPUs, the reality that
eventuates may or may not prove propitious.


DEC's Fast-Clock CPUs


In the context of this column, DEC is more sinned against than sinning. DEC's
chip designers decided years back, when the Alpha was being designed, that a
fast clock was what it would take to provide industry-performance leadership
as measured by SPECmarks. They were right. Ever since the Alpha was
introduced, DEC led the SPECmark derby, as measured by SPECint89 and then
SPECint92, usually by a large margin.
Most of the members of SPEC are CPU producers. The other members got tired of
eating DEC's exhaust, so the new SPECint95 has been devised. Using this new
metric, DEC was knocked out of the SPECmark lead with no change whatever in
anyone's CPU designs. (True, DEC has regained the lead, but by the narrowest
of margins.)
If you think this result is unrelated to the nature of the new SPECint95, I
have some Montana beachfront property for sale. Other writers have explained
the what and how of this new benchmark; I'm going to tell you why.
Higher performance can be attained by either pushing up the clock frequency of
a CPU with few execution units (the DEC approach), or by using a more moderate
clock frequency and using superscalar parallelism with many execution units
(the approach used by everybody else).
The higher the clock frequency, the worse the performance hit of a cache miss
and stall while data is fetched from DRAM. DEC's 21164 is the first CPU to
include on-chip both the now-conventional L1 cache and also a larger L2 cache
that's slower but still faster than the external (L3 in this case) cache using
fast static RAM. In case there's a cache miss in both the 21164's internal L1
and L2 caches, DEC uses 8 MBs (!) of L3 cache in its fastest systems, again to
minimize cache misses.
The new SPECint95 test suite has been specifically designed to overload all
caches, including DEC's 8 MB, and force numerous data fetches from DRAM. This
slows down everyone's systems, not just DEC's. But since DEC uses the
fast-clock approach, this brings DEC's performance back to the front of the
pack. To overload an 8-MB cache (with enough margin to prevent DEC from simply
switching to, say, a 32-MB cache) requires a really big data set. That's why
it takes two days to run the new SPECint95 test suite.
If your application doesn't overflow conventional static-RAM caches, then a
DEC machine will perform better against the competition than SPECint95 would
indicate. If your application overflows conventional, smaller caches but
doesn't overflow DEC's 8 MB, then you definitely want to run your application
on a DEC system, all other factors being equal. If your application overflows
all caches, then SPECint95 is a very good performance indicator.


Here There be Dragons


In their attempts to provide us end users with excellent performance, and to
one-up their competitors, CPU designers are constantly pushing into terra
incognito. It's not possible to predict what strange beast is going to jump
out of the bushes to gnaw on the chip-designers' ankles, or worse.
All leading-edge CPU designs are world beaters at the start of the project,
else management would not commit the enormous piles of money needed to develop
such a device. When (if) the CPU sees first working silicon and its
performance can be measured, we then learn whether the PR puffery was
accurate. Nobody ever sets out to build a slow CPU.
If AMD/NexGen's upcoming K6 proves as fast as the PR folk claim, I'll finally
be able to run Win95 as fast as my 8-MHz Z80 used to run CP/M.


























20/20


Peace Maker




Al Williams


Al, a consultant specializing in software development, training, and
documentation, is the author of Steal This Code! (Addison-Wesley, 1995). You
can find Al on the Web at
http://ourworld.compuserve.com/homepages/Al_Williams.


What do computer programming and religion have in common? (Falling to your
knees and praying to meet a deadline doesn't count.) It always strikes me that
programming tools and languages develop almost religious followings. If you
want to start a jihad, tell a group of programmers what text editor you use!
Language choice is at least as bad. I once worked with a very bright engineer
who wanted everything written in APL and WordPerfect macro language (that's
all he knew, so there's no danger of him reading this). Everyone knows that
Visual Basic is the answer to all programming problems. No, C++ is. Wait,
Delphi works for everything. A true Tower of Babel.
Since I've survived many languages (all the way back to Fortran IV), I find
the language crusades amusing. Imagine going to a hardware store and saying,
"I need to drive screws, nail up a fence, and cut my grass. Should I buy a
screwdriver or a hammer?" Programming is much the same. Tools work best when
applied to certain problems. I've pounded a few nails with a screwdriver, but
it didn't work very well. I'd never try to cut the grass with a screwdriver
(if I ever cut my grass, that is).
In this installment of "20/20," I'll show you how to create DLLs using Visual
Basic (VB) 4.0 and Borland's Delphi 2.0. I'll use DLLs from Delphi, VB, and
(for variety) C++. This will show off several interesting features of the
languages:
How to construct and use OLE DLLs from VB.
How to use OLE DLLs from Delphi and C++.
How to construct and use non-OLE DLLs from Delphi or use them with VB or C++.
You'll also see how to use a simple Telephony API (TAPI) call to add phone
dialing to any Windows application. If you program alone, you might not care
to learn multiple languages. Certainly, most languages can do many common
tasks. However, for most programmers, there are several reasons you might
consider mixing languages:
Another group in your company uses VB, but your group uses Delphi.
You've written code for a client in Delphi and would like to resell it to a
client that uses VB.
You find it easier to solve a particular problem using one language.
You need to interface to third-party programs that support one language, but
not the one you are using.
Even if you don't need to mix languages, it is interesting to see the
differences between Delphi DLLs and their VB counterparts.


Lingua Franca


If you look closely at the Windows programming environment, you'll notice that
DLLs are truly the universal code. Any language that can make calls to the API
certainly has some facility for calling DLLs, since the Windows API is just a
collection of DLLs. If you are careful, you can write DLLs that almost any
program can use. The key is to use standard calling conventions and data
types.
An alternative to standard DLLs is OLE. Although you might think of OLE as an
end-user tool, the underlying technology provides a good way to make objects
for software. You can make simple objects or full-blown controls (OCX
controls).
Delphi can create and use both OLE and non-OLE DLLs. Visual Basic can use
either type, but can only create OLE DLLs (unless you use third-party tools;
see the text box "Conventional DLLs with VB" for more details). Which type is
better? That depends. If you use Visual Basic, OLE DLLs are very easy to use.
They aren't much more difficult to use in Delphi, but they are more complex
than ordinary DLLs. Using ordinary DLLs in C or C++ is trivial. Using OLE DLLs
in C++ is a bit daunting and requires heroic efforts in C.


The Problem


Suppose you are working with other programmers on a suite of related programs.
All of the programs need access to a common phone book and they should all be
able to place a voice call as part of their features. However, some of the
applications are in C++, some are in VB, and still others are in Delphi. You
are the only member of the team that understands TAPI (for more information on
TAPI, see the accompanying text box entitled "About Assisted TAPI") and you've
already written some code in Delphi to dial a number. Too bad the phone book
programmer used Visual Basic.
Sure, in this case you could punt and make the dialer and phone book separate
programs, but in other cases that might not be an option. Besides, there are
many good reasons for integrating the programs into a single unit.
In a perfect world, everyone would work with one language, but this is our
world. You'll have to glue Delphi and VB code into different programs in
different languages. Obviously, you'll want to use some DLLs.


One Solution


If you want a DLL in VB, you have to go the OLE route--you don't have much
choice. That means the phone book will become an OLE object in a DLL (an
inproc OLE Server for you OLE people). Delphi can create either type of DLL.
To keep things simple, I decided to put the phone dialing code in an ordinary
DLL. Figure 1 shows the system design.
Certainly, there are other approaches. In this case, the Delphi code is very
simple. You could just rewrite it. But what if it was more complex? You could
also turn the Delphi code into an OLE DLL similar to the phone book. Since the
programs that access the phone book already need to deal with OLE objects,
that wouldn't be a very big problem. Still, ordinary DLLs are usually simpler,
so I decided to stick with them where possible.


Creating the VB DLL


When you create a VB DLL, you are really writing an OLE server. The main
interface to the server is a VB class object. That object may display forms
and do almost anything else you need. Here are a few things you can't do:
Use modeless forms. If you show a form, make sure to pass vbModal as an
argument to Show.
Use the End statement.

Pass or return objects that are private.
Read a command string via Command$.
To start the DLL project, select New Project from the File menu. Figure 2
shows the dialog that results. On the Project tab, you'll select Sub Main as
the startup form. It isn't really a form, of course, but all DLLs must begin
with Sub Main. They can't have a startup form because those are modeless.
You'll also want to pick OLE Server in the StartMode options. This doesn't
affect your project, but it does affect the way VB starts your DLL for
debugging. The purpose of a server is to supply objects to other programs. If
you start the server normally, it will notice that no other programs are using
its objects and will exit. When StartMode is set to OLE Server, VB will keep
your DLL open so you can start the other program and begin debugging. The
project name you select (PhoneBk, in the example) is part of how other
applications will identify your objects.
If you like, VB can also raise a run-time error if you try to do anything that
OLE DLLs don't allow. Go to the Advanced tab and select OLE DLL Restrictions.
Since you selected Sub Main as the DLL's entry point, it makes sense that
you'll need to create this subroutine. Insert a standard module, and create a
Main subroutine (just type Sub Main() in the code window and press Enter). If
you don't need any startup code, you can just leave the subroutine empty. For
debugging purposes, I usually put a temporary line here to show the form I'm
writing. However, you must remove that line before releasing your DLL to
others.
The next step is to create a class module. It is simple to insert a class
module using the Insert menu. Look at the properties for the module in Figure
3. The Name property is what other programs will use to create the object. In
this case, the complete object name is PhoneBk.Phonebook. The Public property
is True--without this, the object is not accessible to other programs (and
therefore, not an OLE object). Finally, the Instancing property must be Create
Multiuse. This allows other programs to successfully create multiple copies of
the object without starting multiple instances of your DLL. You must use this
option to create OLE DLLs.


Method and Madness


As with any other class module, you'll need to fill this class out with
methods and properties. You can find out more about VB's class modules in my
previous "20/20" column; see Dr. Dobb's Sourcebook, March/April 1996. The
phone book only needs one method: GetEntry (see Listing One). This function
fills in a name and phone number string for the caller. It returns True if it
succeeds, and False if the information it returns in the strings is invalid.
The code simply shows a modal form to collect the information and returns. The
form itself (available electronically; see "Availability," page 3) is
completely unremarkable. Although I could have used VB's extensive database
support, I wanted to keep the example simple, so the phone book resides in a
simple flat file. The name of the file is C:\PHONE.DAT by default. You can
alter the file name by changing the \HKEY_CURRENT_USER\Software\VB and VBA
Program Settings\PhoneBk\Database entry in the registry. Set that key's File
value to the full path of the data file you want to use.
If you added a line to show the form in Sub Main you can test the DLL very
easily. Just run it. When the program works to your satisfaction, remove the
testing line from Sub Main, and select Create OLE DLL from the File menu.
That's it. Unless, of course, you would like to distribute your DLL to another
machine.


Installation Woes


If you are content to use your DLLs on your machine only, VB does all the work
required to register your DLL. Without the crucial entries in the system
registry, other programs can't use your DLL. There are many possible ways to
register your DLL, but the simplest is to use the REGSVR32 program (found in
the \TOOLS\PSS directory of your VB CD-ROM). Just run REGSVR32 and pass the
name of your DLL as an argument. REGSVR32 does all the work.
Without REGSVR32, you would have to find all the registry entries related to
your server (there are many) and install the same entries in the target
machine's registry.


Creating the Delphi DLL


Although you can create OLE DLLs in Delphi, it is much easier to use ordinary
ones. The steps are simple:
1. Create a DLL project using the New menu item.
2. Modify the provided code to use any units you need.
3. Add any necessary initialization code in the unit's main section.
4. Write any functions you want the DLL to expose. Use the export modifier.
Usually, you'll want to use the stdcall modifier, too. 
5. Add an exports section and name any exported functions you wrote in step 4
in that section.
You can find an example of this in Listing Two. Since this is a DLL, Delphi
doesn't automatically create your main form and associated objects. When you
write ordinary programs, all of these details are generated automatically. The
code in Listing Two uses Example 1 to create a form (from the dialer unit).
The only other interesting feature of the dialer is that it uses TAPI to dial
the phone. See "About Assisted TAPI" for a quick overview of using TAPI to
manage a telephone connection.


Using the DLLs


Since the dialing DLL does not use OLE, it is very simple to use it from all
three target languages. Example 2 shows the declarations required for Delphi,
Visual Basic, and C++. Once you have the proper declaration, you can use the
DLL call as you would any native call.
The real trick to using DLLs is matching the calling conventions and
parameters. VB can only call DLLs that use the stdcall calling convention.
That's why the Delphi DLL uses the stdcall modifier. Pascal and Basic use
similar string types, and most languages use the same representation for
integers. Be prepared to experiment if you pass other data types around.
The OLE DLL takes a little more work. How much work depends on which language
you're using. In VB, using the OLE DLL doesn't take much more effort; see
Example 3(a). You declare a variable of type PhoneBk.Phonebook and specify the
new keyword. This creates an object that looks like an ordinary class object,
but is actually an OLE DLL.
In Delphi, the procedure is nearly as straightforward. In Example 3(b), the
program creates a Variant variable. Then, a call to CreateOleObject binds the
OLE DLL to that variable. Armed with the variable, you can make calls to the
DLL as you would any object.
The C++ program, of course, has to do the most work; see Example 3(c). This is
an MFC program, so it first has to call AfxOleInit(). This sets up the OLE
libraries. Later, when the program needs to load the object, it has to convert
the name PhoneBk.Phonebook into a class ID (CLSID). The CLSIDFromProgID call
handles that. Armed with the CLSID, the program can bind the object to a
_PhoneBook variable (created with Class Wizard) using the CreateDispatch
member function. Finally, the string format is different for C++, so the
program calls AfxBSTR2CString to make the appropriate conversions. Whew!
Another problem with C++ is that it insists on renaming functions without your
knowledge. A stdcall function has an underscore affixed to it, an "@" sign,
and the number of stack bytes it needs. However, this has nothing to do with
the stdcall convention (none of the API calls have this format). The only way
around this is to build a dummy DEF file and use it to construct an import
library that matches the DLL's entry points to the oddball C++ names. More
details on the DUMMY.DEF file are available electronically. You will also find
simple example programs that use the phone book and dialer DLLs. Although none
of them do anything else, there is no reason they couldn't be three
full-fledged programs, each written in a different language.


Other Considerations


Of course, managing a project that uses multiple languages can be a nightmare.
Over the long haul, it is best to consider consolidating your code into one
language. However, if you don't have a choice, it is certainly possible to
glue together code from a variety of sources.
I'm certain the jihad concerning programming languages will continue for as
long as there are programmers. What's my favorite language? I'll keep that to
myself for now, but you can always ask me next time you see me.
Conventional DLLs with VB
Imentioned in the accompanying article that Visual Basic can only create OLE
DLLs. That isn't strictly true. With Visual DLL from Simply Solutions (Santa
Ana, CA, simply@netcom.com), you can create ordinary DLLs using VB. You can
use Visual DLL to write callbacks or any other ordinary DLL in VB. If you
wanted to create a control-panel applet, for example, this is the ticket.
Visual DLL creates a DLL shell for your VB code and automatically generates
declarations for VB and a C-compatible header file.
With the ability to create ordinary DLLs with VB, you can do practically
anything. Of course, performance may suffer compared to a true compiled C or
Pascal program, but for many applications this isn't a major problem. There
are a few quirks. For example, functions in your DLL appear as subs in VB.
Each function has a parameter named Return_Value that you assign the
function's return value. Of course, Visual DLL is twisting VB to do something
it won't ordinarily do, so you should expect a quirk or two, but there is
nothing that you couldn't become accustomed to. You can find a slide show
about Visual DLL (VDLLIN.ZIP) in Simply Solutions' CompuServe forum (type GO
SIMSOL).
--A.W.
About Assisted TAPI
The main point of this month's programs was to illustrate how to integrate
code from different languages. The telephone example just happened to be
handy. However, you may have found it surprising that dialing the phone would
be so simple. Well, it wasn't always like that.
In the not-so-distant past, PC vendors noticed that hooking up a phone to a PC
made it more valuable. Sure, there have been data modems for a long time, but
actually using the PC to take and place calls is a relatively new idea. If you
don't believe this has caught on, take a look in the Sunday paper. Try to find
a major retailer selling PCs without a speakerphone/voice mail card in it. As
usual, every vendor used their own unique hardware with special commands. This
makes it difficult to write telephone software that works with a variety of
hardware.
To help encourage this market, Microsoft introduced the Telephony API (TAPI),
which serves as an abstraction layer between your program and the telephone
hardware in the same way printer drivers prevent you from needing to know the
specifics about every printer you might encounter. In truth, TAPI serves as
several abstractions, depending on what you want to accomplish. TAPI can
handle small, single-line voice modems or control sophisticated multiline PBX
phone switches.
TAPI can provide your program several views of the hardware. The phone dialer
uses the simplified (or assisted) TAPI services. As the name implies, these
services are simple, and don't do very much. The simplest request (used by the
phone dialer) is tapiRequestMakeCall. This function does exactly what we want:
It places a call on the telephone.
Without assisted TAPI, you would have to do much more work: You'd need to set
up a TAPI callback, obtain a telephone line, place the call, and process TAPI
messages. This would still be an improvement over manually controlling the
hardware.

--A.W.
Figure 1: System overview.
Figure 2: Starting an OLE DLL.
Figure 3: OLE DLL class module properties.
Example 1: Creating a form using the DialPhone procedure.
TDlg:=TDialDlg.Create(nil); { no parent window }
 .
 .
 .
if (TDlg.ShowModal=mrOK) ... { do it}
Example 2: Using an ordinary DLL. (a) Visual Basic; (b) Delphi; (c) C++.
(a)
Declare Function DialPhone Lib "dialdll.dll" (ByVal nm As String,
 ByVal num As String) As Integer

(b)
function DialPhone(n : PChar; n1 : PChar) : integer; stdcall;
 external 'dialdll.dll';

(c)
// This requires an import lib constructed in an odd way
// See dummy.def for details
extern "C" {
int _stdcall DialPhone(char const *name, char const *num);
}
Example 3: Using an OLE DLL. (a) Visual Basic; (b) Delphi; (c) C++.
(a)
Dim pb As New PhoneBk.Phonebook
If pb.GetEntry(nam, num) Then
 .
 .
 .

(b)
var
 pb:Variant;
 .
 .
 .
begin
 pb:=CreateOleObject('PhoneBk.Phonebook');
 if (pb.GetEntry(nam,num)) then
 .
 .
 .

(c)
 CString nam,num;
 BSTR bnam,bnum;
 CLSID clsid;
 _PhoneBook pbook; // Class generated by Class Wizard
 // Init pbook
 COleException e;
 if (CLSIDFromProgID(OLESTR("phonebk.phonebook"), &clsid)
 != NOERROR)
 {
 MessageBox("Error");
 EndDialog(IDABORT);
 }

// Load DLL and attach to pbook variable
 if (!pbook.CreateDispatch(clsid, &e))
 {
 MessageBox("Error");
 EndDialog(IDABORT);
 }
 pbook.GetEntry(&bnam,&bnum);
 AfxBSTR2CString(&nam,bnam);
 AfxBSTR2CString(&num,bnum);

Listing One 
VERSION 1.0 CLASS
BEGIN
 MultiUse = -1 'True
END
Attribute VB_Name = "PhoneBook"
Attribute VB_Creatable = True
Attribute VB_Exposed = True
Public Function GetEntry(ByRef nam As String, ByRef num As String) As Boolean
 Form1.Show 1
 If Form1.Result Then
 nam = Form1.Result_Name
 num = Form1.Result_Num
 End If
 GetEntry = Form1.Result
End Function

Listing Two
library dialdll;
{ Important note about DLL memory management: ShareMem must be the
 first unit in your library's USES clause AND your project's (select
 View-Project Source) USES clause if your DLL exports any procedures or
 functions that pass strings as parameters or function results. This
 applies to all strings passed to and from your DLL--even those that
 are nested in records and classes. ShareMem is the interface unit to
 the DELPHIMM.DLL shared memory manager, which must be deployed along
 with your DLL. To avoid using DELPHIMM.DLL, pass string information
 using PChar or ShortString parameters. }
uses
 SysUtils,
 Classes,
 dialer in 'dialer.pas' {DialDlg};
function tapiRequestMakeCall(num : PChar; app : PChar; name : PChar;
 comment : PChar): Integer; stdcall; external 'TAPI32' name
'tapiRequestMakeCall';
function DialPhone(Name : PChar;Num : PChar) : integer; stdcall; export; {
might add auto}
var TDlg : TDialDlg;
begin
 TDlg:=TDialDlg.Create(nil);
 TDlg.Telno.Caption:=Num;
 if (TDlg.ShowModal=mrOK) then
 result:=tapiRequestMakeCall(Num,'Delphi Dialer',Name,nil)
 else
 result:=-1;
end;
exports
 DialPhone;
begin
end.
End Listings
































































SOFTWARE AND THE LAW


Servicing, Upgrading, or Interfacing with Someone Else's Software




Marc E. Brown


Marc is a patent attorney and shareholder in the intellectual-property law
firm of Poms, Smith, Lande, & Rose in Los Angeles, CA. Marc specializes in
computer law and can be contacted at 73414.1226 @compuserve.com.


I recently took my six-year-old's tricycle to the repair shop. The repairman
adjusted the height of the seat and added a bell. He never questioned for a
moment whether what he was doing was legal.
But software is different. If you earn a living by working with someone else's
products, you need to give the matter careful thought. Surprisingly,
servicing, upgrading, or interfacing with software written by another company
can lead to an unpleasant and damaging lawsuit. Although the line between
legal and illegal activity has not been clearly illuminated, certain
guidelines have emerged.


Loading Customer Software into RAM During Service


Consider MAI Systems Corp. v. Peak Computer, Inc., a 1993 decision by the
Ninth U.S. Circuit Court of Appeals.
MAI Systems manufactured computers and designed software to run in those
computers. The software included an operating system, utilities, and
diagnostics. Peak maintained MAI computers for more than 100 clients in
Southern California. Peak performed routine maintenance and made emergency
repairs. Peak's work on MAI's computers accounted for more than 50 percent of
its business.
When providing maintenance or making emergency repairs, Peak usually booted
the MAI computer, causing the MAI operating system to be loaded in RAM. MAI
also alleged that Peak ran MAI's diagnostic software during Peak's service
calls.
MAI contended that Peak's use of the MAI operating system constituted
copyright infringement. To Peak's surprise (and to the surprise of many in the
software industry), the federal court agreed and enjoined Peak from continuing
this work.
Under copyright law, making an unauthorized "copy" of software is an
infringement. The principal question addressed by the MAI case was whether a
copy was being made simply by loading software into RAM from a hard disk. 
The Copyright Act defines "copy" as a "material object...in that a work is
fixed." A work is defined to be "fixed" "when its embodiment...is sufficiently
permanent or stable to permit it to be perceived...for a period of more than
transitory duration." After the operating system was loaded, the court noted
that Peak was able to view a system-error log and to diagnose the problem with
the computer. The court then concluded that a copy was being made by loading
the operating system into RAM. Because the customer was not authorized to
allow another company to make such a copy, each service call generally
resulted in a copyright infringement.
The MAI decision was criticized by many legal scholars. But this federal
appellate court was not listening. Last year it reaffirmed the principals of
MAI in Triad Systems Corp. v. Southeastern Express Co.
Triad manufactured computers for use by automotive part stores. Triad included
an array of custom software with each computer, including an operating system,
applications, utilities, and diagnostics. Southeastern was an independent
service organization (ISO) that serviced and maintained Triad computers in
competition with Triad. Just as in MAI, Southeastern frequently used the
software that Triad provided during its service and maintenance calls. 
Following its earlier decision in MAI, this federal court again concluded that
Southeastern was making an unauthorized copy of the software by loading it
into RAM.
Southeastern urged the defense of "fair use," a strategy not attempted in the
MAI case. Although it may have made an unauthorized copy, Southeastern argued
that it was nevertheless fair, and thus legal, for it to have done so.
Fair use is a recognized defense to an allegation of copyright infringement. A
common example is a book review that quotes a paragraph from the book.
Although a portion of a copyrighted work has been copied, that copying is not
deemed to be an infringement because it is a fair use of the copyright.
But this defense was rejected in the Triad case. The court found that it was
not fair to make the copy because the entire program (not just a segment) was
copied, because the copying did not result in any new creative work, and
because there "was no appreciable public benefit" from the activity.
"Southeastern [was] simply commandeering its customer's software and using it
for the very purpose that, and in precisely the manner in that, it was
designed to be used." On February 26, the Supreme Court rejected, without
comment, a request by Southeastern to overturn the decision of the Ninth
Circuit.


Distinguish Between a License and a Sale


Section 117 of the Copyright Act states that "it is not an infringement for
the owner of a copy of a computer program to make or authorize the making of
another copy or adaptation of that computer program provided...that such a new
copy or adaptation is created as an essential step in the utilization of the
computer program...."
You could credibly argue that the service and maintenance that was done in
both MAI and Triad was an "essential step in the utilization of the computer
program," and thus immune to an allegation of infringement. The problem faced
by the defendants in these two cases was that the software that their
customers were using had been "licensed" to the customers, not "sold." In both
of these cases, the court noted that the duplication rights provided under
Section 117 only applied to an "owner" of a copy. The court concluded that a
"licensee" was not an "owner."
Congress is currently considering a bill (H.R. 533) that would substitute the
word "possessor" for "owner" in Section 117. The bill is intended to eliminate
the prohibition against "copying" established by the MAI and Triad decisions,
even when the customer only receives a license.


Examine the License Agreement


In the meantime, a customer's custom software should not be operated without
first checking the documentation that the customer received from the software
vendor.
Today, most software vendors do not "sell" their software--they "license" it.
(Although the MAI and Triad decisions underscore one reason for these
licenses, there are others. The uses of the software can be more easily
restricted and exposure to implied warranties is reduced.) The mere presence
of a license agreement, of course, does not necessarily prohibit use of the
customer's software by an ISV. The terms of the license must be studied to
determine whether the contemplated use is authorized.
Examine the license agreement to determine whether the software can be used by
persons other than the customer. Then see if the contemplated use of the
software is authorized. It is important to understand that the absence of
language prohibiting the contemplated activity is not decisive. According to
MAI and Triad, loading software in RAM will be a copyright infringement,
unless the act is authorized in the license agreement. Thus, it is important
to find language that can be fairly interpreted to authorize the contemplated
activity, not merely the absence of language that prohibits it. 


Modifications and Upgrades May Also Constitute an Infringement


The right to make "copies" is not the only right protected by copyright law.
Another is the right to prepare "derivative works" based upon the copyrighted
work. ISVs often modify software written by someone else. Modifications are
made to fix bugs, improve existing features, or add new features. Almost any
modification or upgrade might be characterized as a "derivative work."
Permission to make this derivative work should therefore be obtained from the
owner of the copyright.
Some software is designed to interface with other software and is promoted for
such use. The presence of this promotional material is probably sufficient to
provide the needed license by implication. 
If there is no such implied license, an express license will be needed to
avoid the risk of an infringement charge. If it is not contained in the
customer's licensing agreement, it should be obtained from the copyright owner
before proceeding.



Reverse Engineering to Facilitate Interoperability 


The pendulum swings the other way on the concept of "reverse
engineering"--disassembling object code for the purpose of extracting the
concepts that the software implements. The fact that the disassembly is done
to facilitate interoperability does not make the disassembly illegal.
The leading case in this area is Sega Enterprises, Ltd. v. Accolade, Inc., a
federal appellate court decision in 1992.
Sega developed and marketed video games, including the "Genesis" console. A
different game could be played on the console simply by plugging in a
different cartridge. Sega designed and sold game cartridges and also licensed
other companies to do the same.
Accolade was an independent developer of computer game software. Accolade
wanted to obtain a license from Sega, but decided against it when Sega
insisted that Sega be the exclusive manufacturer of all games produced by
Accolade.
Instead, Accolade purchased three different Sega game cartridges on the
marketplace. It decompiled the object code and wrote a manual containing the
functional descriptions that were necessary for the game cartridge to
interface with the Genesis console. The manual did not contain any source
code. It was given to Accolade's programmers. They used the manual to design
other game cartridges compatible with the Genesis console.
One of the functional specifications was a header code that Accolade believed
was needed to initialize the Genesis console. Accolade therefore copied this
precise header in its game cartridges. Upon detecting the header, the Genesis
console displayed the erroneous message "PRODUCED BY OR UNDER LICENSE FROM
SEGA ENTERPRISES, LTD."
Sega alleged that Accolade's conduct constituted copyright infringement. The
federal court agreed that the disassembly process that Accolade used created
an unauthorized copy of the copyrighted software. Notwithstanding, the federal
court concluded that Accolade's activities constituted a "fair use."
The court concluded that the functional specifications that Accolade extracted
were not protected by copyright law. Extraction of these unprotectable
components through disassembly of object code was therefore a "fair use." The
header code, on the other hand, was not simply a functional specification.
Rather, it was a specific implementation of a functional specification. This
was a type of information that traditionally had been accorded copyright
protection.
The federal court nevertheless held that Accolade had the right to extract and
utilize this header. The court permitted such an unauthorized use because
"there [was] no other method of access to the computer that [was] known or
readily available to rival cartridge manufacturers." Although Accolade's
unauthorized use of the header caused the game cartridge to display the false
message that the game was "PRODUCED BY OR UNDER LICENSE FROM SEGA ENTERPRISES,
LTD.," the court held that this impropriety was the fault of Sega for
designing the system to display this message, not Accolade.
A district court in Texas followed the Accolade decision in DSC Communications
Corp. v. DGI Technologies, Inc. DSC made digital switching equipment,
including a microprocessor card containing firmware. DGI wanted to compete in
the sale of a compatible microprocessor card. It purchased a DSC card, made a
copy of the firmware, disassembled the firmware, and made flow charts from the
disassembled source code. These flow charts were used to make the compatible
microprocessor card. 
Following the Sega decision, the court held that DGI's efforts were lawful,
notwithstanding the absence of DSC's consent. The court found that the flow
charts were not protected by the copyright on the DSC firmware. The court also
found that disassembly of the firmware was the only means to gain access to
the unprotectable flow charts.
All of the "reverse engineering" cases, however, critically hinge on one
common fact--lawful means were used to acquire the object code that was
disassembled. In the Sega case, the defendant purchased three game cartridges
on the marketplace. In the DSC case, the defendant extracted the object code
from a microprocessor board that it had lawfully purchased. If the object code
is stolen or otherwise obtained or decompiled in breach of a contract, it is
likely that no "fair use" would be found. 


Trade-Secret Restrictions may Apply


Software-licensing agreements often include clauses prohibiting the customer
from letting others see the software or from allowing anyone, including the
customer, to decompile the software. These types of restrictive clauses can
prohibit conduct that would not even be a copyright infringement, such as
decompilation of object code. They are based on the theory that the software
is a trade secret.
There is some doubt whether these clauses would be enforced when used with
software that is widely distributed to consumers through the retail market.
Trade secret protection is usually not afforded to information unless it truly
is a secret. It is questionable whether software that is widely distributed to
consumers through retail sales can retain a reasonable degree of secrecy, no
matter what contract language is used. The law also requires the trade-secret
owner to take reasonable steps to protect its secrecy. Widespread distribution
might be found to be violative of this additional requirement. 
But for custom software, these trade-secret agreements can often be quite
effective and should usually be respected.
As a practical matter, of course, the third-party developer may not be aware
that his customer has agreed to protect the confidentiality of its software or
to restrict its use. Such ignorance could conceivably support a defense
against a claim of trade-secret misappropriation. But proceeding in this
manner is risky. The trade-secret owner might be able to successfully argue
that the third-party developer had reason to believe that confidentiality
restrictions were imposed, even if he had no actual knowledge of them.
Evidence of industry custom, as well as knowledge of such contracts with other
customers, might be cited. Even if the third-party developer escapes
liability, his customer might not be so lucky.


Conclusion


Working with another company's software is a risky business. Each step of the
effort can potentially violate a multitude of legal rights. And the governing
legal principles are only just beginning to crystalize.
The safest route is to fully describe the exact efforts that are contemplated
in writing and to have the owner of the software copyright (not merely the
customer) expressly consent to this activity in writing.
If strong objection to the activity is predicted, the greatest care must be
exercised. The customer's licensing agreement should be carefully reviewed,
preferably by an attorney. Consideration should also be given to obtaining a
written commitment from the customer to indemnify the third-party developer if
a lawsuit is filed. Procuring insurance should also be considered. 
U.S. Supreme Court Splits on Duplication of Menu Tree
Early last year, many in the software industry were shaken while others
cheered when the First Circuit federal appellate court ruled that Borland had
the right to copy the entire menu tree used in the Lotus 1-2-3 spreadsheet
program, free of any claim of copyright infringement. The court ruled that the
menu tree was a "method of operation" that was not protected by copyright law.

The ruling left many wondering what other types of software copying would be
permitted under the rubric of a "method of operation." What about a
programming language? What about structures necessary to achieve
interoperability? Could one argue that the entire content of a software
program is merely an unprotectable "method of operation?"
Many questioned the soundness of the decision, including the United States
Supreme Court, which agreed to review the decision last year.
But early this year, the U.S. Supreme Court surprised many by issuing a
one-sentence decision stating that "The judgment of the United States Court of
Appeals for the First Circuit is affirmed by an equally divided court. Justice
Stevens took no part in the consideration or decision of this case." No
reasons or explanations were provided. 
The evenly split vote of the Supreme Court leaves the issue unresolved.
Although many lower courts will probably now allow menu trees to be copied, we
will have to wait again for the fi-nal word.
--M.E.B.






















THE SOFTWARE ENGINEER


Let the Games Begin




Allen Holub


Allen is a programmer, educator, and OO design consultant. He can be reached
at http://www.holub.com or allen@holub.com.


This column is about software engineering, about how to make high-quality
programs using the best tools at hand. I'll be looking at design techniques
and patterns, programming-styles, and language issues. You'll find a lot of
code in the coming months, but the code will be here to illustrate certain
design techniques, not as an end in itself. I'll also share the occasional
book and product review when I find something relevant to the design process.
Initially, my focus will be on object-oriented (OO) design and C++, but that
could change. I'm starting with OO and C++ because these are the most commonly
used tools today, and because they are the tools that I use in my daily work.


Who is this guy, anyway?


Since this is a new column, I should probably introduce myself. Some of you
will know me from "The C Chest," a column I wrote for Dr. Dobb's Journal from
1983 to 1987. My focus then was on practical programming, good design, and
good style in C, and I intend to continue that focus here, but without
limiting the discussion to a single language.
I've been working in the computer field since 1979 and as an independent
consultant since 1983. I've written all sorts of programs during this time:
robot controllers, small operating systems, compilers, lots of UNIX and
Windows applications--you name it. Lately, I've been using OO design
techniques and C++ almost exclusively. I've also been teaching a lot, both
though the University of California, Berkeley Extension, and in-house classes
for private companies. Most of this work has focused on object-oriented design
and C++ (which I strongly believe have to be approached together), primarily
in the Microsoft NT and Windows 95 environments. I've worked a lot with the
Microsoft Foundation Classes (MFC)--which will provide us with a rich set of
examples of bad design practices in future columns-- and often teach
MFC-related topics as well. 
In addition to "C Chest," I've written for various computer magazines, am the
"MFC Pro" for inquiry.com (http:// www.inquiry.com), and author of numerous
books, including Enough Rope to Shoot Yourself in the Foot (McGraw Hill,
1995), about programming styles in C and C++.


Committing to Object Orientation


I'm going to kick off the column, not with a discussion of a design process or
technique, but by discussing the problem of moving to a new design--in this
case OO--environment. Since this is a "games" issue, I'll examine how a
particular game can be used as a valuable tool in the adoption of OO.
First, some background: Whether a company can successfully adopt OOD as its
software-development methodology depends largely on the commitment of
high-level management, and "commitment" is the operative word here. ("In a
bacon-and-eggs breakfast, the chicken is involved, the pig is committed.") OOD
has some very real pluses: You really do get better maintenance, code really
does go together faster and painlessly, and once you have a library of
reusable objects, programs really are smaller and easier to write. This
translates to a real-world competitive edge. To get this benefit, you have to
be using the methodology, however, and using OO is a lot like being
pregnant--you are or aren't, there's no middle ground.
The main downside is that switching to OO can be a gut-wrenching process. It's
not surprising that the internal structure of a company reflects its
software-design methodology. For example, if your company follows a
traditional "waterfall" model, you'll probably have sales, design,
engineering, quality-assurance, documentation, and customer-relations or
consulting departments. The program will move from one department to the next
(probably in that order) as it's developed. As you'll see in future columns,
the OO design process does not follow the "waterfall" model at all.
Consequently, any company that adopts OO must change its internal structure to
adapt to the new methodology.
My point is that adopting OO can initially be both disruptive and painful, and
without serious commitment on the part of the upper management, it won't
happen: The company probably will try to take some middle ground that doesn't
involve stepping on too many toes, and that way lies madness. 
So how do you make the changeover succeed? How do you get the commitment
needed to make the change work? The answer is education. Of course, the
engineers and designers must be properly trained or they won't be able to do
the work. Usually this means a formal training programming--OO is easy to do
once the ideas "click," but for some reason it's difficult to learn the ideas
from books. You have to learn either from trial and error (which is time
consuming and expensive) or by some sort of guided learning process.
Education at the managerial level is even more important. No matter how good
the programmers are, OO won't work unless the programmers are provided with an
environment in which they're happy. (Most good programmers define "happy" as
being able to do the best job possible with the least amount of interference.)
The only way to accomplish this is for the managers to really understand what
the programmers need, which means that they must thoroughly understand the
design process. But how can they do this if they're not programmers (or if the
last program they wrote was a Fortran 77 accounting package)? There are books,
of course (I'll discuss two at the end of this column), but there's no
substitute for experience.


Playing the OO Game


This brings me back to games. Playing computer games is one of the best ways
for novices to learn how to use computers. By the same token, games that model
the OO design process and, ideally, the behavior of the resulting program,
would be a great way to learn about OO.
To choose such a game, you first have to identify the key concepts in OO
design. Fortunately, there is more commonality between the various
methodologies than differences. All OO design methodologies are concerned
primarily with the management of complexity. Rather than pretend that a
program can be analyzed mathematically and its correctness proved, the OO
designer starts with the assumption that a large computer program cannot be
analyzed. To use a big word, large computer programs are nondeterministic: You
can't predict their behavior for all possible inputs. An OO designer doesn't
try to eliminate complexity, but rather, tries to manage it. The basic tool is
a rather extreme form of data abstraction. An "object" knows nothing about how
most other objects in the system actually do what they do. An object does know
that other objects around it have certain capabilities--a string knows how to
print itself, for example, and the object knows how to ask its neighbors to
exercise those capabilities--by sending the string a print_yourself() message
in the current case.
The system I've described, where one "cell" knows only about the surrounding
cells and doesn't have a "big picture," is called a "cellular automaton."
These automata are usually quite simple to implement, and can model quite
complex natural processes (like the behavior of air when penetrated by an
object of a particular shape, say, an airfoil) quite accurately. The cellular
automaton with which most programmers are familiar is the game of life. Look
at each "cell" as representing a city block. If the population of the
surrounding blocks is too high, the cell dies from overcrowding; if the
population is too low, the cell dies from loneliness. Otherwise the cell is
happy as a clam. Life is easy to implement, and fascinating to watch. A
cellular automaton is also a nice application for OO technology since the idea
of a "cell" and an "object" are more-or-less equivalent.
Since most real-world objects have much more complex interfaces than most
cells, life is a bit simplistic for our purposes. There's a great game that
takes the principles of cellular automata in general (and life in particular)
to a complex-enough level to be useful for learning about OO at a gut level,
and that's Maxis Software's Sim City. If you haven't played it, Sim City is a
dynamic model of a city. You are in charge of creating the physical landscape,
zoning, and managing the city's infrastructure (building roads, power plants,
water mains, and the like). You also set tax rates, establish budgets, and so
forth. You are constrained by a budget, and you get your money from taxes.
The city is occupied by virtual inhabitants called "sims" who choose to build
(or not), and move into (or out of) your town based entirely on how well the
infrastructure supports them and how pleasant a place your town is. The goal
of the game is to increase population. If you do things wrong, the city decays
and everyone moves out.
So what does all this have to do with OO? First of all, setting up the
infrastructure is actually a form of programming. The basic object (a city
block) has attributes that you have to define (residential, commercial, and so
on) and will interact with adjacent objects based on those attributes. The
city is typically modularized into neighborhoods based on the zoning, and the
interaction between modules is determined by the transportation
infrastructure. The city as a whole (and the day-to-day movement of people
within the city) nicely parallels the run-time behavior of an OO program.
One of the first lessons you learn is maintenance. If the budget is too low,
roads become impassable, the electrical grid breaks down, and the city stops
working. Not the entire city at first, but only parts of it. If you don't
solve problems as they arise, the entire city collapses. There are a lot of
obvious parallels with computer programs, not only in terms of actually
maintaining programs, but also setting priorities and learning to identify the
complex interactions between seemingly unconnected parts of a program. This is
one of the most difficult lessons for many managers to learn. Changing a
seemingly insignificant part of a program can cause unpredictable major
problems in other far-away locations. The actual cost of making a change is
not just the time spent doing the changing, but also the time spent fixing the
resulting problems.
The next lesson is planning. The first few times you play the game, you tend
not to design the city, but rather let it grow in an ad hoc way. The city
works fine for a while, A road occasionally fails and you fix it. Eventually,
you spend all your time responding to crises, however, and growth stops. Every
tax dollar that comes in has to be spent just to keep the city viable.
Moreover, the situation is quite unstable. If you make even one mistake in
deciding what to fix at a given moment, the entire city collapses. Again, the
parallel to most large computer programs is obvious. Once you start planning,
these problems don't arise. The lessons of city planning apply directly to OO
programs. The only way to build a successful large city, in fact, is to design
the future city before building anything. You must design the infrastructure
to be extensible and learn to modularize your city in such a way that problems
in one location do not ripple to other locations. All these are
characteristics of good program design in general and OO design in particular.


By the Book


Obviously, just playing games is not in itself sufficient preparation for the
move to OO, but it's a good start. I can recommend two books that are relevant
to OO adoption at the managerial level. David Taylor's Object Oriented
Technology: A Manager's Guide (Addison-Wesley, 1981) is a concise introduction
to OO concepts. Written for nontechnical managers, this book is light reading
for most hard-core techies, but it's a good, quick introduction to OO
concepts.
A much meatier (and more valuable) book is Adele Goldberg and Kenneth Rubin's
Succeeding with Objects: Decision Frameworks for Project Management
(Addison-Wesley, 1995). Goldberg is one of the original developers of
Smalltalk, and probably has more experience with OO development than anyone
alive. This book is not about the OO design process, but rather the process of
adopting OO design. It's full of good advice that comes from practical
experience; I consider it essential reading for any manager who is considering
the move to object orientation.
With respect to this column, a great "intelligent layman's" introduction to
cellular automata is Ivars Peterson's The Mathematical Tourist: Snapshots of
Modern Mathematics (W.H. Freeman and Co., 1988). A similarly wonderful
discussion of life is William's Poundstone's The Recursive Universe: Cosmic
Complexity and the Limits of Scientific Knowledge (William Morrow & Co.,
1985).














